New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psycopg2 incorrectly casting intervals from Redshift #558

Closed
tjhiggins opened this Issue May 23, 2017 · 17 comments

Comments

Projects
None yet
2 participants
@tjhiggins
Copy link

tjhiggins commented May 23, 2017

This doesn't appear to be an issue with standard postgresql since I tried to replicate the scenario by adding a new test case, but it passed. Only an issue in Redshift as far as I know.

SELECT interval '1 day'

Correctly = timedelta(days=1)

SELECT interval '1 day' from {table}

Incorrectly = timedelta(days=0) in psycopg2==2.6.1
It throws an overflow exception in psycopg2==2.7.1

For whatever reason the psycopg2.extensions.INTERVAL value is 1 day without selecting from a table, but if you select from any table the value is 1 day in microseconds 8.64e+10.

My current hack which probably wrong in other ways was to add a new_type to override the INTERVAL casting:

def cast_redshift_interval(value, curs, _interval=psycopg2.extensions.INTERVAL):
    # https://github.com/psycopg/psycopg2/issues/558
    try:
        casted_value = _interval(value, curs)
    except OverflowError as error:
        # psycopg2==2.7.X incorrectly throws an OverflowError if you select an interval from a table
        try:
            return timedelta(microseconds=float(value))
        except ValueError:
            raise error

    # psycopg2==2.6.X incorrectly zeros out intervals if you select an interval from a table
    if isinstance(casted_value, timedelta) and casted_value.total_seconds() == 0:
        try:
            return timedelta(microseconds=float(value))
        except ValueError:
            return casted_value
    else:
        return casted_value

...

redshift_interval = psycopg2.extensions.new_type(
    psycopg2.extensions.INTERVAL.values,
    str('REDSHIFTINTERVAL'),
    cast_redshift_interval)
psycopg2.extensions.register_type(redshift_interval, conn)
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 8, 2017

Hi, sorry, I didn't really notice this bug appearing.

What is the text representation of the case that fails in psycopg 2.7.1? I.e. if you run:

SELECT (interval '1 day')::text from {table}

what is the result?

Thank you: if we get a response quickly enough I'll try to fix this and release in a few day. Tentatively added to milestone 2.7.2.

@dvarrazzo dvarrazzo added this to the psycopg 2.7.2 milestone Jun 8, 2017

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 8, 2017

@dvarrazzo Interestingly SELECT (interval '1 day')::text from {table} returns 1 day.
However, SELECT interval '1 day' from {table} returns 86400000000.
1 day in microseconds. Thats at least the value that gets past to my cast interval override cast_redshift_interval.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 8, 2017

That's helpful, thank you.

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 8, 2017

@dvarrazzo Shoot I'm sorry I was still on psycopg 2.6.1.
In 2.7.1:
SELECT interval '1 day' from {table} throws an overflow error

Edit: Huh just upgraded and it doesn't throw an overflow error, but it still the same issue as 2.6.1 where it returns a timedelta(0)

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 8, 2017

I will try and check that upon input 86400000000 the output is 1 day.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

Hi @tjhiggins I should have fixed the problem in branch bug-558. Could you please test if redshift can also return values smaller than 1 days as microseconds? E.g. if you select

SELECT interval '1 second' from {table}

would it return 1000000?

Let me know, thank you.

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

@dvarrazzo Nice job! It appears to be fixed. Sorry I should have provided more examples earlier, but its not particular to 1 day. However, all of the following now return the correct timedelta:
select '1 second'::interval from {table} = 1000000 = 1 second timedelta
select '10 seconds'::interval from {table} = 10000000 = 10 seconds timedelta
select '15 hours'::interval from {table} = 54000000000 = 15 hours timedelta

Thanks again! I look forward to upgrading.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

Ok cool: I just wanted to know to understand in which case we should follow the path of parsing a number as number of micros: from your first example it seemed it was only necessary in the overflowing path, but I'd say we will do that every time we don't find any indicator of it being an interval (e.g. if we don't find : or day/month).

Of course I assume negative intervals are generated too, and are represented as negative numbers...

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

Good point. I just checked and negative intervals selected from tables are negative numbers.
select '-1 day'::interval from {table} = -86400000000

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

Ok, it seems the tests in the test suite should cover all these base cases then.

However I find so surprising that select '-1 day'::interval returns one thing and select '-1 day'::interval from {table} returns another... Is that even Postgres? It goes against the grain of everything I know. Are you sure there isn't anything obviously wrong?

What oid does cur.description contain? 1186 in both cases?

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

Actually here it says intervals are not even supported...

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

The cursor.description returns the same thing for both I think:
Column(name='interval', type_code=1186, display_size=None, internal_size=12, precision=None, scale=None, null_ok=None)

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

I guess they implemented their own version. Now that you mention this I do see the following note:
You cannot use the INTERVAL data type for columns in Amazon Redshift tables.
That might explain why it oddly returns microseconds since the column cannot be an interval? Not sure where that leaves us.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

Yes, I was reading exactly that. It seems that intervals exist in the type system, you can create literals but you cannot store them.

The fact the representation is different if a table is involved or not seems more a bug on their side... However, if it doesn't create ambiguities, I'm ok to release the adapter with this change. Please note though that if some unexpected incompatibility should arise this support will be immediately dropped in favour of what is needed by Postgres instead. Looking at the postgres docs this shouldn't be the case though.

On a related note I just discovered we don't parse ok literals with mixed signs... oh my.

@tjhiggins

This comment has been minimized.

Copy link

tjhiggins commented Jun 16, 2017

Cool makes sense. I'm okay with the possibility of it being removed from the adapter. I do have a hack that works, and it does seem super particular to redshift. I wouldn't be sad if you didn't include it at all at this point. I am glad it uncovered an unrelated error so at least not all is lost!

@dvarrazzo dvarrazzo closed this in 315f728 Jun 16, 2017

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 16, 2017

No, at that point I prefer to include it :) Merged to master, to be released in 2.7.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment