New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_datetime returns NaT for epoch with errors='coerce' #11758

Closed
bhy opened this Issue Dec 4, 2015 · 9 comments

Comments

Projects
None yet
5 participants
@bhy

bhy commented Dec 4, 2015

xref #11760

Parsing unix epoch timestamps give NaT with errors='coerce' while they can be parsed correctly without it:

>>> pd.to_datetime(1420043460, errors='coerce', unit='s')
NaT
>>> pd.to_datetime(1420043460, unit='s')
Timestamp('2014-12-31 16:31:00')

pandas: 0.17.1
numpy: 1.10.1
Python 2.7.7

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 4, 2015

Contributor

hmm, that does seem like a bug...

pull-requests welcome!

Contributor

jreback commented Dec 4, 2015

hmm, that does seem like a bug...

pull-requests welcome!

@engelmav

This comment has been minimized.

Show comment
Hide comment
@engelmav

engelmav Dec 8, 2015

I will take this one.

engelmav commented Dec 8, 2015

I will take this one.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 24, 2016

Contributor

@engelmav want to do the PR?

Contributor

jreback commented Jan 24, 2016

@engelmav want to do the PR?

@engelmav

This comment has been minimized.

Show comment
Hide comment
@engelmav

engelmav Jan 25, 2016

hey Jeff, sure. I don't have the fix yet. had to pause due to issues at
work.

On Sunday, January 24, 2016, Jeff Reback notifications@github.com wrote:

@engelmav https://github.com/engelmav want to do the PR?


Reply to this email directly or view it on GitHub
#11758 (comment).

engelmav commented Jan 25, 2016

hey Jeff, sure. I don't have the fix yet. had to pause due to issues at
work.

On Sunday, January 24, 2016, Jeff Reback notifications@github.com wrote:

@engelmav https://github.com/engelmav want to do the PR?


Reply to this email directly or view it on GitHub
#11758 (comment).

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016

@sumitbinnani

This comment has been minimized.

Show comment
Hide comment
@sumitbinnani

sumitbinnani Apr 29, 2016

Similarly, this is also not working as desired

>>> pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: long too big to convert

sumitbinnani commented Apr 29, 2016

Similarly, this is also not working as desired

>>> pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: long too big to convert
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 29, 2016

Member

@jreback I was looking at this, and some questions:

It seems this is deliberate, given the comment here: https://github.com/pydata/pandas/blob/084391126cc4edda35b41378d2905c788e5b573a/pandas/tslib.pyx#L2041 ("if we are coercing, dont' allow integers"). And I suppose this is to deal with the case of mixed types. For example, if you are parsing strings: pd.to_datetime(['2012-01-01', 1]), you want the integer to be coerced to NaT.
But, of course when having only integers, this behaviour is not correct.

But, I was also wondering, is there a reason we don't have a special path specifically for converting epochs (integers with unit specified), instead of using the general tslib.array_to_datetime where the types are checked for each element?
(simplified idea)

def to_datetime(arg, ...):

    ....
    elif com.is_integer(arg):
        _epoch_to_datetime(arg, unit)
    ....

def _epoch_to_datetime(vals, unit):
    return vals.astype('datetime[{}]'.format(unit))

Member

jorisvandenbossche commented Apr 29, 2016

@jreback I was looking at this, and some questions:

It seems this is deliberate, given the comment here: https://github.com/pydata/pandas/blob/084391126cc4edda35b41378d2905c788e5b573a/pandas/tslib.pyx#L2041 ("if we are coercing, dont' allow integers"). And I suppose this is to deal with the case of mixed types. For example, if you are parsing strings: pd.to_datetime(['2012-01-01', 1]), you want the integer to be coerced to NaT.
But, of course when having only integers, this behaviour is not correct.

But, I was also wondering, is there a reason we don't have a special path specifically for converting epochs (integers with unit specified), instead of using the general tslib.array_to_datetime where the types are checked for each element?
(simplified idea)

def to_datetime(arg, ...):

    ....
    elif com.is_integer(arg):
        _epoch_to_datetime(arg, unit)
    ....

def _epoch_to_datetime(vals, unit):
    return vals.astype('datetime[{}]'.format(unit))

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 29, 2016

Contributor

@jorisvandenbossche so it could be in a routine. What you could do is multiply then feed it back to to_datetime with the same coerce option. you can directly astype if only if errors='raise'. I think ignore you might have to iterate as well.

Contributor

jreback commented Apr 29, 2016

@jorisvandenbossche so it could be in a routine. What you could do is multiply then feed it back to to_datetime with the same coerce option. you can directly astype if only if errors='raise'. I think ignore you might have to iterate as well.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 29, 2016

Contributor

further I suspect we can error out on the above example as unit doesn't make sense when you don't have integers/floats (though you can allow mixed things as long as you can astype them to ints/floats) eg.

In [1]: pd.to_numeric(['1.0',2,3.0])
Out[1]: array([ 1.,  2.,  3.])

in fact could be a pre-processing step assume we pass thru the errors, because that's the rub you have to potentially iterate here.

Contributor

jreback commented Apr 29, 2016

further I suspect we can error out on the above example as unit doesn't make sense when you don't have integers/floats (though you can allow mixed things as long as you can astype them to ints/floats) eg.

In [1]: pd.to_numeric(['1.0',2,3.0])
Out[1]: array([ 1.,  2.,  3.])

in fact could be a pre-processing step assume we pass thru the errors, because that's the rub you have to potentially iterate here.

@jreback jreback modified the milestones: 0.18.2, Next Major Release Apr 29, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 29, 2016

Contributor

I think I can fix this shortly.

Contributor

jreback commented Apr 29, 2016

I think I can fix this shortly.

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 30, 2016

jreback added a commit to jreback/pandas that referenced this issue Apr 30, 2016

@jreback jreback closed this in 286782d Apr 30, 2016

nps added a commit to nps/pandas that referenced this issue May 17, 2016

BUG: to_datetime when called with a unit and coerce is buggy
closes pandas-dev#11758

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#13033 from jreback/to_datetime and squashes the following commits:

ed3cdf0 [Jeff Reback] BUG: to_datetime when called with a unit and coerce is buggy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment