Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_datetime returns NaT for epoch with errors='coerce' #11758

Closed
bhy opened this issue Dec 4, 2015 · 9 comments
Closed

to_datetime returns NaT for epoch with errors='coerce' #11758

bhy opened this issue Dec 4, 2015 · 9 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Timeseries
Milestone

Comments

@bhy
Copy link

bhy commented Dec 4, 2015

xref #11760

Parsing unix epoch timestamps give NaT with errors='coerce' while they can be parsed correctly without it:

>>> pd.to_datetime(1420043460, errors='coerce', unit='s')
NaT
>>> pd.to_datetime(1420043460, unit='s')
Timestamp('2014-12-31 16:31:00')

pandas: 0.17.1
numpy: 1.10.1
Python 2.7.7

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

hmm, that does seem like a bug...

pull-requests welcome!

@jreback jreback added Bug Timeseries Difficulty Novice Compat pandas objects compatability with Numpy or Python functions labels Dec 4, 2015
@jreback jreback added this to the 0.18.0 milestone Dec 4, 2015
@engelmav
Copy link

engelmav commented Dec 8, 2015

I will take this one.

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

@engelmav want to do the PR?

@engelmav
Copy link

hey Jeff, sure. I don't have the fix yet. had to pause due to issues at
work.

On Sunday, January 24, 2016, Jeff Reback notifications@github.com wrote:

@engelmav https://github.com/engelmav want to do the PR?


Reply to this email directly or view it on GitHub
#11758 (comment).

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016
@sumitbinnani
Copy link

Similarly, this is also not working as desired

>>> pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: long too big to convert

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 29, 2016

@jreback I was looking at this, and some questions:

It seems this is deliberate, given the comment here: https://github.com/pydata/pandas/blob/084391126cc4edda35b41378d2905c788e5b573a/pandas/tslib.pyx#L2041 ("if we are coercing, dont' allow integers"). And I suppose this is to deal with the case of mixed types. For example, if you are parsing strings: pd.to_datetime(['2012-01-01', 1]), you want the integer to be coerced to NaT.
But, of course when having only integers, this behaviour is not correct.

But, I was also wondering, is there a reason we don't have a special path specifically for converting epochs (integers with unit specified), instead of using the general tslib.array_to_datetime where the types are checked for each element?
(simplified idea)

def to_datetime(arg, ...):

    ....
    elif com.is_integer(arg):
        _epoch_to_datetime(arg, unit)
    ....

def _epoch_to_datetime(vals, unit):
    return vals.astype('datetime[{}]'.format(unit))

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

@jorisvandenbossche so it could be in a routine. What you could do is multiply then feed it back to to_datetime with the same coerce option. you can directly astype if only if errors='raise'. I think ignore you might have to iterate as well.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

further I suspect we can error out on the above example as unit doesn't make sense when you don't have integers/floats (though you can allow mixed things as long as you can astype them to ints/floats) eg.

In [1]: pd.to_numeric(['1.0',2,3.0])
Out[1]: array([ 1.,  2.,  3.])

in fact could be a pre-processing step assume we pass thru the errors, because that's the rub you have to potentially iterate here.

@jreback jreback modified the milestones: 0.18.2, Next Major Release Apr 29, 2016
@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

I think I can fix this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Timeseries
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants