New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: coercion of non-M8[ns] in datetime ops #7996

Closed
jreback opened this Issue Aug 12, 2014 · 16 comments

Comments

Projects
None yet
3 participants
@jreback
Contributor

jreback commented Aug 12, 2014

import datetime
s = pd.Series(pd.date_range('20130101',periods=3))
s-pd.Timestamp('20130101')
s-datetime.datetime(2013,1')

This fails as the datetime64 is not converted properly (because numpy datetime ops suck)

s-np.datetime64('20130101')

e.g. np.datetime64('20130101').astype('M8[ns]') is a bug, no?

@jreback jreback added this to the 0.15.0 milestone Aug 12, 2014

@jreback jreback added Bug labels Aug 12, 2014

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

I am not fully following here. Isn't this just a limitation of numpy's string parsing? np.datetime64('2013-01-01').astype('M8[ns]') looks OK to me.

Numpy only parses ISO timestrings, so np.datetime64('20130101') will never work? Doing np.datetime64('2013-01-01').astype('M8[ns]') does not look wrong. Or is that not what you mean?

But in any case s-np.datetime64('2013-01-01') should still ideally work. Can pandas work around this? (by always doing a .astype('M8[ns]') in __rsub__?)

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 31, 2014

no I think it's just broken in numpy

np.datetime64('2013-01-01') make this dtype of M8[D] but I don't think it allows astype to M8ns

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 31, 2014

it still IS possible though I think you just have to get the value and figure it out based in the dtype (which is where pandas can handle it - if the astype worked then it would be easy)

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

In [94]: np.datetime64('2013-01-01')
Out[94]: numpy.datetime64('2013-01-01')

In [95]: np.datetime64('2013-01-01').astype('M8[ms]')
Out[95]: numpy.datetime64('2013-01-01T01:00:00.000+0100')

In [96]: np.datetime64('2013-01-01').astype('M8[ns]')
Out[96]: numpy.datetime64('2013-01-01T01:00:00.000000000+0100')

This looks like astype to M8[ns] works?

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 31, 2014

hmm maybe doesn't work in 1.7 I think

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

that was in 1.7.1 (and 1.8.1)

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 31, 2014

try exactly how I have it (the format)

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

In [98]: np.datetime64('20130101').astype('M8[ns]')
Out[98]: numpy.datetime64('2151-06-04T08:32:39.009206272+0200')

That is indeed bullshit output, but that is because numpy does not support non-iso string parsing, not because of the astype not working.
Without the astype, it is also not really working (see that the datetime is not parse, but it is strange it does not give an error here):

In [12]: np.datetime64('20130101')
Out[12]: numpy.datetime64('20130101')
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

That seems a bug in numpy, that is does not raise:

In [99]: np.datetime64('20130101')
Out[99]: numpy.datetime64('20130101')

In [100]: np.datetime64('20130101 10:00')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-100-6968cb1137d9> in <module>()
----> 1 np.datetime64('20130101 10:00')

ValueError: Error parsing datetime string "20130101 10:00" at position 8

In [101]: np.datetime64('2013-01-01 10:00')
Out[101]: numpy.datetime64('2013-01-01T10:00+0100')

it does raise when there is also an hour and not only date. Or is there a reason a date-only would allow more flexible string parsing?

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 31, 2014

ahh ok

but we still don't handle this input correctly (a non ns numpy datetime input)

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Aug 31, 2014

yes, so it a legitimate issue :-) (only the comment about astype was not correct)

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Sep 1, 2014

@jreback Figured out the strange behaviour of numpy :-)

It is interpreting np.datetime64('20130101') as the year 20,130,101, so it is logical this does not raise an error about malformated date, and that it does not fit in a ns range:

In [16]: np.datetime64('20130101').dtype
Out[16]: dtype('<M8[Y]')

In [17]: np.datetime64('20130101').astype('M8[D]')
Out[17]: numpy.datetime64('20130101-01-01')

But it apparantly does not give an out-of-range date error when astyping to ns range.

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 19, 2014

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 8, 2016

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 29, 2017

For copy/pasting, the OP has a typo in line s-datetime.datetime(2013,1').

AFAICT the np.datetime64('20130101') is unsalvageable and the open issue is s-np.datetime64('2013-01-01'). Is that correct?

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Oct 29, 2017

Slightly different kind of wrong when using a DatetimeIndex instead of Series:

>>> dti = pd.date_range('20130101',periods=3)
>>> dti - np.datetime64('2013-01-01')
DatetimeIndex(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[ns]', freq=None)
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Oct 30, 2017

the open issue is s-np.datetime64('2013-01-01'). Is that correct?

Yes, it is:

In [2]: import datetime
   ...: s = pd.Series(pd.date_range('20130101',periods=3))
   ...: 

In [3]: s-pd.Timestamp('20130101')
Out[3]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

In [5]: s-datetime.datetime(2013,1,1)
Out[5]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

In [6]: s-np.datetime64('2013-01-01')
Out[6]: 
0   15705 days 23:59:59.999984
1   15706 days 23:59:59.999984
2   15707 days 23:59:59.999984
dtype: timedelta64[ns]
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Dec 14, 2017

I'm about to submit a PR that fixes this. It works for Series but still fails for DataFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment