Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: coercion of non-M8[ns] in datetime ops #7996

Closed
jreback opened this issue Aug 12, 2014 · 17 comments · Fixed by #18783
Closed

BUG: coercion of non-M8[ns] in datetime ops #7996

jreback opened this issue Aug 12, 2014 · 17 comments · Fixed by #18783
Labels
Bug Datetime Datetime data dtype Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Aug 12, 2014

import datetime
s = pd.Series(pd.date_range('20130101',periods=3))
s-pd.Timestamp('20130101')
s-datetime.datetime(2013,1')

This fails as the datetime64 is not converted properly (because numpy datetime ops suck)

s-np.datetime64('20130101')

e.g. np.datetime64('20130101').astype('M8[ns]') is a bug, no?

@jreback jreback added this to the 0.15.0 milestone Aug 12, 2014
@jorisvandenbossche
Copy link
Member

I am not fully following here. Isn't this just a limitation of numpy's string parsing? np.datetime64('2013-01-01').astype('M8[ns]') looks OK to me.

Numpy only parses ISO timestrings, so np.datetime64('20130101') will never work? Doing np.datetime64('2013-01-01').astype('M8[ns]') does not look wrong. Or is that not what you mean?

But in any case s-np.datetime64('2013-01-01') should still ideally work. Can pandas work around this? (by always doing a .astype('M8[ns]') in __rsub__?)

@jreback
Copy link
Contributor Author

jreback commented Aug 31, 2014

no I think it's just broken in numpy

np.datetime64('2013-01-01') make this dtype of M8[D] but I don't think it allows astype to M8ns

@jreback
Copy link
Contributor Author

jreback commented Aug 31, 2014

it still IS possible though I think you just have to get the value and figure it out based in the dtype (which is where pandas can handle it - if the astype worked then it would be easy)

@jorisvandenbossche
Copy link
Member

In [94]: np.datetime64('2013-01-01')
Out[94]: numpy.datetime64('2013-01-01')

In [95]: np.datetime64('2013-01-01').astype('M8[ms]')
Out[95]: numpy.datetime64('2013-01-01T01:00:00.000+0100')

In [96]: np.datetime64('2013-01-01').astype('M8[ns]')
Out[96]: numpy.datetime64('2013-01-01T01:00:00.000000000+0100')

This looks like astype to M8[ns] works?

@jreback
Copy link
Contributor Author

jreback commented Aug 31, 2014

hmm maybe doesn't work in 1.7 I think

@jorisvandenbossche
Copy link
Member

that was in 1.7.1 (and 1.8.1)

@jreback
Copy link
Contributor Author

jreback commented Aug 31, 2014

try exactly how I have it (the format)

@jorisvandenbossche
Copy link
Member

In [98]: np.datetime64('20130101').astype('M8[ns]')
Out[98]: numpy.datetime64('2151-06-04T08:32:39.009206272+0200')

That is indeed bullshit output, but that is because numpy does not support non-iso string parsing, not because of the astype not working.
Without the astype, it is also not really working (see that the datetime is not parse, but it is strange it does not give an error here):

In [12]: np.datetime64('20130101')
Out[12]: numpy.datetime64('20130101')

@jorisvandenbossche
Copy link
Member

That seems a bug in numpy, that is does not raise:

In [99]: np.datetime64('20130101')
Out[99]: numpy.datetime64('20130101')

In [100]: np.datetime64('20130101 10:00')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-100-6968cb1137d9> in <module>()
----> 1 np.datetime64('20130101 10:00')

ValueError: Error parsing datetime string "20130101 10:00" at position 8

In [101]: np.datetime64('2013-01-01 10:00')
Out[101]: numpy.datetime64('2013-01-01T10:00+0100')

it does raise when there is also an hour and not only date. Or is there a reason a date-only would allow more flexible string parsing?

@jreback
Copy link
Contributor Author

jreback commented Aug 31, 2014

ahh ok

but we still don't handle this input correctly (a non ns numpy datetime input)

@jorisvandenbossche
Copy link
Member

yes, so it a legitimate issue :-) (only the comment about astype was not correct)

@jorisvandenbossche
Copy link
Member

@jreback Figured out the strange behaviour of numpy :-)

It is interpreting np.datetime64('20130101') as the year 20,130,101, so it is logical this does not raise an error about malformated date, and that it does not fit in a ns range:

In [16]: np.datetime64('20130101').dtype
Out[16]: dtype('<M8[Y]')

In [17]: np.datetime64('20130101').astype('M8[D]')
Out[17]: numpy.datetime64('20130101-01-01')

But it apparantly does not give an out-of-range date error when astyping to ns range.

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 19, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 8, 2016
@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
@jbrockmendel
Copy link
Member

For copy/pasting, the OP has a typo in line s-datetime.datetime(2013,1').

AFAICT the np.datetime64('20130101') is unsalvageable and the open issue is s-np.datetime64('2013-01-01'). Is that correct?

@jbrockmendel
Copy link
Member

Slightly different kind of wrong when using a DatetimeIndex instead of Series:

>>> dti = pd.date_range('20130101',periods=3)
>>> dti - np.datetime64('2013-01-01')
DatetimeIndex(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[ns]', freq=None)

@jorisvandenbossche
Copy link
Member

the open issue is s-np.datetime64('2013-01-01'). Is that correct?

Yes, it is:

In [2]: import datetime
   ...: s = pd.Series(pd.date_range('20130101',periods=3))
   ...: 

In [3]: s-pd.Timestamp('20130101')
Out[3]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

In [5]: s-datetime.datetime(2013,1,1)
Out[5]: 
0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

In [6]: s-np.datetime64('2013-01-01')
Out[6]: 
0   15705 days 23:59:59.999984
1   15706 days 23:59:59.999984
2   15707 days 23:59:59.999984
dtype: timedelta64[ns]

@jbrockmendel
Copy link
Member

I'm about to submit a PR that fixes this. It works for Series but still fails for DataFrame.

@sunnychase
Copy link

sunnychase commented Oct 4, 2023

Hello, I am having issues with the datetime64, can anyone help me here?

C:\Users\v_ichase\Desktop\Ultimate GEX>PY "C:\Users\v_ichase\Desktop\Ultimate GEX\Gex.py"
Traceback (most recent call last):
File "C:\Users\v_ichase\Desktop\Ultimate GEX\Gex.py", line 78, in
dfAgg = df.groupby(['StrikePrice']).sum()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py", line 3053, in sum
result = self._agg_general(
^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py", line 1835, in _agg_general
result = self._cython_agg_general(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py", line 1926, in _cython_agg_general
new_mgr = data.grouped_reduce(array_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\internals\managers.py", line 1431, in grouped_reduce
applied = blk.apply(func)
^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\internals\blocks.py", line 366, in apply
result = func(self.values, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py", line 1902, in array_func
result = self.grouper._cython_operation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\ops.py", line 815, in _cython_operation
return cy_op.cython_operation(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\ops.py", line 525, in cython_operation
return values._groupby_op(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\v_ichase\AppData\Roaming\Python\Python311\site-packages\pandas\core\arrays\datetimelike.py", line 1637, in _groupby_op
raise TypeError(f"datetime64 type does not support {how} operations")
TypeError: datetime64 type does not support sum operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants