Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.datetime64 values cast to datetime.date iff dtype == M8 #6529

Closed
dhirschfeld opened this issue Mar 3, 2014 · 6 comments · Fixed by #6530
Closed

np.datetime64 values cast to datetime.date iff dtype == M8 #6529

dhirschfeld opened this issue Mar 3, 2014 · 6 comments · Fixed by #6530
Assignees
Labels
API Design Datetime Datetime data dtype
Milestone

Comments

@dhirschfeld
Copy link
Contributor

This seems a little inconsistent to me. IMHO, it would be good to be able to rely on the the fact that if you specify a numpy array of any datetime64 type the result will still be a datetime64 type and not cast to a different type dependant on the exact dtype passed in.

In [1]: dates = pd.date_range('01-Jan-2015', '01-Dec-2015', freq='M')
   ...: values1 = dates.view(np.ndarray).astype('M8[D]')
   ...: values2 = dates.view(np.ndarray).astype('datetime64[ns]')
   ...: series1 = pd.TimeSeries(values1, dates)
   ...: series2 = pd.TimeSeries(values2, dates)
   ...: 

In [2]: series1
Out[2]: 
2015-01-31    2015-01-31
2015-02-28    2015-02-28
2015-03-31    2015-03-31
2015-04-30    2015-04-30
2015-05-31    2015-05-31
2015-06-30    2015-06-30
2015-07-31    2015-07-31
2015-08-31    2015-08-31
2015-09-30    2015-09-30
2015-10-31    2015-10-31
2015-11-30    2015-11-30
Freq: M, dtype: object

In [3]: series2
Out[3]: 
2015-01-31   2015-01-31
2015-02-28   2015-02-28
2015-03-31   2015-03-31
2015-04-30   2015-04-30
2015-05-31   2015-05-31
2015-06-30   2015-06-30
2015-07-31   2015-07-31
2015-08-31   2015-08-31
2015-09-30   2015-09-30
2015-10-31   2015-10-31
2015-11-30   2015-11-30
Freq: M, dtype: datetime64[ns]

In [4]: series1.values
Out[4]: array([datetime.date(2015, 1, 31), datetime.date(2015, 2, 28), datetime.date(2015, 3, 31), datetime.date(2015, 4, 30), datetime.date(2015, 5, 31), datetime.date(2015, 6, 30), datetime.date(2015, 7, 31), datetime.date(2015, 8, 31), datetime.date(2015, 9, 30), datetime.date(2015, 10, 31), datetime.date(2015, 11, 30)], dtype=object)

In [5]: series2.values
Out[5]: array(['2015-01-31T00:00:00.000000000+0000', '2015-02-28T00:00:00.000000000+0000', '2015-03-31T01:00:00.000000000+0100', '2015-04-30T01:00:00.000000000+0100', '2015-05-31T01:00:00.000000000+0100', '2015-06-30T01:00:00.000000000+0100', '2015-07-31T01:00:00.000000000+0100', '2015-08-31T01:00:00.000000000+0100', '2015-09-30T01:00:00.000000000+0100', '2015-10-31T00:00:00.000000000+0000', '2015-11-30T00:00:00.000000000+0000'], dtype='datetime64[ns]')

In [6]: pd.__version__
Out[6]: '0.13.1-339-g6c3755b'
@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

The problem is that some people actually want to keep an actual datetime.date value in a Series, even though its pretty much useless and inefficient.

note that series1.dtype is object, while series2 is converted to datetime64[ns].

Their is no way to 'tell' what should be done here.

IMHO we should convert these, and we could, but that would break backward usage. thoughts?

@dhirschfeld
Copy link
Contributor Author

The issue isn't keeping datetime.date instances about - there weren't any in the first place until the TimeSeries class cast the datetime64[D] instances to datetime.date instances. So it's actually changing the type of the input you're supplying to it, to, as you've noted, a pretty useless and non-performant type.

As for backward compatibility, I picked this up from a breakage in our unittests - It looks like it is a regression between 13.0 and 13.1, though 12.0 seems to have the most consistent behaviour as it doesn't change the dtype of the array at all - but that could equally be a np17 vs np18 issue.

pandas 0.12.0

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.12.0'
>>> dates = pd.date_range('01-Jan-2015', '01-Dec-2015', freq='M')
>>> values1 = dates.view(np.ndarray).astype('M8[D]')
>>> series1 = pd.TimeSeries(values1, dates)
>>> series1
2015-01-31   2015-01-31 00:00:00
2015-02-28   2015-02-28 00:00:00
2015-03-31   2015-03-31 00:00:00
2015-04-30   2015-04-30 00:00:00
2015-05-31   2015-05-31 00:00:00
2015-06-30   2015-06-30 00:00:00
2015-07-31   2015-07-31 00:00:00
2015-08-31   2015-08-31 00:00:00
2015-09-30   2015-09-30 00:00:00
2015-10-31   2015-10-31 00:00:00
2015-11-30   2015-11-30 00:00:00
Freq: M, dtype: datetime64[D]
>>>

pandas 0.13.0

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.13.0'
>>> dates = pd.date_range('01-Jan-2015', '01-Dec-2015', freq='M')
>>> values1 = dates.view(np.ndarray).astype('M8[D]')
>>> series1 = pd.TimeSeries(values1, dates)
>>> series1
2015-01-31   2015-01-31 00:00:00
2015-02-28   2015-02-28 00:00:00
2015-03-31   2015-03-31 00:00:00
2015-04-30   2015-04-30 00:00:00
2015-05-31   2015-05-31 00:00:00
2015-06-30   2015-06-30 00:00:00
2015-07-31   2015-07-31 00:00:00
2015-08-31   2015-08-31 00:00:00
2015-09-30   2015-09-30 00:00:00
2015-10-31   2015-10-31 00:00:00
2015-11-30   2015-11-30 00:00:00
Freq: M, dtype: datetime64[ns]
>>>

pandas 0.13.1

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.13.1'
>>> dates = pd.date_range('01-Jan-2015', '01-Dec-2015', freq='M')
>>> values1 = dates.view(np.ndarray).astype('M8[D]')
>>> series1 = pd.TimeSeries(values1, dates)
>>> series1
2015-01-31    2015-01-31
2015-02-28    2015-02-28
2015-03-31    2015-03-31
2015-04-30    2015-04-30
2015-05-31    2015-05-31
2015-06-30    2015-06-30
2015-07-31    2015-07-31
2015-08-31    2015-08-31
2015-09-30    2015-09-30
2015-10-31    2015-10-31
2015-11-30    2015-11-30
Freq: M, dtype: object
>>>

@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

what numpy are these under? 1.7.1/2? 1.8?

0.12.0 is wrong; no numpy datetime dtypes can live except for datetime64[ns] they all must be converted upon input.

I can prob put back to 0.13.0 which seems to make the most sense (if you are actually passing datetime.date then I would expect those to stay).

@jreback jreback added this to the 0.14.0 milestone Mar 3, 2014
@jreback jreback self-assigned this Mar 3, 2014
@dhirschfeld
Copy link
Contributor Author

Agreed on the fact that datetime.date can stay if they're explicitly passed in. I don't mind the dates being cast to datetime64[ns] - so long as they're still numpy datetime64 instances.
FWIW the (simplified) test which works under 0.12.0 and 0.13.0 but breaks with 0.13.1 was:

def test_datetime64_values_arent_cast():
    dates = pd.date_range('01-Jan-2015', '01-Dec-2015', freq='M').view(np.ndarray).astype('M8[D]')
    series = pd.Series(dates)
    assert np.all(series.values == dates)

NB: I'm using anaconda so pandas 0.12.0 was tested against numpy 1.7.1 and 0.13+ against numpy 1.8

@jreback
Copy link
Contributor

jreback commented Mar 3, 2014

ok...#6530 should bring this back in line!

thanks for the report!

@dhirschfeld
Copy link
Contributor Author

Thanks for the super quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype
Projects
None yet
2 participants