Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API (BUG?): DatetimeArray.astype vs to_numpy behaviour differences #44484

Open
jorisvandenbossche opened this issue Nov 16, 2021 · 1 comment
Labels
API - Consistency Internal Consistency of API/Behavior Astype Bug Timeseries

Comments

@jorisvandenbossche
Copy link
Member

Currently, the to_numpy() implementation on the datetimelike arrays is quite simple (basically np.array(EA, dtype), which calls __array__ which returns/casts the underlying numpy array). This means that it basically follows numpy's casting rules, while for astype we have a whole bunch of custom conversion rules.

Some examples:

In [1]: arr = pd.date_range("2012-01-01", periods=3).array

In [2]: arr
Out[2]: 
<DatetimeArray>
['2012-01-01 00:00:00', '2012-01-02 00:00:00', '2012-01-03 00:00:00']
Length: 3, dtype: datetime64[ns]

# conversion to string (-> different formatting)

In [3]: arr.astype(str)
Out[3]: array(['2012-01-01', '2012-01-02', '2012-01-03'], dtype=object)

In [4]: arr.to_numpy(str)
Out[4]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='<U48')

# conversion to float or timedelta (error vs working cast)

In [5]: arr.astype(float)
...
TypeError: Cannot cast DatetimeArray to dtype float64

In [6]: arr.to_numpy(float)
Out[6]: array([1.3253760e+18, 1.3254624e+18, 1.3255488e+18])

In [7]: arr.astype('timedelta64[ns]')
...
TypeError: Cannot cast DatetimeArray to dtype timedelta64[ns]

In [8]: arr.to_numpy('timedelta64[ns]')
Out[8]: 
array([1325376000000000000, 1325462400000000000, 1325548800000000000],
      dtype='timedelta64[ns]')

We might say: to_numpy() is for converting to a numpy array, so in that case it is fine to follow numpy's casting rules. But it would still be good to explicitly decide on this.
On the other hand it's also strange to have two different sets of rules (and the fact that it uses numpy's rules here is somewhat an implementation detail).

@mroeschke mroeschke added API - Consistency Internal Consistency of API/Behavior Bug labels Dec 27, 2021
@jbrockmendel
Copy link
Member

Agreed, we should just dispatch whenever possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Astype Bug Timeseries
Projects
None yet
Development

No branches or pull requests

3 participants