Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame (and Series?) constructor ignores request for naive datetime64 dtype when passed datetime64 objects with timezone #25843

Closed
tswast opened this issue Mar 22, 2019 · 4 comments

Comments

@tswast
Copy link
Contributor

commented Mar 22, 2019

I can reproduces this in pandas (development version, 0.24.0+, but not 0.23.4) with this minimal example:

import datetime

import pandas as pd
import pytz


dates = [
    datetime.datetime(2019, 1, 1, 12, tzinfo=pytz.utc),
    datetime.datetime(2018, 4, 1, 17, 13, tzinfo=pytz.utc),
]

df = pd.DataFrame({"dates": dates})
print(df.dtypes)

df2 = pd.DataFrame({"dates": dates}, dtype="datetime64[ns]")
print(df2.dtypes)

It prints:

# df
dates    datetime64[ns, UTC]  <-- I expect this.
dtype: object

# df2
dates    datetime64[ns, UTC]  <-- I didn't expect this.
dtype: object

There do appear to be a lot of changes to datetime64 behavior in the changelog for 0.24.0 http://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html so maybe this is intended behavior? Maybe the distinction between datetime64[ns, UTC] and datetime64[ns] when you pass in an explicit dtype shouldn't actually be a meaningful difference?

@tswast

This comment has been minimized.

Copy link
Contributor Author

commented Mar 23, 2019

I think this is related to #23990 (though if I'm doing something that's deprecated, I didn't see a deprecation warning).

I tried using pd.DatetimeTZDtype(unit='ns') to force a timezone-naive dtype, but the timezone is required. The only way I could figure to use datetime64[ns] is to pass in a string as I'm doing. Is it just not possible to choose to treat a timezone-aware datetime.datetime as a naive datetime64[ns] anymore?

@mroeschke

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

I guess in theory the dtype argument in the DataFrame constructor should enforce the final dtype (in this case strip the timezone). Though for this use case, it's more idiomatic to df2.dates.dt.tz_localize(None) after construction.

@tswast

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

Interesting that tz_localize is more idiomatic. It seemed much more natural to me to set the dtype once at construction time. I suspect it's more idiomatic because setting the dtype to a tz-aware dtype didn't really work before 0.24. I had a bear of a time in pydata/pandas-gbq#263 trying to get pandas versions before 0.24 to act the same as 0.24 regarding datetime64[ns], but it seems earlier versions don't like getting DatetimeTZDtype as a dtype.

@mroeschke

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

Exactly, the reason I'd say more idiomatic is because the timezone conversion process is a lot more robust and explicit with the tz_localize/tz_convert methods.

I don't think as much care has been given to ensure that correct timezone conversion process occurs when using dtypes in the constructor. The one exception is .astype() but I still advocate for tz_localize/tz_convert

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#tz-aware-dtypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.