Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame (and Series?) constructor ignores request for naive datetime64 dtype when passed datetime64 objects with timezone #25843

Closed
tswast opened this issue Mar 22, 2019 · 4 comments · Fixed by #26167
Labels
DataFrame DataFrame data structure Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Milestone

Comments

@tswast
Copy link
Contributor

tswast commented Mar 22, 2019

I can reproduces this in pandas (development version, 0.24.0+, but not 0.23.4) with this minimal example:

import datetime

import pandas as pd
import pytz


dates = [
    datetime.datetime(2019, 1, 1, 12, tzinfo=pytz.utc),
    datetime.datetime(2018, 4, 1, 17, 13, tzinfo=pytz.utc),
]

df = pd.DataFrame({"dates": dates})
print(df.dtypes)

df2 = pd.DataFrame({"dates": dates}, dtype="datetime64[ns]")
print(df2.dtypes)

It prints:

# df
dates    datetime64[ns, UTC]  <-- I expect this.
dtype: object

# df2
dates    datetime64[ns, UTC]  <-- I didn't expect this.
dtype: object

There do appear to be a lot of changes to datetime64 behavior in the changelog for 0.24.0 http://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html so maybe this is intended behavior? Maybe the distinction between datetime64[ns, UTC] and datetime64[ns] when you pass in an explicit dtype shouldn't actually be a meaningful difference?

@tswast
Copy link
Contributor Author

tswast commented Mar 23, 2019

I think this is related to #23990 (though if I'm doing something that's deprecated, I didn't see a deprecation warning).

I tried using pd.DatetimeTZDtype(unit='ns') to force a timezone-naive dtype, but the timezone is required. The only way I could figure to use datetime64[ns] is to pass in a string as I'm doing. Is it just not possible to choose to treat a timezone-aware datetime.datetime as a naive datetime64[ns] anymore?

@mroeschke
Copy link
Member

I guess in theory the dtype argument in the DataFrame constructor should enforce the final dtype (in this case strip the timezone). Though for this use case, it's more idiomatic to df2.dates.dt.tz_localize(None) after construction.

@mroeschke mroeschke added Datetime Datetime data dtype Timezones Timezone data dtype DataFrame DataFrame data structure labels Mar 26, 2019
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Mar 30, 2019
@tswast
Copy link
Contributor Author

tswast commented Apr 3, 2019

Interesting that tz_localize is more idiomatic. It seemed much more natural to me to set the dtype once at construction time. I suspect it's more idiomatic because setting the dtype to a tz-aware dtype didn't really work before 0.24. I had a bear of a time in googleapis/python-bigquery-pandas#263 trying to get pandas versions before 0.24 to act the same as 0.24 regarding datetime64[ns], but it seems earlier versions don't like getting DatetimeTZDtype as a dtype.

@mroeschke
Copy link
Member

mroeschke commented Apr 3, 2019

Exactly, the reason I'd say more idiomatic is because the timezone conversion process is a lot more robust and explicit with the tz_localize/tz_convert methods.

I don't think as much care has been given to ensure that correct timezone conversion process occurs when using dtypes in the constructor. The one exception is .astype() but I still advocate for tz_localize/tz_convert

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#tz-aware-dtypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Projects
None yet
4 participants