Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python!): use time zone from dtype to overwrite output time zone when initialising Series #10689

Merged

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Aug 23, 2023

closes #10662

I think this should probably follow what currently happens for dt.strptime

So, the following two should return the same result:

print(pl.Series([dtm], dtype=dtype))
print(pl.Series([dtm_string]).str.strptime(dtype))

Unfortunately, to make behaviour consistent, this does involve a breaking change. My fault for not having noticed the inconsistency earlier really - anyway, here it is:

before:

In [9]: pl.DataFrame({'a': [datetime(2020, 1, 1)]}, schema={'a': pl.Datetime('us', 'Asia/Tokyo')})
Out[9]:
shape: (1, 1)
┌──────────────────────────┐
│ a                        │
│ ---                      │
│ datetime[μs, Asia/Tokyo] │
╞══════════════════════════╡
│ 2020-01-01 09:00:00 JST  │
└──────────────────────────┘

In [10]: pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime('us', 'Asia/Tokyo'))
Out[10]:
shape: (1,)
Series: '' [datetime[μs]]
[
        2020-01-01 00:00:00
]

In [11]: pl.Series(['2020-01-01']).str.strptime(pl.Datetime('us', 'Asia/Tokyo'))
Out[11]:
shape: (1,)
Series: '' [datetime[μs, Asia/Tokyo]]
[
        2020-01-01 00:00:00 JST
]

Now:

In [1]: pl.DataFrame({'a': [datetime(2020, 1, 1)]}, schema={'a': pl.Datetime('us', 'Asia/Tokyo')})
Out[1]: 
shape: (1, 1)
┌──────────────────────────┐
│ a                        │
│ ---                      │
│ datetime[μs, Asia/Tokyo] │
╞══════════════════════════╡
│ 2020-01-01 00:00:00 JST  │
└──────────────────────────┘

In [2]: pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime('us', 'Asia/Tokyo'))
Out[2]: 
shape: (1,)
Series: '' [datetime[μs, Asia/Tokyo]]
[
        2020-01-01 00:00:00 JST
]

In [3]: pl.Series(['2020-01-01']).str.strptime(pl.Datetime('us', 'Asia/Tokyo'))
Out[3]: 
shape: (1,)
Series: '' [datetime[μs, Asia/Tokyo]]
[
        2020-01-01 00:00:00 JST
]

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Aug 23, 2023
@@ -197,7 +197,7 @@ def test_write_delta(df: pl.DataFrame, tmp_path: Path) -> None:
pl.Series(
"date_ns",
[datetime(2010, 1, 1, 0, 0)],
dtype=pl.Datetime(time_unit="ns", time_zone="ETC"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'ETC' isn't a valid time zone, but this never errored because before it was being ignored

@MarcoGorelli MarcoGorelli force-pushed the use-tz-from-dtype-when-initting-series branch 4 times, most recently from 09b1c6b to 231829f Compare August 23, 2023 15:58
@MarcoGorelli MarcoGorelli changed the title fix(python): use time zone from dtype to overwrite output time zone when initialising Series fix(python!): use time zone from dtype to overwrite output time zone when initialising Series Aug 23, 2023
@github-actions github-actions bot added the breaking python Change that breaks backwards compatibility for the Python package label Aug 23, 2023
@ritchie46
Copy link
Member

Yes, we definitely need this one in 0.19!

@ritchie46
Copy link
Member

@MarcoGorelli does this need to remain in draft? Seems like the right way to go. If a user provides a datetype, we must respect it.

@MarcoGorelli
Copy link
Collaborator Author

Thanks - what's slightly bothering me is if someone mixes tz-naive and tz-aware values, e.g.:

In [5]: pl.Series([datetime(2020, 1, 1), datetime(2020, 1, 1, tzinfo=ZoneInfo('US/Pacific'))], dtype=pl.Datetime('us', 'Asia/Kathmandu'))
Out[5]: 
shape: (2,)
Series: '' [datetime[μs, Asia/Kathmandu]]
[
        2020-01-01 00:00:00 +0545
        2020-01-01 08:00:00 +0545
]

am I'm not sure that the second value would be expected to most users. It's essentially done "convert to UTC, and then replace the time zone with what the user passed".
Though maybe this is rare enough, and users working with mixed tz-naive and tz-aware objects are kind of playing with fire anyway

@MarcoGorelli MarcoGorelli marked this pull request as ready for review August 25, 2023 09:44
@ritchie46
Copy link
Member

Though maybe this is rare enough, and users working with mixed tz-naive and tz-aware objects are kind of playing with fire anyway

Using mixed timezone datetime objects AND setting a time zone on a dtype. Might be even juggling with fire. ;)

@ritchie46 ritchie46 merged commit 03550be into pola-rs:main Aug 25, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking python Change that breaks backwards compatibility for the Python package fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series constructor ignores time zone of Datetime dtype
2 participants