Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: rename with_time_zone to convert_time_zone #6767

Closed
MarcoGorelli opened this issue Feb 9, 2023 · 8 comments · Fixed by #6768
Closed

API: rename with_time_zone to convert_time_zone #6767

MarcoGorelli opened this issue Feb 9, 2023 · 8 comments · Fixed by #6768
Labels
enhancement New feature or an improvement of an existing feature

Comments

@MarcoGorelli
Copy link
Collaborator

Problem description

The current methods for working with time zones are:

  • with_time_zone: convert from one time zone to another
  • cast_time_zone: set/unset/change time zone

The second one is clear enough, but the first one seems to regularly cause confusion.

As the next release will already see some changes to the API (deprecating tz_localize), why not go all the way and rename with_time_zone to convert_time_zone?

@MarcoGorelli MarcoGorelli added the enhancement New feature or an improvement of an existing feature label Feb 9, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 10, 2023

I think cast_time_zone should probably be renamed replace_time_zone (in addition to changing with_time_zone to convert_time_zone), because even staring right at cast_ / convert_ I'm still not completely sure I know what the difference is between them as they are almost synonyms (for instance: if I cast an int to float, I have also converted it 😅).

The word "replace" should help to better disambiguate the two methods (and would also be similar to the equivalent python datetime method).

We would then have the following two methods, which (at least to me!) seem a lot less interchangeable or synonymous, naming-wise:

  • convert_time_zone:
    convert from current timezone to new timezone
  • replace_time_zone:
    replace current timezone (possibly None) with new timezone (also possibly None)

In use:

from datetime import datetime
import polars as pl

df = pl.DataFrame(
    {
        "utc": pl.date_range(
            datetime(2020, 3, 1, 12, 30),
            datetime(2020, 7, 1, 23, 00),
            "1mo 6h 15s",
            time_zone="UTC",
        ),
    }
)
df.with_columns(
    pl.col('utc').dt.replace_time_zone("America/New_York").alias('as_est'),
    pl.col('utc').dt.replace_time_zone(None).alias('as_naive'),
    pl.col('utc').dt.convert_time_zone("Asia/Tokyo").alias('to_jst'),
)
# ┌─────────────────────────┬────────────────────────────────┬─────────────────────┬──────────────────────────┐
# │ utc                     ┆ as_est                         ┆ as_naive            ┆ to_jst                   │
# │ ---                     ┆ ---                            ┆ ---                 ┆ ---                      │
# │ datetime[μs, UTC]       ┆ datetime[μs, America/New_York] ┆ datetime[μs]        ┆ datetime[μs, Asia/Tokyo] │
# ╞═════════════════════════╪════════════════════════════════╪═════════════════════╪══════════════════════════╡
# │ 2020-03-01 12:30:00 UTC ┆ 2020-03-01 12:30:00 EST        ┆ 2020-03-01 12:30:00 ┆ 2020-03-01 21:30:00 JST  │
# │ 2020-04-01 18:30:15 UTC ┆ 2020-04-01 18:30:15 EDT        ┆ 2020-04-01 18:30:15 ┆ 2020-04-02 03:30:15 JST  │
# │ 2020-05-02 00:30:30 UTC ┆ 2020-05-02 00:30:30 EDT        ┆ 2020-05-02 00:30:30 ┆ 2020-05-02 09:30:30 JST  │
# │ 2020-06-02 06:30:45 UTC ┆ 2020-06-02 06:30:45 EDT        ┆ 2020-06-02 06:30:45 ┆ 2020-06-02 15:30:45 JST  │
# └─────────────────────────┴────────────────────────────────┴─────────────────────┴──────────────────────────┘

Seeing "replace" / "convert", I'd feel more confident that I can reason about the expected outcome without having to go back to the docstrings. 👍

@ritchie46
Copy link
Member

I agree with @alexander-beedie on this one. I had the same question, how do cast and convert differ.

If we go for replace/ convert I also think we should specifically state in the docs that the old time zone is ignored.

@stinodego
Copy link
Member

stinodego commented Feb 10, 2023

I don't think replace is correct, because it implies that there is already a timezone present. But it also works when there is no timezone?

set_time_zone makes more sense to me.

But then again, time zones as a whole don't make any sense to me, so take this comment with a grain of salt 😄

I am reading the current docs and doc examples and honestly I can't make out what is happening for either method. I think we should update the doc examples when doing the deprecation.

@MarcoGorelli
Copy link
Collaborator Author

that's a good point, although even set_time_zone might not be 100% accurate, as it works to unset a timezone too. I think replace might be accurate enough

I am reading the current docs and doc examples and honestly I can't make out what is happening for either method.

😆 does this help pola-rs/polars-book#234 ?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 10, 2023

I don't think replace is correct, because it implies that there is already a timezone present. But it also works when there is no timezone?

Yup; and it would be completely consistent with Python behaviour, where a naive datetime can be considered equivalent to a datetime with tzinfo=None, and you can replace that None with a real timezone (this consistency is one of the reasons I thought replace_ might be good, though I also like set_ - both would work for me ;)

from datetime import datetime
from zoneinfo import ZoneInfo

d = datetime( 2021,1,1,12,30,45 )
dtz = d.replace( tzinfo=ZoneInfo('Asia/Tokyo') )

d
# datetime.datetime(2021, 1, 1, 12, 30, 45)
dtz
# datetime.datetime(2021, 1, 1, 12, 30, 45, tzinfo=ZoneInfo('Asia/Tokyo'))

@stinodego
Copy link
Member

it would be completely consistent with Python behaviour, where a naive datetime can be considered equivalent to a datetime with tzinfo=None, and you can replace that None with a real timezone (this consistency is one of the reasons I thought replace_ might be good

Sounds good enough to me, thanks for the explanation!

@deanm0000
Copy link
Collaborator

I like with_time_zone as it is consistent R's lubridate::with_tz. If that's not compelling enough then how about doing as_timezone to be consistent with python datatime's native astimezone. Of course there's also a reason why with_time_zone was chosen in the first place.

@sorhawell
Copy link
Contributor

I also like the renaming, because cast/with meant the opposite for x_time_zone than for x_time_unit which was confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
6 participants