Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): support DataFrame init with Datetime dtypes that specify a timezone #5174

Merged
merged 1 commit into from
Oct 12, 2022
Merged

feat(python): support DataFrame init with Datetime dtypes that specify a timezone #5174

merged 1 commit into from
Oct 12, 2022

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Oct 12, 2022

Can now initialize DataFrame with timezone-aware DataType; previously had to post-convert per-column.

Example

from datetime import datetime
import polars as pl

df = pl.DataFrame(
    data={
        "d1": [datetime(2022,10,12,12,30)],
        "d2": [datetime(2022,10,12,12,30)],
    },
    columns=[
        ("d1", pl.Datetime(time_zone="America/New_York")),
        ("d2", pl.Datetime(time_zone="Asia/Tokyo")),
    ],
)

Before:

# NotImplementedError: Conversion of polars data type 
#  datetime[μs, America/New_York] to Python type not implemented.

After:

# ┌────────────────────────────────┬──────────────────────────┐
# │ d1                             ┆ d2                       │
# │ ---                            ┆ ---                      │
# │ datetime[μs, America/New_York] ┆ datetime[μs, Asia/Tokyo] │
# ╞════════════════════════════════╪══════════════════════════╡
# │ 2022-10-12 12:30:00 EDT        ┆ 2022-10-12 12:30:00 JST  │
# └────────────────────────────────┴──────────────────────────┘

df.row(0)

# (datetime(2022,10,12, 8,30,tzinfo=ZoneInfo(key='America/New_York')),
#  datetime(2022,10,12,21,30,tzinfo=ZoneInfo(key='Asia/Tokyo')))

Misc:

  • Also improves/simplifies some type datatype lookups/conversions.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Oct 12, 2022
@ritchie46 ritchie46 merged commit 67cfeac into pola-rs:master Oct 12, 2022
@alexander-beedie alexander-beedie deleted the frame-init-with-timezones branch October 12, 2022 12:48
@ritchie46
Copy link
Member

ritchie46 commented Oct 12, 2022

Hmm.. This has some strange locale issues. :/

On my laptop (in the Netherlands) it fails with:

   def test_init_with_timezone() -> None:
        for tu in DTYPE_TEMPORAL_UNITS | frozenset([None]):
            df = pl.DataFrame(
                data={
                    "d1": [datetime(2022, 10, 12, 12, 30)],
                    "d2": [datetime(2022, 10, 12, 12, 30)],
                },
                columns=[
                    ("d1", pl.Datetime(tu, "America/New_York")),  # type: ignore[arg-type]
                    ("d2", pl.Datetime(tu, "Asia/Tokyo")),  # type: ignore[arg-type]
                ],
            )
            # note: setting timezone doesn't change the underlying/physical value...
            assert (df["d1"].to_physical() == df["d2"].to_physical()).all()
    
            # ...but (as expected) it _does_ change the interpretation of that value
>           assert df.rows() == [
                (
                    datetime(2022, 10, 12, 8, 30, tzinfo=ZoneInfo("America/New_York")),
                    datetime(2022, 10, 12, 21, 30, tzinfo=ZoneInfo("Asia/Tokyo")),
                )
            ]
E           AssertionError: assert [(datetime.da...sia/Tokyo')))] == [(datetime.da...sia/Tokyo')))]
E             At index 0 diff: (datetime.datetime(2022, 10, 12, 7, 30, tzinfo=zoneinfo.ZoneInfo(key='America/New_York')), datetime.datetime(2022, 10, 12, 20, 30, tzinfo=zoneinfo.ZoneInfo(key='Asia/Tokyo'))) != (datetime.datetime(2022, 10, 12, 8, 30, tzinfo=zoneinfo.ZoneInfo(key='America/New_York')), datetime.datetime(2022, 10, 12, 21, 30, tzinfo=zoneinfo.ZoneInfo(key='Asia/Tokyo')))
E             Use -v to get more diff

tests/unit/test_df.py:2414: AssertionError

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Oct 12, 2022

@ritchie46 argh... must be some kind of implicit local/UTC shenanigans on load into pyarrow from naive datetime objects - what's your current UTC offset? (+1 or +2?) I'll try and harden the test ASAP.

@ritchie46
Copy link
Member

@ritchie46 argh... must be some kind of implicit local/UTC shenanigans on load into pyarrow from naive datetime objects - what's your current UTC offset? (+1 or +2?) I'll try and harden the test ASAP.

I am not entirely sure, but I think UTC +2
https://www.timeanddate.com/time/zone/netherlands/amsterdam

@alexander-beedie
Copy link
Collaborator Author

Cheers; think I have a probable fix... one sec.

@alexander-beedie
Copy link
Collaborator Author

Fingers crossed: #5177 🙏

zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants