Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Parquet: Reading timezones leads to incorrect datetimes #861

Closed
ritchie46 opened this issue Feb 23, 2022 · 0 comments · Fixed by #862
Closed

Parquet: Reading timezones leads to incorrect datetimes #861

ritchie46 opened this issue Feb 23, 2022 · 0 comments · Fixed by #862
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@ritchie46
Copy link
Collaborator

If we create a pandas dataframe in the 2022 ranges and then read that with arrow2 we are back in the 70's 🕺.

df = pd.DataFrame(data = {"Timestamp": pd.date_range("2022-01-01T00:00+00:00", "2022-01-01T10:00+00:00", freq="H")})

f = io.BytesIO()
df.to_parquet(f)
f.seek(0)
print(pl.read_parquet(f))
shape: (11, 1)
┌─────────────────────────┐
│ Timestamp               │
│ ---                     │
│ datetime[ns]            │
╞═════════════════════════╡
│ 1970-01-19 23:49:55.200 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:49:58.800 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:02.400 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:06     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...                     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:20.400 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:24     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:27.600 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1970-01-19 23:50:31.200 │
└─────────────────────────┘
Conversion of timezone aware to naive datetimes. TZ information may be lost.

I put a dbg! around the arrow chunks we get to ensure I did not make a conversion error in polars:

[
    Timestamp(Nanosecond, Some("UTC"))[1970-01-19 23:49:55.200 +00:00, 1970-01-19 23:49:58.800 +00:00, 1970-01-19 23:50:02.400 +00:00, 1970-01-19 23:50:06 +00:00, 1970-01-19 23:50:09.600 +00:00, 1970-01-19 23:50:13.200 +00:00, 1970-01-19 23:50:16.800 +00:00, 1970-01-19 23:50:20.400 +00:00, 1970-01-19 23:50:24 +00:00, 1970-01-19 23:50:27.600 +00:00, 1970-01-19 23:50:31.200 +00:00],
]
@jorgecarleitao jorgecarleitao added bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog labels Feb 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants