Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): auto-infer fmt for tz-aware date strings #7405

Merged
merged 1 commit into from
Apr 9, 2023

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Mar 7, 2023

closes #7198

check this out

In [14]: data = [
    ...:     "2021-03-27T00:00:00+01:00",
    ...:     "2021-03-28T00:00:00+01:00",
    ...:     "2021-03-29T00:00:00+02:00",
    ...:     "2021-03-30T00:00:00+02:00",
    ...: ]

In [15]: pl.Series(data).str.strptime(pl.Datetime, utc=True).dt.convert_time_zone('Europe/Brussels')
Out[15]: 
shape: (4,)
Series: '' [datetime[μs, Europe/Brussels]]
[
        2021-03-27 00:00:00 CET
        2021-03-28 00:00:00 CET
        2021-03-29 00:00:00 CEST
        2021-03-30 00:00:00 CEST
]

limitations:

  • the only format I've added in this PR is '%+', but the list can be expanded later
  • this'll take some extra work for it to work in read_csv, but...it should make it possible!

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 7, 2023
@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 7, 2023 09:45
@MarcoGorelli
Copy link
Collaborator Author

One month later, finally marking this as 'ready-for-review' 🎉 @ritchie46 @alexander-beedie in case you have time / want to take a look

@ritchie46
Copy link
Member

Thanks @MarcoGorelli. Could you help me a bit with the review. What have you changed? What are expected performance implications and which snippets are the gist of the changes?

@MarcoGorelli MarcoGorelli changed the title feat(rust, python): auto-infer tz-aware date strings feat(rust, python): auto-infer fmt for tz-aware date strings Apr 8, 2023
@MarcoGorelli
Copy link
Collaborator Author

Sure!

The main thing is that this PR expands Pattern to

 pub enum Pattern {
     DateDMY,
     DateYMD,
     DatetimeYMD,
     DatetimeDMY,
+    DatetimeYMDZ,
 }

to include YMD where an offset is present.

To keep track of the offset, I've also changed return type of infer_pattern_single from Option<Pattern> to Option<PatternWithOffset>.
The latter is just like the former, but also keeps track of the offset:

pub struct PatternWithOffset {
    pub pattern: Pattern,
    pub offset: Option<FixedOffset>,
}

To deal with the offset, I've introduced transform_tzaware_datetime_*s (similar to transform_datetime_*s) which also needs to take a utc argument and an offset argument - so I've added them to transform_datetie_*s too, but prefixed them with underscores as they're unused.

The rest is mostly just updating based on the above.

Just like with parsing with an explicit fmt, multiple offsets are only allowed if utc=True.

I don't think there should be any performance implications. A timing test I've done doesn't show any difference in performance for dates.str.strptime(pl.Datetime) compared with master

@ritchie46
Copy link
Member

Thanks for the explanation. This makes the review so much easier. 👍

The code looks good and the rationale of expanding the tz aware functionality make sense, so merging in. Thanks a lot!

@ritchie46 ritchie46 merged commit f3b6c14 into pola-rs:master Apr 9, 2023
14 checks passed
alicja-januszkiewicz pushed a commit to alicja-januszkiewicz/polars that referenced this pull request Apr 17, 2023
chitralverma pushed a commit to chitralverma/polars that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto-infer tz-aware date strings
2 participants