Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix parsing of ISO8601 durations #37159

Merged
merged 6 commits into from
Oct 17, 2020

Conversation

mgmarino
Copy link
Contributor

@mgmarino mgmarino commented Oct 16, 2020

This PR fixes the following behavior:

  • Adds "W" as a valid weeks designator
  • Fixes parsing of durations with empty periods (e.g. "PT10M")
  • Fixes partial parsing of durations (e.g. "P1DT1H")

Fixes #29773.
Fixes #36204.

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@mgmarino mgmarino marked this pull request as ready for review October 16, 2020 11:29
@jreback jreback added Bug Timedelta Timedelta data type labels Oct 16, 2020
@jreback jreback added this to the 1.2 milestone Oct 16, 2020
doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
("PT1S", Timedelta(seconds=1)),
("PT0S", Timedelta(seconds=0)),
("P1WT0S", Timedelta(days=7, seconds=0)),
("P1D", Timedelta(days=1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we handle negatives of these? (IIRC that's on one of the issues)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the behavior of negativity, and the question is how to handle it. One of the only references I can find regarding the expected behavior is this comment, which also seems to indicate it's not very well defined in the ISO 8601 standard. How would you feel about pulling that out into a separate issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls create a separate issue. only thing for now is do we raise if a negative is passed in? (i think we should unless / until we decide on how to handle it), though isn't the result just negative of the Timedelta itself which is well defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

No, nothing is raised when it is negative, and the behavior is, admittedly, quite odd, i.e.

"P-6DT0H50M3.010010012S" parses as Timedelta( days=-6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12, )
, and the negative is only allowed right after the P descriptor. A negative in any other position will raise an error. As far as I can tell, there are two possibilities to go here:

  • only support an overall "-" e.g. "-P6DT1H" = Timedelta('-7 days +23:00:00') and/or
  • support positive/negative on each, e.g. "P7DT-1H3M" = Timedelta('6 days 23:03:00')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue now here: #37172. I would prefer to change/update the negativity behavior in a separate PR if that's ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's great. well scoped PRs are good. one thing at a time. and thanks for working on this.

- ISO 8601 doesn't seem to express that explicit limitations here are
  necessary.
@jreback jreback merged commit 009ffa8 into pandas-dev:master Oct 17, 2020
@jreback
Copy link
Contributor

jreback commented Oct 17, 2020

thanks @mgmarino very nice!

@mgmarino mgmarino deleted the fix-iso8601-parsing branch October 17, 2020 06:34
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Incorrect parsing of ISO 8601 durations Timedelta not parsing ISO-8601 strings properly
2 participants