Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime: Expand scope of fromisoformat to include all of ISO 8601. #80010

Closed
rdb mannequin opened this issue Jan 25, 2019 · 17 comments
Closed

datetime: Expand scope of fromisoformat to include all of ISO 8601. #80010

rdb mannequin opened this issue Jan 25, 2019 · 17 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rdb
Copy link
Mannequin

rdb mannequin commented Jan 25, 2019

BPO 35829
Nosy @brettcannon, @abalkin, @jwilk, @agronholm, @dlenski, @rdb, @ammaraskar, @mehaase, @pganssle, @hongweipeng, @godlygeek

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-01-25.19:36:47.536>
labels = ['type-feature', 'library', '3.11']
title = 'datetime: parse "Z" timezone suffix in fromisoformat()'
updated_at = <Date 2022-02-02.17:32:27.477>
user = 'https://github.com/rdb'

bugs.python.org fields:

activity = <Date 2022-02-02.17:32:27.477>
actor = 'p-ganssle'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-01-25.19:36:47.536>
creator = 'rdb'
dependencies = []
files = []
hgrepos = []
issue_num = 35829
keywords = []
message_count = 14.0
messages = ['334365', '334368', '334370', '334372', '334378', '334430', '349163', '350643', '367533', '367695', '385309', '405759', '411713', '412385']
nosy_count = 11.0
nosy_names = ['brett.cannon', 'belopolsky', 'jwilk', 'alex.gronholm', 'dlenski', 'rdb', 'ammar2', 'mehaase', 'p-ganssle', 'hongweipeng', 'godlygeek']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue35829'
versions = ['Python 3.11']

@rdb
Copy link
Mannequin Author

rdb mannequin commented Jan 25, 2019

The fromisoformat() function added in 3.7 is a very welcome addition. But one quite noticeable absence was the inability to parse Z instead of +00:00 as the timezone suffix.

Its absence is particularly noticeable given how ubiquitous use of Z is in ISO 8601 timestamps on the web; it is also part of the RFC 3339 subset. In particular, JavaScript produces it in its canonical ISO 8601 format and is therefore quite common in JSON APIs; this would be the only piece missing to parse ISO dates produced by JavaScript correctly.

I realise that the function was not intended to be able to parse *all* timestamps. But given the triviality of this change, the ubiquity of this particular formatting feature, and the fact that this change is designed in particular for operability with the widely-used JavaScript date format, I don't think this is a slippery slope, and I would personally see no harm in accepting a 'Z' instead of a timezone.

I am happy to follow up with a patch for this, but would first like confirmation that there is any chance that such a change would be accepted. Thanks for your consideration!

@rdb rdb mannequin added 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 25, 2019
@pganssle
Copy link
Member

pganssle commented Jan 25, 2019

You can see the discussion in bpo-15873 for the full rationale of why "Z" was omitted - to quote from https://bugs.python.org/issue15873#msg307607 :

We can have further discussion later about what exactly should be supported in Python 3.8,
but even in the pre-release discussions I'm already seeing pushback about some of the more
unusual 8601 formats, and it's a lot easier to explain (in documentation) that fromisoformat()
is intended to be the inverse of isoformat() than it is to explain which variations of ISO 8601
are and are not supported (fractional minutes? if you're following the standard, the separator has
to be a T, so what other variations of the standard are allowed?)

With the current implementation, the contract of the function is very simple to explain: datetime.fromisoformat() is the inverse operation of datetime.isoformat(), which is to say that every valid input to datetime.fromisoformat() is a possible output of datetime.isoformat(), and every possible output of datetime.isoformat() is a valid input to datetime.fromisoformat().

With that as the background - fromisoformat() was designed to be a conservative API because scope is a one-way ratchet, and it's better to under-commit than over-commit. We do have the option going forward of widening the scope of the function in a backwards-compatible way. The main problem I see is that I think we should maintain the property that it should be dead simple to explain what a function does, and having to enumerate edge cases is a code smell. So "it is the inverse operation of fromisoformat(), but it also supports specifying using Z for UTC" fails that test in my opinion.

I see a few rational choices here:

  1. Supports the full ISO 8601 datetime spec and all outputs from datetime.isoformat() (these inputs mostly but not completely overlap). We would then just have to decide on a simple policy for how to deal with the optional portions of the spec.

  2. Support only the rfc3339 standard + the outputs of datetime.isoformat(), with the option to switch to #‍1 later.

  3. Add the ability for datetime.isoformat() to output 'Z' instead of 00:00, which would allow us to support it as an input and also keep the scope of datetime.fromisoformat unchanged.

  4. Add a separate function (either a classmethod or a bare function) for parsing exactly the ISO 8601 standard, maybe parse_iso8601, so both parse_iso8601 and fromisoformat have a clean, rational explanation for what they do.

  5. Leave the current scope alone and don't add anything.

5a. Leave the current scope alone and point people in the direction of dateutil.parser.isoparse in the documentation.

@rdb
Copy link
Mannequin Author

rdb mannequin commented Jan 25, 2019

I'm a fan of "be lenient in what you accept" but I can see your point in not causing confusion about what this method is meant to be used for.

Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit. From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.

So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for fromisoformat() to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".

Does this seem acceptable? We can always expand further to allow any ISO 8601 timestamp later, but RFC3339 would already make this function immensely more useful. I really think that parsing RFC3339 dates is a feature Python needs to have in the standard library given their ubiquity on the web.

Alternatively I am happy to consider adding something like a utc=True flag to isoformat(), but I would personally feel reluctant to add any features that I can't think of a solid use case for.

@pganssle
Copy link
Member

I can see your point in not causing confusion about what this method is meant to be used for.

In this case, making it easy to explain what it does is less important than making the scope and contract of the function clear so that we don't have to argue about what should and should not be supported. Having a narrowly-scoped function is also useful for other reasons:

  1. The API is clearer - there are no options to configure on this function, if you start supporting a bunch of features, people will inevitably want to turn some of them *off*, because they only want to accept a subset of the valid inputs.

  2. The interface to test is clear - we can exhaustively test the entire contract of the function if desired.

  3. Development will not get stalled in decision-making about which features to support or how they might interfere with one another.

From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.

There are other differences, for example a comma can be used in place of a dot as the delimiter for fractional seconds. Looking at the grammar in the RFC, it seems that it might also support datetimes like 2018-W03-D4, but I don't see any mention of that in the text.

So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for fromisoformat() to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".

No, because the isoformat outputs are not a subset of RFC 3339. For example, 2015-01-01T00:00:00 is not a valid RFC 3339 datetime string, nor is 2015-01-01Q00:00:00, but they are valid outputs of datetime.isoformat(). datetime.fromisoformat() also supports fractional seconds on time zone offsets, which is not part of ISO 8601.

Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit.

Is there a reason you can't use dateutil.parser.isoparse? The contract of that function is to parse any valid ISO8601 datetime, and fromisoformat is adapted from it.

@rdb
Copy link
Mannequin Author

rdb mannequin commented Jan 25, 2019

> From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.

There are other differences, for example a comma can be used in place of a dot as the delimiter for fractional seconds. Looking at the grammar in the RFC, it seems that it might also support datetimes like 2018-W03-D4, but I don't see any mention of that in the text.

I think you're looking at the appendix, which collects the ABNF from
ISO 8601, but this is not part of RFC3339. The grammar for RFC3339 is
purposefully very restrictive to make parsing it simple. The comma
for delimiter is in though, good catch; also a trivial change.

So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for fromisoformat() to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".

No, because the isoformat outputs are not a subset of RFC 3339. For example, 2015-01-01T00:00:00 is not a valid RFC 3339 datetime string, nor is 2015-01-01Q00:00:00, but they are valid outputs of datetime.isoformat(). datetime.fromisoformat() also supports fractional seconds on time zone offsets, which is not part of ISO 8601.

Fair enough (though I'd say "isoformat()" is a misnomer then). I was
just going by your option #2. We would change the wording to imply
"supports RFC 3339 or anything produced by isoformat()"

Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit.

Is there a reason you can't use dateutil.parser.isoparse? The contract of that function is to parse any valid ISO8601 datetime, and fromisoformat is adapted from it.

It seems a little odd to need to pull in a third-party library for
this; it seems far more tempting for me to just do
"datetime.fromisoformat(str.replace('Z', '+00:00'))" instead since I
know my dates are produced by a JSON API.

I don't intend to get argumentative about whether supporting RFC3339
belongs in the standard library; that is clearly a decision for the
Python maintainers, and I'm not sure what criteria they follow on
this. I just find it odd to point people to a third-party library for
parsing a simple but ubiquitous date standard when there are many
modules in the standard library for far more specific use cases.

FWIW, I do think that fromisoformat() is the right function to provide
RFC3339 support. I don't think users would benefit from having to
choose between several different functions that parse similar but
subtly different date formats; this seems likely to cause confusion.

Thanks for your consideration!

@pganssle
Copy link
Member

It seems a little odd to need to pull in a third-party library for this; it seems far more tempting for me to just do "datetime.fromisoformat(str.replace('Z', '+00:00'))" instead since I know my dates are produced by a JSON API.

Yes, this is also a viable solution. Generally speaking, third party libraries are less onerous these days than they have been in the past, and there are many things that are delegated to third party libraries because staying out of the standard library gives more flexibility in release cycles and the APIs don't need to be quite as stable.

FWIW, I do think that fromisoformat() is the right function to provide RFC3339 support. I don't think
users would benefit from having to choose between several different functions that parse similar but
subtly different date formats; this seems likely to cause confusion.

This is in fact one of the reasons to proceed with caution here, because ISO 8601, RFC 3339 and datetime.isoformat() are three slightly different and in some senses incompatible datetime serialization formats. If I had the choice, I would probably either not have named isoformat the way it is named, or I would have stuck to the standard, but what's done is done. As it is now, all the "fromX" alternate constructors are simply the inverse operation of the corresponding "X" method. If we make fromisoformat accept the RFC 3339 subset of ISO 8601, people will find it confusing that it doesn't support even some of the most common other ISO 8601 formats, considering it's called fromisoformat not fromrfcformat.

To give you an idea of why this sort of thing is a problem, it's that with each minor change, expanding the scope a little sounds reasonable, but along with that comes maintenance burdens. People start to rely on the specific behavior of the function, and eventually you get into a position where someone asks for a very reasonable expansion of the scope that is incompatible with the way people are already using the function. This leads you to either stop developing the function at some arbitrary point or to start tacking on a configuration API to resolve these incompatibilities.

If instead we design the function from the beginning with a very clear scope, we can also design the configuration API (and the default values) from the beginning as well. I definitely believe there is a place for a function that parses at least the timestamp portions of the ISO 8601 spec in CPython. I think I would prefer it to be a separate function from fromisoformat. I also think that it's worth letting it marinate in dateutil a bit, so that we can get a sense of what works and what doesn't work as a configuration API so that it's at least *easier* for people to select which of the subtly different datetime formats they're intending to parse.

@mehaase
Copy link
Mannequin

mehaase mannequin commented Aug 7, 2019

Defining isoformat() and fromisoformat() as functional inverses is misguided. Indeed, it's not even true:

Python 3.7.2 (default, Dec 28 2018, 14:27:11)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = '2019-08-07t10:44:00+00:00'
>>> assert s == datetime.isoformat(datetime.fromisoformat(s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

I agree with rdb that not parsing "Z" is inconvenient and counter intuitive. We have the same use case: parsing ISO strings created by JavaScript (or created by systems that interoperate with JavaScript). We have also memorized the same .replace("Z", "+00:00") hack, but this feels like a missing battery in the stdlib.

As Paul points out the legacy of isoformat() complicates the situation. A new pair of functions for RFC-3339 sounds reasonable to me, either rfcformat()/fromrfcformat() or more boldly inetformat()/frominetformat(). The contracts for these functions are simple: fromrfcformat() parses RFC-3339 strings, and rfcformat() produces an RFC-3339 string. The docs for the ISO functions should be updated to point towards the RFC-compliant functions.

I'd be willing to work on a PR, but a change of this size probably needs to through python-ideas first?

@pganssle
Copy link
Member

Defining isoformat() and fromisoformat() as functional inverses is misguided. Indeed, it's not even true:

isoformat() is not the inverse of fromisoformat(), that doesn't work because there are multiple strings that isoformat() can create from any given datetime. There is, however, only one datetime that is represented by any given datetime (assuming you consider truncation to create a new datetime), so it is fine for fromisoformat() to be the inverse of isoformat().

I have explained the reason that was chosen for the contract in several places (including in this thread), so I won't bother to repeat it. I think from a practical point of view we should eventually grow more generalized ISO 8601 parsing functionality, and the main question is what the API will look like. In dateutil.parser.isoparse, I still haven't figured out a good way to do feature flags.

I'd be willing to work on a PR, but a change of this size probably needs to through python-ideas first?

I don't think it *needs* to go to python-ideas, though it's probably a good idea to try and work out the optimal API in a post on the discourse ( discuss.python.org ), and the "ideas" category seems like the right one there. Please CC me (pganssle) if you propose modifications to the fromisoformat API on the discourse.

@agronholm
Copy link
Mannequin

agronholm mannequin commented Apr 28, 2020

Has this effort gone forwards lately, or has there been any discussion elsewhere? I implemented support for "Z" in .fromisoformat() before finding this issue. Even after reading the discussion I still don't quite understand why it's such a big problem.

@ammaraskar
Copy link
Member

There's been some additional discussion on https://discuss.python.org/t/parse-z-timezone-suffix-in-datetime/2220

@dlenski
Copy link
Mannequin

dlenski mannequin commented Jan 20, 2021

Like many others here, I've run into this issue because I'm trying to parse timestamps from JSON.

(Specifically, I'm trying to parse timestamps from JSON serialization of Java POJOs and/or Kotlin data classes, as serialized by the Jackson serialization library for JVM languages, in conjunction with JavaTimeModule.
https://fasterxml.github.io/jackson-modules-java8/javadoc/datetime/2.9/com/fasterxml/jackson/datatype/jsr310/JavaTimeModule.html)

In order to "be lenient in what I accept" (adhering to the robustness principal), I need to add a special case for deserialization of strings ending with 'Z'. This gets pretty tricky and pretty subtle quickly.

Here is my Python 3.7+ code path (the strptime-based code path for earlier versions is much, much uglier).

    from numbers import Number
    from datetime import datetime, timezone
    def number_or_iso8601_to_dt(ts, t=datetime):
        if isinstance(ts, Number):
            return datetime.utcfromtimestamp(ts).replace(tzinfo=timezone.utc)
        elif ts.endswith('Z'):
            # This is not strictly correct, since it would accept a string with
            # two timezone specifications (e.g. ending with +01:00Z) and 
            # silently pass that erroneous representation:
            #
            # return datetime.fromisoformat(ts[:-1]).replace(tzinfo=timezone.utc)
            #
            # This version is better:
            d = datetime.fromisoformat(ts[:-1])
            if d.tzinfo is not None:
                raise ValueError(f"time data '{ts}' contains multiple timezone suffixes")
            return d.replace(tzinfo=timezone.utc)
        else:
            return datetime.fromisoformat(ts)

I don't really understand why .fromisoformat() must be *strictly* the inverse of .isoformat(). As @mehaase points out, the transformation isn't strictly reversible as is.

There are other functions where the Python standard library has special-cased options for extremely common use cases. For example, str.split(None), which is certainly not the inverse of the non-existent None.join().

This feels to me like a case where the standard library should simply just accept an extremely-common real-world variant in the interests of interoperability.

I would also be in favor of @p-ganssle's proposal (3), wherein datetime.isoformat would also output the 'Z' suffix for the UTC timezone.

@brettcannon
Copy link
Member

I also support the idea of adding an allow_Z or some equivalent keyword parameter to isoformat() and then allowing for Z in fromisoformat().

@godlygeek
Copy link
Mannequin

godlygeek mannequin commented Jan 26, 2022

I agree with Brett. Adding allow_z (or perhaps compact or use_utc_designator if we're bikeshedding) as an optional keyword only argument to .isoformat() would allow us to keep the explanation that what .fromisoformat() can parse is exactly what .isoformat() can produce, while giving a feature to .fromisoformat() that many have asked for and that would have minimal overhead in terms of code complexity or runtime performance.

Would you accept a PR that implements that, Paul?

@pganssle
Copy link
Member

pganssle commented Feb 2, 2022

I don't think it's necessary to add a feature to isoformat() just for the purpose of being able to add the corresponding parser, particularly when the plan is to implement a much broader ISO 8601 parser for Python 3.11 (I've done most of the implementation in C already, I can share the progress I've made, particularly if someone else wants to pick up the baton there before I get back to it).

That said, I think it's useful for isoformat() to be able to output UTC times as "Z", so we may as well do that as part of 3.11 anyway. I think that's a separate issue to discuss, so I've created bpo-46614 to hammer out the details.

@pganssle pganssle added 3.11 only security fixes and removed 3.8 (EOL) end of life labels Feb 2, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@pganssle pganssle changed the title datetime: parse "Z" timezone suffix in fromisoformat() datetime: Expand scope of fromisoformat to include all of ISO 8601. May 3, 2022
@pganssle
Copy link
Member

pganssle commented May 3, 2022

For the historical record, originally this was slated to include an expansion to parse all of ISO 8601, but after a discussion with @godlygeek, I was convinced of the wisdom of deliberately excluding the formats with non-second fractional components. No one uses these obscure formats, and supporting them would be an invitation for people to introduce bugs into their code.

I don't think anyone wants this:

>>> datetime.fromisoformat("2020-01-01T01:05.211")
datetime.datetime(2020, 1, 1, 1, 5, 12, 660000)

pganssle added a commit that referenced this issue May 6, 2022
This expands `fromisoformat` to cover most of the common uses of ISO 8601. We may expand the scope more in the future.
felixxm added a commit to felixxm/django that referenced this issue May 9, 2022
date/datetime/time.fromisoformat() support any valid ISO 8601 format
in Python 3.11+, see python/cpython#80010.
felixxm added a commit to django/django that referenced this issue May 9, 2022
date/datetime/time.fromisoformat() support any valid ISO 8601 format
in Python 3.11+, see python/cpython#80010.
@hauntsaninja
Copy link
Contributor

Does anything remain to be done here?

terjekv added a commit to hubuum/hubuum that referenced this issue Apr 14, 2023
Migrate to dateutil.parser.isoparse instead of datetime.datetime.fromisoformat because the latter only supportes Z for UTC in python 3.11.
See python/cpython#80010
@pganssle
Copy link
Member

I think this is done, closing.

Akeboshiwind added a commit to ake-forks/xtdb that referenced this issue Apr 9, 2024
Better parsing in datetime.fromisoformat was only added in 3.11:
python/cpython#80010

Another option is to add python-dateutil as a dependency, not tested
that though
Akeboshiwind added a commit to xtdb/xtdb that referenced this issue Apr 10, 2024
* feat(python): reduce minimum supported version to 3.10

This is the version that google colab runs

* fix(python): type error

This syntax was only supported in 3.11:
https://peps.python.org/pep-0646/

* fix(python): type error

* fix(python): manually replace `Z` in system time

Better parsing in datetime.fromisoformat was only added in 3.11:
python/cpython#80010

Another option is to add python-dateutil as a dependency, not tested
that though

* fix(python): use python-dateutil instead of string hack
Pablito9422 pushed a commit to Pablito9422/python that referenced this issue May 8, 2024
date/datetime/time.fromisoformat() support any valid ISO 8601 format
in Python 3.11+, see python/cpython#80010.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Archived in project
Development

No branches or pull requests

4 participants