Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime: add ability to parse RFC 3339 dates and times #60077

Closed
nagle mannequin opened this issue Sep 6, 2012 · 90 comments
Closed

datetime: add ability to parse RFC 3339 dates and times #60077

nagle mannequin opened this issue Sep 6, 2012 · 90 comments
Assignees
Labels
3.7 stdlib type-feature

Comments

@nagle
Copy link
Mannequin

@nagle nagle mannequin commented Sep 6, 2012

BPO 15873
Nosy @warsaw, @jcea, @cben, @ncoghlan, @abalkin, @vstinner, @jwilk, @mcepl, @merwok, @bitdancer, @karlcow, @elprans, @flying-sheep, @mihaic, @Fak3, @berkerpeksag, @vadmium, @boxed, @jstasiak, @offby1, @deronnax, @pbryan, @pganssle, @sirex, @jaitaiwan
PRs
  • #4699
  • #4841
  • #5559
  • #5559
  • #5939
  • Files
  • issue15873-proto.diff
  • test-cases.py: Test cases
  • fromisoformat.patch
  • fromisoformat2.patch: slightly improved version, better use of timedelta
  • fromisoformat3.patch
  • fromisoformat4.patch
  • simplerfromisoformat.patch: simpler, stricter version of fromisoformat
  • fromisoformat_new.patch
  • fromisoformat_new2.patch
  • fromisoformat_new3.patch
  • fromisoformat_singledispatch.patch
  • fromisoformat_regexinclasses.patch: regex as class attributes
  • fromisoformat_strptimesingledispatch.patch: regex looked up by class type in _strptime.py
  • fromisoformat_regexinclasses2.patch: regex as class attributes, 2nd version
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/abalkin'
    closed_at = <Date 2018-03-01.18:50:17.237>
    created_at = <Date 2012-09-06.21:08:49.605>
    labels = ['3.7', 'type-feature', 'library']
    title = 'datetime: add ability to parse RFC 3339 dates and times'
    updated_at = <Date 2018-03-01.18:50:17.236>
    user = 'https://bugs.python.org/nagle'

    bugs.python.org fields:

    activity = <Date 2018-03-01.18:50:17.236>
    actor = 'belopolsky'
    assignee = 'belopolsky'
    closed = True
    closed_date = <Date 2018-03-01.18:50:17.237>
    closer = 'belopolsky'
    components = ['Library (Lib)']
    creation = <Date 2012-09-06.21:08:49.605>
    creator = 'nagle'
    dependencies = []
    files = ['27141', '27165', '41922', '41923', '41926', '41927', '41934', '41935', '41940', '41945', '41951', '44015', '44016', '44019']
    hgrepos = []
    issue_num = 15873
    keywords = ['patch']
    message_count = 90.0
    messages = ['169941', '169952', '169966', '169968', '169970', '170098', '170104', '170109', '170112', '170114', '170116', '170180', '170181', '174339', '183672', '183743', '183809', '183921', '183931', '221829', '221830', '221831', '221903', '260099', '260100', '260150', '260266', '260276', '260280', '260282', '260292', '260293', '260294', '260295', '260298', '260303', '260309', '260318', '260337', '260342', '260343', '260344', '260345', '260347', '260350', '260356', '260382', '260420', '260426', '260427', '260440', '260441', '260442', '260445', '260449', '260989', '260990', '260991', '263867', '269714', '269722', '270529', '270828', '270829', '270831', '270899', '272021', '272026', '273609', '291822', '291831', '304950', '307603', '307604', '307605', '307606', '307607', '307610', '307616', '308214', '308505', '308507', '308510', '308569', '308637', '308851', '309168', '309175', '311703', '313105']
    nosy_count = 36.0
    nosy_names = ['barry', 'jcea', 'cben', 'roysmith', 'ncoghlan', 'belopolsky', 'nagle', 'vstinner', 'jwilk', 'mcepl', 'eric.araujo', 'Arfrever', 'r.david.murray', 'davydov', 'cvrebert', 'karlcow', 'SilentGhost', 'Elvis.Pranskevichus', 'perey', 'flying sheep', 'mihaic', 'aymeric.augustin', 'Roman.Evstifeev', 'berker.peksag', 'martin.panter', 'piotr.dobrogost', 'kirpit', 'Anders.Hovm\xc3\xb6ller', 'jstasiak', 'Eric.Hanchrow', 'deronnax', 'pbryan', 'p-ganssle', 'sirex', 'larsonreever', 'jaitaiwan']
    pr_nums = ['4699', '4841', '5559', '5559', '5939']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15873'
    versions = ['Python 3.7']

    @nagle
    Copy link
    Mannequin Author

    @nagle nagle mannequin commented Sep 6, 2012

    The datetime module has support for output to a string of dates and times in ISO 8601 format ("2012-09-09T18:00:00-07:00"), with the object method "isoformat([sep])". But there's no support for parsing such strings. A string to datetime class method should be provided, one capable of parsing at least the RFC 3339 subset of ISO 8601.

    The problem is parsing time zone information correctly.
    The allowed formats for time zone are
    empty - no TZ, date/time is "naive" in the datetime sense
    Z - zero, or Zulu time, i.e. UTC.
    [+-]nn.nn - offset from UTC

    "strptime" does not understand timezone offsets. The "datetime" documentation suggests that the "z" format directive handles time zone info, but that's not actually implemented for input.

    Pypi has four modules for parsing ISO 8601 dates. Each has least one major
    problem in time zone handling:

    iso8601 0.1.4
    Abandonware. Mishandles time zone when time zone is "Z" and
    the default time zone is specified.
    iso8601.py 0.1dev
    Always returns a "naive" datetime object, even if zone specified.
    iso8601plus 0.1.6
    Fork of abandonware version above. Same bug.
    zc.iso8601 0.2.0
    Zope version. Imports the pytz module with the full Olsen time zone
    database, but doesn't actually use that database.

    Thus, nothing in Pypi provides a good alternative.

    It would be appropriate to handle this in the datetime module. One small, correct, tested function would be better than the existing five bad alternatives.

    @nagle nagle mannequin added stdlib type-feature labels Sep 6, 2012
    @AlexanderBelopolsky
    Copy link
    Mannequin

    @AlexanderBelopolsky AlexanderBelopolsky mannequin commented Sep 6, 2012

    %z format is supported, but it cannot accept colon in TZ offset. It can parse offsets like -0600 just fine. What OP is looking for is the GNU date %:z format which datetime does not support.

    For ISO 8601 compliance, however I think we need a way to specify a parser that will accept any valid 8601 format: with T or space separator and with or without : in time and timezone and with or without dashes in date.

    I would very much like such promiscuous parser to be implemented in datetime.__new__. So that we can create datetime objects from strings the way we do it with numbers.

    @nagle
    Copy link
    Mannequin Author

    @nagle nagle mannequin commented Sep 7, 2012

    Re: "%z format is supported".

    That's platform-specific; the actual parsing is delegated to the C library. It's not in Python 2.7 / Win32:

    ValueError: 'z' is a bad directive in format '%Y-%m-%dT%H:%M:%S%z'

    It really shouldn't be platform-specific; the underlying platform is irrelevant to this task. That's more of a documentation error; the features not common to all supported Python platforms should not be mentioned in the documentation.

    Re: "I would very much like such promiscuous parser to be implemented in datetime.__new__. "

    For string input, it's probably better to do this conversion in a specific class-level function. Full ISO 8601 dates/times generally come from computer-generated data via a file or API. If invalid text shows up, it should be detected as an error, not be heuristically interpreted as a date. There's already "fromtimestamp" and "fromordinal",
    and "isoformat" as an instance method, so "fromisoformat" seems reasonable.

    I'd also suggest providing a standard subclass of tzinfo in datetime for fixed offsets. That's needed to express the time zone information in an ISO 8601 date. The new "fromisoformat" would convert an ISO 8601 date/time would be convertible to a time-zone "aware" datetime object. If converted back to an ISO 8601 string with .isoformat(), the round trip should preserve the original data, including time zone offset.

    (Several more implementations of this conversion have turned up. In addition to the four already mentioned, there was one in xml.util, and one in feedparser. There are probably more yet to be found.)

    @AlexanderBelopolsky
    Copy link
    Mannequin

    @AlexanderBelopolsky AlexanderBelopolsky mannequin commented Sep 7, 2012

    On Thu, Sep 6, 2012 at 9:51 PM, John Nagle <report@bugs.python.org> wrote:

    It's not in Python 2.7 / Win32.

    Python 2.x series is closed and cannot accept new features. Both %z
    and fixed offset tzinfo subclass are implemented in 3.2.

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Sep 7, 2012

    I am attaching a quick python only prototype for the proposed feature. My goal is to make date/time objects behave like numeric types for which constructors accept strings produced by str(). Since str() format is ISO 8601, it is natural to accept ISO 8601 formats in constructors.

    @roysmith
    Copy link
    Mannequin

    @roysmith roysmith mannequin commented Sep 9, 2012

    We need to define the scope of what input strings will be accepted. ISO-8601 defines a lot of stuff which we may not wish to accept.

    Do we want to accept both basic format (YYYYMMDD) and extended format (YYYY-MM-DD)?

    Do we want to accept things like "1985-W15-5", which is (if I understand this correctly(), the 5th day of the 15th week of 1985 [section 4.1.4.2].

    Do we want to accept [section 4.2.2.4], "23:20,8", which is 23 hours, 20 minutes, 8 tenths of a minute.

    I suspect most people who have been following the recent thread (https://groups.google.com/d/topic/comp.lang.python/Q2w4R89Nq1w/discussion) would say none of the above are needed. All that's needed is if you have an existing datetime object, d1, you can do:

    s = str(d1)
    d2 = datetime.datetime(s)
    assert d1 == d2

    for all values of d1.

    But, let's at least agree on that. Or, in the alternative, agree on something else. Then we know what we're shooting for.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    @AlexanderBelopolsky AlexanderBelopolsky mannequin commented Sep 9, 2012

    On Sep 9, 2012, at 8:15 AM, Roy Smith <report@bugs.python.org> wrote:

    We need to define the scope of what input strings will be accepted.

    Since it is easier to widen the domain of acceptable arguments than to narrow it in the future, I would say let's start by accepting str(x) only where x is date, time, timezone or datetime. I would leave out timedelta for now because it's str(x) does not resemble ISO at all.

    Either that or full ISO 8601. Anything in between is just too hard to explain.

    @roysmith
    Copy link
    Mannequin

    @roysmith roysmith mannequin commented Sep 9, 2012

    I see I mis-stated my example. When I wrote:

    s = str(d1)
    d2 = datetime.datetime(s)
    assert d1 == d2

    what I really meant was:

    s = d1.isoformat()
    d2 = datetime.datetime(s)
    assert d1 == d2

    But, now I realize that while that is certainly an absolute lower bound, it's almost certainly not sufficient. The most common use case I see on a daily basis is parsing strings that look like "2012-09-07T23:59:59+00:00". This is also John Nagle's original use case from the cited mailing list thread:

    I want to parse standard ISO date/time strings such as
    2012-09-09T18:00:00-07:00

    Datetime.isoformat() returns something that matches the beginning of that, but doesn't have the time zone offset. And it's the offset that makes strptime() not usable as a soluation, because "%z" isn't portable.

    If we don't satisfy the "2012-09-07T23:59:59+00:00" case, then we won't have really done anything useful.

    @nagle
    Copy link
    Mannequin Author

    @nagle nagle mannequin commented Sep 9, 2012

    For what parts of ISO 8601 to accept, there's a standard: RFC3339, "Date and Time on the Internet: Timestamps". See section 5.6:

    date-fullyear = 4DIGIT
    date-month = 2DIGIT ; 01-12
    date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on
    ; month/year
    time-hour = 2DIGIT ; 00-23
    time-minute = 2DIGIT ; 00-59
    time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second
    ; rules
    time-secfrac = "." 1*DIGIT
    time-numoffset = ("+" / "-") time-hour ":" time-minute
    time-offset = "Z" / time-numoffset

    partial-time = time-hour ":" time-minute ":" time-second
    [time-secfrac]
    full-date = date-fullyear "-" date-month "-" date-mday
    full-time = partial-time time-offset

    date-time = full-date "T" full-time

    NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
    syntax may alternatively be lower case "t" or "z" respectively.

      ISO 8601 defines date and time separated by "T".
      Applications using this syntax may choose, for the sake of
      readability, to specify a full-date and full-time separated by
      (say) a space character.
    

    That's straightforward, and can be expressed as a regular expression.

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Sep 9, 2012

    I realize that while that is certainly an absolute lower bound,
    it's almost certainly not sufficient. The most common use case
    I see on a daily basis is parsing strings that look like
    "2012-09-07T23:59:59+00:00".

    This is exactly what isoformat() of an aware datetime looks like:

    >>> datetime.now(timezone.utc).isoformat()
    '2012-09-09T16:09:46.165886+00:00'

    str() is the same up to T replaced by space:

    >>> print(datetime.now(timezone.utc))
    2012-09-09 15:19:12.567692+00:00

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Sep 9, 2012

    For what parts of ISO 8601 to accept, there's a standard: RFC3339

    This is almost indistinguishable from the idea of accepting .isoformat() and str() results. From what I see the only difference is that 't' is accepted for date/time separator and 'z' is accepted as a timezone.

    Let's start with this.

    As an ultimate solution, I would like to see something like codec registry so that we can do things like datetime(.., format='rfc3339') or date(.., format='gnu') for GNU parse_datetime. I think this will look more pythonic than strptime(). Of course, strptime format can also be accepted as the value for the format keyword.

    @abalkin abalkin self-assigned this Sep 10, 2012
    @roysmith
    Copy link
    Mannequin

    @roysmith roysmith mannequin commented Sep 10, 2012

    I've started collecting some test cases. I'll keep adding to the collection. I'm going to start trolling ISO 8601:2004(E) for more. Let me know if there are other sources I should be considering.

    @roysmith
    Copy link
    Mannequin

    @roysmith roysmith mannequin commented Sep 10, 2012

    Ooops, clicked the wrong button.

    @flying-sheep
    Copy link
    Mannequin

    @flying-sheep flying-sheep mannequin commented Oct 31, 2012

    there is a module that parses those strings pretty nicely, it’s called pyiso8601: http://code.google.com/p/pyiso8601/

    in the context of writing a better plistlib, i also needed the capability to parse those strings, and decided not to use the sucky incomplete implementation of plistlib, but the one mentioned above.

    i py3ified it, eliminating quite some code, and the result is pretty terse, check it out: https://github.com/flying-sheep/plist/blob/master/iso8601.py

    note that that implementation returns utc-datetimes for timezoneless strings, instead of naive ones. (l.30)

    @boxed
    Copy link
    Mannequin

    @boxed boxed mannequin commented Mar 7, 2013

    I've written a parser for ISO 8601: https://github.com/boxed/iso8601

    Some basic tests are included and it supports most of the standard. Haven't gotten around to the more obscure parts like durations and intervals, but those are trivial to add...

    @merwok
    Copy link
    Member

    @merwok merwok commented Mar 8, 2013

    Are you offering the module for inclusion in the stdlib?

    @boxed
    Copy link
    Mannequin

    @boxed boxed mannequin commented Mar 9, 2013

    Éric Araujo: absolutely. Although I think my code can be improved (speed wise, elegance, etc) since I just wrote it quickly a weekend :)

    @merwok
    Copy link
    Member

    @merwok merwok commented Mar 11, 2013

    John listed four modules with issues in the first message, and now we have proposals for two more modules. Could you work together to make a unified patch?

    Alexander, do you think there is a need to check python-ideas or python-dev before working on this?

    (I changed the title to clarify scope: ISO 8601 is huge and not easily accessible whereas W3CDTF/RFC 3339 is narrower in scope and freely accessible.)

    @merwok merwok changed the title "datetime" cannot parse ISO 8601 dates and times datetime: add ability to parse RFC 3339 dates and times Mar 11, 2013
    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Mar 11, 2013

    Éric> do you think there is a need to check python-ideas or python-dev before working on this?

    Yes, I think this is python-ideas material. IMHO, what should be added to datetime module in 3.4 is ability to construct date/time objects from their str() representation:

    assert time(str(t)) == t
    assert date(str(d)) == d
    assert datetime(str(dt)) == dt

    I am not sure the same is needed for timedelta, but this can be discussed.

    Implementation of any external to python standard should be wetted at PyPI first. There may be a reason why there is no rfc3339.py module on PyPI.

    @karlcow
    Copy link
    Mannequin

    @karlcow karlcow mannequin commented Jun 29, 2014

    I had the issue today. I needed to parse a date with the following format.

    2014-04-04T23:59:00+09:00
    

    and could not with strptime.

    I see a discussion in March 2014 http://code.activestate.com/lists/python-ideas/26883/ but no followup.

    For references:
    http://www.w3.org/TR/NOTE-datetime
    http://tools.ietf.org/html/rfc3339

    @karlcow
    Copy link
    Mannequin

    @karlcow karlcow mannequin commented Jun 29, 2014

    On closer inspection, Anders Hovmöller proposal doesn't work.
    https://github.com/boxed/iso8601

    At least for the microseconds part.

    In http://tools.ietf.org/html/rfc3339#section-5.6, the microsecond part is defined as:

    time-secfrac = "." 1*DIGIT

    In http://www.w3.org/TR/NOTE-datetime, same thing:
    s = one or more digits representing a decimal fraction of a second

    Anders considers it to be only six digits. It can be more or it can be less. :)

    Will comment on github too.

    @karlcow
    Copy link
    Mannequin

    @karlcow karlcow mannequin commented Jun 29, 2014

    @karlcow
    Copy link
    Mannequin

    @karlcow karlcow mannequin commented Jun 29, 2014

    After inspections, the best library for parsing RFC3339 style date is definitely:
    https://github.com/tonyg/python-rfc3339/

    Main code at
    https://github.com/tonyg/python-rfc3339/blob/master/rfc3339.py

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Feb 11, 2016

    So, shall we include it ? Otherwise, py8601 (https://bitbucket.org/micktwomey/pyiso8601/) looks pretty popular and well maintained (various committers, started in 2012, last commit in 2016).
    I think we should hurry, that's a great shame it has been while Python is able to generate a 8601 datetime but not parsing it back.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 11, 2016

    I'm working on the OpenStack project and iso8601 is heavily used.

    Otherwise, py8601 (https://bitbucket.org/micktwomey/pyiso8601/) looks pretty popular and well maintained (various committers, started in 2012, last commit in 2016).

    I don't think that we should add the iso8601 module to the stdlib, but merge iso8601 "features" into the datetime module.

    The iso8601 module supports Python 2.7 and so has to implement its own timezone classes. The datetime module now has datetime.timezone since Python 3.2 for fixed timezone.

    The iso8601 module provides functions. I would prefer datetime.datetime *methods*.

    Would you mind to try to implement that? It would be kind to contact iso8601 author before.

    The important part is also unit tests.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 12, 2016

    See also bpo-12006 for ISO 8601: "The datetime.strftime() and date.strftime() methods now support ISO 8601 date directives %G, %u and %V. (Contributed by Ashley Anderson in bpo-12006.)".

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Jul 19, 2016

    because it limits itself to only support the RFC 3339 subset, as
    explained in the begining of the discussion.

    2016-07-19 16:07 GMT+02:00 Anders Hovmöller <report@bugs.python.org>:

    Anders Hovmöller added the comment:

    The tests attached to this ticket seem pretty bare. Issues that I can spot directly:

    • only tests for datetimes, not times or dates
    • only tests for zulu and "-8:00” timezones
    • no tests for invalid input (parsing a valid date as a datetime for example)
    • only tests for YYYY-MM-DDTHH:MM:SSZ, but ISO8601 supports:
      • Naive times
      • Timezone information (specified as offsets or as Z for 0 offset)
      • Year
      • Year-month
      • Year-month-date
      • Year-week
      • Year-week-weekday
      • Year-ordinal day
      • Hour
      • Hour-minute
      • Hour-minute
      • Hour-minute-second
      • Hour-minute-second-microsecond
      • All combinations of the three "families" above!
        (the above list is a copy paste from my project that implements all ISO8601 that fits into native python: https://github.com/boxed/iso8601 <https://github.com/boxed/iso8601\>)

    This is a more reasonable test suite: https://github.com/boxed/iso8601/blob/master/iso8601.py#L166 <https://github.com/boxed/iso8601/blob/master/iso8601.py#L166\> although it lacks the tests for bogus inputs.

    > On 2016-07-16, at 03:41, Alexander Belopolsky <report@bugs.python.org> wrote:
    >
    >
    > Alexander Belopolsky added the comment:
    >
    > I would very much like to see this ready before the feature cut-off for Python 3.6. Could someone post a summary on python-ideas to get a show of hands on some of the remaining wrinkles?
    >
    > I would not worry about a C implementation at this point. We can put python implementation in _strptime.py and call it from C as we do for the strptime method.
    >
    > ----------
    >
    > _______________________________________
    > Python tracker <report@bugs.python.org>
    > <http://bugs.python.org/issue15873\>
    > _______________________________________

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue15873\>


    @boxed
    Copy link
    Mannequin

    @boxed boxed mannequin commented Jul 19, 2016

    Hmm, ok. I guess I was confused by "dates and times" part of the subject. Ok, so only datetimes. My other comments still apply though.

    On 19 Jul 2016, at 16:20, Mathieu Dupuy <report@bugs.python.org> wrote:

    Mathieu Dupuy added the comment:

    because it limits itself to only support the RFC 3339 subset, as
    explained in the begining of the discussion.

    2016-07-19 16:07 GMT+02:00 Anders Hovmöller <report@bugs.python.org>:
    >
    > Anders Hovmöller added the comment:
    >
    > The tests attached to this ticket seem pretty bare. Issues that I can spot directly:
    >
    > - only tests for datetimes, not times or dates
    > - only tests for zulu and "-8:00” timezones
    > - no tests for invalid input (parsing a valid date as a datetime for example)
    > - only tests for YYYY-MM-DDTHH:MM:SSZ, but ISO8601 supports:
    > - Naive times
    > - Timezone information (specified as offsets or as Z for 0 offset)
    > - Year
    > - Year-month
    > - Year-month-date
    > - Year-week
    > - Year-week-weekday
    > - Year-ordinal day
    > - Hour
    > - Hour-minute
    > - Hour-minute
    > - Hour-minute-second
    > - Hour-minute-second-microsecond
    > - All combinations of the three "families" above!
    > (the above list is a copy paste from my project that implements all ISO8601 that fits into native python: https://github.com/boxed/iso8601 <https://github.com/boxed/iso8601\>)
    >
    > This is a more reasonable test suite: https://github.com/boxed/iso8601/blob/master/iso8601.py#L166 <https://github.com/boxed/iso8601/blob/master/iso8601.py#L166\> although it lacks the tests for bogus inputs.
    >
    >> On 2016-07-16, at 03:41, Alexander Belopolsky <report@bugs.python.org> wrote:
    >>
    >>
    >> Alexander Belopolsky added the comment:
    >>
    >> I would very much like to see this ready before the feature cut-off for Python 3.6. Could someone post a summary on python-ideas to get a show of hands on some of the remaining wrinkles?
    >>
    >> I would not worry about a C implementation at this point. We can put python implementation in _strptime.py and call it from C as we do for the strptime method.
    >>
    >> ----------
    >>
    >> _______________________________________
    >> Python tracker <report@bugs.python.org>
    >> <http://bugs.python.org/issue15873\>
    >> _______________________________________
    >
    > ----------
    >
    > _______________________________________
    > Python tracker <report@bugs.python.org>
    > <http://bugs.python.org/issue15873\>
    > _______________________________________

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue15873\>


    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Jul 21, 2016

    Mathieu: Maybe you haven’t seen some of the comments on your older patches. E.g. my comment on fromisoformat4.patch about improper use of “with self.assertRaises(...)” still stands.

    Also, adding some documentation to the patch might help the likes of Anders figure out the scope of the change. I think we decided to parse RFC 3339’s “internet date and time format” profile of ISO 8601 with the date, time, and datetime classes, including tolerating arbitrary resolutions of fractions of seconds in the time, and parsing time zones.

    I don’t think we need to test every combination of the other ISO 8601 formats. There are already a couple of negative tests. Are there any in particular you think are important to add?

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Aug 5, 2016

    I'm back on the issue. I'm currently stuck on the design. We need to store the regexes somewhere, and that's what causes problem : I can't really find a good place to store them. We basically have two possible designs :

    • single dispatch kind, class-type dictionary lookup for regexes, stored in _strpime.py. It's minimally invasive, allow a very simple C implementation, and allows us to avoid to add a 're' import in datetime.py. Problem : it breaks when the given class is not of type date, time or datetime. And it currently breaks the tests because tests are doing this, testing using subclasses. We could rely on "isinstance" but do we want this ?

    • regex stored as classes attributes. More robust, more invasive, 're' import in datetime.py, allows subclassing, passes test. C implementation not done yet. Since it requires a better understanding of the C API, I will do it only we are sure that's the way to go.

    I post the two versions of the implementation as patches here. These adress all the concerns expressed before (Martin). If we can't decide, I will post a mail on the mailing list Martin suggested, python-ideas. By the way, are you sure it's the right one to ask ? Wouldn't be python-dev more appropriated ?

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Aug 5, 2016

    updated version with SilentGhost's concerns addressed.

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Aug 24, 2016

    Please move _parse_isotime to _strptime so that it can be called from C implementation. Also, the new method should be documented.

    @abalkin abalkin added the 3.7 label Sep 22, 2016
    @larsonreever
    Copy link
    Mannequin

    @larsonreever larsonreever mannequin commented Apr 18, 2017

    Otherwise, py8601 (https://bitbucket.org/micktwomey/pyiso8601/) looks pretty popular and well maintained (various committers, started in 2012, last commit in 2016). I don't think that we should add the iso8601 module to the stdlib, but merge iso8601 "features" into the datetime module. The iso8601 module supports Python 2.7 and so has to implement its own timezone classes. The datetime module now has datetime.timezone since Python 3.2 for fixed timezone. To me it's the finest, the most elegant, and no other one can claim to be more robust since it's probably the #1 iso parsing functions used in python. Have a look at https://docs.djangoproject.com/en/1.9/_modules/django/utils/dateparse/#parse_datetime.

    @boxed
    Copy link
    Mannequin

    @boxed boxed mannequin commented Apr 18, 2017

    @larsonreever That lib is pretty limited, in that it doesn't handle dates or deltas. Again: my lib that is linked above does and has comprehensive tests.

    @elprans
    Copy link
    Mannequin

    @elprans elprans mannequin commented Oct 24, 2017

    I think that both the pyiso8601 and boxed/iso8601 implementations parse ISO 8601 strings incorrectly. The standard explicitly says that all truncated datetime strings are *reduced accuracy timestamps*. In other words, "2017-10" is *not* equal to "2017-10-01". Instead, "2017-10" represents the whole month of October 2017. Same thing with hours. Earlier versions of ISO 8601 even allowed dropping the year: "--10-01", which meant October 1st of _any year_. They dropped this from more recent revisions of the standard.

    The only place where the truncated representation means "default to zero" is the timezone offset, so "10:10:00+4" and "10:10:00+04:00" mean the same thing.

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Dec 4, 2017

    P-ganssle seems to be proposing to limit parsing to exactly what “datetime.isoformat” produces; i.e. whole number of seconds, milliseconds or microseconds. Personally I would prefer it without this limitation, like in Mathieu’s patches. But P-ganssle has done some documentation, so perhaps we can combine the work of each?

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Dec 4, 2017

    The other difference is Mattieu guarantees ValueError for invalid input strings, which I think is good.

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Dec 4, 2017

    The better is the enemy of the good here. Given the history of this issue, I would rather accept a well documented restrictive parser than wait for a more general code to be written. Note that we can always relax the parsing rules in the future.

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Dec 4, 2017

    I'm right now available again to work on this issue. I'll submit a pull
    request within a week with all issues addressed

    Le 4 déc. 2017 11:45 PM, "Alexander Belopolsky" <report@bugs.python.org> a
    écrit :

    Alexander Belopolsky <alexander.belopolsky@gmail.com> added the comment:

    The better is the enemy of the good here. Given the history of this
    issue, I would rather accept a well documented restrictive parser than wait
    for a more general code to be written. Note that we can always relax the
    parsing rules in the future.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue15873\>


    @pganssle
    Copy link
    Member

    @pganssle pganssle commented Dec 5, 2017

    The better is the enemy of the good here. Given the history of this issue, I would rather accept a well documented restrictive parser than wait for a more general code to be written. Note that we can always relax the parsing rules in the future.

    This is in fact the exact reason why I wrote the isoformat parser like I did, because ISO 8601 is actually a quite expansive standard, and this is the least controversial subset of the features. In fact, I spent quite a bit of time on adapting the general purpose ISO8601 parser I wrote for dateutil *into* one that only accepts the output of isoformat() because it places a minimum burden on ongoing support, so it's not really a matter of waiting for a more general parser to be written.

    I suggest that for Python 3.7 we only support output of isoformat(). Many general iso8601 parsers exist, including the one I have already implemented for python-dateutil (which will be part of the dateutil 2.7.0 release). We can have further discussion later about what exactly should be supported in Python 3.8, but even in the pre-release discussions I'm already seeing pushback about some of the more unusual 8601 formats, and it's a lot easier to explain (in documentation) that fromisoformat() is intended to be the inverse of isoformat() than it is to explain which variations of ISO 8601 are and are not supported (fractional minutes? if you're following the standard, the separator has to be a T, so what other variations of the standard are allowed?).

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Dec 5, 2017

    +1 on what Paul said.

    Mathieu, the goal for 3.7 will be to get Paul's PR merged. It will be great if you could help in reviewing it. We can return to the features in your PR during the 3.8 development cycle.

    @pganssle
    Copy link
    Member

    @pganssle pganssle commented Dec 5, 2017

    The other difference is Mattieu guarantees ValueError for invalid input strings, which I think is good.

    I forgot to address this - but I don't think this is a difference in approaches. If you pass None or an int or something, the problem is with the type, not the value, so at a minimum you're looking at TypeError and ValueError - and those are the only exceptions raised in my patch.

    (I'll note that my patch does not accept bytes, though this is something of an artificial limitation, since the patch makes use of the fact that all valid isoformat() strings will contain at most exactly 1 non-ascii character in position 10, so we could easily work around this, but I think the trend for CPython is to avoid blurring the lines between bytes and str rather than encouraging their interchangeable use.)

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Dec 13, 2017

    I finally released my work. It looks like Paul's work is more comprehensive, but if you want to pick one thing or two in mine, feel free.

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Dec 18, 2017

    Regarding Matthieu’s RFC 3339 parser, Victor wanted to use the round-half-to-even rule to get a whole number of microseconds. But considering the “time” class cannot represent 24:00, how do you round up in the extreme case past 23:59?

    time.fromisoformat("23:59:59.9999995")

    Perhaps it is better to always truncate to zero, only support 6 digits (rejecting fractions of a microsecond), or add Anders’s truncate_microseconds=True option.

    @pganssle
    Copy link
    Member

    @pganssle pganssle commented Dec 18, 2017

    @martin.panter I don't see the problem here? Wouldn't 23:59.9999995 round up to 00:00?

    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Dec 18, 2017

    Not if the time is associated with a particular day. Imagine implementing datetime.fromisoformat by separately calling date.fromisoformat and time.fromisoformat. The date will be off by one day if you naively rounded 2017-12-18 23:59 “up” to 2017-12-18 00:00.

    @pganssle
    Copy link
    Member

    @pganssle pganssle commented Dec 18, 2017

    Not if the time is associated with a particular day. Imagine implementing datetime.fromisoformat by separately calling date.fromisoformat and time.fromisoformat. The date will be off by one day if you naively rounded 2017-12-18 23:59 “up” to 2017-12-18 00:00.

    Yes, I suppose this is a problem if you implement it that way. Seems like a somewhat moot point, but I think any decision about rounding should probably be driven by what people are expecting more than by how it is implemented.

    That said, I can see a good case for truncation *and* rounding up for something like '2016-12-31T23:59:59.999999999'. Rounding up to '2017-01-01' is certainly the closest whole millisecond to round to, *but* often people expressing a "23:59:59.9999999" are trying to actually express "the last possible moment *before* 00:00".

    @jaitaiwan
    Copy link
    Mannequin

    @jaitaiwan jaitaiwan mannequin commented Dec 19, 2017

    I wanted to note here... I've been trying to get strptime to work with the types of dates specified in this request and came across a documentation bug here: https://docs.python.org/3.5/library/time.html#time.strptime

    You can see that the %z attribute's examples given have colons in them while the format specified is +HHMM rather than +HH:MM which the examples illude to.

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Dec 21, 2017

    New changeset 09dc2f5 by Alexander Belopolsky (Paul Ganssle) in branch 'master':
    bpo-15873: Implement [date][time].fromisoformat (bpo-4699)
    09dc2f5

    @deronnax
    Copy link
    Mannequin

    @deronnax deronnax mannequin commented Dec 29, 2017

    maybe it's worth adding an entry in python 3.7 "what's new" ? I think it was a very long awaited issue.
    The opposite of isoformat() is a very frequent question from python newcomers

    @bitdancer
    Copy link
    Member

    @bitdancer bitdancer commented Dec 29, 2017

    Correct, a new feature should always get a what's new entry. You could submit a PR for it :)

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Feb 6, 2018

    New changeset 22864bc by Alexander Belopolsky (Paul Ganssle) in branch 'master':
    Add What's new entry for datetime.fromisoformat (bpo-5559)
    22864bc

    @abalkin
    Copy link
    Member

    @abalkin abalkin commented Mar 1, 2018

    New changeset 0e06be8 by Alexander Belopolsky (Miss Islington (bot)) in branch '3.7':
    Add What's new entry for datetime.fromisoformat (GH-5559) (GH-5939)
    0e06be8

    @abalkin abalkin closed this as completed Mar 1, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 stdlib type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants