Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure Python strptime() (PEP 42) #35391

Closed
brettcannon opened this issue Oct 23, 2001 · 39 comments
Closed

Pure Python strptime() (PEP 42) #35391

brettcannon opened this issue Oct 23, 2001 · 39 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@brettcannon
Copy link
Member

BPO 474274
Nosy @gvanrossum, @smontanaro, @warsaw, @brettcannon, @nascheme, @jaraco
Files
  • diff_time: diff for time importing _strptime against 2002-07-10 CVS
  • _strptime.py: uploaded 2002-07-17
  • test_time.py: uploaded 2002-07-17
  • test_strptime.py: uploaded 2002-07-17
  • libtime_diff: uploaded 2002-07-17: patch for lib/libtime.tex
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2002-09-03.18:24:23.000>
    created_at = <Date 2001-10-23.23:15:29.000>
    labels = ['library']
    title = 'Pure Python strptime() (PEP 42)'
    updated_at = <Date 2002-09-03.18:24:23.000>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2002-09-03.18:24:23.000>
    actor = 'nnorwitz'
    assignee = 'gvanrossum'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2001-10-23.23:15:29.000>
    creator = 'brett.cannon'
    dependencies = []
    files = ['3706', '3707', '3708', '3709', '3710']
    hgrepos = []
    issue_num = 474274
    keywords = ['patch']
    message_count = 39.0
    messages = ['37953', '37954', '37955', '37956', '37957', '37958', '37959', '37960', '37961', '37962', '37963', '37964', '37965', '37966', '37967', '37968', '37969', '37970', '37971', '37972', '37973', '37974', '37975', '37976', '37977', '37978', '37979', '37980', '37981', '37982', '37983', '37984', '37985', '37986', '37987', '37988', '37989', '37990', '37991']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'skip.montanaro', 'barry', 'nnorwitz', 'brett.cannon', 'nascheme', 'jaraco']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue474274'
    versions = ['Python 2.3']

    @brettcannon
    Copy link
    Member Author

    The attached file contains a pure Python version of
    strptime(). It attempts to operate as much like
    time.strptime() within reason. Where vagueness or
    obvious platform dependence existed, I tried to
    standardize and be reasonable.

    PEP-42 makes a request for a portable, consistent
    version of time.strptime():

    • Add a portable implementation of time.strptime() that
      works in
      clearly defined ways on all platforms.

    This module attempts to close that feature request.

    The code has been tested thoroughly by myself as well
    as some other people who happened to have caught the
    post I made to c.l.p a while back and used the module.

    It is available at the Python Cookbook
    (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
    It has been approved by the editors there and thus is
    listed as approved. It is also being considered for
    inclusion in the book (thanks, Alex, for encouraging
    this submission).

    A PyUnit testing suite for the module is available at
    http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
    along with the code for the function itself.
    Localization has been handled in a modular way using
    regexes. All of it is self-explanatory in the doc
    strings. It is very straight-forward to include your
    own localization settings or modify the two languages
    included in the module (English and Swedish).

    If the code needs to have its license changed, I am
    quite happy to do it (I have already given the OK to
    the Python Cookbook).

    -Brett Cannon

    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Oct 23, 2001
    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Oct 23, 2001
    @nascheme
    Copy link
    Member

    Logged In: YES
    user_id=35752

    I'm pretty sure this code needs a different license before
    it can be accepted. The current license contains the
    "BSD advertising clause". See
    http://www.gnu.org/philosophy/bsd.html.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Oops. I thought I had removed the clause. Feel free to
    remove it.

    I am going to be cleaning up the module, though, so if you
    would rather not bother reviewing this version and wait on
    the cleaned-up one, go ahead.

    Speaking of which, should I just reply to this bugfix when I
    get around to the update, or start a new patch?

    @nascheme
    Copy link
    Member

    Logged In: YES
    user_id=35752

    Go ahead and reuse this item. I'll wait for the updated
    version.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Version 2 of strptime() has now been uploaded. This nearly
    complete rewrite includes the removal of the need to input
    locale-specific time info. All need locale info is gleaned
    from time.strftime(). This makes it able to behave exactly
    like time.strptime().

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Jun 1, 2002

    Logged In: YES
    user_id=33168

    Overall, the patch looks pretty good.
    I didn't check for completeness or consistency, though.

    • You don't need: from exceptions import Exception
    • The comment "from strptime import * will only export
      strptime()" is not correct.
    • I'm not sure what should be included for the license.
    • Why do you need success flag in CheckIntegrity, you raise
      an exception?
      (You don't need to return anything, raise an exception,
      else it's ok)
    • In return_time(), could you change xrange(9) to
      range(len(temp_time))
      this removes a dependancy.
    • strings are sequences, so instead of if found in ('y', 'Y')
      you can do if found in 'yY'
    • daylight should use the new bools True, False
      (this also applies to any other flags) * The formatting
      doesn't follow the standard (see PEP-8)
      (specifically, spaces after commas, =, binary ops,
      comparisons, etc)
    • Long lines should be broken up
      The test looks pretty good too. I didn't check it for
      completeness.
      The URL is wrong (too high up), the test can be found here:
      http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
      I noticed a spelling mistake in the test: anme -> name.

    Also, note that PEP-42 has a comment about a python strptime.
    So if this gets implemented, we need to update PEP-42.
    Thanks.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I'm afraid you looked at the wrong patch! My fault since I
    accidentally forgot to add a description for my patch. So
    the file with no description is the newest one and
    completely supercedes the older file. I am very sorry about
    that. Trust me, the new version is much better.

    I realized the other day that since the time module is a C
    extension file, would getting this accepted require getting
    BDFL approval to add this as a separate module into the
    standard library? Would the time module have to have a
    Python interface module where this is put and all other
    methods in the module just pass directly to the extension file?

    As for the suggestions, here are my replies to the ones that
    still apply to the new file:

    • strings are sequences, so instead of if found in ('y',
      'Y') you can do if found in 'yY'
      -> True, but I personally find it easier to read using the
      tuple. If it is standard practice in the standard library
      to do it the suggested way, I will change it.

    • daylight should use the new bools True, False (this also
      applies to any other flags)
      -> Oops. Since I wrote this under Python 2.2.1 I didn't
      think about it. I will go through the code and look for
      places where True and False should be used.

    -Brett C.

    @smontanaro
    Copy link
    Contributor

    Logged In: YES
    user_id=44345

    Brett,

    Please see the drastically shortened test_strptime.py. (Basically all I'm
    interested in here is whether or not strptime.strptime and time.strptime
    will pass the tests.) Near the top are two lines, one commented out:

      parsetime = time.strptime
      #parsetime = strptime.strptime

    Regardless which version of parsetime I get, I get some errors. If
    parsetime == time.strptime I get

    ======================================================================
    ERROR: Test timezone directives.
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "test_strptime.py", line 69, in test_timezone
        strp_output = parsetime(strf_output, "%Z")
    ValueError: unconverted data remains: 'CDT'

    If parsetime == strptime.strptime I get

    ERROR: *** Test %c directive. ***
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "test_strptime.py", line 75, in test_date_time
        self.helper('c', position)
      File "test_strptime.py", line 17, in helper
        strp_output = parsetime(strf_output, '%'+directive)
      File "strptime.py", line 380, in strptime
        found_dict = found.groupdict()
    AttributeError: NoneType object has no attribute 'groupdict'

    ======================================================================
    ERROR: Test for month directives.
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "test_strptime.py", line 31, in test_month
        self.helper(directive, 1)
      File "test_strptime.py", line 17, in helper
        strp_output = parsetime(strf_output, '%'+directive)
      File "strptime.py", line 393, in strptime
        month = list(locale_time.f_month).index(found_dict['b'])
    ValueError: list.index(x): x not in list

    This is with a very recent interpreter (updated from CVS in the past
    day) running on Mandrake Linux 8.1.

    Can you reproduce either or both problems? Got fixes for the
    strptime.strptime problems?

    Thx,

    Skip

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I have uploaded a verision 2.0.1 which fixes the %b format
    bug (stupid typo on a variable name).

    As for the %c directive, I pass that test. Can you please
    send the output of strftime and the time tuple used to
    generate it?

    As for the time.strptime() failure, I don't have
    time.strptime() on any system available to me, so could you
    please send me the output you have for strftime('%Z'), and
    time.tzname?

    I don't know how much %Z should be worried about since its
    use is deprecated (according to the time module's
    documentation). Perhaps strptime() should take the
    initiative and not support it?

    -Brett

    @smontanaro
    Copy link
    Contributor

    Logged In: YES
    user_id=44345

    Here ya go...

    % ./python
    Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
    [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import time
    >>> now = time.localtime(time.time())
    >>> now
    (2002, 6, 4, 0, 53, 39, 1, 155, 1)
    >>> time.strftime("%c", now)
    'Tue Jun  4 00:53:39 2002'
    >>> time.tzname
    ('CST', 'CDT')
    >>> time.strftime("%Z", now)
    'CDT'

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Thanks for being so prompt with your response, Skip.

    I found the problem with your %c. If you look at your
    output you will notice that the day of the month is '4', but
    if you look at the docs for time.strftime() you will notice
    that is specifies the day of the month (%d) as being in the
    range [01,31]. The regex for %d (simplified) is
    '(3[0-1])|([0-2]\d)'; not being represented by 2 digits
    caused the regex to fail.

    Now the question becomes do we follow the spec and chaulk
    this up to a non-standard strftime() implementation, or do
    we adapt strptime to deal with possible improper output from
    strftime()? Changing the regexes should not be a big issue
    since I could just tack on '\d' as the last option for all
    numerical regexes.

    As for the test error from time.strptime(), I don't know
    what is causing it. If you look at the test you will notice
    that all it basically does is parsetime(time.strftime("%Z"),
    "%Z"). Now how that can fail I don't know. The docs do say
    that strptime() tends to be buggy, so perhaps this is a case
    of this.

    One last thing. Should I wait until the bugs are worked out
    before I post to python-dev asking to either add this as a
    module to the standard library or change time to a Python
    stub and rename timemodule.c? Should I ask now to get the
    ball rolling? Since I just joined python-dev literally this
    morning I don't know what the protocol is.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Jun 4, 2002

    Logged In: YES
    user_id=33168

    Hopefully, I'm looking at the correct patch this time. :-)

    To answer one question you had (re: 'yY' vs. ('y', 'Y')),
    I'm not sure people really care. It's not big to me.
    Although 'yY' is faster than ('y', 'Y').

    In order to try to reduce the lines where you raise an error
    (in __init__)
    you could change 'sequence of ... must be X items long' to
    '... must have/contain X items'.

    Generally, it would be nice to make sure none of the lines
    are over 72-79 chars (see PEP-8).

    Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

    you could do:
    x = tuple(orig[:])
    or something like that. Perhaps a helper function?

    In __init__ do you want to check the params against 'is None'
    If someone passes a non-sequence that doesn't evaluate
    to False, the __init__ won't raise a TypeError which it
    probably should.

    What is the magic date used in __calc_weekday()?
    (1999/3/15+ 22:44:55) is this significant, should there
    be a comment?
    (magic dates are used elsewhere too, e.g., __calc_month,
    __calc_am_pm, many more)

    __calc_month() doesn't seem to take leap year into account?
    (not sure if this is a problem or not)
    In __calc_date_time(), you use date_time[offset] repetatively,
    couldn't you start the loop with something like dto =
    date_time[offset] and then use dto
    (dto is not a good name, I'm just making an example)

    Are you supposed to use __init__ when deriving from
    built-ins (TimeRE(dict)) or __new__?
    (sorry, I don't remember the answer)

    In __tupleToRE.sorter(), instead of the last 3 lines, you
    can do:
    return cmp(b_length, a_length)

    Note: you can do x = y = z = -1, instead of x = -1 ; y = -1
    ; z = -1

    It could be problematic to compare x is -1. You should
    probably just use ==.
    It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
    were not defined in Objects/intobject.c.

    This docstring seems backwards:
    def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
    I know a lot of these things seem like a pain.
    And it's not that bad now, but the problem is maintaining
    the code. It will be easier for everyone else if the code
    is similar to the rest.

    BTW, protocol on python-dev is pretty loose and friendly. :-)

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I went ahead an implemented most of Neal's suggestions. On
    a few, of them, though, I either didn't do it or took a
    slightly different route.

    For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives
    a performance boost, why not since it doesn't make the code
    harder to read. Implementing it actually had me catch some
    redundant code for dealing with a literal %.

    The tests in the __init__ for LocaleTime have been reworked
    to check that they are either None or have the proper
    length, otherwise they raise a TypeError.

    I have gone through and tried to catch all the lines that
    were over 80 characters and cut them up to fit.

    For the adding of '' to tuples, I created a method that
    could specify front or back concatination. Not much
    different from before, but it allows me to specify front or
    back concatination easily.

    I explained why the various magic dates were used.

    I in no way have to worry about leap year. Since it is not
    validating the data string for validity the fxn just takes
    the data and uses it. I have no reason to calc for leap year.

    date_time[offset] has been replaced with current_format and
    added the requisite two lines to assign between it and the list.

    You are only supposed to use __new__ when it is immutable.
    Since dict is obviously mutable, I don't need to worry about it.

    Used Neal's suggested shortening of the sorter helper fxn.

    I also used the suggestion of doing x = y = z = -1. Now it
    barely fits on a single line instead of two.

    All numerical compares use == and != instead of is and is
    not. Didn't know about that dependency on
    NSMALL((POS)|(NEG))INTS; good thing to know.

    The doc string was backwards. Thanks for catching that, Neal.

    I also went through and added True and False where
    appropriate. There is a line in the code where True = 1;
    False = 0 right at the top. That can obviously be removed
    if being run under Python 2.3.

    And I completely understand being picky about minute details
    where maintainability is a concern. I just graduated from
    Cal and so the memory of seeing beginning programmers' code
    is still fresh in my mind <shudders>.

    And I will query python-dev about how to go about to get
    this added after the bugs are fixed and I am back home
    (going to be out of town until June 16). I will still be
    periodically checking email, though, so I will continue to
    implement any suggestions/bugfixes that anyone suggests/finds.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I am back from my vacation and ready to email python-
    dev about getting this patch accepted (whether to modify
    time or make this a separate module, etc.). I think I will
    do the email on June 17.

    Before then, though, I am going to make two changes.
    One is the raise a Value Error exception if the regex doesn't
    match (to try to match time.strptime()s exception as seen
    in Skip's run of the unit test). The other change is to tack
    on a \d on all numeric formats where it might come out as
    a single digit (i.e., lacking a leading zero). This will be for
    v2.0.3 which I will post before June 17.

    If there is any reason anyone thinks I should hold back on
    this, please let me know! I would like to have this code as
    done as possible before I make any announcement.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I uploaded v.2.0.3. Beyond implementing what I mentioned
    previously (raising TypeError when a match fails, adding \d
    to all applicable regexes) I did a few more things.

    For one, I added a special " \d" to the numeric month regex.
    I discovered that ANSI C for ctime displays the month with
    a leading space if it is a single digit. So to deal with
    that since at least Skip's C library likes to use that
    format for %c, I went ahead and added it.

    I changed all attributes in LocaleTime to lists. A recent
    mail on python-dev from GvR said that lists are for
    homogeneous data, which everything that is grouped together
    in LocaleTime is. It also simplified the code slightly and
    led to less conversions of data types.

    I also added a method that raises a TypeError if you try to
    assign to any of LocaleTime's attributes. I thought that if
    you left out the set value for property() it wouldn't work;
    didn't realize it just defaults over to __setitem__. So I
    added that method as the set value for all of the property()s.

    It does require 2.2.1 now since I used True and False
    without defining them. Obviously just set those values to 1
    and 0 respectively if you are running under 2.2

    I also updated the overly exhaustive PyUnit suite that I
    have for testing my code. It is not black-box testing,
    though; Skip's pruned version of my testing suite fits that
    bill (I think).

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I have uploaded v. 2.0.4. It now uses the calendar module
    to figure out the names of weekdays and months. Thanks goes
    out to Guido for pointing out this undocumented feature of
    calendar.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    2.1.0 is now up and ready for use. I only changed two
    things to the code, but since they change the semantics of
    stprtime()s use, I made this a new minor release.

    One, I removed the ability to pass in your own LocaleTime
    object. I did this for two reasons. One is because I
    forgot about how default arguments are created at the time
    of function creation and not at each fxn call. This meant
    that if someone was not thinking and ran strptime() under
    one locale and then switched to another locale without
    explicitly passing in a new LocaleTime object for every call
    for the new locale, they would get bad matches. That is not
    good.

    The other reason was that I don't want to force users to
    pass in a LocaleTime object on every call if I can't have a
    default value for it. This is meant to act as a drop-in
    replacement for time.strptime(). That forced the removal of
    the parameter since it can't have a default value.

    In retrospect, though, people will probably never parse log
    files in other languages other then there default locale.
    And if they were, they should change the locale for the
    interpreter and not just for strptime().

    The second change was what triggers strptime() to return an
    re object that it can use. Initially it was any nothing
    value (i.e., would be considered false), but I realized that
    an empty string could trigger that and it would be better to
    raise a TypeError then let some error come up from trying to
    use the re object in an incorrect way.

    Now, to have an re object returned, you pass in False. I
    figured that there is a very minimal chance of passing in
    False when you meant to pass in a string. Also, False as
    the data_string, to me, means that I don't want what would
    normally be returned.

    I debated about removing this feature from strptime(), but I
    profiled it and most of the time comes from TimeRE's
    __getitem__. So building the string to be compiled into a
    regex is the big bottleneck. Using a precompiled regex
    instead of constructing a new one everytime took 25% of the
    time overall for strptime() when calling strptime() 10,000
    times in a row. This is a conservative number, IMO, for
    calls in a row; I checked the Apache hit logs for a single
    day on Open Computing Facility's web server
    (http://www.ocf.berkeley.edu/) and there were 188,562 hits
    on June 16 alone. So I am going to keep the feature until
    someone tells me otherwise.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    I have uploaded a contextual diff of timemodule.c with a
    callout to strptime.strptime when HAVE_STRPTIME is not
    defined just as Guido requested.

    It's my first extension module, so I am not totally sure of
    myself with it. But since Alex Marttelli told me what I
    needed to do I am fairly certain it is correct.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    2.1.1 is now uploaded. Almost a purely syntatical change.
    From discussions on python-dev I renamed the helper fxns so
    they are all lowercase-style. Also changed them so that
    they state what the fxn returns.

    I also put all of the imports on their own line as per PEP-8.

    The only semantical change I did was directly import
    re.compile since it is the only thing I am using from the re
    module.

    These changes required tweaking of my exhaustive testing
    suite, so that got uploaded, too.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
    below!). Just a little bit more cleanup. Biggest change is
    that I changed the default format string and made strptime()
    raise ValueError instead of TypeError. This was all done to
    match the time module docs.

    I also fiddled with the regexes so that the groups were
    none-capturing. Mainly done for a possible performance
    improvement.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    The actual 2.1.3 edition of strptime is now up. I don't
    think there are any changes, but since I renamed the file
    _strptime.py, I figured uploading it again wouldn't hurt.

    I also uploaded a new contextual diff of the time module
    taken from CVS on 2002-07-10. The only difference between
    this and the previous diff (which was against 2.2.1's time
    module) is the change of the imported module to _strptime.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Hm. This isn't done yet. I get these problems:

    (a) the patch for timemodule.c doesn't apply cleanly in
    current CVS (trivial)

    (b) it still tries to import strptime (no leading '_') (also
    trivial)

    (c) so does test_strptime.py (also trivial)

    (d) the simplest of simple examples fails:

    With Linux's strptime:

    >>> time.strptime("7/12/02", "%m/%d/%y")
    (2002, 7, 12, 0, 0, 0, 4, 193, 0)
    >>>

    With yours:

    >>> time.strptime("7/12/02", "%m/%d/%y")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/home/guido/python/dist/src/Lib/_strptime.py", line
    392, in strptime
        raise ValueError("time data did not match format")
    ValueError: time data did not match format
    >>> 

    Perhaps you should write a regression test suite for the
    strptime function as found in the time module courtesy of
    libc, and then make sure that your code satisfies it?

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    To respond to your points, Guido:

    (a) I accidentally uploaded the old file. Sorry about that.
    I misnamed the new one 'time_diff" but in my head I meant
    to overwrite "diff_time". I have uploaded the new one.

    (b) See (a)

    (c) Oops. That is a complete oversight on my part. Now in
    (d) you mention writing up regression tests for the standard
    time.strptime. I am quite hapy to do this. Do you want
    that as a separate patch? If so I will just stop with
    uploading tests here and just start a patch with my strptime
    tests for the stdlib tests.

    (d) The reason this test failed is because your input is not
    compliant with the Python docs. Read what %m accepts:

    Month as a decimal number [01,12]

    Notice the leading 0 for the single digit month. My
    implementation follows the docs and not what glibc suggests.
    If you want, I can obviously add on to all the regexes \d
    as an option and eliminate this issue. But that means it
    will no longer be following the docs. This tripped Skip up
    too since no one writes numbers that way; strftime does, though.
    Now if the docs meant for no trailing 0, I think they should
    be rewritten since that is misleading.

    In other words, either strptime stays as it is and follows
    the docs or I change the regexes, but then the docs will
    have to be changed. I can go either way, but I personally
    would want to follow the docs as-is since strptime is meant
    to parse strftime output and not human output. =)

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Hm, the new diff_time *still* fails to apply. But don't
    worry about that.

    I'd love to see regression tests for time.strptime. Please
    upload them here -- don't start a new patch.

    I think your interpretation of the docs is overly
    restrictive; the table shows what strftime does but I think
    it's reasonable for strptime to accept missing leading
    zeros. You can upload a patch for the docs too if you feel
    that's necessary. You may also try to read up on what the
    XPG standard says about strptime.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Uploaded 2.1.4. I added \d to the end of all relevant
    regexes (basically all of them but %y and %Y) to deal with
    non-zero-leading numbers.

    I also made the regex case-insensitive.

    As for the diff failing, I am wondering if I am doing
    something wrong. I am just running diff -c CVS_file
    modified_file > diff_file . Isn't that right?

    I will work on merging my strptime tests into the time
    regression tests and upload a patch here.

    I will do a patch for the docs since it is not consistent
    with the explanation of struct_time (or at least in my opinion).

    I tried finding XPG docs, but the best Google came up with
    was the NetBSD man pages for strptime (which they claim is
    XPG compliant). The difference between that implementation
    and mine is that NetBSD's allows whitespace (defined as
    isspace()) in the format string to match \s* in the data
    string. It also requires a whitespace or a non-alphanumeric
    character while my implementation does not require that.

    Personally, I don't like either difference. If they were
    used, though, there might be a possibility of rewriting
    strptime to just use a bunch of string methods instead of
    regexes for a possible performance benefit. But I prefer
    regexes since it adds checks of the input. That and I just
    like regexes period. =)

    Also, I noticed that your little test returned 0 for all
    unknown values. Mine returns -1 since 0 can be a legitimate
    value for some and I figured that would eliminate ambiguity.
    I can change it to 0, though.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Two things have been uploaded. First, test_time.py w/ a
    strptime test. It is almost an exact mirror of the strftime
    test; only difference is that I used strftime to test
    strptime. So if strftime ever fails, strptime will fail
    also. I feel this is fine since strptime depends on
    strftime so much that if strftime were to fail strptime
    would definitely fail.

    The other file is version 2.1.5 of strptime. I made two
    changes. One was to remove the TypeError raised when %I was
    used without %p. This was from me being very picky about
    only accepting good data strings. The second was to go
    through and replace all whitespace in the format string with
    \s*. That basically makes this version of strptime XPG
    compatible as far as I (and the NetBSD man page) can tell.
    The only difference now is that I do not require whitespace
    or a non-alphanumeric character between format strings.
    Seems like a ridiculous requirement since the requirement
    that whitespace be able to compress down to no whitespace
    negates this requirement. Oh well, we are more than
    compliant now.

    I decided not to write a patch for the docs to make them
    read more leniently for what the format directives. Figured
    I would just let people who think like me do it in a more
    "proper" way with leading zeros and those who don't read it
    like that to still be okay.

    I think that is everything. If you want more in-depth
    tests, Guido, I can add them to the testing suite, but I
    figured that since this is (hopefully) going in bug-free it
    needs only be checked to make sure it isn't broken by
    anything. And if you do want more in-depth tests, do you
    want me to add mirror tests for strftime or not worry about
    that since that is the ANSI C library's problem? Other then
    that, I think strptime is pretty much done.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    • Can you please delete all the obsolete uploads? (If SF
      won't let you, let me know and I'll do it for you, leaving
      only the most recend version of each.)

    • There' still a confusion between strptime.py and
      _strptime.py; your test_time.py imports strptime, and so
      does the latest version of test_strptime.py I can find.

    • The "from __future__ import division" is unnecessary,
      since you're never using the single / operator (// doesn't
      need the future statement). Also note that future statements
      should come *after* a module's docstring (for future
      reference :-).

    • When I run test_strptime.py, I get one failure:

    ======================================================================
    FAIL: Test TimeRE.pattern.
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "../Lib/test/test_strptime.py", line 124, in test_pattern
       
    self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
    \d))") != -1, "did not find 'd' directive pattern string
    '%s'" % pattern_string)
      File "/home/guido/python/dist/src/Lib/unittest.py", line
    262, in failUnless
        if not expr: raise self.failureException, msg
    AssertionError: did not find 'd' directive pattern string
    '(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
    \d)'

    I haven't looked into this deeper.

    Back to you...

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    God I wish I could delete those old files! Poor Neal
    Norwitz was nice enough to go over my code once to help me
    make it sure it was up for being included in the stdlib, but
    he initially used an old version. Thankfully he was nice
    enough to look over the newer version at the time. But no,
    SF does not give me the priveleges to delete old files (and
    why is that? I am the creator of the patch; you would think
    I could manage my own files). I re-uploaded everything now.
    All files that specify they were uploaded 2002-07-17 are
    the newest files.

    I am terribly sorry about this whole name mix-up. I have
    now fixed test_strptime.py to use _strptime. I completely
    removed the strptime import so that the strptime testing
    will go through time and thus test which ever version time
    will export.

    I removed the __future__ import. And thanks for the piece
    of advice; I was taking the advice that __future__
    statements should come before code a little too far. =)

    As for your error, that is because the test_strptime.py you
    are using is old. I originally had a test in there that
    checked to make sure the regex returned was the same as the
    one being tested for; that was a bad decision. So I went
    through and removed all hard-coded tests like that.
    Unfortunately the version you ran still had that test in
    there. SF should really let patch creators delete old files.

    That's it this time. Now I await the next drama in this
    never-ending saga of trying to make a non-trivial
    contribution to Python. =)

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    OK, deleting all old files as promised. All tests succeed. I
    think I'll check this version in (but it may be tomorrow,
    since I've got a few other things to take care of).

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Wonderful!

    About the docs; do you want me to email Fred or upload a
    patched version of the docs for time fixed? And for
    removing the request in PEP-42, should I email Jeremy about
    it or Barry?

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Since I had the time, I went ahead and did a patch for
    libtime.tex that removes the comment saying that strptime
    fully relies on the C library and uploaded it.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Jul 19, 2002

    Logged In: YES
    user_id=33168

    Brett, I'm still following. It wasn't that bad. :-)
    Guido, let me know if you want me to
    do anything/check stuff in.

    Docs are fine to upload here. I can change PEP-42 also.

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Thanks! All checked in.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 23, 2002

    Logged In: YES
    user_id=599869

    _strptime still doesn't handle case correctly. Although it
    matches case-insensitive, the lookups for month names are
    still case sensitive (since they must match exactly the
    month names returned by time.strftime). Allow me to
    demonstrate below.

    >>> import _strptime as strptime
    >>> strptime.strptime( 'Aug', '%b' )
    (-1, 8, -1, -1, -1, -1, -1, -1, -1)
    >>> strptime.strptime( 'AUG', '%b' )
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "C:\Program Files\Python\Lib\site-
    packages\strptime.py", line 410, in strptime
        month = locale_time.a_month.index(found_dict['b'])
    ValueError: list.index(x): x not in list

    The same result is encountered using any variation of 'aug'
    other than 'Aug'.

    I imagine the solution to this problem is as simple as
    changing lines 408, 410, 430, and 432 to something like:
    month = locale_time.a_month.index(capwords(found_dict
    ['b']))
    and adding
    from string import capwords
    to the header of the file. This is what I did in strptime v2.0.0
    and the latest v2.1.5, and it has worked well.

    As a general note, I believe the test suite should also test for
    cases not automatically generated. Many of the parse
    routines are going to be parsing times generated by systems
    other than time.strftime, to include variants such as the
    aforementioned case variant and other things like
    miscellaneous interspersed strings.

    I will only suggest these changes, not implement them
    myself on this page, as I'm not intimately familiar with
    sourceforge or code customs, so I'll allow someone such as
    Guido or Brett to make the changes.

    Regards,
    Jason

    @gvanrossum
    Copy link
    Member

    Logged In: YES
    user_id=6380

    Reopening to address Jason's bug report.

    Brett, can you provide a patch?

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    Yep, I will patch it. I will add it to patch 593560 which
    has my last bugfix and code cleanup in it.

    @brettcannon
    Copy link
    Member Author

    Logged In: YES
    user_id=357491

    OK, all patched and attached to the other patch entry. I
    also discovered a bug in PyUnit and got to submit a patch
    for that to its patch DB. Such fun. =)

    @warsaw
    Copy link
    Member

    warsaw commented Sep 3, 2002

    Logged In: YES
    user_id=12800

    Closing. Refer to patch 593560 for further details.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Sep 3, 2002

    Logged In: YES
    user_id=33168

    I'm assuming Barry really meant to keep this closed. :-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants