Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strpdate('20141110', '%Y%m%d%H%S') returns wrong date #67029

Closed
dgorley mannequin opened this issue Nov 10, 2014 · 11 comments
Closed

strpdate('20141110', '%Y%m%d%H%S') returns wrong date #67029

dgorley mannequin opened this issue Nov 10, 2014 · 11 comments
Assignees
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@dgorley
Copy link
Mannequin

dgorley mannequin commented Nov 10, 2014

BPO 22840
Nosy @malemburg, @brettcannon, @abalkin, @ethanfurman

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/abalkin'
closed_at = <Date 2015-03-01.19:58:23.233>
created_at = <Date 2014-11-10.22:01:30.995>
labels = ['type-feature', 'invalid', 'docs']
title = "strpdate('20141110', '%Y%m%d%H%S') returns wrong date"
updated_at = <Date 2015-03-01.19:58:23.232>
user = 'https://bugs.python.org/dgorley'

bugs.python.org fields:

activity = <Date 2015-03-01.19:58:23.232>
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date = <Date 2015-03-01.19:58:23.233>
closer = 'belopolsky'
components = ['Documentation']
creation = <Date 2014-11-10.22:01:30.995>
creator = 'dgorley'
dependencies = []
files = []
hgrepos = []
issue_num = 22840
keywords = []
message_count = 11.0
messages = ['230977', '230979', '230980', '230983', '230986', '230988', '230989', '230991', '230993', '231028', '231029']
nosy_count = 5.0
nosy_names = ['lemburg', 'brett.cannon', 'belopolsky', 'ethan.furman', 'dgorley']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue22840'
versions = ['Python 3.5']

@dgorley
Copy link
Mannequin Author

dgorley mannequin commented Nov 10, 2014

strptime() is returning the wrong date if I try to parse today's date (2014-11-10) as a string with no separators, and if I ask strpdate() to look for nonexistent hour and minute fields.

>>> datetime.datetime.strptime('20141110', '%Y%m%d').isoformat()
'2014-11-10T00:00:00'
>>> datetime.datetime.strptime('20141110', '%Y%m%d%H%M').isoformat()
'2014-01-01T01:00:00'

@dgorley dgorley mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Nov 10, 2014
@ethanfurman
Copy link
Member

What result did you expect?

@dgorley
Copy link
Mannequin Author

dgorley mannequin commented Nov 10, 2014

I expected the second call to strpdate() to throw an exception, because %Y consumed '2014', %m consumed '11', and %d consumed '10', leaving nothing for %H and %M to match. That would be consistent with the first call.

@ethanfurman
Copy link
Member

The documentation certainly appears to say that %m, for example, will consume two digits, but it could just as easily be only for output (i.e. strftime).

I suspect this is simply a documentation issue as opposed to a bug, but let's see what the others think.

@abalkin
Copy link
Member

abalkin commented Nov 10, 2014

I have recently closed a similar issue (bpo-5979) as "won't fix". The winning argument there was that Python behavior was consistent with C. How does C strptime behave in this case?

@abalkin
Copy link
Member

abalkin commented Nov 11, 2014

With the following C code:

#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){

  char buf[255];
  struct tm tm;

  memset(&tm, 0, sizeof(tm));
  strptime("20141110", "%Y%m%d%H%M", &tm);
  strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M", &tm);
  printf("%s\n", buf);

  return 0;
}

I get

$ ./a.out
2014-11-10 00:00

So I think Python behavior is wrong.

@abalkin
Copy link
Member

abalkin commented Nov 11, 2014

Here is the case that I think illustrates the current logic better:

>>> datetime.strptime("20141234", "%Y%m%d%H%M")
datetime.datetime(2014, 1, 2, 3, 4)

@abalkin
Copy link
Member

abalkin commented Nov 11, 2014

Looking at the POSIX standard

http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html

It appears that Python may be compliant:

%H The hour (24-hour clock) [00,23]; leading zeros are permitted but not required.
%m The month number [01,12]; leading zeros are permitted but not required.
%M The minute [00,59]; leading zeros are permitted but not required.

@abalkin
Copy link
Member

abalkin commented Nov 11, 2014

Here is another interesting bit from the standard: "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications."

This is how they get away from not specifying whether parser of variable width fields should be greedy or not.

@brettcannon
Copy link
Member

strptime very much follows the POSIX standard as I implemented strptime by reading that doc.

If you want to see how the behaviour is implemented you can look at https://hg.python.org/cpython/file/ac0334665459/Lib/_strptime.py#l178 . But the key thing here is that the OP has unused formatters. Since strptime uses regexes underneath the hood, the re module does its best to match the entire format. Since POSIX says that e.g. the leading 0 for %m is optional, the regex goes with the single digit version to let the %H format match _something_ (same goes for %d and %M). So without rewriting strptime to not use regexes to support unused formatters and to stop being so POSIX-compliant, I don't see how to change the behaviour. Plus it would be backwards-incompatible as this is how strptime has worked in 2002.

It's Alexander's call, but I vote to close this as "not a bug".

@abalkin
Copy link
Member

abalkin commented Nov 11, 2014

After reading the standard a few more times, I agree with Brett and Ethan that this is at most a call for better documentation.

I'll leave this open for a chance that someone will come up with a succinct description of what exactly datetime.strptime does. (Maybe we should just document the format to regexp translation implemented in _strptime.py.)

We may also include POSIX's directive "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications" as a recommendation.

@abalkin abalkin added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Nov 11, 2014
@abalkin abalkin self-assigned this Nov 11, 2014
@abalkin abalkin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Nov 11, 2014
@abalkin abalkin closed this as completed Mar 1, 2015
@abalkin abalkin added the invalid label Mar 1, 2015
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants