-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strptime should implement %G, %V and %u directives #56215
Comments
Python is via datetime.isocalendar() able to produce the ISO week number and ISO weekday from a given date. But it is not possible to do the reverse; calculate the date from a year, ISO week number and weekday. libc strptime()/strftime() mentions a %V and %u directive which does this, i.e. Monday in ISO week 22 of the year 2011: datetime.strptime('2011221', '%Y%V%u') returning 2011-05-30 and datetime.strptime('2011227', '%Y%V%u') returning 2011-06-05 libc (on FreeBSD) has this to say: %V is replaced by the week number of the year (Monday as the first day %u is replaced by the weekday (Monday as the first day of the week) as |
This is a reasonable request and easy to implement. I am not sure how often this functionality is needed, so I am only +0 on this feature. |
At least in Denmark, week numbers are used regularly in everyday communication and planning, and the numbering follows the ISO rules. Also, the week starts on Monday. |
I've recently joined the python-mentors mailing list because I love Python and want to get involved. I found this bug in the list of "Easy issues" and thought I'd try my hand. Anyway, this is my first patch, so please forgive me if I am breaking protocol or stepping on anyone's toes here. I also hope my code isn't embarrassing. This adds a constructor to the date class that allows construction based on an ISO-8601 YYYYWWD string. Does this address the issue in a logical way? |
Thanks for wanting to work on this. It is a little hard to parse from the OP's initial post, but he is asking that %V and %u be supported by datetime.strptime, which they currently are not. strptime is the generalized constructor for turning a string into a datetime, so no, a new constructor is not the right approach. The code that implements the strptime method of datetime is in the file _strptime.py in Lib. Your patch should be against that code. (Note that this is pretty un-obvious, I had to dig a bit to figure it out!) |
Thanks, I think I understand the original post now. Upon reading the docs and code, however, it seems this is possible using the %W and %w directives. Is the issue just to support the different letters (%V and %u) specifically, or that they are not quite the same format as the corresponding ISO numbers? |
Getting a date from ISO week /weekday is not possible with the %W and %w directives. ISO week numbers and normal week numbers are calculated differently (see my libc qoute in the original post). Also, using %w instead of %u would be ambiguous, since %w weeks start on Sunday, while %u start with Monday. I agree that this should be implemented in strptime(), to follow libc strptime(). datetime() has a clear constructor while strptime() has all the whacky cases. |
OK, here is my second attempt. I think it functions as desired, but a code review may reveal flaws in my implementation. I'm sure there are associated tests and documentation to write, but I have basically no experience with that yet. If this looks like the right direction, I can take it to the python-mentors list for help with the test/docs. |
I'm not familiar with this module, so at the moment I just gave the patch a quick scan. The approach looks OK to me. Hopefully Alexander can review it. In the meantime I think you should go ahead and work on tests and doc. It is easier to do a detailed review when tests are included. |
I'll try to give this a more thorough review by the end of the week. For now, just a nit-pick, but _calc_julian_from_V jumped at me as a really odd name. Either "iso_to_julian" or "julian_from_iso" would be much clearer. The patch also needs tests and documentation. |
Thanks everyone, please take your time if there are more pressing issues; I'll get to work on tests and documentation in the mean time. I agree that '_calc_julian_from_V' is a bit strange. I was mimicking a similar helper function's name ('_calc_julian_from_U_or_W'), but perhaps that is no defense. Also, I know the functionality is there with 'toisocalendar' and 'toisoweekday', but maybe %V and %u should be implemented for 'strftime' for completeness. Any thoughts? |
On Wed, May 25, 2011 at 6:28 PM, Ashley Anderson <report@bugs.python.org> wrote:
This is a perfect defense. Local consistency usually trumps global |
It looks like strftime already support %V and %u (presumably if and only if the platform strftime does). |
Uploaded patch adding unit tests for %V and %u. Patch is against python2.7 |
Patch by aganders3 doesn't compile. Added comments in review of his patch. Adding a patch against python2.7 with my changes, which pass all unit tests posted previously. |
Documentation (I have no idea how to create a patch for this): %V ISO week number of the year (Monday as the first day of the week) as a decimal number (01-53). The week containing January 4 is week 1; %u ISO weekday (Monday as the first day of the week) as a decimal number (1-7). |
Since this is a new feature it can only go into 3.3. Documentation is in the Doc subdirectory of a checkout, and is in the form of *.rst files. Modify the appropriate .rst file, and the doc update will be included in the patch you generate. If you wish to test your doc patch, execute 'make html' in the Doc subdirectory, which will pull in the appropriate software and build the docs in a Doc/build/html subdirectory. Ashely's patch worked for me on 3.3 in a minimal test; I haven't looked it it further at this point. Thanks for reviewing and for working on the unit tests and docs. |
On Thu, May 26, 2011 at 9:44 AM, R. David Murray <report@bugs.python.org> wrote:
More specifically, strftime/strptime directives are listed in a table |
When trying to add cases for %V and %u in the tests, I ran into an issue of year ambiguity. The problem comes when the ISO year does not match the Gregorian year for a given date. I think this should be fixed by implementing the %G directive (ISO year, which is present in strftime) as well. |
On Thu, May 26, 2011 at 12:07 PM, Ashley Anderson
Correct. |
Ashley, can you elaborate on wherein the ambiguity lies? Which combination of year, ISO week number and ISO weekday would lead to ambiguity? Regarding documentation, the %G, %V and %u directives are listed in IEEE Std 1003.1, 2004 Edition for strftime(): http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html For some reason, these directives are not mentioned in the related strptime(): http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html |
The example that triggered the issue in testing was January 1, 1905. The ISO date for this day is 1904 52 7. This is reported correctly if you use datetime.isocalendar() or datetime.strftime('%G'), but you get 1905 if you use datetime.strftime('%Y'). When it's read back in it causes the test to fail because ISO '1904 52 7' is different from ISO '1905 52 7'. Likewise if you consider a year starting on a Tuesday, Wednesday, or Thursday, there will be several days at the end of the previous year that will have an ISO year ahead of their Gregorian representation. |
Oh, I see, you're talking about strftime(). You're correct that you would need %G to print the year to which the ISO week containing the specific date belongs. I see no ambiguity or need for %G in strptime(). |
I disagree, I think %G is necessary in strptime(). Take Monday, December 31, 2001 as an example. The ISO date is 2002 01 1. Now, given only the Gregorian year (2001) for this date, and the ISO week and weekday (01 1), there is an ambiguity with Monday, January 1, 2001, which has an ISO date of 2001 01 1. The ISO week/weekday combination of (01 1) occurs twice in the year 2001, and can only be differentiated by the corresponding ISO year. We can, of course, debate on what the behavior in this case should be. The way I see it, we can:
I'm attaching a patch now that includes some minor changes, includes %G and adds some tests. I am also working on the documentation but it's not quite ready. |
I respectfully disagree. I take strptime('2002 01 1', '%Y %V %u') as mening "first day of first week in the year 2002" There is only one date that corresponds to the first day of the first week of 2002, i.e. Dec. 31, 2001. If you specify the first day of the first week of 2001 instead, then that's another date (Jan. 1, 2001). The last and the first week in a year may span dates belonging to two years. That's just the way it is. Are you suggesting strptime('2002 01 1', '%Y %V %u') to mean "first day of first week of 2002, except if that would change the year, in which case it means ???" If you want to strftime(strptime(date, fmt), fmt) and arrive at the original date then yes, you need %G (but only in strftime). |
Reading you comment again, I see the ambiguity now, if %Y is interpreted as "The resulting date MUST be in 2001". I think the safest way would be to implement %G and fail if %Y is used in combination with %V. Maybe even fail if %V and %u aren't used in combination (since using %w could mean either the Sunday in the end of ISO week X or the Sunday preceeding ISO week X). |
Alexander, http://bugs.python.org/file36417/12006_3.5_complete.patch updates the previous patches and is ready for review. Unit tests pass as of today. Regards, Alex W. |
I find this footnote somewhat confusing: """ The existing footnote (7) is much clearer: """ Why not use the same language in (8)? """ """ |
How was "%Y %V" issue resolved? I don't see any tests for this case. |
Well, it's an ambiguous situation, as established earlier. I'm not sure what the policy is wrt. foot-shooting, but I'd opt to fail if %G, %V and %u are not used together. |
I don't have the repo handy to make a patch against 3.5, but would an addition like this do? in Lib/_strptime.py: + elif iso_week != -1 and iso_year == -1: def test_ValueError(self):
+ # Make sure ValueError is raised when match fails or format is bad
+ self.assertRaises(ValueError, _strptime._strptime, data_string="1905 52",
+ format="%Y %V") |
For future reference, here is the example showing %Y %V ambiguity: >>> date(2013,12,31).strftime('%Y %V %u')
'2013 01 2'
>>> date(2013,1,1).strftime('%Y %V %u')
'2013 01 2' which is resolved by using %G >>> date(2013,12,31).strftime('%G %V %u')
'2014 01 2'
>>> date(2013,1,1).strftime('%G %V %u')
'2013 01 2' In other words, '2013 01 2' cannot be unambiguously parsed using '%Y %V %u' format. |
I think we need more tests showing that new directives don't violate strftime - strptime round-trip invariants. |
Documentation should say "new in 3.5". |
It would be nice to get this in for 3.5. Does anyone have time/interest to address the last two comments? |
The patch is synchronized with the tip, added the versionadded directives and whatsnews entry, addressed sasha's comment on Rietveld. Fixed a bug: %a and %A now are interchangeable with %u as well as with %w. I don't understand what more test are needed. Existing tests cover %Y %V ambiguity: >>> date(1906, 12, 31).strftime('%G %V %u')
'1907 01 1'
>>> date(1917, 12, 31).strftime('%G %V %u')
'1918 01 1' |
Addressed Berker's comments. |
There are other issues. strptime with single %Y returns a date of the first day of the year. What should return single %G? >>> time.strptime('2015', '%Y')
time.struct_time(tm_year=2015, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=-1)
>>> time.strptime('2015', '%G')
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=-1) |
I think POLA would be to raise ValueError on time.strptime('2015', '%G') and other situations that don't make sense or are ambiguous. That, or reproduce the bugs that the native OS strptime() implementation has. I got the %G, %V, and %u directives accepted for the next POSIX spec (http://austingroupbugs.net/view.php?id=879). But it will be years before operating systems catch up, so I would opt for the ValueError approach instead. |
Reverting to "needs patch" stage because there are still issues to be ironed out. |
Wow, I can't believe this issue is so old now! I'm motivated to finally come It sounds like the consensus is to raise a ValueError in cases where ambiguity Ambiguous or incomplete cases I have identified are:
Hopefully that covers it...let me know if I need to add any cases or change |
Here is an updated patch with implementation as outlined in msg247525. |
Thanks for the review and the good suggestions. Hopefully this new patch is an improvement. I didn't know about the context manager for assertRaises - I was just following the format for another ValueError test a few lines above. The frozenset and re-wrapped comment were left from playing around with another way to do the checks, and I've corrected them. I think the conditionals around calculating the julian and year are clearer now as well. Please (obviously) let me know if there are further changes. Also please let me know if this is not the proper way to respond to the code review! |
The going's a bit tough here. I've spent at least 10 times as long on this bug than it took me to work around the fact that Python doesn't support ISO week number round-trip. Python puts a smile on my face every day and I enjoy giving back where I can. But after four years, the smile is getting a bit stiff when I look at this enhancement request. I'm hereby offering a scrumptious, mouth-watering, virtual, blackberry mousse gateau with a honey-roasted almond meringue cake bottom to the person who gets this enhancement committed. In good open-source spirit, everyone who contributed along the way is entitled to a healthy serving and eternal gratitude from me. Bon appetit! |
Haha, thanks Erik. It seems you know my tastes enough to not offer a chocolate-based reward. =) I was actually set to "ping" to request a patch review today, as it's been one month since my last submission. Please let me know if I need to update the patch against the current |
Another *ping* for a patch review since it's been almost a month since the last one. |
Ashley, You don't have to be a committer to review a patch. Given that it is unlikely that any of the committers is an expert on ISO calendar, an external review will be most welcome. Would you complete one? |
Thanks Alexander, but I think the latest patch is still mine. It seems strange to review my own patch. I'll do it if that's common, but since this will (hopefully, eventually) be my first accepted patch I would appreciate the feedback from another reviewer. |
My bad. In this case, maybe Erik can review your latest patch? |
I believe Rietveld gives you a "done" button on each reviewer comment. If all you do is to take the reviewer suggestion, you can just press it. Responding with an issue note as you did is perfectly fine as well. |
I have reviewed the latest patch, and it looks good to me. There are tests for the tricky conversions around Jan 1, and the docs are brief and succinct. Until the full set of new c99 strftime directives are supported, I think it's overkill to include a lecture about the origin of these new directives and their support in the underlying OS. There are a number of foot-shooting possibilities when parsing date strings with the directives supported by strptime(), but at least this patch does not make it worse. It validates the input nicely when using the new directives and prints useful error messages it input is ambiguous. I'm not a committer, but I approve of the patch as-is. |
New changeset acdebfbfbdcf by Alexander Belopolsky in branch 'default': |
I've committed the latest patch. Thank you Ashley and Erik for your work and perseverance. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: