-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timespec optional flag to datetime isoformat() to choose the precision #63674
Comments
I have a CSV file. Here are a few rows: "2013-10-30 14:26:46.000528","1.36097023829" I want to parse the strings in the first column as timestamps. I can, and often do, use dateutil.parser.parse(), but in situations like this where all the timestamps are of the same format, it can be incredibly slow. OTOH, there is no single format I can pass to datetime.datetime.strptime() that will parse all the above timestamps. Using "%Y-%m-%d %H:%M:%S" I get errors about the leftover microseconds. Using "%Y-%m-%d %H:%M:%S".%f" I get errors when I try to parse a timestamp which doesn't have microseconds. Alas, it is datetime itself which is to blame for this problem. The above timestamps were all printed from an earlier Python program which just dumps the str() of a datetime object to its output CSV file. Consider: >>> dt = dateutil.parser.parse("2013-10-30 14:26:50")
>>> print dt
2013-10-30 14:26:50
>>> dt2 = dateutil.parser.parse("2013-10-30 14:26:51.000549")
>>> print dt2
2013-10-30 14:26:51.000549 The same holds for isoformat(): >>> print dt.isoformat()
2013-10-30T14:26:50
>>> print dt2.isoformat()
2013-10-30T14:26:51.000549 Whatever happened to "be strict in what you send, but generous in what you receive"? If strptime() is going to complain the way it does, then str() should always generate a full timestamp, including microseconds. The above is from a Python 2.7 session, but I also confirmed that Python 3.3 behaves the same. I've checked 2.7 and 3.3 in the Versions list, but I don't think it can be fixed there. Can the __str__ and isoformat methods of datetime (and time) objects be modified for 3.4 to always include the microseconds? Alternatively, can the %S format character be modified to consume optional decimal point and microseconds? I rate this as "easy" considering the easiest fix is to modify __str__ and isoformat, which seems unchallenging. |
See bpo-7342. |
It may be simple but as Ezio has pointed out, it has already been rejected :) The problem with being generous in what you accept in this context is that the parsing is using a specific format string, and the semantics of that format string are based on external "standards" and are pretty inflexible. The pythonic solution, IMO, is to have datetime's constructor accept what its str produces. And indeed, exactly this has been suggested by Alexander Belopolsky in bpo-15873. So I'm going to close this one as a duplicate of that one. |
I don't accept your conclusion. I understand that making %S consume microseconds or ".%f" be "optional" would be a load. What's the problem with forcing __str__ and isoformat to emit microseconds in all cases though? That would allow you to parse what they produce using existing code. No new constructor needed. The issue of sometimes emitting microseconds, sometimes not, is annoying, even beyond this issue. I think for consistency's sake it makes sense for the string version of datetime and time objects to always be the same length. |
It's not my conclusion. It's Guido's and the other developers who designed datetime. Argue with them. (I'd guess it would be better argued on python-ideas rather than python-dev, but use your own judgement.) |
The decision to omit microseconds when 0 was a Guido pronouncement, back when datetime was first written. The idea is that str() is supposed to be friendly, and for the vast number of applications that don't use microseconds at all, it's unfriendly to shove ".000000" in their face all the time. Much the same reason is behind why, e.g., str(2.0) doesn't produce "2.0000000000000000". I doubt this will change. If you want to use a single format, you could massage the data first, like if '.' not in dt:
dt += ".000000" |
Okay, so no to __str__. What about isoformat? |
I don't know, Skip. Since |
Well, I don't know if this sways anything, but I was probably responsible, and I think my argument was something about not all timestamp sources having microseconds, and not wanting to emit the ".000000" in that case. If I could go back I'd probably do something else; after all str(1.0) doesn't return '1' either. But that's water under the bridge; "fixing" this is undoubtedly going to break a lot of code. Maybe we can give isoformat() a flag parameter to force the inclusion or exclusion of the microseconds (with a default of None meaning the current behavior)? |
The ultimate culprit here is actually the csv module. :-) It calls str() on every element it's about to write. In my applications which write to CSV files I can special case datetime objects. I will stop swimming upstream. |
I suppose in an ideal world the csv module would have some sort of hookable serialization protocol, like the database modules do :) |
As I understand Guido's message, he reopened this to consider adding a new parameter. Given an existing csv file like that given, either Tim's solution or |
+1 on adding an option to isoformat(). We already have an optional <sep> argument, so the symmetry with __str__ is not complete. To make this option more useful, rather than implementing always_emit_microseconds=False flag, I would add a keyword argument 'precision' that would take ('hour'|'minute'|'second'|millisecond'|'microsecond') value. |
I would like to implement this feature. I already wrote the Python part. Is there anything else to decide? |
2013/11/5 Alexander Belopolsky <report@bugs.python.org>:
Hour precision is not part of the ISO 8601 standard. "resolution" is maybe a better name for the new parameter than "precision": The new parameter should be added to datetime.datetime.isoformat() but |
+1 on all Victor's points. I like 'resolution' because this is the term that datetime module uses already: >>> from datetime import *
>>> datetime.resolution
datetime.timedelta(0, 0, 1) There is a slight chance of confusion stemming from the fact that datetime.resolution is timedelta, but proposed parameter is a string. I believe ISO 8601 uses the word "accuracy" to describe this kind of format variations. I am leaning towards "resolution", but would like to hear from others. Here are the candidates:
(Note that "accuracy" is the shortest but "resolution" is the most correct.) |
On 05.11.2013 21:31, STINNER Victor wrote:
Since this ticket is about being able to remove the seconds fraction BTW: Have you thought about the rounding/truncation issues A safe bet is truncation, but this can lead to inaccuracies of |
MAL: Have you thought about the rounding/truncation issues I believe it has to be the truncation. Rounding is better left to the user code where it can be done either using timedelta arithmetics or at the time source. I would expect that in the majority of cases where lower resolution printing is desired the times will be already at lower resolution at the source. |
On 06.11.2013 16:51, Alexander Belopolsky wrote:
Sure, otherwise I wouldn't have mentioned it :-) mxDateTime always uses 2 digit fractions when displaying date/time values. /* Fix a second value for display as string. Seconds are rounded to the nearest microsecond in order to avoid Special care is taken for second values which would cause rounding The second value returned by this function should be formatted */ This approach has worked out well, though YMMV.
In practice you often don't know the resolution of MS SQL Server datetime is the exception to that rule, with a http://msdn.microsoft.com/en-us/library/ms187819.aspx For full seconds, truncation will add an error of +/- 1 second, |
I am afraid that the rounding issues may kill this proposal. Can we start with something simple? For example, we can start with show=None keyword argument and allow a single value 'microseconds' (or 'us'). This will solve the issue at hand with a reasonable syntax: t.isoformat(show='us'). If other resolutions will be required, we can later add more values and may even allow t.isoformat(show=2) to show 2 decimal digits. |
I don't think the meaning of this proposed show keyword argument Furthermore... If we go far enough back, my original problem was really that the In my own code (where I first noticed the problem) I acquiesced, and d["time"] = now to this: d["time"] = now.strftime("%Y-%m-%dT%H:%M:%S.%f") where "now" is a datetime object. I thus guarantee that I can parse So, fiddle all you want with isoformat(), but do it right. I vote that >>> import datetime
>>> x = datetime.datetime.now()
>>> x
datetime.datetime(2013, 11, 6, 12, 19, 5, 759020)
>>> x.strftime("%Y-%m-%d %H:%M:%S")
'2013-11-06 12:19:05' (%S doesn't produce "06") Skip |
Here is some "prior art": GNU date utility has an --iso-8601[=timespec] option defined as ‘-I[timespec]’ ‘auto’ https://www.gnu.org/software/coreutils/manual/html_node/Options-for-date.html |
I left some comments on Rietveld. |
What about milliseconds? I'll leave it for Guido to make a call on nanoseconds. My vote is +0.5.
The timespec feature is modeled after GNU date --iso-8601[=timespec] option which does support nanoseconds. It is fairly common to support nanoseconds these days and it does not cost much to implement. |
Yes, but the module does not support nanoseconds. And putting any such options would require a huge banner saying that the nanosecond option will just always result in three zeros at the end. My suggestion is not to pretend that we suddenly "support" nanoseconds, but rather to follow the actual implementation of the module and add the support for nanoseconds timespec when the module actually adds support for them. |
You can leave out the nanoseconds but please do add the milliseconds. I'm sure they would find more use than the option to show only the hours. |
New patch |
Left some review suggestions |
New patch after @martin.panter comments on Rietveld. I left only this:
I think is quite obvious that a datetime.now() can't be rounded to the future if microseconds are 999500. |
About rounding: I’m not too sure what people would expect. Obviously it is much easier to implement truncating to zero. But it is different to many other rounding cases in Python; that is why I thought to make it explicit. >>> datetime.fromtimestamp(59.9999999).isoformat(timespec="microseconds")
'1970-01-01T00:01:00.000000'
>>> datetime.fromtimestamp(59.999999).isoformat(timespec="milliseconds")
'1970-01-01T00:00:59.999'
>>> format(59.999999, ".3f")
'60.000' |
Oh, now I see your point. I've uploaded a new patch with a note for that. |
Out of context here, but regarding round vs. truncate, IIUC for time |
We discussed truncation vs. rounding some time ago. See msg202270 and the posts around it. The consensus was the same as Guido's current advise: do the truncation. |
@belopolsky could you please review one of the latest two patches submitted? I think I've done all required. Now I'll wait from you if I have to do more. |
Guido, Did you consider MAL's msg202274? I am still in favor of truncation, but would like to make sure we are not missing something that MAL knows from experience. |
Another argument for truncation is that this is what GNU date does: $ date --iso-8601=seconds --date="2016-03-01 15:00:00.999"
2016-03-01T15:00:00-0500 |
Given that we're talking about what to do when we're suppressing the usecs I don't think roundtripping matters. :-) |
I changed many times how Python rounds nanoseconds in the private PyTime API, and I got a bug report because of that! => issue bpo-23517. By the way, I wrote an article to explain the history the private PyTime API, especially changes on rounding ;-) https://haypo.github.io/pytime.html |
But what should we do in your opinion? |
I hope my prediction "I am afraid that the rounding issues may kill this proposal" (see msg202276) will not come true. I think the correct way to view "timespec" is a way to suppress/enforce printing of trailing digits. Users that choose printing less than full usec format should make sure that their datetime instances are properly rounded before printing. Unfortunately, I does not look like the datetime module makes rounding easy. The best I can think of is something like def round_datetime(dt, delta):
dt0 = datetime.combine(dt.date(), time(0))
return dt0 + round((dt - dt0) / delta) * delta Maybe a datetime.round() method along these lines will be a worthwhile addition? |
Use ROUND_FLOOR rounding method. time.time(), datetime.datetime.now(), etc. round the current time using the ROUND_FLOOR rounding method. Only datetime.datetime.fromtimestamp() uses ROUND_HALF_EVEN, but it's more an exception than the rule: this function uses a float as input. To be consistent, we must use the same rounding method than other Python functions taking float as parameter, like round(), so use ROUND_HALF_EVEN. So I suggest to also use ROUND_FLOOR for .isoformat(). Hopefully, we don't have to discuss about the exact rounding method for negative numbers, since the minimum datetime object is datetime.datetime(1, 1, 1) which is "positive" ;-) You have a similar rounding question for file timestamps. Depending on the file system, you may have a resolution of 2 seconds (FAT), 1 second (ext3) or 1 nanosecond (ext4). But Linux syscalls accept subsecond resolution. The Linux kernel uses ROUND_FLOOR rounding method if I recall correctly. I guess that it's a requirement for makefiles. If you already experimented a system clock slew, you may understand me :-)
What is truncation? Is it the ROUND_FLOOR (towards -inf) rounding method? Like math.floor(float). Python int(float) uses ROUND_DOWN (towards zero) which is different than ROUND_FLOOR, but only different for negative numbers. int(-0.9) returns 0, whereas math.floor(-0.9) returns -1. I guess that "rounding" means ROUND_HALF_EVEN here? The funny "Round to nearest with ties going to nearest even integer" rounding method. Like round(float). |
Except for the case where you're closer than half a usec from the next value, IMO rounding makes no sense when suppressing digits. I most definitely would never want 9:59:59 to be rounded to 10:00 when suppressing seconds. If you really think there are use cases for that you could add a 'round=True' flag (as long as it defaults to False). That seems better than supporting rounding on datetime objects themselves. But I think you're just speculating. |
IIUC truncation traditionally means "towards zero" -- that's why we have separate "floor" and "ceiling" operations meaning "towards [negative] infinity". Fortunately we shouldn't have to deal with negative values here so floor and truncate mean the same thing. Agreed that isoformat() should also truncate. |
Sorry, what is the use case of this method? |
Personally, I don't rounding is that useful. My working assumption is that users will select say timespec='millisecond' only when they know that their time source produces datetime instances with millisecond precision and they don't want to kill more trees by printing redundant 0's. MAL's objection this this line of arguments was that some time sources have odd resolution (he reported MS SQL's use of 333 ms) and a user may want to have a perfect round-tripping when using a sub-usec timespec and such an odd time source or destination. |
Nice, it looks like I agree with you on using ROUNDING_FLOOR :-) I don't think that we should be prepared for theorical user requests, but rather focus on the concrete and well defined current existing user request: "Add timespec optional flag to datetime isoformat() to choose the precision". Let's wait until users request a datetime.round() method to understand better concrete issues. |
I feel odd trying to advocate a POV that I disagree with, so let me just quote MAL: """ For full seconds, truncation will add an error of +/- 1 second, I somehow missed this argument when Marc-Andre made it, so I want to make sure that it is properly considered before we finalize this issue. |
Meanwhile I made corrections after @belopolsky latest review |
Alessandro, thank you very much for your work and perseverance. I will do my best to commit this next weekend. |
New changeset eb120f50df4a by Alexander Belopolsky in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: