-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have date.isocalendar() return a structseq instance #68604
Comments
Currently, It would be a bit more useful if this tuple could be made into a namedtuple (with fields year, week and weekday). |
Is there any way that this could break existing code? |
As far as I know, using a namedtuple in place of a tuple is fully backwards-compatible. |
No, it is not fully backwards-compatible. First, if pickle a namedtuple, it can't be unpickled in previous versions. Second, namedtuple is slower and larger than tuple, so it shouldn't be used in memory or performance critical code. |
True, but I don't think Python goes as far as to promise that objects pickled in one version can be unpickled in previous versions.
True, but I doubt that such tuples are often used extensively in performance-critical areas of code. Also, code which does care about this could immediately convert these namedtuples into plain tuples, which would be backwards compatible. |
I didn't know about issues with pickling. As for the performance issue, is date.isocalendar() really performance critical? I found a precedent for replacing a tuple by a namedtuple: https://hg.python.org/cpython/rev/ef72142eb8a2 I'm trying my hand at writing a patch (I'm a new contributor) but it seems the code is implemented twice: both in Lib/ (python) and in Modules/ (C). I'm having a hard time figuring out how to conjure up a namedutple from the C code. Any hepl is appreciated. Thanks. |
You can take a look at lru_cache_cache_info in Modules/_functoolsmodule.c for an example of namedtuple instantiation in C code. But that code gets the namedtuple class as a parameter. This is not my area of expertise, but you could try using PyObject_CallFunction to call the Python collections.namedtuple function, keep the result as a module attribute, and then call that whenever you want to create an instance. |
Here's a second attempt at a patch which seems to work. I don't know C at all but I used _decimal.c to try and figure out how to use namedtuple in C. The code compiles and the datetime test suite runs. I've added a test for the new feature and a note in Misc/NEWS as well as a versionadded note in the relevant documentation. Let me know if I've missed anything. |
Some C code cleanups suggested by haypo. |
Now for the color of the bike-shed: What should the namedtuple returned by date.isocalendar() be named? Perhaps CalendarDate or DateTuple? Baptiste's patches use ISOCalendarResult. In my opinion that is a poor name since it does not tell you anything about what it represents (you have to know or look up what isocalendar() returns). |
For the name, I took (un)inspiration from ParseResult: https://docs.python.org/3/library/urllib.parse.html?highlight=urlparse#urllib.parse.ParseResult Any better suggestion is welcome of course. |
How about IsoCalendarDate? |
Here's a new patch with the name Thanks |
+1 This looks like a reasonable and useful API improvement. |
C code should generally use structseq instead of collections.namedtuple. See the code for time.localtime() in Modules/timemodule.c. Also, in the docs use "named tuple" as two words and link to the glossary: https://docs.python.org/3/glossary.html#term-named-tuple |
I updated the patch based on Raymond's feedback. I don't know C at all and I tried to mimic the namedtuple usage of timemodule.c as much as I could (until the code compiled and the test suite passed). I still have two questions:
Thanks. |
Regarding the name "IsoCalendarDate", see for example this question on Stack Overflow[1] where both of the leading answers suggest beginning with "Iso" or "iso" rather than "ISO". However, taking a look at the relatively new module urllib.request[2], it uses names such as "HTTPHandler" rather than "HttpHandler". I'll leave this up to someone more knowledgeable about the conventions to settle. .. [1]: http://stackoverflow.com/questions/15526107/acronyms-in-camelcase |
This patch/The patch |
Dong-hee Na, yes, that would be welcome. Making a PR would be a great first step. I also suggest adding a note about the issue unpickling with earlier versions, in the NEWS entry and it the "What's New" entry. |
Sorry for the late response after a patch, but I'm actually -1 on this patch. I don't think it buys us very much in terms of the API considering it only has 3 parameters, and it adds some complexity to the code (in addition to the performance issue). Honestly, I think the main reason we didn't go with positional-only parameters in |
@p-ganssle @taleinat I agree that there are penalties of this patch what you said. AS-IS: >>> year, week, weekday = a.isocalendar()
>>> datetime.date.fromisocalendar(year, week, weekday)
datetime.date(2019, 9, 1)
TO-BE:
>>> b = a.isocalendar()
>>> datetime.date.fromisocalendar(b.year, b.week, b.weekday)
datetime.date(2019, 9, 1) |
You can use datetime.date.fromisocalendar(*b). |
The I think that the usability of |
Is this an issue that anyone is currently insisting upon? From what I can tell the current implementation uses a structseq and none of my objections had to do with the properties of a structseq.
I generally agree with this - it is nice to not break this compatibility when there's no good reason to do so, but pickling data in one version of Python and unpickling it in another is not something that's supported by the pickle module anyway.
I must take some umbrage at this framing of the question. I don't even know where meekness comes into it - adding any new functionality brings support burdens and additional code complexity and changes to code that's been stable for a long time like |
In an effort to get a sense of how useful this would actually be, I did a code search for That is not the kind of usage pattern I was envisioning when I said that this was a marginal improvement, a *lot* of this code could be made considerably more readable with named fields. If indeed the performance is similar or the same and this won't impact consumers of the pure python version of the module unduly (I checked in #pypy and they think that it shouldn't be more than a minor irritation if anything), then I am changing my -1 to a +1. |
Paul, you're a new core dev. Please don't take issues away from the assignee. Please try to learn from more seasoned core devs. Don't argue with everything they say and demand they give you justifications. I asked Tim to opine on this because 1) he is one of the principal authors of the datetime module, 2) he was party to previous decisions to upgrade from tuples to struct seq, 3) he has a good deal of common sense and knows what is important, and 4) he's far more senior than me, so there is a chance you might listen to him (my comments don't seem to carry any weight with you). |
The main blocker issue (if we decide to accept this feature) is the pickle serialization issue. Serhiy Storchaka:
Would it be possible to tell pickle to serialize .isocalendar() as a tuple, and deserialize it from a tuple to a structseq? I'm thinking at __getstate__ and __setstate__ methods, but I'm not sure how to use them. Serhiy Storchaka:
vstinner@apu$ python3
Python 3.7.4 (default, Jul 9 2019, 16:32:37)
>>> import sys
>>> s=sys.version_info
>>> t=tuple(s)
>>> sys.getsizeof(t), sys.getsizeof(s)
(96, 96)
>>> type(s)
<class 'sys.version_info'> Hum, structseq and tuple seem to have exactly the same memory footprint: only the type is larger, not instances. According to msg350986, the performance to instanciate a new object is exactly the same between tuple and structseq. |
The former is possible but that latter is not: If the object is pickled as a tuple, it will always be unpickled as a simple tuple. To customize unpickling, the serialized data must include the name of the class to use, and that class will never exist in earlier Python versions. I don't think there's a way around this. However, I find Raymond's note very convincing, in that we should likely not let the unpickling issue get in the way of this improvement:
|
""" Let me know if the benchmark code is not appropriate Hum wait, this benchmark measures the performance of the pass opcode... That's why it's so fast :-) -s 'a = datetime.datetime.now().isocalendar()' is run exactly once for the whole benchmark, not at each benchmark iteration... I suggest you this microbenchmark instead: ./python.exe -m pyperf timeit -s 'import datetime; dt = datetime.datetime.now()' 'dt.isocalendar()' For example, on my Fedora 30 with Python 3.7.4, I get: Mean +- std dev: 108 ns +- 4 ns |
Oh ok. So this change makes .isocalendar() slower and breaks the backward compatibility on the serialization. I'm not longer sure that it's worth it. I defer the decision the datetime module maintainers: Alexander Beloposky and Paul Ganssle. |
I favor making this a structseq, primarily based on Paul's attempt to find actual use cases, which showed that named member access would be most useful most often. I have no intuition for that myself, because while I wrote the original functions here, I've never used them and never will ;-) But if I ever did, I'd probably want the ISO week number, and would appreciate not having to remember which meaningless index means "week". Don't care about moving pickles to older versions. Don't care about the putative speed hit - it's still measured in nanoseconds, so can still create over a million per second. In apps with a large number of dates, they're typically not calculated, but read up from a relatively snail-like database or network connection. I doubt anyone would notice if Note: I'm unassigning myself, because I have nothing else to say :-) |
[Tim Peters]
If there are no further objections to upgrading isocalendar() to return a structseq, I would like to take the issue back so I can supervise Dong-hee Na writing a patch and then help bring it fruition. |
I haven't had time to try this with an optimized build, I have done a few more benchmarks using a standard build, I'm seeing almost a 50% regression on isocalendar() calls, but the picture is a little rosier if you consider the fact that you need to construct a I have compared the following cases: call only: -s 'import datetime; dt = datetime.datetime(2019, 1, 1)' 'dt.isocalendar()' The results are summarized below, the most likely real practical impact on performance-sensitive "hot loops" would be a 29% performance regression *if* isocalendar() is the bottleneck:
------------------------------------+---------------+-----------------+---------- The numbers for
------------------------------------+---------------+-----------------+---------- This is a bit disheartening, but I think the fact that these performance sensitive situations are rare and the "grab a single component" use is overwhelmingly the common case, I think it's worth it in the end. If there are significant complaints about the performance regression, we may be able to create a variant method similar to the way that |
I have compiled both versions with optimizations on, looks like the gap gets a bit smaller (percentage-wise) after that:
------------------------------------+---------------+-----------------+---------- If this were something fundamental like a performance regression in building a tuple or constructing a dictionary or something I'd be concerned, but this just reinforces my feeling that, on balance, this is worth it, and that we are probably not going to need a "fast path" version of this. |
[Paul Ganssle]
Dong-hee Na, you can now go ahead with the patch. The relevant function is date_isocalendar() located in Modules/_datetime.c. Model the structseq code after the PyFloat_GetInfo() function in Objects/floatobject.h. When you have a PR with a doc update and tests, please request my review and I help you wrap it up. Ask questions here if you get stuck. Good luck! |
namedtuple is much faster now that four years ago. New namedtuple type creation, instantiating a namedtuple object, access to its members -- all this is times faster. It is still slower than tuple in some aspects, because tuples are everywere and the interpreter has special optimizations for tuples. But date.isocalendar() is not used in such performance critical code. The size of pickles will grow, the time of pickling and unpickling will increase, and old Python versions will not be able to unpickle a new type by default. But I do not think that instances of this type will be pickled in large numbers so it will has a global effect, and the problem with compatibility can be solved with some hacks. If anybody pickles them at all. It was not clear whether adding a new type will give any benefit in real code. Thanks to Paul for his investigations. The only thing that makes this case unique is that we have two implementations of the datetime module. Therefore we will need either add two implementation of a new named tuple type (which are different in details) or import the Python implementation into the C code. This increases the cost of the implementation and maintaining. |
It's okay for the pure python equivalent to use collections.namedtuple so long as the C code uses structseq. As Serhiy points out, there is no need to reinvent the wheel. |
I agree with Raymond here: using collections.namedtuple is fine in the pure Python version. Since Raymond checked in doc changes to purge the phrase "struct sequences" (thanks again for that!), it's consistent with everything else now for the new datetime docs to say that this function returns a "named tuple" (the docs are now clear that there may be more than one implementation of such a thing, and that the only things you can _rely_ on are that they support named element access in addition to tuple methods). |
The current state of the PR doesn't hinge on the pure Python implementation, we went with a very simple tuple subclass to keep the two more closely in sync and because we don't need any of the additional functionality that namedtuple brings, but if it were any more complicated than what we did we probably would have just gone with a namedtuple. The only thing that's holding things up now is that we're working out a way to maintain the ability to pickle the object without making the class public. This is not really a hard requirement, but I'd like to give it an honest effort before calling it a day and just relying on "it's not in __all__, therefore it's not public." (I should note that we can only take that approach after issue bpo-38155 is resolved, which is another reason for the delay). In any case, the bulk of the conversation on the implementation has been taking place on #59838, sorry for the split discussion location, folks. |
This is now merged, thanks for the debate and opinions offered everyone, and thanks to Dong-hee for the implementation! The way we did the implementation, a pickle/unpickle cycle on the result of .isocalendar() will return a plain tuple. Despite the fact that I suggested it, I recognize that this is a /weird thing to do/, and I'm sorta banking on the fact that in all likelihood no one is relying on pickling and unpickling the result of .isocalendar() returning anything but a tuple (since, currently, that's already what it does, regardless of what the input type was). I suppose we'll have to see if it causes problems in the beta period, and I'll keep a close eye on it. |
This broke compilation with mingw; see https://bugs.python.org/issue40777 |
(Apologies if this isn't the right place and/or time for this kind of negative feedback. I'm open to suggestions for a more appropriate venue) I found it disheartening that my work on this ticket has been erased. While I understand that other contributors have worked a lot more on this than I have, I did put in a non-negligible amount of work into this ticket. It would have been nice to have been credited in the commit message for example. As-is, it gives me the impression that the only things this project values are pure code contributions. I couldn't find documentation about how this projects credits contributors. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: