New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved allocation of PyUnicode objects #46235
Comments
This is an attempt at improving allocation of str (PyUnicode) objects.
There is a ~10% speedup in stringbench, and a slight improvement in |
The Unicode object was designed not to be a PyVarObject (in contrast to Note that turning the objects into PyVarObjects removes the ability to Regarding your second point: Unicode objects already use a free list and Tuning the KEEPALIVE_SIZE_LIMIT will likely have similar effects w/r to |
I just tried bumping KEEPALIVE_SIZE_LIMIT to 200. It makes up for a bit (then there are of course microbenchmarks. For example: I don't understand the argument for codecs having to resize the unicode I admit I don't know the exact reasons for PyUnicode's design. I just |
Your microbenchmark is biased towards your patched version. The Regarding memory usage: this is difficult to measure in Python, since Regarding resize: you're right - the string object is a PyVarObject as The reason for using an external buffer for the Unicode object was to be Like I already mentioned, PyObjects are also easier to extend at C level
How much speedup do you get when you compare the pybench test with |
With KEEPALIVE_SIZE_LIMIT = 200, the pybench runtime is basically the You say that RAM size is cheaper than CPU power today, which is true but I understand the argument about possible optimizations with an external |
I don't really see the connection with bpo-1629305. An optimization that would be worth checking is hooking up the Another strategy could involve a priority queue style cache with the aim This could also be enhanced using an offline approach: you first run an Coming from a completely different angle, you could also use the Regarding memory constrained environments: these should simply switch |
All of those proposals are much heavier to implement; they also make the The reason that I mentioned bpo-1629305 was that it was such an |
Agreed, those optimizations do make the implementation more complicated. bpo-1629305 only provided speedups for the case where you write s += 'abc'. In your case, I think that closing the door for being able to easily |
I know it's not the place to discuss bpo-1629305, but the join() solution As for cStringIO, I actually observed in some experiments that it was |
FWIW, I tried using the freelist scheme introduced in my patch without |
Yes, definitely. Some comments on style in your first patch:
|
After some more tests I must qualify what I said. The freelist patch is With a small string: With a medium-sized string: With a long string: (the numbers are better than in my previous posts because the Also, given those results, it is also clear that neither pybench nor That said, the freelist patch is attached. |
Here is an updated patch against the current py3k branch, and with |
Here is an updated patch, to comply with the introduction of the |
Marc-Andre: Wit the udpated patches, is this a set of patches we can accept? |
Thanks for your interest Sean :) |
Antoine, as I've already mentioned in my other comments, I'm -1 on I also don't think that the micro-benchmarks you are applying really do I'm +1 on the free list changes, though, in the long run, I think that BTW: Unicode slices would be a possible and fairly attractive target for |
Marc-Andre: don't all your objections also apply to the 8bit string With python 3.0, all strings are unicode. Shouldn't this type be |
Yes, all those objections apply to the string type as well. The fact BTW: Please also see ticket bpo-2321 to see how the change affects your |
Hi, Marc-André, I'm all for "real-life" benchmarks if someone proposes some. You are talking about slicing optimizations but you forget that the As I said the freelist changes actually have mixed consequences, and in Why wouldn't you express your arguments in the python-3000 thread I |
Regarding benchmarks: It's difficult to come up with decent benchmarks Regarding the lazy slice patches: those were not using subclassing, they Regarding discussions on the py3k list: I'm not on that list, since I |
Well I'm not subscribed to the python-3k list either - too much traffic As for instrumenting the interpreter, this would tell us when and which As for the explicit slicing approach, "explicit string views" have been The reason I'm bringing in those previous discussions is that, in regards Antoine. |
I've read the comments from Guido and Martin, but they don't convince me As you say: it's difficult to get support for optimizations such a With the Unicode implementation and the subclassing support for builtin I'm also for making Python faster, but not if it limits future |
Well, I'm not gonna try to defend my patch eternally :) Since all the arguments have been laid down, I'll let other developers |
Regarding the benchmark: You can instrument a 2.x version of the I also expect that patch bpo-2321 will have an effect on the performance |
You are right, bpo-2321 made the numbers a bit tighter: With a small string: With a medium-sized string: With a long string: stringbench3k: Regarding your benchmarking suggestion, this would certainly be an I'm going to post the updated patches. |
Thanks for running the tests again. The use of pymalloc for the buffer It is interesting to see that the free list patch only appears to Dev-Python/LICENSE: Dev-Python/Misc/HISTORY: Compare that to a typical Python module source file... Dev-Python/Lib/urllib.py: The distributions differ a lot, but they both show that typical strings Setting KEEPALIVE_SIZE_LIMIT to 32 should cover most of those cases
|
Antoine Pitrou wrote:
Has Guido pronounced on this already ? |
On Fri, Jun 5, 2009 at 4:06 AM, Marc-Andre Lemburg
I don't want it added to 3.1 unless we start the beta cycle afresh. I think it's fine to wait for 3.2. Maybe add something to the docs |
Guido van Rossum wrote:
We should have a wider discussion about this on python-dev. I'll publish the unicoderef extension and then we can see Antoine's patch makes such extensions impossible (provided you Note that in Python 2.x you don't have such issues because |
The new buffer API has a provision for type flags, although none of them |
In the interest of possibly improving the imminent 3.1 release, I wonder if it is possible to make it generically easier to subclass |
Terry J. Reedy wrote:
Thanks for opening that ticket.
Even if we were to add some pointer arithmetic tricks to at least The reason is simple: subclassing is about reusing existing method If you want to change the way the allocation works, you'd have Furthermore, using your subclasses objects with the existing APIs In summary: Implementations like the unicoderef type I posted The current implementation has no problem with working on referenced That's what I meant with closing the door on future enhancements |
Points against the subclassing argument:
Terry: PyVarObjects would be much easier to subclass if the type object stored an offset to the beginning of the variable section, so it could be automatically recalculated for subclasses based on the size of the struct. This'd mean the PyBytesObject struct would no longer end with a char ob_sval[1]. The down side is a tiny bit more math when accessing the variable section (as the offset is no longer constant). |
Adam Olsen wrote:
Base type Unicode buffers end with a null-Py_UNICODE termination, There's no such thing as a null-termination invariant for Unicode.
Actually, Unicode objects were designed to be subclassable right See the prototype implementation of such a subclass uniref that I've BTW, I'm not aware of any changes to the PyUnicodeObject by some |
|
I find that the null termination for 8-bit strings makes low-level parsing operations (e.g., parsing a numeric string) safer and easier: for example, it makes skipping a series of digits with something like: while (isdigit(*s)) ++s; safe. I'd imagine that null terminated PyUNICODE arrays would have similar benefits. |
Not to mention faster. The new IO library makes use of it (for newline |
On Sun, Jan 10, 2010 at 14:59, Marc-Andre Lemburg
|
Antoine Pitrou wrote:
I'd consider that a bug. Esp. the IO lib should be 8-bit clean Besides, using a for-loop with a counter is both safer and faster Just think of what can happen if you have buggy code that overwrites If you're lucky, you get a segfault. If not, you end up with The Python Unicode API deliberately tries to always use the combination |
It doesn't add any special meaning to them. It just relies on a NUL
It's slower, since it has one more condition to check.
Well, buggy code leads to bugs :) |
Again, on Windows there are many many usages of PyUnicode_AS_UNICODE() that pass the result to various Windows API functions, expecting a nul-terminated array of WCHARs. Please don't change this! |
Amaury Forgeot d'Arc wrote:
The above usage is clearly wrong. PyUnicode_AS_UNICODE() should For such uses, the Unicode conversion APIs need to be used, Note that Python is free to change the meaning of Py_UNICODE |
Then there are many places to change, in core python as well as in third-party code. And PyArg_ParseTuple("u") would not work any more. |
Python-UCS4 has never worked on Windows. Most developers on Windows, taking example on core python source code, implicitly assumed that HAVE_USABLE_WCHAR_T is true, and use the Py_Unicode* api the same way they use the PyString* functions. PyString_AsString is documented to return a nul-terminated array. If PyUnicode_AsUnicode starts to behave differently, people will have more trouble to port their modules to py3k. |
modules to py3k.
It is, otherwise I would have documented it. The fact that some Note that PyUnicode_AsUnicode() only returns a pointer to the But no worries: We're not going to change it. It's too late Still, developers will have to be aware of the fact that 0-termination |
Le lundi 01 février 2010 à 19:21 +0000, Marc-Andre Lemburg a écrit :
Ok, so the current allocation scheme of unicode objects is an |
If, as Antoine claimed, 'it' is a documented feature of str strings, and Py3 says str = Unicode, it is a plausible inference. |
@antoine: do you wish to try and take this forward? |
No reply to msg110599, I'll close this in a couple of weeks unless anyone objects. |
2010/9/20 Mark Lawrence <report@bugs.python.org>:
Please don't. This is still a valid issue. |
Updated patch against current py3k. |
I just found that the extension zope.i18nmessageid: http://pypi.python.org/pypi/zope.i18nmessageid subclasses unicode at the C level: Notably, the Message structure is defined this way:
typedef struct {
PyUnicodeObject base;
PyObject *domain;
PyObject *default_;
PyObject *mapping;
} Message; How would such an extension type behave after the patch? Is there a workaround we can propose? |
The PEP-393 is based on the idea proposed in this issue (use only one memory block, not two), but also enhanced it to reduce more the memory using other technics:
The PEP-393 has been accepted and merged into Python 3.3. So I consider this issue as done. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: