New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jump table for certain safe match-case statements #88449
Comments
Anecdotally, a few people I've talked to have expected that match-case statements would improve performance in the same way that switch-cases sometimes do in C-like languages. While this is impossible in general without changing semantics (since, for example, __eq__ can have side effects), it would be nice if a jump table was implemented in certain safe cases. My idea was to implement this whenever the list of cases begins with 2 or more of the following in a row:
Example:
This would compile to something like
Where the new opcode would be would be something like case TARGET(ATTEMPT_SAFE_MATCH): {
PyObject *jump_dict = POP();
PyObject *x = TOP();
if (PyLong_CheckExact(x) ||
PyUnicode_CheckExact(x) ||
Py_Is(x, Py_None))
{
PyObject *boxed = PyDict_GetItemWithError(jump_dict, x);
Py_DECREF(jump_dict);
if (res == NULL) {
if (PyErr_Occurred()) {
goto error;
}
else {
JUMPBY(oparg);
}
}
assert(PyLong_CheckExact(boxed));
int target = (int)PyLong_AsLong(boxed);
JUMPBY(target);
}
else {
// fall back to general matching code
Py_DECREF(jump_dict);
DISPATCH();
}
} I can work on an implementation in the next couple of weeks. |
Big +1 from me. This would make use of match-case more compelling. |
Seems like a good idea as long as we're careful about the implementation. I've just jotted down a few notes here:
|
At first I thought of matches with only the int/str/None cases and then I saw the easy generalization, but yeah, each contiguous block seems better. One solution to the code hashability/lookup speed apparent dilemma: writing the equivalent of this in C: class HashableDict(dict):
def __new__(cls, pairs):
return dict.__new__(cls, pairs)
def __eq__(self, other):
return tuple(self.mapping.items()) == tuple(other.sequence.items())
def __hash__(self):
return CONSTANT ^ hash(tuple(self.items()))
def __getnewargs__(self):
return tuple(self.items())
# forbid all mutating methods
__setitem__ = __delitem__ = update = __ior__ = None # etc. (Or maybe using composition rather than inheritence) Maybe using the HAMT in /Python/hamt.c would suffice instead. |
Hm. I’m not sure if the HAMT makes much sense. We don’t really care about efficient mutation or copies... constant-time lookups seem to be overwhelmingly the most valuable factor here. I personally think that creating some sort of hashable dict and teaching marshal how to serialize/deserialize it could be a promising path forward. I imagine its actual design could mirror dict (sort of like set/frozenset). That might be a rather drastic step though. Perhaps it’s better to first prove that building a mapping at runtime is too slow. |
I agree that we should benchmark for what the value of "two" should be ;) . Maybe only match statements with >=20 consecutive simple cases end up being worth it. I would assume that if "case 1, case 2, ..., case 20" are all in a row then "case 1" wouldn't be *that* much more likely than "case 20", so a slowdown on case 1 wouldn't matter too much if the average gets better. I'm under the impression that match-case statements would frequently be used only once per function call. If I understand your proposal, that would mean that calling a function with a N-case constant-pattern match would require N hashes and N insertions into a dict (including memory allocation), followed by O(1) lookup. Wouldn't that eliminate any hope of making such a function O(1)? Unless there's a way to cache the populated dict somewhere else? I suggested HAMT because it's a well-tested pre-existing fast immutable mapping that already implements __hash__, so it would be safe to store in co_consts already populated. Re-implementing all of the dict logic seems like an unnecessary pain, which is why I was suggesting either the HAMT or a thin wrapper around dict, not a re-implementation of a new hash table. I was hoping to do the minimum amount of disruption possible to get reasonable O(1) performance, and then seeing where that leaves us. |
Yeah, that was the idea sketched out above. Basically, if we go with a vanilla dictionary here, we're going to need to build it at runtime. It only really makes sense to do that once, the first time we need it. Then we stash it away... uh... somewhere. As you note, it doesn't make much sense to spend linear time building a constant-time jump table if it's only going to be used once. :) Maybe somebody familiar with the constantly-evolving opcaches could chime in if this is the sort of thing that could benefit from that infrastructure. Basically, we would need a separate cache per bytecode offset, per code object. My understanding is that we don't have anything like that yet. (A quick-and-dirty prototype could probably store them at "hidden" local names like ".table0", ".table1", ".table2", etc. I know comprehensions do something like that.)
Well, I don't think we'd need to do any of that. I believe set and frozenset share the same core design and routines, but differ only in the interfaces provided by the objects themselves. I imagine we could do something similar with a hypothetical _PyFrozenDict_Type... copy most of the type definition, dropping all of the methods except mp_subcript, tp_hash, and a few other members. That would probably be all we needed to get this design up and running for a proof-of-concept. A lot of work goes into making dicts fast (especially for things like strings), so it would be nice for a new type to be able to benefit from those incremental improvements.
Agreed, but the HAMT mapping has logarithmic time complexity for lookups, right? Maybe for realistic cases the coefficients make it basically equivalent, but on the surface it seems more promising to try reusing the dict guts instead. |
FWIW PEP-603 includes a graph (Figure 2) of dict.get versus hamt.get performance as the mappings grow, and it seems the hamt is roughly 1.4x slower on inputs of sizes between 10 and 1000. |
For anyone curious, I had some free time today and took a stab at creating a minimal _frozendict type (sharing as much implementation with dict as possible) to see how difficult it would be. It was actually much simpler than I thought... just a few dozen lines of code, including marshal support. If you'd like to see it, you can check out my "frozendict" branch here: main...brandtbucher:frozendict For testing purposes, I've exposed in the _testcapi module. It has the same constructor signature as dict, and basically only supports lookups: >>> from _testcapi import _frozendict
>>> fd = _frozendict({1: 2, "3": 4, None: 5})
>>> fd
<_frozendict object at 0x7f4e127e4ef0>
>>> fd[1]
2
>>> fd[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 3
>>> import marshal
>>> marshal.loads(marshal.dumps(fd)) == fd
True I'm not gonna lie... I really like it. It sidesteps any runtime construction/caching issues that using a normal dict would have, but with all of the same performance characteristics. It also seems like it would not introduce much of a maintenance burden, since it has very little of its own code. Anyways, food for thought. It was a fun experiment at the very least. |
Very interesting -- that is shorter than I thought also! If we really wanted to optimize the tar out of this, we could probably find a way to re-use just the PyDictKeysObject to find the index into a C-array of integer jump targets and not have to worry about unboxing. |
I'm hoping we can get something close that for free by just waiting... the faster-cpython folks are working on a tagged-pointer implementation of integers as we speak. I've been looking for a new project, so I'd love to help work on this issue (if you're open to it). The pattern compiler is a bit of a beast, and I like to think I have a better than-average-understanding of it. ;) |
Hi Brandt, I would welcome your collaboration/mentorship in whatever capacity makes sense. I sent an email to brandt@python.org from sweeney.dennis650@gmail.com. |
Are you sure you want to do this? This optimisation is not applicable if the matched values are given symbolic names. You would be encouraging people to write bad code with lots of literals, for speed. |
No, not yet. But there has been some positive feedback here, and it seems like an interesting project to prototype. :)
I had the same concern initially, but I'm honestly not losing too much sleep over it. Literals already get lots of optimizations that symbolic constants don't, and to me it seems a bit silly to avoid optimizing literals here when it is quite straightforward (and powerful) to do so. There are also many cases where using symbolic constants for certain literals doesn't make much sense. I'd be okay not loudly publicizing this change if it lands. After all, I'm the guy who spent the better part of a year trying to convince people that this is Not A Switch Statement. :) (Although, on the flip side, maybe it will help appease the people who have been asking for one all these years.) |
Here are some benchmarks: PS C:\Users\sween\Source\Repos\cpython2\cpython> .\python.bat -m pyperf compare_to .\main.json .\PR-26697.json |
How common match-case statements with literal integers? I expect that in most cases such magic numbers are not used directly, but as named constants. |
Nothing is common yet because the feature hasn't been released ;-) I suspect that if match-case were optimized for integer constants, they would be used. In discussions about case statements, there are two groups, one looking for a prettier if-elif-else chain and the other looking for a fast vectored jump. This optimization is aimed at the latter group -- they would be happy to have this opportunity to make faster code, even if it means using integer constants. |
This is going in the wrong direction. Rather than add more complex instructions for use only by pattern matching, we need to simplify the individual operations and re-use existing instructions. If you are serious about improving the performance of pattern matching, you need to do the following in the compiler:
E.g. match x: The initial decision tree looks like: if is_sequence(x):
if len(x) == 1:
if x[0] == 1:
A
if is_sequence(x):
if len(x) == 1:
if x[0] == 2:
B which can be improved to: if is_sequence(x):
if len(x) == 1:
if x[0] == 1:
A
elif x[0] == 2:
B For a sequence of integer constants, introducing the test |
Sorry, I’ve been out on vacation this weekend. I didn’t realize that there was already a PR for this… I’m honestly not sure that it’s totally ready yet. While I absolutely agree that compiling efficient decision trees lies in our future, it certainly seems to me that using *both* strategies (decision trees and jump tables) would create the most efficient branching structure. In effect, our control flow would be a rose tree. In your example, Mark, we could perform the checks for the integers 1 and 2 simultaneously, right? Or consider a slightly more complicated example (simplified from PEP-636): match …: We could compile this to something like:
…which looks ideal to me. I’m also confident that the jumping ops could also be broken up into fairly primitive instructions. A multi-way branch would just look something like: (subject on TOS) …where only JUMP_FORWARD_TOS is new.
Sorry, I’m having a bit of difficulty following this part. Is it something like the strategy I just described? |
Also (because some of the people who might be interested are nosied on this issue), I’ve been working a bit on general performance benchmarks for our pattern-matching implementation: https://github.com/brandtbucher/patmaperformance I still need something that hits sequence patterns hard, but I think these will help give us a good baseline for measuring future perf work in this area. (There’s also the perf script at the bottom of Lib/test/test_patma.py, but the test cases it runs on probably are much less representative of “real” code.) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: