New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
with should be as fast as try/finally #46432
Comments
Currently, 'with' costs about .2us over try/finally: $ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.617 usec per loop
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.774 usec per loop Since it's doing the same thing (and calling the same C functions to do
2 0 LOAD_GLOBAL 0 (lock) 3 20 POP_BLOCK
6 0 LOAD_GLOBAL 0 (lock) 7 10 SETUP_FINALLY 4 (to 17) 8 13 POP_BLOCK 10 >> 17 LOAD_GLOBAL 0 (lock) The major difference I see is the extra local variable (_[1]) used by I've added everyone on the previous bug to the nosy list. Sorry if you |
A closer approximation of what the with statement is doing would be: exit = lock.release()
lock.acquire()
try:
pass
finally:
exit() The problem with trying to store the result of the retrieval of __exit__ However, changing WITH_CLEANUP to take an argument indicating which |
Scratch the parentheses on that first line of sample code in my previous |
Here's a proof-of-concept patch that keeps the __exit__ method on the The patch changes the compilation of: def with_(l):
with l:
pass from 4 0 LOAD_FAST 0 (l) 5 20 POP_BLOCK to 4 0 LOAD_FAST 0 (l) 5 18 POP_BLOCK And speeds it up from: $ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.832 usec per loop to: $ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.762 usec per loop That's only half of the way to parity with try/finally: $ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.638 usec per loop What's strange is that calling __enter__ and __exit__ in a try/finally $ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.__enter__()' 'try: pass' 'finally:
lock.__exit__()'
1000000 loops, best of 3: 0.754 usec per loop Any ideas? |
Now with documentation, a working test_compile, and one less refleak. |
Looking carefully at the code, there are two reasons for this:
Phew. |
Patch applied cleanly for me and all tests pass. It also looked good on a visual scan over the diff text. |
I went ahead and committed the change to the bytecode generation as r61290. The deficiencies in the lock implementation should probably be raised as |
Hm, my tests do not see any speedup with this patch. Maybe the optimization is only useful with gcc? |
Thanks Nick and Amaury! Amaury, what times are you seeing? It could be a just-gcc speedup, but I Here are my current timings. To avoid the lock issues, I wrote class CM(object):
def __enter__(self):
pass
def __exit__(self, *args):
pass $ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'with cm: pass'
1000000 loops, best of 3: 0.885 usec per loop
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()'
1000000 loops, best of 3: 0.858 usec per loop If __exit__ doesn't contain *args (making it not a context manager), the I think in theory, with could be slightly faster than finally with the |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: