Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97933: inline list/dict/set comprehensions #101441

Merged
merged 56 commits into from
May 9, 2023
Merged

Conversation

carljm
Copy link
Member

@carljm carljm commented Jan 30, 2023

Closes #97933.

In builds with --enable-optimizations:

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]' --compare-to ../mainopt/python
/home/carljm/build/mainopt/python: ..................... 200 ns +- 3 ns
/home/carljm/build/ic2opt/python: ..................... 120 ns +- 2 ns

Mean +- std dev: [/home/carljm/build/mainopt/python] 200 ns +- 3 ns -> [/home/carljm/build/ic2opt/python] 120 ns +- 2 ns: 1.67x faster

Credit to @vladima for authoring the original version of this code in Cinder 3.8. The compiler approach is different in this version (so as to support inlining all comprehensions, regardless of name clashes); some of the symbol table changes remain the same.

A couple implementation notes:

  • We need a new opcode LOAD_FAST_AND_CLEAR because we need to push/pop the value of possibly-undefined variables on the stack while clearing them. To do this with existing opcodes would require eliminating the invariant/assert that LOAD_FAST never loads NULL, and would add inefficient refcount operations and interpreter loop cycles.
  • If a comprehension iteration variable is a cell inside the comprehension (i.e. the comprehension builds closures) we end up including that variable in both co_varnames and co_cellvars for the inlined-into outer function and using the non-offset (varnames) storage location for it. This is how function arguments that are also cells are currently handled, so we just piggyback on that handling. This makes the needed push/pop of the outer cell simpler as we can just use LOAD_FAST_AND_CLEAR/STORE_FAST as-is for it, and it will push/pop the cell itself. Otherwise we would need some new handling in the compiler for allowing push/pop of a cell in offset storage.

This PR also adds a lot of new tests for comprehension scoping edge cases, and runs all new and existing scoping
tests in module, function, and class scopes (without requiring duplication of the test code.) All comprehension
scoping tests added in this diff also pass with comprehension inlining disabled (ensuring semantics did not change.)

@carljm
Copy link
Member Author

carljm commented Jan 31, 2023

@markshannon @iritkatriel would be grateful for a pyperformance run on this one (presuming signal all looks good.)

@carljm carljm changed the title gh-97933: inline sync list/dict/set comprehensions gh-97933: inline list/dict/set comprehensions in functions Jan 31, 2023
@carljm carljm added the performance Performance or resource usage label Jan 31, 2023
@markshannon
Copy link
Member

I'll run the benchmarks now

@markshannon
Copy link
Member

Benchmarking is delayed a bit, but should be back tomorrow at the latest.

@markshannon
Copy link
Member

markshannon commented Jan 31, 2023

You don't seem to be clearing the inner variable before use, so that

def f():
    l = [ None ]
    [ (l[0], l) in [[1,2]] ]
    print(l)

will, I think, print [1] instead of raising an UnboundLocalError

You can actually make the code more efficient by clearing the local when stashing it on the stack:

inst(LOAD_FAST_AND_CLEAR, ( -- val)) {
    val = LOCALS[oparg];
    LOCALS[oparg] = NULL;
}

as there is no need for a NULL check or INCREF.

Python/compile.c Outdated Show resolved Hide resolved
@markshannon
Copy link
Member

markshannon commented Jan 31, 2023

A couple of obscure issues, but this looks like a very nice improvement.

I think the gain in performance is well worth the cost in code size, given how common comprehensions are.

* main:
  pythongh-101440: fix json snippet error in logging-cookbook.rst (python#101439)
  pythongh-99276 - Updated Doc/faq/general.rst (python#101396)
  Add JOBS parameter to docs Makefile (python#101395)
  pythongh-98831: rewrite GET_LEN, GET_ITER, BEFORE_WITH and a few simple opcodes in the instruction definition DSL (python#101443)
  pythongh-77607: Improve accuracy of os.path.join docs (python#101406)
  Fixes typo in asyncio.TaskGroup context manager code example (python#101449)
  pythongh-98831: Clean up and add cache size static_assert to macro (python#101442)
  pythongh-99955: use SUCCESS/ERROR return values in optimizer and assembler. Use RETURN_IF_ERROR where appropriate. Fix a couple of bugs. (python#101412)
@carljm
Copy link
Member Author

carljm commented Jan 31, 2023

You don't seem to be clearing the inner variable before use

Your example code doesn't involve a comprehension, I'm assuming you meant something like

def f():
    l = [ None ]
    [ (l[0], l) for l in [[1,2]] ]
    print(l)

But this doesn't raise UnboundLocalError on main either, it works fine; the iteration variable l is always bound within the comprehension before the expression is evaluated.

So far I haven't been able to construct a test case where failing to clear an incoming variable bound inside the comprehension has any visible effect, because AFAICT any locals in a comprehension are always (necessarily by the syntactic limitations of how a comprehension is built) defined before use. Let me know if you can think of a test case I'm missing!

You can actually make the code more efficient by clearing the local when stashing it on the stack

But I think this is sufficient reason regardless to justify switching from LOAD_FAST_OR_NULL to LOAD_FAST_AND_CLEAR, so I'll do that!

@carljm
Copy link
Member Author

carljm commented Jan 31, 2023

Actually I don't think LOAD_FAST_AND_CLEAR will work, because of cases like this:

def f(x):
    return [x for x in x]

We need to save/restore the outer value of x, but we cannot clear it on entry because we are also relying on being able to load it when we evaluate the iterable part of the comprehension (which in the non-inlined case would have been evaluated outside the function and passed in as the .0 parameter.) So I think we need to stick with a non-clearing LOAD_FAST_OR_NULL on entry.

If there were a correctness reason for needing to clear in other cases, I could add an additional DELETE_FAST after the LOAD_FAST_OR_NULL, but so far I haven't found any such correctness need, and clearing this way is not an efficiency improvement.

EDIT: there is also the option of doing clearing pushes and also moving evaluation of the outermost iterable to before the pushes; this would require some non-elidable SWAPs to preserve it as TOS through the pushes, though...

@carljm
Copy link
Member Author

carljm commented Jan 31, 2023

@markshannon I think LOAD_FAST_OR_NULL could also become a pseudo-instruction that resolves to LOAD_FAST (which would get around having it turned into LOAD_FAST_CHECK), but we'd have to remove the assert(value != NULL) in the LOAD_FAST implementation. Do you prefer keeping that assertion at the cost of another (non-pseudo) opcode?

@carljm
Copy link
Member Author

carljm commented Jan 31, 2023

I've addressed the review comments. I have some TODOs for potential optimizations that could be done here or in a follow-up diff:

  • Eliminate some unnecessary push/pops in cases where the value is never loaded after the comprehension. This would fall into the general category of dead store elimination. I'm not sure if we can actually do this or if it violates some guarantees about introspectable frame state in tracing?
  • Better handling of SWAPs on the post-comprehension pops. apply_static_swaps can't help us here if the comprehension value is immediately returned (because we can't "swap" a RETURN_VALUE -- this case would go away if we can do the dead store elimination, since if it's right before a RETURN_VALUE it's clearly a dead store) or if it's immediately popped (because the POP_TOP will have a lineno of -1 and apply_static_swaps will thus refuse to swap it). Instead of having one SWAP 2 per pop, we could have a single SWAP N and do the last pop first.
  • Teach add_checks_for_loads_of_uninitialized_variables to understand and trace through the push/pop pairs.

* main:
  pythonGH-100288: Skip extra work when failing to specialize LOAD_ATTR (pythonGH-101354)
  pythongh-101409: Improve generated clinic code for self type checks (python#101411)
  pythongh-98831: rewrite BEFORE_ASYNC_WITH and END_ASYNC_FOR in the instruction definition DSL (python#101458)
  pythongh-101469: Optimise get_io_state() by using _PyModule_GetState() (pythonGH-101470)
@markshannon
Copy link
Member

Sorry for posting a gibberish example.
Here's what I should have posted:

def f():
    l = [ None ]
    [ 1 for (l[0], l) in [[1,2]] ]
    print(l)
f()

3.11

$ python3.11 ~/test/test.py 
Traceback (most recent call last):
   ...
UnboundLocalError: cannot access local variable 'l' where it is not associated with a value

This PR

$ ./python ~/test/test.py 
[1]

Python/compile.c Outdated Show resolved Hide resolved
Python/compile.c Show resolved Hide resolved
Python/compile.c Show resolved Hide resolved
Python/compile.c Outdated Show resolved Hide resolved
Python/symtable.c Outdated Show resolved Hide resolved
Python/symtable.c Show resolved Hide resolved
Python/symtable.c Show resolved Hide resolved
Copy link
Member Author

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review!

Python/compile.c Show resolved Hide resolved
Python/compile.c Show resolved Hide resolved
carljm and others added 3 commits May 8, 2023 23:19
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* main: (47 commits)
  pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)
  pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)
  pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)
  pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)
  pythonGH-104284: Fix documentation gettext build (python#104296)
  pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)
  pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)
  pythongh-99108: fix typo in Modules/Setup (python#104293)
  pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172)
  pythongh-103193: Improve `getattr_static` test coverage (python#104286)
  Trim trailing whitespace and test on CI (python#104275)
  pythongh-102500: Remove mention of bytes shorthand (python#104281)
  pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256)
  pythongh-99108: Replace SHA3 implementation HACL* version (python#103597)
  pythongh-104273: Remove redundant len() calls in argparse function (python#104274)
  pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200)
  pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266)
  pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244)
  pythongh-103650: Fix perf maps address format (python#103651)
  pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243)
  ...
@carljm carljm added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 9, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @carljm for commit a8425a6 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label May 9, 2023
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Jelle and Irit did the heavy review, so this is your PEP-7 bot chiming in for some style suggestions 🤖

Python/compile.c Outdated Show resolved Hide resolved
Python/compile.c Outdated Show resolved Hide resolved
Python/flowgraph.c Outdated Show resolved Hide resolved
Python/symtable.c Outdated Show resolved Hide resolved
Python/symtable.c Outdated Show resolved Hide resolved
Python/symtable.c Outdated Show resolved Hide resolved
Python/symtable.c Outdated Show resolved Hide resolved
carljm and others added 3 commits May 9, 2023 07:08
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
* main:
  pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)
  pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)
  pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)
  require-pr-label.yml: Add missing "permissions:" (python#104309)
  pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)
  pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)
@carljm carljm added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 9, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @carljm for commit 95401fe 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 9, 2023
@carljm
Copy link
Member Author

carljm commented May 9, 2023

The s390x Fedora LTO + PGO PR and s390x Fedora LTO PR buildbots look generally unhealthy. The latter has a full disk.

AMD64 Debian root PR is weird. The failure looks a bit flaky (segfault in test_threads_join) and unlikely to be related, but that failure isn't showing up on any of its other builds. Requested a re-build to see if it repros.

@carljm
Copy link
Member Author

carljm commented May 9, 2023

The buildbot worker hosting the various s390x Fedora ... builds appears to be out of disk space and unhealthy. So I'm satisfied that there aren't any buildbot issues caused by this PR.

Merging!

@carljm carljm merged commit c3b595e into python:main May 9, 2023
17 checks passed
@carljm carljm deleted the inlinecomp2 branch May 9, 2023 17:02
carljm added a commit to carljm/cpython that referenced this pull request May 9, 2023
* main:
  pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)
  pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)
  pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)
  pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)
  pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)
  pythongh-104240: return code unit metadata from codegen (python#104300)
carljm added a commit to carljm/cpython that referenced this pull request May 9, 2023
* main: (156 commits)
  pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)
  pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)
  pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)
  pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)
  pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)
  pythongh-104240: return code unit metadata from codegen (python#104300)
  pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)
  pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)
  pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)
  require-pr-label.yml: Add missing "permissions:" (python#104309)
  pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)
  pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)
  pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)
  pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)
  pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)
  pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)
  pythonGH-104284: Fix documentation gettext build (python#104296)
  pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)
  pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)
  pythongh-99108: fix typo in Modules/Setup (python#104293)
  ...
carljm added a commit to carljm/cpython that referenced this pull request May 9, 2023
* main: (35 commits)
  pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)
  pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)
  pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)
  pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)
  pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)
  pythongh-104240: return code unit metadata from codegen (python#104300)
  pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)
  pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)
  pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)
  require-pr-label.yml: Add missing "permissions:" (python#104309)
  pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)
  pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)
  pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)
  pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)
  pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)
  pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)
  pythonGH-104284: Fix documentation gettext build (python#104296)
  pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)
  pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)
  pythongh-99108: fix typo in Modules/Setup (python#104293)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline dict/list/set comprehensions in the compiler for better performance
9 participants