Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-105848: Get rid of KW_NAMES (again) #108496

Closed
wants to merge 6 commits into from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Aug 25, 2023

This is an updated version of #105849.


馃摎 Documentation preview 馃摎: https://cpython-previews--108496.org.readthedocs.build/

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@brandtbucher
Copy link
Member Author

I might have messed something up. Benchmarking came back 2% slower with wonky stats.

@markshannon
Copy link
Member

I see a few ways to make this faster.

  1. Leave the code much as it is now, but store kwnames on the threadstate instead of in a C local variable. We would need to take extra care to ensure that no Python code could be called between KW_NAMES and the start of the following call and move kwnames from the threadstate to a local at the start of every CALL instruction.
  2. Use a borrowed reference to kwnames. We would need to make sure that error handling doesn't decref the kwnames. (If we get deferred references, we could make kwnames a deferred reference and it wouldn't need any special casing for error handling)

Option 2 sounds a lot simpler, if a bit slower.

Unless we go with option 1, we are going to see a lot more PUSH_NULL instructions. It might be worth adding LOAD_FAST_PUSH_NULL and LOAD_CONST_PUSH_NULL superinstructions.

My guess is that with option 2, and the the superinstructions, there won't be much of a slowdown.

@markshannon
Copy link
Member

markshannon commented Aug 27, 2023

(And please add a link to the stats/performance numbers)

@brettcannon brettcannon removed their request for review August 28, 2023 19:22
@brandtbucher
Copy link
Member Author

brandtbucher commented Aug 28, 2023

(The stats issue I observed was in the comparison vs main, and is unrelated.)

So, here's the current situation. This branch adds over 7 billion new PUSH_NULL instructions to the benchmark run, which is a significant (~5%) increase in the number of instructions executed. This makes the benchmarks 2% slower (stats).

I tried the "easy" route of adding a new LOAD_FAST_PUSH_NULL superinstruction, since the stats suggest that this is by far the most common PUSH_NULL pairing. However, that made it an additional 2% slower (for a total of 4% slower than main), which is surprising (stats). Note that a handful of the slowest benchmarks from this branch got way faster on that branch.

So, it seems that clearly something needs to be done, but superinstructions aren't going to help. I'll try some of the other suggested ideas tomorrow, but in general this is getting to be a pretty annoying problem.

@gvanrossum
Copy link
Member

That's disappointing; it would seem that moving kwnames to the tstate makes most sense to try next?

@brandtbucher
Copy link
Member Author

brandtbucher commented Aug 30, 2023

Moving keyword names to the frame is a little less than 1% slower vs main. I was getting crashes with moving it to the tstate (which is trickier), but I think I resolved those and I'm benchmarking that approach now.

@brandtbucher
Copy link
Member Author

brandtbucher commented Aug 30, 2023

Sticking it on the tstate is a little more than 1% slower vs main.

@gvanrossum
Copy link
Member

Off-line we decided to at least look at splitting CALL into two separate opcode families, CALL and CALL_KW. Looking at the specializations, most of them seem to either insist on no keywords (the majority), or insist on keywords. There are only a few specializations that need to handle either case. So we would not be adding many new opcodes, and the split makes the specialization more efficiently. Using uops we could avoid having more code. Of course, we (Brandt :-) still have to code this and benchmark it before we can be sure of those predictions -- this has been a surprising adventure so far.

@brandtbucher
Copy link
Member Author

Closing in favor of the CALL/CALL_KW split (PR up soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants