Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-105848: Get rid of KW_NAMES #105849

Closed
wants to merge 13 commits into from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Jun 16, 2023

Instead, optionally push tuple of keyword names before every CALL (using the low bit of the oparg as a flag to indicate their presence).


馃摎 Documentation preview 馃摎: https://cpython-previews--105849.org.readthedocs.build/

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Definitely happy to see KW_NAMES and its side-channel local variable go away.

It's not clear to me that it wouldn't be simpler to just do the conditional version with oparg flag right away in this PR (you've added a lot of PUSH_NULL that are going to be removed shortly, and I wouldn't want this intermediate state to persist without measuring the code size and perf impact), but since you've already done it this way, seems OK.

@@ -1387,6 +1390,10 @@ iterations of the loop.

.. versionadded:: 3.11

.. versionchanged:: 3.12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is really a candidate for 3.12 backport?

Suggested change
.. versionchanged:: 3.12
.. versionchanged:: 3.13

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, typo.

Py_DECREF(callable);
GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS);
}

inst(CALL_PY_EXACT_ARGS, (unused/1, func_version/2, method, callable, args[oparg] -- unused)) {
inst(CALL_PY_EXACT_ARGS, (unused/1, func_version/2, method, callable, args[oparg], kwnames -- unused)) {
assert(kwnames == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assert is worth having kwnames instead of unused in the instruction signature? I'm guessing compilers will be smart enough to eliminate the unused load from stack when asserts are not enabled...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, probably not worth keeping around. The new assert can just check the oparg flag instead.

SKIP_OVER(INLINE_CACHE_ENTRIES_CALL);
frame->return_offset = 0;
DISPATCH_INLINED(new_frame);
}

inst(CALL_PY_WITH_DEFAULTS, (unused/1, func_version/2, method, callable, args[oparg] -- unused)) {
inst(CALL_PY_WITH_DEFAULTS, (unused/1, func_version/2, method, callable, args[oparg], kwnames -- unused)) {
assert(kwnames == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above, and several more places below as well

@brandtbucher
Copy link
Member Author

It's not clear to me that it wouldn't be simpler to just do the conditional version with oparg flag right away in this PR (you've added a lot of PUSH_NULL that are going to be removed shortly, and I wouldn't want this intermediate state to persist without measuring the code size and perf impact), but since you've already done it this way, seems OK.

I originally benchmarked this and it was "0% faster", but I re-ran the benchmarks after merging in main and it came back 1% slower. So while it's probably in the noise, I agree it's probably just cleaner to do the oparg flag here since that is our end goal anyways, and it should make the perf impact less iffy.

I'll have some new commits with the flag up later today.

@brandtbucher brandtbucher marked this pull request as draft June 16, 2023 18:14
@brandtbucher
Copy link
Member Author

Rerunning the benchmarks...

@brandtbucher brandtbucher marked this pull request as ready for review June 16, 2023 21:13
@brettcannon brettcannon removed their request for review June 16, 2023 22:27
@brandtbucher
Copy link
Member Author

Hm... still 1% slower. Maybe because of the new branches and bitwise operations calls have now?

@brandtbucher
Copy link
Member Author

Given the disappointing performance results, I'm going to close this. It's probably still worth exploring in the future, but isn't really blocking anything we have in the pipeline right now.

@gvanrossum
Copy link
Member

Not having this is holding back converting some Tier 1 CALL specializations to Tier 2, because the Tier 2 interpreter doesn't have access to the kwnames variable. Moving that onto the stack seems the obvious solution (as opposed to passing the address of the kwnames variable around, which seems horrible).

@gvanrossum
Copy link
Member

(Reopening as a reminder.)

@gvanrossum gvanrossum reopened this Aug 21, 2023
@markshannon
Copy link
Member

Given that we hold a strong reference to the code object, we can safely use a borrowed reference to the kwnames tuple.
Using a borrowed reference saves a lot of branches removing the Py_XDECREF to clean up the kwnames.

@gvanrossum
Copy link
Member

I have a feeling it would be easier to start over than to attempt to merge main into this PR. @brandtbucher Your thoughts? This is the next frontier in translating remaining CALL specializations to Tier 2.

@brandtbucher
Copy link
Member Author

My plan is to start over (I can start on this today). Do we have a preference between extra PUSH_NULL instructions vs stealing a bit from CALL's oparg? I'm slightly leaning towards the former, since it's much simpler and they both seem to be about the same, performance-wise.

@brandtbucher
Copy link
Member Author

Re: borrowed references, that seems like something we could bolt onto this after (with a new LOAD_CONST_BORROWED opcode or similar).

@gvanrossum
Copy link
Member

My plan is to start over (I can start on this today). Do we have a preference between extra PUSH_NULL instructions vs stealing a bit from CALL's oparg? I'm slightly leaning towards the former, since it's much simpler and they both seem to be about the same, performance-wise.

Excellent. I'd vote for extra PUSH_NULLs -- in Tier 2 that'd lead to less work (no need to check a bit in oparg and shift to get arg count).

@brandtbucher
Copy link
Member Author

Closing in favor of #108496 (updated version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants