Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45256: Avoid C calls for more Python to Python calls. #28937

Merged
merged 6 commits into from
Oct 18, 2021

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Oct 13, 2021

Extends the approach used for CALL_FUNCTION to CALL_FUNCTION_KW, CALL_METHOD and CALL_METHOD_KW.

Also modifies initialize_locals and _PyTuple_FromArraySteal to have the same behavior w.r.t. reference counting regardless of whether they succeed or fail.

https://bugs.python.org/issue45256

@Fidget-Spinner
Copy link
Member

Did anyone discover what caused the slight slowdown in the initial PR's benchmark?

…onsume the argument references regardless of whether they succeed or fail.
@markshannon
Copy link
Member Author

Did anyone discover what caused the slight slowdown in the initial PR's benchmark?

Maybe the additional tests for consuming references?
I don't think it matters much, as specialization will handle all the fast paths.

@markshannon markshannon added skip news 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Oct 14, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 7c0f498 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 14, 2021
@markshannon
Copy link
Member Author

A little bit faster

@pablogsal pablogsal added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 14, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 03e7ad9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 14, 2021
@@ -5480,7 +5459,7 @@ initialize_locals(PyThreadState *tstate, PyFrameConstructor *con,
_PyErr_Format(tstate, PyExc_TypeError,
"%U() keywords must be strings",
con->fc_qualname);
goto fail;
goto kw_fail;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit confusing. How this is this cleaning the positional arguments in the case where some positional args have been copied and we fail to copy some of the keywords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the positional arguments have been copied at this point.

Copy link
Member

@pablogsal pablogsal Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, so if the call fails you need to increment those references so the cleanup succeeds, no? This is because if the references have been stolen, we own the references back to the positional arguments

Copy link
Member Author

@markshannon markshannon Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If steal_args is true, then we always consume the references. #28937 (comment)

By this point the references to all positional arguments have been consumed, as have all references of kwargs up to i (exclusive).
kw_fail consumes the references to kwargs from i (inclusive) to kwcount (exclusive) so it can them jump to fail_late as all references will have been consumed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wait a minute, I misread what we are doing! We are consuming references now, while before we were increasing them so the stack cleans them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we are avoiding the double free by shrinking the stack here, no?

                  STACK_SHRINK(stackadj);
                    // The frame has stolen all the arguments from the stack,
                    // so there is no need to clean them up.
                    Py_XDECREF(kwnames);
                    Py_DECREF(function);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1702,6 +1702,11 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, InterpreterFrame *frame, int thr
switch (opcode) {
#endif

/* Variables used for making calls */
PyObject *kwnames;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put this into a struct instead of having more locals? It makes it a bit more clean to read and contextualize and it will have the same performance as long as the struct is stack allocated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to rely on all compilers being able to do perfect escape analysis here?
I'd rather leave these as local variables so the compiler can easily allocate them to registers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way that given the size of this function this is going to end in registers. Indeed: I checked with GCC, clang, ICC and xlc in all different optimization level and not a single one places these locals on registers.

Up to you anyway, I don't feel super strongly about it but I think it helps with clarity and organization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine which of these were in registers? Once in SSA form, there are 15 (I think) variables here.

Copy link
Member

@pablogsal pablogsal Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdb/dbx + breakpoint for this function + disas + info registers. I checked if any of the values in these locals are stored or partially stored in registers.

I only checked x86-64 thought.

Python/ceval.c Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, otherwise LGTM.

Great work! I have not checked for refleaks, will do this afterwards but I also scheduled a buildbot run

@markshannon
Copy link
Member Author

I ran the buildbots earlier. All good except the Gentoo ones failing for tkinter stuff.
Feel free to re-run them, in case I broke something in the last couple of commits.

@markshannon markshannon merged commit 70945d5 into python:main Oct 18, 2021
@markshannon markshannon deleted the flatten-all-py-calls branch October 18, 2021 09:55
@pablogsal
Copy link
Member

pablogsal commented Oct 27, 2021

Unfortunately, seems that this commit has broken the AMD64 FreeBSD Shared 3.x buildbot:

https://buildbot.python.org/all/#/builders/483/builds/1003/steps/5/logs/stdio

The buildbot was green until we merged this

@markshannon
Copy link
Member Author

The BSD entry in https://devguide.python.org/experts/ is empty. Not only that, but the gdb tests are decidedly uninformative when they fail. I'll see what I can come up with.

@pablogsal
Copy link
Member

I am taking a look as well, seems that the problem is that it cannot find builtin_id:

Function "builtin_id" not defined.

@pablogsal
Copy link
Member

This is one of the errors:

AssertionError:


'Breakpoint 1 (builtin_id) pending.
Breakpoint 1, builtin_id (self=<optimized out>, v=42) at Python/bltinmodule.c:1197
1197        PyObject *id = PyLong_FromVoidPtr(v);
#4 Frame 0x800235020, for file ...gdb_sample.py, line 12, in <module> ()
    foo(1, 2, 3)
Unable to find an older python frame
Locals for <module>

did not match

'^.*\nLocals for foo\na = 1\nb = 2\nc = 3\nLocals for <module>\n.*$'

@pablogsal
Copy link
Member

So seems that it was not able to find the previous python frame for some reason

@pablogsal
Copy link
Member

Something is going on, I logged into the buildbot (you need to ask koobs for access by writting to koobs@freebsd.org) and indeed many commands are broken:

140-CURRENT-amd64-564d% gdb --args ./python Lib/test/gdb_sample.py
(gdb) b builtin_id
(gdb) r
Breakpoint 1, builtin_id (self=<optimized out>, v=0x801424c00) at Python/bltinmodule.c:1197
...
(gdb) py-list
   7        baz(a, b, c)
   8
   9    def baz(*args):
  10        id(42)
  11
 >12    foo(1, 2, 3)

while on my Linux system:

❯ gdb --args ./python Lib/test/gdb_sample.py
(gdb) b builtin_id
(gdb) r
Breakpoint 1, builtin_id (self=0x7ffff7928470, v=42) at Python/bltinmodule.c:1196
1196    {
(gdb) py-list
   5
   6    def bar(a, b, c):
   7        baz(a, b, c)
   8
   9    def baz(*args):
 >10        id(42)
  11
  12    foo(1, 2, 3)
(gdb)

@pablogsal
Copy link
Member

@markshannon I know the problem. The problem is this code:

cpython/Tools/gdb/libpython.py

Lines 1804 to 1813 in d02ffd1

# gdb is unable to get the "frame" argument of PyEval_EvalFrameEx()
# because it was "optimized out". Try to get "frame" from the frame
# of the caller, _PyEval_Vector().
orig_frame = frame
caller = self._gdbframe.older()
if caller:
frame = caller.read_var('frame')
frame = PyFramePtr(frame)
if not frame.is_optimized_out():
return frame

This is not correct anymore when the frames are inlined as that frame is the top frame of all of them. The problem is that in the buildbot, the frame is optimized so it goes into this fallback.

@pablogsal
Copy link
Member

I will try to prepare a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants