gh-117958: Expose JIT code via access method in experimental UOpExecutor #117959

tonybaloney · 2024-04-17T05:56:37Z

Adds the get_jit_code() access method to the UOp Executor along with the existing access methods.

This is only accessible via internal C APIs but would be helpful for testing and debugging.

Issue: Expose jit_code field for UOp Executor #117958

… Executor Type

tonybaloney · 2024-04-17T05:57:03Z

@brandtbucher copy of the original PR to your JIT branch.

Python/optimizer.c

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

tonybaloney · 2024-04-18T08:23:39Z

Python/optimizer.c

+    }
+    _PyExecutorObject *executor = (_PyExecutorObject *)self;
+    if (executor->jit_code == NULL || executor->jit_size == 0) {
+        PyErr_SetString(PyExc_ValueError, "No JIT code available.");


This could return an empty string instead of a error

I think the exception makes more sense -- though maybe you should only check for jit_code == NULL.

gvanrossum

This LGTM. I have a nit for the news item and a suggestion for the actual code.

Curious what you're planning to do with this?

We might also worry about security implications -- while this doesn't allow writing the JIT code, it might give an attacker an easier way to analyze the JIT code and look for vulnerabilities. Though they can access this using ctypes as well, of course.

gvanrossum · 2024-04-19T00:10:29Z

Python/optimizer.c

+    }
+    _PyExecutorObject *executor = (_PyExecutorObject *)self;
+    if (executor->jit_code == NULL || executor->jit_size == 0) {
+        PyErr_SetString(PyExc_ValueError, "No JIT code available.");


I think the exception makes more sense -- though maybe you should only check for jit_code == NULL.

Misc/NEWS.d/next/Core and Builtins/2024-04-18-03-49-41.gh-issue-117958.-EsfUs.rst

…e-117958.-EsfUs.rst Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>

tonybaloney · 2024-04-19T00:51:26Z

Curious what you're planning to do with this?

I found a feature like this really useful for debugging in Pyjion and other JITs like ryuJIT. I know there are some other compiler folks who would use this too. There was a discuss thread where someone else was asking for this linked in the issue.

dumping the machine code into a file and disassembling it to do some analysis
looking at CFGs to understand the control flow and compare it with other JITs
(hopefully) emitting some debug symbols in future or at least markers for which offsets relates to the higher level instructions

Some security teams may also want the ability to export the JIT code for analysis, beyond what you can gather by looking at the C templates.

brandtbucher

I'm slightly leaning towards ditching the exceptions and returning None in the case where either _Py_JIT is not defined or jit_code == NULL, and an empty string if jit_size == 0.

I'm thinking about things like regression tests or runtime introspection (where it's more ergonomic to check for None instead of catching an exception in cases where the JIT was built but is disabled, etc.). Not a huge deal, but I think I'd personally rather see an empty string if the JIT code is empty or a None if no JIT code exists when debugging a buggy JIT. :)

Otherwise, this looks good. What do you think?

Python/optimizer.c

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>

…to jit_access_method

tonybaloney · 2024-04-23T01:47:57Z

@brandtbucher amended with your feedback.

Removed the extra check (descriptors check seems to catch anyone trying to call it on another type)
Now returns None instead of raising an exception if there is no JIT code.

diegorusso · 2024-04-23T13:21:31Z

Hello, thanks for the PR! It certainly does the job of capturing the machine code generated by the JIT but I was hoping to have a map between the uop byte code and the related machine code similarly to what I was envisaging here

gvanrossum · 2024-04-30T23:48:29Z

Python/optimizer.c

+
+static PyMethodDef uop_executor_methods[] = {
+    { "is_valid", is_valid, METH_NOARGS, NULL },
+    { "get_jit_code", get_jit_code, METH_NOARGS, NULL},


I'm happy to merge at this point, but did you consider putting this line inside #ifdef _Py_JIT instead? (And then the entire function definition as well.) That would make it possible to test whether this functionality exists without calling it, which is generally Pythonic API design.

brandtbucher · 2024-05-01T06:07:15Z

Hello, thanks for the PR! It certainly does the job of capturing the machine code generated by the JIT but I was hoping to have a map between the uop byte code and the related machine code similarly to what I was envisaging here

So, I've thought about this, and it should be possible with a couple of tweaks.

Basically, this current PR returns a byte string, which consists of the code for each instruction in sequence, followed by the auxiliary data for each instruction in sequence.

Meaning, for a trace of:

[A, B, C, D]

It returns:

b"".join([<A code>, <B code>, <C code>, <D code>, <A data>, <B data>, <C data>, <D data>, <padding>])

However, the executor knows the uops that make up its trace. If we #include "jit_stencils.h", we should be able to use stencil_groups[instruction->opcode].code.body_size and stencil_groups[instruction->opcode].data.body_size to compute these chunks.

Maybe @tonybaloney and @diegorusso can confirm, but it seems like the most useful info to return would be a 3-tuple of base address, a list of code byte strings (corresponding to uops) and a list of data byte strings (again, corresponding to uops).

So, for the above example, the return value would be:

(
    <base address>,
    [<A code>, <B code>, <C code>, <D code>],
    [<A data>, <B data>, <C data>, <D data>],
)

(I think base address is needed for some absolute addressing that we use in places.)

So each of the code or data lists can be zip'd with the executor to map them to individual uops. And if I want the raw string of data that this PR returns now, I can just take this tuple and do b"".join(result[1] + result[2]).

Would this meet everyone's needs, or am I overthinking it? Even though it's internal, I don't want to tweak this too much after the beta freeze on Monday, so I'm leaning towards providing more information rather than less.

markshannon · 2024-05-01T06:47:07Z

I'd be inclined to keep it simple (just returning a bytes object) for 3.13 as feature freeze is imminent.
We can always implement a richer API for 3.14.

Unless, someone really needs the fancier API and is able and willing to implement it in the next two or three days.

diegorusso · 2024-05-01T09:54:57Z

Hello, thanks for the follow up. I was going through a different route by adding a couple of fields in the executor struct

+    size_t *instruction_starts;
+    size_t trace_length;

and then work out where every instruction starts.

Anyway, because the feature freeze is imminent, I would vote for accepting this PR as it is, improve it in the next cycle and dedicate more thinking to the API. Better something good enough than nothing perfect :)
I will create a new issue with what @brandtbucher has suggested in his comment so we don't lose track of it.

Also it helps the fact that I'm off for a few days and I would miss anyway the feature freeze deadline.

gvanrossum · 2024-05-01T14:10:28Z

Okay, then I'll merge it as is.

…7959)

Expose JIT code via access method in byte string for experimental UOp…

6838308

… Executor Type

tonybaloney requested review from markshannon and gvanrossum as code owners April 17, 2024 05:56

bedevere-app bot added the awaiting review label Apr 17, 2024

bedevere-app bot mentioned this pull request Apr 17, 2024

Expose jit_code field for UOp Executor #117958

Closed

gvanrossum reviewed Apr 17, 2024

View reviewed changes

Python/optimizer.c Show resolved Hide resolved

tonybaloney and others added 2 commits April 18, 2024 13:45

Catch undefined behaviour of JIT fields

7be0d85

📜🤖 Added by blurb_it.

21b547e

JelleZijlstra reviewed Apr 18, 2024

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

Update Python/optimizer.c

87f1d74

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

tonybaloney commented Apr 18, 2024

View reviewed changes

gvanrossum approved these changes Apr 19, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Apr 19, 2024

tonybaloney and others added 2 commits April 19, 2024 10:37

Update Misc/NEWS.d/next/Core and Builtins/2024-04-18-03-49-41.gh-issu…

4b13419

…e-117958.-EsfUs.rst Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>

Update 2024-04-18-03-49-41.gh-issue-117958.-EsfUs.rst

bc9284a

gvanrossum requested a review from brandtbucher April 22, 2024 16:38

gvanrossum assigned brandtbucher Apr 22, 2024

brandtbucher reviewed Apr 22, 2024

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

tonybaloney and others added 4 commits April 23, 2024 11:39

Update Python/optimizer.c

bf5bd37

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>

Return none if there is no JIT

23bbe83

Merge branch 'jit_access_method' of github.com:tonybaloney/cpython in…

7019c1c

…to jit_access_method

Merge branch 'main' into jit_access_method

aa5930a

gvanrossum reviewed Apr 30, 2024

View reviewed changes

diegorusso mentioned this pull request May 1, 2024

JIT: map uops with code generated by the JIT #118467

Open

gvanrossum merged commit beb653c into python:main May 1, 2024
50 of 54 checks passed

bedevere-app bot removed the awaiting merge label May 1, 2024

tonybaloney deleted the jit_access_method branch May 1, 2024 22:42

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024

pythongh-117958: Expose JIT code via method in UOpExecutor (python#11…

4ff9489

…7959)

brandtbucher added the topic-JIT label May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-117958: Expose JIT code via access method in experimental UOpExecutor #117959

gh-117958: Expose JIT code via access method in experimental UOpExecutor #117959

tonybaloney commented Apr 17, 2024 •

edited by bedevere-app bot

tonybaloney commented Apr 17, 2024

tonybaloney Apr 18, 2024

gvanrossum Apr 19, 2024

gvanrossum left a comment

gvanrossum Apr 19, 2024

tonybaloney commented Apr 19, 2024

brandtbucher left a comment

tonybaloney commented Apr 23, 2024

diegorusso commented Apr 23, 2024

gvanrossum Apr 30, 2024

brandtbucher commented May 1, 2024

markshannon commented May 1, 2024

diegorusso commented May 1, 2024

gvanrossum commented May 1, 2024

gh-117958: Expose JIT code via access method in experimental UOpExecutor #117959

gh-117958: Expose JIT code via access method in experimental UOpExecutor #117959

Conversation

tonybaloney commented Apr 17, 2024 • edited by bedevere-app bot

tonybaloney commented Apr 17, 2024

tonybaloney Apr 18, 2024

Choose a reason for hiding this comment

gvanrossum Apr 19, 2024

Choose a reason for hiding this comment

gvanrossum left a comment

Choose a reason for hiding this comment

gvanrossum Apr 19, 2024

Choose a reason for hiding this comment

tonybaloney commented Apr 19, 2024

brandtbucher left a comment

Choose a reason for hiding this comment

tonybaloney commented Apr 23, 2024

diegorusso commented Apr 23, 2024

gvanrossum Apr 30, 2024

Choose a reason for hiding this comment

brandtbucher commented May 1, 2024

markshannon commented May 1, 2024

diegorusso commented May 1, 2024

gvanrossum commented May 1, 2024

tonybaloney commented Apr 17, 2024 •

edited by bedevere-app bot