Skip to content

Conversation

@JeffersGlass
Copy link
Contributor

@JeffersGlass JeffersGlass commented May 28, 2024

This PR adds an additional table to the output from summarize_stats.py. Namely, a table of (# of times a uop was exectued) * (length of that UOp in machine code), sorted by this value. This makes it clear how much time* is being spent in each UOp, as opposed to just which ones are most frequently executed.

*Machine instruction count is a rough proxy for time, but a really easy one to calculate.

The new table looks like this:

Total Machine Instruction Counts per UOp

Name Product Self Cumulative Count Length (Machine Instructions)
_COLD_EXIT 741,336 14.4% 14.4% 1,173 632
_TIER2_RESUME_CHECK 436,914 8.5% 22.8% 2,511 174
_STORE_FAST_0 389,712 7.6% 30.4% 2,118 184
_BINARY_OP_ADD_INT 327,339 6.3% 36.7% 983 333
_START_EXECUTOR 231,442 4.5% 41.2% 1,193 194
_LOAD_FAST_0 225,144 4.4% 45.6% 1,416 159
... ... ... ... ... ...

Closes #119692. Tagging @brandtbucher as the requester of this feature, and @mdboom for pystats visibility.

Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This LGTM, but let's let @brandtbucher confirm the stats themselves make sense for what he needs.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I think it will be really useful!

A few notes:

  • I don't love searching for and parsing jit_stencils.h like this... it's pretty fragile (the JIT is changing quite a bit right now, and out-of-tree builds are a thing unfortunately). A more robust solution would be to dump the code size as part of the stats themselves in the interpreter, since we have access to the stencil_groups array in the C code. Then we could just parse it out of the stats dump normally.
  • No need to repeat the "Self" and "Cumulative" values in the new table.
  • I'd rename "length" to "size" (minor, but we use "length" to mean other things in the stats summary).
  • Remove all mention of "machine instructions", since what we're really measuring is size (in bytes).
  • I'd replace "Product" with something like "Total Size" or "Total Cost" in the table, and move it after the "Count" and "Size" columns (which makes it a bit clearer to me that it is derived from them).

@JeffersGlass
Copy link
Contributor Author

JeffersGlass commented May 31, 2024

Thanks for the feedback @brandtbucher! I've reworked things so that the code size (and data size) of each stencil are dumped as part of the stats, and the summarize script picks them up from there. I've also renamed and re-ordered the table fields to clarify what is being shown.

I've left the Self and Cumulative columns for now - they're the percentage of all the bytes that were jitted by the current UOp (and the running total of the same), if that makes sense, so they're not just repeating the values from an earlier table. But I'm happy to remove or re-label them if that doesn't actually seem useful.

The new table with some sample data looks like:

Total Bytes Executed per JIT'ed UOp

Name Count Stencil Size (Bytes) Total Size Self (Total Size) Cumulative (Total Size)
_COLD_EXIT 23,808 447 10,642,176 30.9% 30.9%
_STORE_NAME 23,800 259 6,164,200 17.9% 48.7%
_START_EXECUTOR 23,808 170 4,047,360 11.7% 60.5%
_EXIT_TRACE 23,808 151 3,595,008 10.4% 70.9%
_ITER_NEXT_RANGE 23,800 86 2,046,800 5.9% 76.8%
_ITER_CHECK_RANGE 23,808 82 1,952,256 5.7% 82.5%
_CHECK_VALIDITY 23,800 76 1,808,800 5.2% 87.7%
_GUARD_NOT_EXHAUSTED_RANGE 23,808 72 1,714,176 5.0% 92.7%
_TIER2_RESUME_CHECK 23,808 66 1,571,328 4.6% 97.2%
_SET_IP 23,800 40 952,000 2.8% 100.0%

** Updated - see below **
Right now, there's a bit of a kludge in load_raw_data so that the stencil lengths don't get summed. The lines containing the info about the code stencils look like uops[_MATCH_KEYS].code_size : 74, and this snippet makes sure they're just recorded, not summed.

# Data about JIT stencils isn't cumulative
if "code_size" in key or "data_size" in key:
    stats[key.strip()] = int(value)
else:
    stats[key.strip()] += int(value)

I can see breaking this data into a new prefix (uops[_MATCH_KEYS].data.XXX maybe, and looking for data in the key?) and reworking how its loaded, if that seems cleaner?

@JeffersGlass
Copy link
Contributor Author

JeffersGlass commented May 31, 2024

I made a small format change - keys that have metadata in them should simply be set across the input files (instead of summed). So the dumped stencil-length data looks like:

uops[_CONVERT_VALUE].metadata.code_size : 227
uops[_CONVERT_VALUE].metadata.data_size : 280
uops[_COPY].metadata.code_size : 137
uops[_COPY].metadata.data_size : 216
uops[_COPY_FREE_VARS].metadata.code_size : 396
uops[_COPY_FREE_VARS].metadata.data_size : 480
...

I think ideally, there would be some checking that these values are consistent across all the stats files, but currently it'll just use the last value it finds. A bit of a kludge still, but since I would guess it's rare to have stats files hanging around from multiple builds with different jit stencils, perhaps this is fine for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "Total Machine Code Cost" of UOps to PyStats

3 participants