Compiling tiny traces wastes lots of memory #116017

brandtbucher · 2024-02-28T00:18:35Z

The JIT has a minimum allocation granularity of 1 page, with code and data on separate pages. Reducing this minimum would mean having pages of memory shared between executors, which is much harder to manage (and isn't thread-safe).

So the minimum amount of memory owned by an executor is 2 pages. This equals 8KB of memory on most platforms (the notable exception being newer Macs which have 16KB pages, and thus a whopping 32KB minimum allocation per trace). And all of that memory is paged in, since there's at least some memory being used on every page.

We should tune the trace collection machinery to find a sensible minimum trace length for tier two. We might also consider compiling fewer traces.

A separate, but hairier, issue is what to do about the array of cold exit executors. We have 512 of them, and they're one instruction each, adding up to 4MB of mostly empty memory (16MB on macOS). For reference, if optimally packed (i.e. compiled as a single trace of length 512), they would take up about 200KB.

@mdboom has been working on measuring the memory overhead of tier two on our benchmark suite. Early results show that:

The tier two interpreter uses ~1% more memory (max RSS).
On Linux, the JIT used ~5% more memory prior to GH-112354: Initial implementation of warm up on exits and trace-stitching #114142, and ~15% after that change landed.
On M1 Mac, the JIT used ~15% more memory prior to GH-112354: Initial implementation of warm up on exits and trace-stitching #114142, and ~80% after that change landed.

(Keep in mind that most of the benchmarks have a tiny memory footprint to begin with, so moderate increases can manifest as large percentages.)

CC: @gvanrossum, @markshannon

Linked PRs

GH-116017: Put JIT code and data on the same page #116845

brandtbucher · 2024-02-28T00:28:28Z

We may also want to consider a scheme for garbage-collecting cold traces. We already have a linked list of executors and a way of invalidating/removing them, so it probably wouldn't be too big of a project.

gvanrossum · 2024-02-28T01:06:24Z

Yeah, for a hot executor the memory waste feels less of an issue than for a cold executor. Without measuring what's hot, we could at least have separate min-trace-size limits if the executor has a loop in it and if it doesn't. Mark probably has more interesting things to say. Meeting agenda for tomorrow?

brandtbucher · 2024-02-28T17:51:27Z

Action item for me here is to investigate a more efficient allocation scheme (with multiple executors per page).

)

savannahostrowski · 2024-05-01T16:48:58Z

I'll take a look at this ahead of 3.14!

brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes labels Feb 28, 2024

brandtbucher mentioned this issue Feb 28, 2024

Produce plots for memory usage faster-cpython/bench_runner#137

Closed

bedevere-app bot mentioned this issue Mar 14, 2024

GH-116017: Put JIT code and data on the same page #116845

Merged

brandtbucher added a commit that referenced this issue Mar 19, 2024

GH-116017: Put JIT code and data on the same page (GH-116845)

2c82592

vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024

pythonGH-116017: Put JIT code and data on the same page (pythonGH-116845

318e6f1

)

adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024

pythonGH-116017: Put JIT code and data on the same page (pythonGH-116845

62ef665

)

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythonGH-116017: Put JIT code and data on the same page (pythonGH-116845

6b82138

)

brandtbucher assigned brandtbucher and unassigned brandtbucher May 1, 2024

brandtbucher assigned savannahostrowski May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling tiny traces wastes lots of memory #116017

Compiling tiny traces wastes lots of memory #116017

brandtbucher commented Feb 28, 2024 •

edited by bedevere-app bot

brandtbucher commented Feb 28, 2024

gvanrossum commented Feb 28, 2024

brandtbucher commented Feb 28, 2024

savannahostrowski commented May 1, 2024

Compiling tiny traces wastes lots of memory #116017

Compiling tiny traces wastes lots of memory #116017

Comments

brandtbucher commented Feb 28, 2024 • edited by bedevere-app bot

Linked PRs

brandtbucher commented Feb 28, 2024

gvanrossum commented Feb 28, 2024

brandtbucher commented Feb 28, 2024

savannahostrowski commented May 1, 2024

brandtbucher commented Feb 28, 2024 •

edited by bedevere-app bot