-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-81392: obmalloc: eliminate limit on pool size #13934
Conversation
0da278d
to
c5ef280
Compare
It sounds irrational to multiply sizes by 4 (why not simply 2?) when going from 32-bit to 64-bit. More generally, this change will make freed memory less likely to be reclaimed by the system. So there is a tradeoff. |
If someone cares enough to pursue it, they can measure a range of possibilities. The intuition isn't just that we moved from 32- to 64-bit, but also that typical machines have far more RAM now than they did when obmalloc was first written (about 18 years ago). Even if 64-bit machines had never been created, I'd be in favor of at least doubling pool and arena sizes by now for 32-bit machines (but since I never use such machines anymore, nor does anyone I interact with, I'm not proposing to change what they use). BTW, in related work on a different approach, Neil S saw consistent measurable speed gains from boosting arena size even more. Apparently
Sure. But in the absence of quantification, I'm not much inclined to care. Whether arenas can be freed is mostly a matter of blind luck regardless of which sizes are used, with only one poke-&-hope heuristic employed to try to increase the likelihood of arenas emptying. I'm not worried about it. That doesn't mean I shouldn't be - but I nevertheless am not 😉. |
That's true, but typical machines also run many processes at once.
By "working fine", you mean he's fine with all Python processes allocating memory in chunks of 16 MiB and releasing those chunks only when they are perfectly empty? I'm not sure everyone would agree. mmap/munmap is probably expensive. We should study what other allocators do. IIRC, jemalloc calls madvise() with MADV_FREE. A more sophisticated approach is a free list of arenas on which you call MADV_FREE, and when that free list overflows you just munmap() the extraneous arenas. The granularity of memory management is a delicate tradeoff. For example, Linus Torvalds still is a proponent of 4 kB pages at the HW and OS level. See this subthread. And this quote of his is also interesting for our context:
Consuming more memory in Python might make system-level performance worse. |
Note that this PR only boosts arena size to 1 MiB - it's Neil who is usually using 16 MiB. If you're a fan of
About Torvalds, Python is not the OS. The OS has to worry about actually backing pages with physical RAM. We don't. We merely reserve address space, which doesn't get associated with actual RAM until we actually write into a page. So long as Linux sticks to 4 KiB pages, that's the granularity at which we consume RAM too, regardless of how large our arena address reservations are. On a 64-bit box, an address reservation of 1 MiB is trivial. Even on the feeblest edition of 64-bit Windows, we can do that 8 million times (2**23) before exhausting a process's user virtual address space. Nothing in the PR directly changes the amount of RAM we actually use (note that I rearranged the pool header so that its size didn't change despite adding a new member), and for the smallest size classes RAM efficiency directly increases a tiny bit (we can, e.g., fit 6 more 16-byte objects into a 4 KiB pool than in four 1 KiB pools). Maybe we'll be able to free less arena space, but maybe not - depends on the app. So far I've seen cases go both ways - luck of the draw. |
A bunch of comments need updating.
Restored the original pool & arena sizes for 32-bit boxes. Got rid of the distinct "page" overhead & quantization stats, and folded them into the "pool" stats.
(rather than pool-based) now. Also other assorted comment changes.
inputs and outputs are correctly aligned. This should have been done from day one.
c5ef280
to
2cf1f3c
Compare
@tim-one What's the fate of this PR? |
This is dead. Arenas are 4x larger already now on 64-bit boxes, and so are pools if radix tree tracking is used (which it should be - there's no sane reason to keep the old code around anymore). It would probably be valuable to make both arenas and pools larger still, but I have no intent to pursue it. |
As described in bpo-37211, this changes
address_in_range()
to be page-based rather than pool-based, allows pools to span any power-of-2 number of pages, and on 64-bit boxes quadruples the size of both pools and arenas.It would be great to get feedback from 64-bit apps that do massive amounts of small-object allocations and deallocations.
https://bugs.python.org/issue37211