-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow passing custom memory management functions #704
Comments
|
Probably some benefit to this in PostGIS too, getting inside the MemoryContext stucture means things like configured memory upper limits can be enforced, and small allocations (should) be a little faster. |
|
I just realised that the Python allocator will require the GIL (global interpreter lock) to be held. I missed that detail when scanning the docs earlier. So for the shapely usecase this will probably be a reason to keep using the system allocator. Or am I missing something @jorisvandenbossche |
|
Yes, that indeed seems a reason we wouldn't want to set up GEOS with Python's memory allocator in Shapely. There is also a Raw Memory Interface that does not require the GIL, but it seems this is just a plain wrapper around That said, I think this would still be an interesting enhancement to explore. Whether it is for external users of GEOS to plug in their own allocator (like PostGIS might do), or whether it is to experiment with different memory allocators in GEOS itself. For example, in the Arrow C++ project, we have a configurable MemoryPool (header), and have implementations based on the system allocator, jemalloc and mimalloc. But we default to jemalloc / mimalloc (depending on the OS) because those give better performance (if enabled at build time). |
|
In the GEOS context, a custom memory allocator would require all references to C++ containers to be changed from In Arrow C++, as far as I can see, the MemoryPool infrastructure is only used for "big" allocations (that is for the content of Arrow arrays), that are done manually through it, but I don't see it used for standard C++ containers (probably because they don't consume a lot of memory) |
|
I would be seriously interested in attempting to pick this up. Are there any other challenges that immediately comes to mind? I suppose the c-api would have to be extended as well. |
|
Not to my mind. I looked into it a while ago, and to my untrained eye this looked like a reasonable way to do the implementation: https://github.com/aws/aws-sdk-cpp/tree/bb1fdce01cc7e8ae2fe7162f24c8836e9d3ab0a2/aws-cpp-sdk-core/include/aws/core/utils/memory |
Providing custom memory management functions would allow dependent libraries to use their memory management policies. Especially when small amount of bytes are repeatedly allocated/deallocated, caching/pooling may show a decent speedup.
Also out-of-memory situations may be handled more gracefully (specifically for pygeos/shapely, we could throw an error instead of going in to swap, see shapely/shapely#1349)
Ref. https://trac.osgeo.org/geos/ticket/540
One way to set up such a C interface is well described in the Numpy docs (NEP49): https://numpy.org/neps/nep-0049.html and another one in the Python docs: https://docs.python.org/3/c-api/memory.html#memoryoverview
@jorisvandenbossche @hobu
The text was updated successfully, but these errors were encountered: