Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing custom memory management functions #704

Open
caspervdw opened this issue Oct 17, 2022 · 6 comments
Open

Allow passing custom memory management functions #704

caspervdw opened this issue Oct 17, 2022 · 6 comments
Labels
Enhancement New feature or feature improvement.

Comments

@caspervdw
Copy link
Contributor

caspervdw commented Oct 17, 2022

Providing custom memory management functions would allow dependent libraries to use their memory management policies. Especially when small amount of bytes are repeatedly allocated/deallocated, caching/pooling may show a decent speedup.

Also out-of-memory situations may be handled more gracefully (specifically for pygeos/shapely, we could throw an error instead of going in to swap, see shapely/shapely#1349)

Ref. https://trac.osgeo.org/geos/ticket/540

One way to set up such a C interface is well described in the Numpy docs (NEP49): https://numpy.org/neps/nep-0049.html and another one in the Python docs: https://docs.python.org/3/c-api/memory.html#memoryoverview

@jorisvandenbossche @hobu

@pramsey
Copy link
Member

pramsey commented Oct 17, 2022

Probably some benefit to this in PostGIS too, getting inside the MemoryContext stucture means things like configured memory upper limits can be enforced, and small allocations (should) be a little faster.

@pramsey pramsey added the Enhancement New feature or feature improvement. label Oct 17, 2022
@caspervdw
Copy link
Contributor Author

I just realised that the Python allocator will require the GIL (global interpreter lock) to be held. I missed that detail when scanning the docs earlier.

So for the shapely usecase this will probably be a reason to keep using the system allocator. Or am I missing something @jorisvandenbossche

@jorisvandenbossche
Copy link
Contributor

Yes, that indeed seems a reason we wouldn't want to set up GEOS with Python's memory allocator in Shapely. There is also a Raw Memory Interface that does not require the GIL, but it seems this is just a plain wrapper around malloc et al, so using this wouldn't give any advantage (it's rather meant for overriding this in Python with your custom allocator).

That said, I think this would still be an interesting enhancement to explore. Whether it is for external users of GEOS to plug in their own allocator (like PostGIS might do), or whether it is to experiment with different memory allocators in GEOS itself.
(I would assume that the leg work for both use cases would largely overlap?)

For example, in the Arrow C++ project, we have a configurable MemoryPool (header), and have implementations based on the system allocator, jemalloc and mimalloc. But we default to jemalloc / mimalloc (depending on the OS) because those give better performance (if enabled at build time).

@rouault
Copy link
Contributor

rouault commented Nov 14, 2022

In the GEOS context, a custom memory allocator would require all references to C++ containers to be changed from std::vector<T> to std::vector<T, my_allocator> (cf https://stackoverflow.com/a/826635). For example the std::vector<Coordinate> vect member of the CoordinateArraySequence class. This could be a bit of an invasive change.

In Arrow C++, as far as I can see, the MemoryPool infrastructure is only used for "big" allocations (that is for the content of Arrow arrays), that are done manually through it, but I don't see it used for standard C++ containers (probably because they don't consume a lot of memory)

@Maxxen
Copy link
Contributor

Maxxen commented Jan 16, 2023

I would be seriously interested in attempting to pick this up. Are there any other challenges that immediately comes to mind? I suppose the c-api would have to be extended as well.

@dbaston
Copy link
Member

dbaston commented Jan 16, 2023

Not to my mind. I looked into it a while ago, and to my untrained eye this looked like a reasonable way to do the implementation: https://github.com/aws/aws-sdk-cpp/tree/bb1fdce01cc7e8ae2fe7162f24c8836e9d3ab0a2/aws-cpp-sdk-core/include/aws/core/utils/memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or feature improvement.
Projects
None yet
Development

No branches or pull requests

6 participants