Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the out parameter to avoid malloc of output array #147

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

kif
Copy link
Contributor

@kif kif commented Oct 13, 2023

This patch offers the user the ability to provide an output buffer for all functions exposed at the Python level. This feature avoids the array creation and permits to recycle explicitly temporary buffers.

Despite the Python memory allocator is supposed to recycle memory in the back of the programmer, it fails sometimes, for example in timeit mode where the garbage collector is disabled. Under such situation the code can becomes terribly slow and causes huge memory leaks.

@kiyo-masui
Copy link
Owner

Hello - CI was broken when you opened this pull request. It's fixed now, can you merge in the latest from master to trigger the CI again?

@kiyo-masui
Copy link
Owner

I don't think you should have removed the cdef np.ndarray out statements, since those should be required for the threaded sections. Have you done a detail performance comparison in a multithreaded context?

Tests pass. API does change (in a backward compatible way) so we would want to bump the minor version.

@kif
Copy link
Contributor Author

kif commented Sep 27, 2024

The cdef np.ndarray out is no more needed and would break the declaration in the signature of the function, i.e. out would be declared twice which prevents the code to compile.

@kif
Copy link
Contributor Author

kif commented Sep 27, 2024

Feel free to bump the version number, you are the maintainer.

@kiyo-masui
Copy link
Owner

The cdef np.ndarray out is no more needed and would break the declaration in the signature of the function, i.e. out would be declared twice which prevents the code to compile.

Oh I see. That makes sense.

Can you add a unit test for this functionality?

@kif
Copy link
Contributor Author

kif commented Sep 27, 2024

About performances, it allows us to provided a pinned-memory region for fast transfer to the GPU.
I also tested the patch on this notebook: http://www.silx.org/doc/pyFAI/dev/usage/tutorial/Parallelization/Direct_chunk_read.html

@kiyo-masui kiyo-masui merged commit 22fafa7 into kiyo-masui:master Sep 27, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants