llama: fix leaked buffers for mmap + split files #16765
Open
+16
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #16762 . As correctly pointed out by Aman, the problem is that the pointer to the buffer is being overwritten when looping over the split files. As a consequence the backend buffers are currently being leaked.
More generally, for the combination of mmap and split files there can be more than one backend buffer being associated with a ggml context which wasn't being correctly represented by the type
vector<pair<ggml_context_ptr, ggml_buffer_ptr>>. I changed the type tovector<pair<ggml_context_ptr, vector<ggml_buffer_ptr>>>instead.