JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
While using a query to remove data in the GridFS (both [bucket].files and [bucket].chunks collections) the current driver first issues a select, and then loops over the results to run 2 removes (files and chunks) on each iteration.
On large resultsets, this behavior can results in thousands of requests x2 (files and chunks).
I can understand that performing files and chunks removal, one after the other, is a way to limit data inconsistency. But there still is a risk.
Thus, as long as the linked removal between files and chunks isn't managed by the server itself, the client side is responsible for checking whether both files and chunks are consistent.
Solution
I updated my previous PR by adding a parameter to keep the legacy behavior but allowing to force the "bulk removal".
A remove(query) = remove(query,true) as the default existing behavior.
A remove(query, false) only issues 3 requests :
Fields ids are remembered using a list.
On a single remove this won't be a great improvement, but on large sets of files it'll be worthwhile.
"Legacy" remove(query) = 2 * n requests for remove on gridfs.
"Bulk" remove(query, false) = 3 requests for remove on gridfs.
where n = number of files matched by the query.
[Fixes : https://jira.mongodb.org/browse/JAVA-1125]
[Updates/Improves of PR https://github.com//pull/171 against branch 3.0.x]