Skip to content

Conversation

finalspy
Copy link

@finalspy finalspy commented May 2, 2014

Problem

While using a query to remove data in the GridFS (both [bucket].files and [bucket].chunks collections) the current driver first issues a select, and then loops over the results to run 2 removes (files and chunks) on each iteration.
On large resultsets, this behavior can results in thousands of requests x2 (files and chunks).

I can understand that performing files and chunks removal, one after the other, is a way to limit data inconsistency. But there still is a risk.
Thus, as long as the linked removal between files and chunks isn't managed by the server itself, the client side is responsible for checking whether both files and chunks are consistent.

Solution

I updated my previous PR by adding a parameter to keep the legacy behavior but allowing to force the "bulk removal".
A remove(query) = remove(query,true) as the default existing behavior.
A remove(query, false) only issues 3 requests :

  • one select on "files",
  • and then a remove using the query on files
  • and a remove with a $in clause on chunks.
    Fields ids are remembered using a list.
    On a single remove this won't be a great improvement, but on large sets of files it'll be worthwhile.

"Legacy" remove(query) = 2 * n requests for remove on gridfs.
"Bulk" remove(query, false) = 3 requests for remove on gridfs.
where n = number of files matched by the query.

[Fixes : https://jira.mongodb.org/browse/JAVA-1125]
[Updates/Improves of PR https://github.com//pull/171 against branch 3.0.x]

@finalspy finalspy changed the title 3.0.x JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) May 2, 2014
@jyemin
Copy link
Collaborator

jyemin commented Feb 6, 2015

My sincere apologies. We recently merged the 3.0.x into master, and on deletion of the 3.0.x branch all the pull requests were summarily closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants