JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192

finalspy · 2014-05-02T23:14:00Z

Problem

While using a query to remove data in the GridFS (both [bucket].files and [bucket].chunks collections) the current driver first issues a select, and then loops over the results to run 2 removes (files and chunks) on each iteration.
On large resultsets, this behavior can results in thousands of requests x2 (files and chunks).

I can understand that performing files and chunks removal, one after the other, is a way to limit data inconsistency. But there still is a risk.
Thus, as long as the linked removal between files and chunks isn't managed by the server itself, the client side is responsible for checking whether both files and chunks are consistent.

Solution

I updated my previous PR by adding a parameter to keep the legacy behavior but allowing to force the "bulk removal".
A remove(query) = remove(query,true) as the default existing behavior.
A remove(query, false) only issues 3 requests :

one select on "files",
and then a remove using the query on files
and a remove with a $in clause on chunks.
Fields ids are remembered using a list.
On a single remove this won't be a great improvement, but on large sets of files it'll be worthwhile.

"Legacy" remove(query) = 2 * n requests for remove on gridfs.
"Bulk" remove(query, false) = 3 requests for remove on gridfs.
where n = number of files matched by the query.

[Fixes : https://jira.mongodb.org/browse/JAVA-1125]
[Updates/Improves of PR https://github.com//pull/171 against branch 3.0.x]

jyemin · 2015-02-06T21:05:17Z

My sincere apologies. We recently merged the 3.0.x into master, and on deletion of the 3.0.x branch all the pull requests were summarily closed.

finalspy changed the title ~~3.0.x~~ JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) May 2, 2014

finalspy mentioned this pull request May 2, 2014

JAVA-1125 Change way remove(query) on gridfs is performed to improve performances #171

Closed

add test and implementation for both legacy and bulk gridfs remove

4d098f8

jyemin force-pushed the 3.0.x branch from 3f431ea to 3e0bc8e Compare August 19, 2014 21:36

trishagee force-pushed the 3.0.x branch from 9bb1094 to 14e9da5 Compare September 9, 2014 09:23

jyemin force-pushed the 3.0.x branch from f4d6698 to 27637b5 Compare September 24, 2014 20:41

rozza force-pushed the 3.0.x branch 2 times, most recently from c3e1b16 to e5fb2de Compare October 7, 2014 09:36

jyemin closed this Feb 6, 2015

finalspy mentioned this pull request Mar 21, 2015

[JAVA-1125] Change way remove(query) on gridfs is performed to improve p... #300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192

JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192

Uh oh!

finalspy commented May 2, 2014

Uh oh!

jyemin commented Feb 6, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192

JAVA-1125 Change way remove(query) on gridfs is performed to improve performances (branch 3.0.x) #192

Uh oh!

Conversation

finalspy commented May 2, 2014

Problem

Solution

Uh oh!

jyemin commented Feb 6, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants