Skip to content

Conversation

finalspy
Copy link

Problem

While using a query to remove data in the GridFS (both [bucket].files and [bucket].chunks collections) the current driver first issues a select, and then loops over the results to run 2 removes (files and chunks) on each iteration.
On large resultsets, this behaviour can results in thousands of requests x2 (files and chunks).

I can understand that performing files and chunks removal, one after the other, is a way to limit data inconsistency. But there still is a risk.

Thus, as long as the linked removal between files and chunks isn't managed by the server itself, the client side is responsible for checking whether both files and chunks are consistent.

Solution

This pull request for a remove(query) only issues 3 requests :

  • one select on "files",
  • and then a remove using the query on files
  • and a remove with a $in clause on chunks.
    Fields ids are remembered using a list.
    On a single remove this won't be a great improvement, but on large sets of files it'll be worthwhile.

[Fixes : https://jira.mongodb.org/browse/JAVA-1125]

finalspy and others added 30 commits February 27, 2014 20:57
Added a test to ensure that getMore triggers a MongoExecutionTimeoutException
…not generate the index name if none is provided
Small documentation clarification and testcase update JAVA-1105
…Ensure that key checking and _id creation occurs for all insert paths
…turns false if the server is unable to provide the count. BulkWriteResult.getModifiedCount() now throws if the count is unavailable.
…ting of both write errors and write concern errors
…h either wnote or jnote as write concern errors
…r by removing unnecessary synchronization in DefaultServer
…hunks inserted into a collection using power-of-two allocator do not waste a lot of space
jyemin added 14 commits May 3, 2014 00:57
…properly when the driver is running as an OSGI module.
…the write commands are exceeded.

There is one limit for the number of items that are allowed in each command (maxWriteBatchSize from ismaster).
There is a second limit for the number of bytes in the encoded message (maxBsonObjectSize from ismaster), with an exception for a write command containing just a single item, which is allowed to exceed that limit.
…direct connections, and in general only use

the set of server selectors that make sense for the connection mode and cluster type
…recated method is the one to get the results.
@finalspy
Copy link
Author

finalspy commented May 2, 2014

Updated/improved by #192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants