Skip to content

Performance enhancements for the MongoDB Java driver #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Performance enhancements for the MongoDB Java driver #66

wants to merge 4 commits into from

Conversation

thkala
Copy link

@thkala thkala commented Mar 16, 2012

After several runs under a profiler for a specific use case, a few alterations resulted in a noticeable throughput increase. The improvement was consistently measured in the 10-15% range (or better - this number is rather conservative). This changeset affects three classes:

  • Adjustable socket input buffer in DBPort, set by default to 64K
  • Optimizations in PoolOutputBuffer
  • A significantly faster version of BasicBSONDecoder$BSONInput.readCstr()

The target benchmark was a single-threaded dump of a collection that contains mostly ASCII C-style strings. The changes are self-contained - each commit can be applied independently. The test suite passes cleanly on my own single-server setup. Considering the particulars, I believe these changes to be an improvement with no potential for adverse effects.

thkala added 4 commits March 15, 2012 14:55
that is wrapped around the DBPort network socket input stream.

- Set the default buffer size to 65536, overriding the default
BufferedInputStream buffer size, which seems to be currently set
at 8192 bytes. This results in a consistent throughput increase
of about 2% in a microbenchmark. Users can set a different size,
should the need (or curiosity) arise.
for optimization. Using an extra class field to avoid unnecessary
method calls results in a throughput increase of at least 2% in a
few benchmarks.
reading bytes into a temporary buffer (_random) and using the more
efficient PoolOutputBuffer.write(byte[], int, int) method to update
the output buffer. Micro-benchmarks show a throughput increase in
the 5-6% range.
@scotthernandez
Copy link
Contributor

Cool, the changes look good generally. I've added a jira issue to track this: https://jira.mongodb.org/browse/JAVA-541

@jyemin
Copy link
Collaborator

jyemin commented Mar 18, 2012

Hi. Can you provide your benchmark in a comment or gist?

@thkala
Copy link
Author

thkala commented Mar 18, 2012

Are you asking about the benchmark results, or the benchmark code itself? The first I could type into a comment, but the second refers to significant parts of my codebase that I would not be able to release at this time - at least not without a lot of work.

@jyemin
Copy link
Collaborator

jyemin commented Mar 19, 2012

The benchmark code. I understand you can't provide your application's source code, but can you create a small benchmark program that demonstrates the effectiveness of this change?

@thkala
Copy link
Author

thkala commented Mar 24, 2012

I cobbled up this little benchmark - just point it to a DB and collection with lots of C-strings, although some of the improvements should be visible everywhere:

try {
    long time = System.nanoTime();

    Mongo mongo = new Mongo("127.0.0.1", 27017);

    DBCursor cur = mongo.getDB("db").getCollection("collection").find();

    long count = 0;

    while(cur.hasNext()) {
        cur.next();

        ++count;

        if ((count % 100000) == 0) {
            System.out.print('.');
        }
    }

    time = System.nanoTime() - time;

    System.out.println("\n" + count + " documents in " + (time/1000000000D) + " seconds");
} catch (UnknownHostException e) {
    e.printStackTrace();
} catch (MongoException e) {
    e.printStackTrace();
}

In my own code I was using a custom callback that did some pre-processing on the received documents. This one is as simple as it gets and it seems to highlight the performance improvements even more. I see a 25% improvement over the 2.7.3 stable release - I'll see about comparing with git master shortly...

@gerner
Copy link

gerner commented Nov 6, 2012

Any plans to incorporate any of these changes (in particular the buffer size) I'm working with large result sets and I suspect a larger network buffer size would help things for me.

@trishagee
Copy link
Contributor

Hi,

We're currently working on a new driver, on the 3.0 branch. We won't be making major changes to the 2.x driver, especially simply for performance, but we do hope the 3.0 driver will (eventually) be more performant. We certainly hope that people will be able to tweak and tune it better.

When that's released, please feel free to profile that version and send us feedback.

Trisha

@trishagee trishagee closed this Jul 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants