Performance enhancements for the MongoDB Java driver #66

thkala · 2012-03-16T00:34:36Z

After several runs under a profiler for a specific use case, a few alterations resulted in a noticeable throughput increase. The improvement was consistently measured in the 10-15% range (or better - this number is rather conservative). This changeset affects three classes:

Adjustable socket input buffer in DBPort, set by default to 64K
Optimizations in PoolOutputBuffer
A significantly faster version of BasicBSONDecoder$BSONInput.readCstr()

The target benchmark was a single-threaded dump of a collection that contains mostly ASCII C-style strings. The changes are self-contained - each commit can be applied independently. The test suite passes cleanly on my own single-server setup. Considering the particulars, I believe these changes to be an improvement with no potential for adverse effects.

that is wrapped around the DBPort network socket input stream. - Set the default buffer size to 65536, overriding the default BufferedInputStream buffer size, which seems to be currently set at 8192 bytes. This results in a consistent throughput increase of about 2% in a microbenchmark. Users can set a different size, should the need (or curiosity) arise.

for optimization. Using an extra class field to avoid unnecessary method calls results in a throughput increase of at least 2% in a few benchmarks.

reading bytes into a temporary buffer (_random) and using the more efficient PoolOutputBuffer.write(byte[], int, int) method to update the output buffer. Micro-benchmarks show a throughput increase in the 5-6% range.

scotthernandez · 2012-03-18T14:43:36Z

Cool, the changes look good generally. I've added a jira issue to track this: https://jira.mongodb.org/browse/JAVA-541

jyemin · 2012-03-18T22:40:10Z

Hi. Can you provide your benchmark in a comment or gist?

thkala · 2012-03-18T23:27:08Z

Are you asking about the benchmark results, or the benchmark code itself? The first I could type into a comment, but the second refers to significant parts of my codebase that I would not be able to release at this time - at least not without a lot of work.

jyemin · 2012-03-19T11:57:47Z

The benchmark code. I understand you can't provide your application's source code, but can you create a small benchmark program that demonstrates the effectiveness of this change?

thkala · 2012-03-24T08:58:00Z

I cobbled up this little benchmark - just point it to a DB and collection with lots of C-strings, although some of the improvements should be visible everywhere:

try {
    long time = System.nanoTime();

    Mongo mongo = new Mongo("127.0.0.1", 27017);

    DBCursor cur = mongo.getDB("db").getCollection("collection").find();

    long count = 0;

    while(cur.hasNext()) {
        cur.next();

        ++count;

        if ((count % 100000) == 0) {
            System.out.print('.');
        }
    }

    time = System.nanoTime() - time;

    System.out.println("\n" + count + " documents in " + (time/1000000000D) + " seconds");
} catch (UnknownHostException e) {
    e.printStackTrace();
} catch (MongoException e) {
    e.printStackTrace();
}

In my own code I was using a custom callback that did some pre-processing on the received documents. This one is as simple as it gets and it seems to highlight the performance improvements even more. I see a 25% improvement over the 2.7.3 stable release - I'll see about comparing with git master shortly...

gerner · 2012-11-06T19:56:05Z

Any plans to incorporate any of these changes (in particular the buffer size) I'm working with large result sets and I suspect a larger network buffer size would help things for me.

trishagee · 2013-07-02T16:12:51Z

Hi,

We're currently working on a new driver, on the 3.0 branch. We won't be making major changes to the 2.x driver, especially simply for performance, but we do hope the 3.0 driver will (eventually) be more performant. We certainly hope that people will be able to tweak and tune it better.

When that's released, please feel free to profile that version and send us feedback.

Trisha

thkala added 4 commits March 15, 2012 14:55

According to the profiler, PoolOutputBuffer.reset() is a good target

495f50c

for optimization. Using an extra class field to avoid unnecessary method calls results in a throughput increase of at least 2% in a few benchmarks.

Improve the performance of BasicBSONDecoder$BSONInput.readCStr() by

ee3523b

reading bytes into a temporary buffer (_random) and using the more efficient PoolOutputBuffer.write(byte[], int, int) method to update the output buffer. Micro-benchmarks show a throughput increase in the 5-6% range.

Merge remote-tracking branch 'upstream/master'

515bb9f

trishagee closed this Jul 2, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance enhancements for the MongoDB Java driver #66

Performance enhancements for the MongoDB Java driver #66

Uh oh!

thkala commented Mar 16, 2012

Uh oh!

scotthernandez commented Mar 18, 2012

Uh oh!

jyemin commented Mar 18, 2012

Uh oh!

thkala commented Mar 18, 2012

Uh oh!

jyemin commented Mar 19, 2012

Uh oh!

thkala commented Mar 24, 2012

Uh oh!

gerner commented Nov 6, 2012

Uh oh!

trishagee commented Jul 2, 2013

Uh oh!

Uh oh!

Performance enhancements for the MongoDB Java driver #66

Performance enhancements for the MongoDB Java driver #66

Uh oh!

Conversation

thkala commented Mar 16, 2012

Uh oh!

scotthernandez commented Mar 18, 2012

Uh oh!

jyemin commented Mar 18, 2012

Uh oh!

thkala commented Mar 18, 2012

Uh oh!

jyemin commented Mar 19, 2012

Uh oh!

thkala commented Mar 24, 2012

Uh oh!

gerner commented Nov 6, 2012

Uh oh!

trishagee commented Jul 2, 2013

Uh oh!

Uh oh!