Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

HADOOP-213: Reuse DBObjects when inserting data #129

Closed
wants to merge 2 commits into from
Closed

HADOOP-213: Reuse DBObjects when inserting data #129

wants to merge 2 commits into from

Conversation

nadenf
Copy link

@nadenf nadenf commented Jul 13, 2015

In the MongoOutputCommiter a new DBObject is created for each insert resulting in unnecessary object allocation.

Informal testing has seen noticeable performance improvement (5-10%) for tens of millions of inserts.

@@ -102,12 +105,12 @@ public void commitTask(final TaskAttemptContext taskContext) {
int mwType = inputStream.readInt();
if (MongoWritableTypes.BSON_WRITABLE == mwType) {
bw.readFields(inputStream);
bulkOp.insert(new BasicDBObject(bw.getDoc().toMap()));
insert.putAll(bw.getDoc().toMap());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putAll won't remove the fields that were in insert from a previous document, so if not all documents have exactly the same schema, then later documents will have extraneous fields in them. A similar problem applies to query and modifiers below as well. If you made these all BasicDBObjects instead of DBObjects, you could call the clear method before putAll.

@llvtt
Copy link

llvtt commented Jul 13, 2015

Thanks for the pull request! This change sounds fine to me after addressing the one comment.

@nadenf
Copy link
Author

nadenf commented Jul 14, 2015

New pull request submitted with changes.

@llvtt
Copy link

llvtt commented Aug 21, 2015

I'm just checking in with this... did you see the comment I made on https://jira.mongodb.org/browse/HADOOP-213?

@llvtt llvtt added the core label Aug 21, 2015
@llvtt
Copy link

llvtt commented Jul 13, 2016

Closing this PR for now, since the design of this pull request introduces some problems of its own that outweigh the performance benefits. Perhaps reusing a pool of DBObjects can be investigated at a later date for performance gain.

@llvtt llvtt closed this Jul 13, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
2 participants