Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

HADOOP-213: Reuse DBObjects when inserting data #129

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -95,19 +95,22 @@ public void commitTask(final TaskAttemptContext taskContext) {
// Read Writables out of the temporary file.
BSONWritable bw = new BSONWritable();
MongoUpdateWritable muw = new MongoUpdateWritable();
DBObject query = new BasicDBObject();
DBObject insert = new BasicDBObject();
DBObject modifiers = new BasicDBObject();
while (filePos < fileLen) {
try {
// Determine writable type, and perform corresponding operation
// on MongoDB.
int mwType = inputStream.readInt();
if (MongoWritableTypes.BSON_WRITABLE == mwType) {
bw.readFields(inputStream);
bulkOp.insert(new BasicDBObject(bw.getDoc().toMap()));
insert.putAll(bw.getDoc().toMap());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putAll won't remove the fields that were in insert from a previous document, so if not all documents have exactly the same schema, then later documents will have extraneous fields in them. A similar problem applies to query and modifiers below as well. If you made these all BasicDBObjects instead of DBObjects, you could call the clear method before putAll.

bulkOp.insert(insert);
} else if (MongoWritableTypes.MONGO_UPDATE_WRITABLE == mwType) {
muw.readFields(inputStream);
DBObject query = new BasicDBObject(muw.getQuery().toMap());
DBObject modifiers =
new BasicDBObject(muw.getModifiers().toMap());
query.putAll(muw.getQuery().toMap());
modifiers.putAll(muw.getModifiers().toMap());
if (muw.isMultiUpdate()) {
if (muw.isUpsert()) {
bulkOp.find(query).upsert().update(modifiers);
Expand Down