Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a field to the JSON of a PDF in MongoDB => NullPointerException for the river #91

Closed
antoinecarton opened this issue Jun 20, 2013 · 5 comments
Labels

Comments

@antoinecarton
Copy link

Hi,

First of all, here is the Exception from ElasticSearch :

Exception in thread "elasticsearch[Nathaniel Richards][mongodb_river_slurper][T#1]" java.lang.NullPointerException
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processOplogEntry(MongoDBRiver.java:1074)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:986)
at java.lang.Thread.run(Thread.java:679)

Here is my configuration :
River : 1.6.9
ElasticSearch : 0.90.1
MongoDB : 2.4.4

Configuration used for MongoDB :
http://docs.mongodb.org/manual/tutorial/deploy-replica-set/, partie "Deploy a Development or Test Replica Set"

Next, in a console :

mongo --port 27017
use pdf_database5

In a second console, I add a PDF file :

mongofiles --host localhost:27017 --db pdf_database5 --collection fs --type applicaton/pdf put /PATH_TO_A_PDF

After that, I create a MongoDB river for ElasticSearch :

curl -XPUT "${host}/_river/mongodb/_meta" -d '{
"type": "mongodb",
"mongodb": {
"db": "pdf_database5",
"collection": "fs",
"gridfs": true
},
"index": {
"name": "mongoindex",
"type": "files"
}
}'

Until now, everything is OK and my PDF file is correctly indexed and full text search is OK.

However, once I add a field to the JSON of the PDF file, that is to say with the following step in mongoDB console :

db.fs.files.find({});

(for instance, 51c05f881a13d534df7463c4 is the ID of my PDF).

I add a field "titleDoc" to the object with the id 51c05f881a13d534df7463c4 thanks to the following command :

db.fs.files.update({"_id": ObjectId("51c05f881a13d534df7463c4")}, {$set: {"titleDoc":"MY TITLE DOC"}})

I then have the exception in the ElasticSearch log. I tried to edit the _mapping in ElasticSearch but there's still the error.

Maybe it is an error due to the fact that I forgot something for the river to map new fields of raw file like PDF in Mongo.

Thank in advance,

Antoine

@richardwilly98
Copy link
Owner

Hi Antoine,

Additional gridfs metadata should be stored in metadata attribute (see here [1]).

doc.metadata = {}
doc.metadata.title = "woww"
db.fs.files.save(doc)
{
        "_id" : ObjectId("51c78a054ce10426a81a3e27"),
        "filename" : "test-document.pdf",
        "chunkSize" : 262144,
        "uploadDate" : ISODate("2013-06-23T23:51:33.229Z"),
        "md5" : "947090a3e9cac07c13adabb25b9a3fa9",
        "length" : 50573,
        "contentType" : "applicaton/pdf",
        "title" : "test",
        "metadata" : {
                "title" : "woww"
        }
}

Does it help?

[1] - http://docs.mongodb.org/manual/reference/gridfs/#gridfs-files-collection

Thanks,
Richard.

@antoinecarton
Copy link
Author

Hi,

Thank you for your answer.

You are right for metadata attribute. However, I have already tried to use it and I still have the problem with the following steps :

My initial object :

{ "_id" : ObjectId("51c7f5dc71f6549c212cae37"), "filename" : "/home/acarton/Téléchargements/Cairngorm.pdf", "chunkSize" : 262144, "uploadDate" : ISODate("2013-06-24T07:31:41.611Z"), "md5" : "2d7d1f636a4e07b675eebb873330205e", "length" : 661649, "contentType" : "applicaton/pdf" }

The update command :

db.fs.files.update({"_id": ObjectId("51c7f5dc71f6549c212cae37")}, {$set: {"metadata.titleDoc":"Framework CAIRNGORM"}})

And the final object :

{ "_id" : ObjectId("51c7f5dc71f6549c212cae37"), "chunkSize" : 262144, "contentType" : "applicaton/pdf", "filename" : "/home/acarton/Téléchargements/Cairngorm.pdf", "length" : 661649, "md5" : "2d7d1f636a4e07b675eebb873330205e", "metadata" : { "titleDoc" : "Framework CAIRNGORM" }, "uploadDate" : ISODate("2013-06-24T07:31:41.611Z") }

I still have the NullPointerException with this update command.

However, the steps you give work fine. What is the difference between the "update" and the "save" commands ?

Thank you in advance,

Antoine

@richardwilly98
Copy link
Owner

Hi,

The oplog entry is different for $set operation.

The entry for "save" operation is:

{
        "ts" : {
                "t" : 1372032972,
                "i" : 1
        },
        "h" : NumberLong("2162081457563127592"),
        "v" : 2,
        "op" : "u",
        "ns" : "mydb91.fs.files",
        "o2" : {
                "_id" : ObjectId("51c78a054ce10426a81a3e27")
        },
        "o" : {
                "_id" : ObjectId("51c78a054ce10426a81a3e27"),
                "filename" : "test-document.pdf",
                "chunkSize" : 262144,
                "uploadDate" : ISODate("2013-06-23T23:51:33.229Z"),
                "md5" : "947090a3e9cac07c13adabb25b9a3fa9",
                "length" : 50573,
                "contentType" : "applicaton/pdf",
                "title" : "test",
                "metadata" : {
                        "title" : "woww"
                }
        }
}

For $set operation:

{
        "ts" : {
                "t" : 1372065805,
                "i" : 1
        },
        "h" : NumberLong("8302104313737943305"),
        "v" : 2,
        "op" : "u",
        "ns" : "mydb91.fs.files",
        "o2" : {
                "_id" : ObjectId("51c78a07ae251a257e0e4d3e")
        },
        "o" : {
                "$set" : {
                        "metadata.titleDoc" : "test91"
                }
        }
}

The object id was extract from "o" but with $set is is only available in "o2". I will fix the code soon.

richardwilly98 added a commit that referenced this issue Jun 24, 2013
@antoinecarton
Copy link
Author

Perfect ! Thank you !

@richardwilly98
Copy link
Owner

Fix is available in release 1.6.11.

Thanks,
Richard.

cheald referenced this issue in kdkeck/elasticsearch-river-mongodb Apr 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants