Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongodb river does not index all documents #229

Open
salzamt opened this issue Mar 9, 2014 · 3 comments
Open

mongodb river does not index all documents #229

salzamt opened this issue Mar 9, 2014 · 3 comments

Comments

@salzamt
Copy link

salzamt commented Mar 9, 2014

Hi,
I have two small collections to index with mongodb river, two (about 100000 documents) are indexed very well, the last one (about 500000 documents) stops after a random number (one time 1000, than 2000, than 1500 and so on...) - I always delete the river and add it again.
any idea why this happens?
i create a simple river with:
curl -XPUT "localhost:9200/_river/content.videos/_meta" -d'
{
"type": "mongodb",
"mongodb": {
"servers":
[
{ "host": "xxxx", "port": "27017" },
{ "host": "xxxx", "port": "27017" },
{ "host": "xxxx", "port": "27017" }
],
"options": {
"secondary_read_preference" : true
},
"db": "tubeomatic",
"collection": "content.videos.raw",
"gridfs": false
},
"index": {
"name": "tubeomatic",
"type": "content.videos"
}
}'

It runs on a linux machine on amazon cloud and connects to a mongdb within the same security group.
I am working with ES 1.0.0 and 2.4.8
the log just says when it stops:
[2014-03-09 13:20:57,493][INFO ][cluster.metadata ] [tom-el-01] [tubeomatic] update_mapping contentVideos
[2014-03-09 13:20:57,568][INFO ][cluster.metadata ] [tom-el-01] [tubeomatic] update_mapping contentVideos
[2014-03-09 13:20:59,739][INFO ][cluster.metadata ] [tom-el-01] [tubeomatic] update_mapping contentVideos
[2014-03-09 13:21:01,775][INFO ][cluster.metadata ] [tom-el-01] [tubeomatic] update_mapping contentVideos

[2014-03-09 13:21:29,565][DEBUG][action.bulk ] [tom-el-01] [tubeomatic][2] failed to execute bulk item (index) index {[tubeomatic][contentVideos][52e773ad5e5b95574c00f27f], source[{"title":"asdfasdf","_id":"52e773ad5e5b95574c00f27f","search":{"cat":"asdf","title":"asdf","cp":"asdf","or":"straight"},"keyTermCriteria":{"cat":"sadf","title":"asdf","cp":"asdf","or":"straight"},"created":"2014-01-28T09:09:01.318Z","foreign_id":"1387249","contentpartner_token":"asdf","prevpics_s3":[{"url":"videos/pics//0.jpeg","sort":0},{"url":"videos/pics/52e773ad5e5b95574c00f27f/1.jpeg","sort":1},{"url":"videos/pics//2.jpeg","sort":2},{"url":"videos/pics/52e773ad5e5b95574c00f27f/3.jpeg","sort":3},{"url":"videos/pics//4.jpeg","sort":4},{"url":"videos/pics/52e773ad5e5b95574c00f27f/5.jpeg","sort":5},{"url":"videos/pics//6.jpeg","sort":6}],"length":429,"prevpics":[{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0018.jpeg","sort":0},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0021.jpeg","sort":1},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0022.jpeg","sort":2},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0028.jpeg","sort":3},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0026.jpeg","sort":4},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0013.jpeg","sort":5},{"url":"http://cdn1.vidcaps..com/0/0/0/2/7/7/8/4/9/archive/0012.jpeg","sort":6}],"contentpartner":54,"categories":["asf"],"url":"http://www.asdf.com/videos/asdf?cid=2312","orientations":["straight"]}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [keyTermCriteria]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:418)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:517)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:459)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:392)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:394)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property [cat]
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateFieldForString(StringFieldMapper.java:331)
at org.elasticsearch.index.mapper.core.StringFieldMapper.parseCreateField(StringFieldMapper.java:277)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:408)
... 12 more

thanks in advance

@salzamt
Copy link
Author

salzamt commented Mar 9, 2014

I think I found the error, some documents in mongo has an array under the "orientations" attribute, others has an object. seems that this is too schemeless for elasticsearch :P.
I guess I need to get the mongodata in a scheme ?

@salzamt
Copy link
Author

salzamt commented Mar 10, 2014

After I created a scheme so that a specific field in the document always can have just one type everything works fine!

@qraynaud
Copy link
Contributor

Yeah. I advise you to use templates to create mappings dynamically. Here is the related documentation : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants