org.elasticsearch.index.mapper.MapperParsingException while adding Tweets #1331

Closed
ramv opened this Issue Sep 13, 2011 · 16 comments

Projects

None yet

7 participants

@ramv

curl -XPUT http://localhost:9200/tweets/tweet/113429251348373500 -d ' {"contributors":{},"truncated":false,"text":"@SocialMediaCoop Americans Spend 23% of Online Time on Social Networks #socialmedia tweeps http://t.co/UVUOcA7 #facebook... via @DioFavatas","geo":{},"entities":{"urls":[{"indices":[91,110],"display_url":"mydio.me/o1jUPk","expanded_url":"http://mydio.me/o1jUPk","url":"http://t.co/UVUOcA7"}],"hashtags":[{"text":"socialmedia","indices":[71,83]},{"text":"facebook","indices":[111,120]}],"user_mentions":[{"indices":[0,16],"screen_name":"SocialMediaCoop","name":"Social Media Coop","id":188151799,"id_str":"188151799"},{"indices":[128,139],"screen_name":"DioFavatas","name":"Dio Favatas","id":16186656,"id_str":"16186656"}]},"favorited":false,"place":{},"coordinates":{},"source":"Evion","in_reply_to_screen_name":"SocialMediaCoop","in_reply_to_user_id":188151799,"possibly_sensitive":false,"retweeted":false,"created_at":"Tue Sep 13 01:50:15 +0000 2011","in_reply_to_status_id_str":{},"user":{"contributors_enabled":false,"profile_background_image_url":"http://a0.twimg.com/images/themes/theme1/bg.png","show_all_inline_media":false,"geo_enabled":false,"profile_image_url":"http://a2.twimg.com/profile_images/1099763562/evion_logo_3_normal.png","profile_text_color":"333333","profile_image_url_https":"https://si0.twimg.com/profile_images/1099763562/evion_logo_3_normal.png","location":"","default_profile_image":false,"lang":"en","profile_background_image_url_https":"https://si0.twimg.com/images/themes/theme1/bg.png","profile_sidebar_fill_color":"DDEEF6","description":"Evion lets you discover interesting tweets from other people on Twitter as well as find interested new readers for your tweets.","screen_name":"evwo6","statuses_count":17950,"profile_background_tile":false,"default_profile":true,"followers_count":755,"follow_request_sent":{},"following":{},"notifications":{},"friends_count":720,"profile_link_color":"0084B4","verified":false,"created_at":"Fri Aug 06 07:59:25 +0000 2010","profile_sidebar_border_color":"C0DEED","protected":false,"favourites_count":0,"name":"Evion","is_translator":false,"profile_use_background_image":true,"id":175318654,"id_str":"175318654","listed_count":19,"time_zone":"Quito","utc_offset":-18000,"profile_background_color":"C0DEED","url":"http://evion.org"},"in_reply_to_status_id":{},"id":113429250345930750,"in_reply_to_user_id_str":"188151799","id_str":"113429250345930752","retweet_count":0,"updated_at":1315878615699}'

{"error":"MapperParsingException[object_mapper [tweet] tried to parse as object, but got EOF, has a concrete value been provided to it?]","status":400}

org.elasticsearch.index.mapper.MapperParsingException: object_mapper [in_reply_to_screen_name] tried to parse as object, but got EOF, has a concrete value been provided to it?
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:439)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:569)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:441)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:567)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:289)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:185)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:428)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:341)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

@kimchy
elastic member

This does get indexed, usually, this error comes from a document being indexed one way, and then indexed another. For example, one time having an object json for a field, and then having a value.

@ramv

I don't think the tweet is getting indexed. Here is an example

DEBUG: Elastical: the request failed {"method":"PUT","json":{"contributors":{},"truncated":false,"text":"@CMBInfo @ConstantContact - Over 6,000 #socialmedia regular users in study on brand #engagement by @SMG_London http://t.co/hGkhqX4 #SMBI","geo":{},"entities":{"urls":[{"indices":[112,131],"display_url":"bit.ly/oQz5ar","expanded_url":"http://bit.ly/oQz5ar","url":"http://t.co/hGkhqX4"}],"hashtags":[{"text":"socialmedia","indices":[39,51]},{"text":"engagement","indices":[84,95]},{"text":"SMBI","indices":[132,137]}],"user_mentions":[{"indices":[0,8],"screen_name":"CMBInfo","name":"ChadwickMartinBailey","id":80942445,"id_str":"80942445"},{"indices":[9,25],"screen_name":"ConstantContact","name":"Constant Contact","id":25960305,"id_str":"25960305"},{"indices":[99,110],"screen_name":"SMG_London","name":"SMG London","id":38452444,"id_str":"38452444"}]},"favorited":false,"place":{},"coordinates":{},"source":"web","in_reply_to_screen_name":"CMBInfo","in_reply_to_user_id":80942445,"possibly_sensitive":false,"retweeted":false,"created_at":"Tue Sep 13 14:06:24 +0000 2011","in_reply_to_status_id_str":"113293957642977280","user":{"default_profile":true,"contributors_enabled":false,"profile_background_image_url":"http://a0.twimg.com/images/themes/theme1/bg.png","show_all_inline_media":false,"geo_enabled":false,"profile_image_url":"http://a2.twimg.com/profile_images/1281593358/Steve_Parker_normal.jpg","profile_text_color":"333333","profile_image_url_https":"https://si0.twimg.com/profile_images/1281593358/Steve_Parker_normal.jpg","location":"London","default_profile_image":false,"lang":"en","profile_background_image_url_https":"https://si0.twimg.com/images/themes/theme1/bg.png","profile_sidebar_fill_color":"DDEEF6","description":"","screen_name":"steveparkersmg","statuses_count":338,"profile_background_tile":false,"followers_count":232,"follow_request_sent":{},"following":{},"notifications":{},"friends_count":330,"profile_link_color":"0084B4","verified":false,"created_at":"Mon May 11 16:41:20 +0000 2009","profile_sidebar_border_color":"C0DEED","protected":false,"favourites_count":3,"name":"Steve Parker","is_translator":false,"profile_use_background_image":true,"id":39286278,"id_str":"39286278","listed_count":6,"time_zone":{},"utc_offset":{},"profile_background_color":"C0DEED","url":"http://emergingspaces.co.uk"},"in_reply_to_status_id":113293957642977280,"id":113614508500590600,"in_reply_to_user_id_str":"80942445","id_str":"113614508500590592","retweet_count":0,"updated_at":"2011-09-13T14:06:25.143Z"},"url":"http://127.0.0.1:9200/tweets/tweet/113614508500590600","timeout":10000,"encoding":"utf8"}

$ curl -XGET http://127.0.0.1:9200/tweets/tweet/113614508500590600
{"_index":"tweets","_type":"tweet","_id":"113614508500590600","exists":false}

@kimchy
elastic member

If you get the failure you pasted in the issue, then it won't be indexed. As i said, its probably because you try to index a value into an object mapped json. i.e., you index one time something like this: { "obj1" : { "field1" : "value1" } }, and then index this: { "obj1" : "value" }.

@uzquiano

Would it be possible to have the error message indicate which field from the object mapped JSON is in conflict?

@mehtryx

I've encountered an issue here where the symptoms are the same...what I've discovered is that I have some data fields that are null and some which have a json date in them....and it is when I index a document that is different from whatever I had indexed first I get the error.

As an example the json for a record I might try to index contains:

"AvailableOn": "/Date(-2206274400000-0500)/",

however most of my documents had the following:

"AvailableOn": null,

When a null value is indexed it turns it into a null object inside elasticsearch, and then the documents that do not have null generated the error you reported and I'm experiencing. My issue is very much related to how this date is presented and we can figure a work around, but I suspect the overall scenario may give people with the issue something to look for in their own data and determine how/why they are getting this.

@mehtryx

Further checking on other fields did not reproduce this....so while it was a theory and may have some relevence into what I am experiencing, it seems my other null fields with regular strings for data have no issue.

The fact still remains however that if I remove this field from my data contract or override it to always be null or always be a string representation like this:

item.AvailableOn = if item.AvailableOn? then "#{item.AvailableOn}" else ""

Which I do in the code I have looping through the json docs and indexing...then it works.

@markmacgillivray

Hello there, great software by the way, have been using for a while but am now having trouble with this error too.

I have a mapping like this:

"record" : {
"record" : {
"dynamic_templates" : [
{
"default" : {
"match" : "*",
"mapping" : {
"type" : "multi_field",
"fields" : {
"{name}" : {"type" : "{dynamic_type}", "index" : "analyzed", "store" : "no"},
"exact" : {"type" : "{dynamic_type}", "index" : "not_analyzed", "store" : "yes"}
}
}
}
}
]
}
}

but when I send it items like this:

[{'url': 'http://dx.doi.org/10.1007/3-540-34266-4_1'}]
[{'url': 'http://stat.berkeley.edu/users/pitman/AOP445.pdf', 'anchor': '[pdf]'}, {'url': 'http://projecteuclid.org/euclid.aop/1253539862', 'anchor': '[Project', 'format': 'Euclid]'}]
[{'url': 'http://arxiv.org/abs/0910.0405', 'anchor': 'arXiv'}, {'url': 'http://projecteuclid.org/euclid.bj/1297173851', 'anchor': 'Project', 'format': 'Euclid'}]
[{'url': 'http://www.bibkn.org/bibjson/index.html'}]

They index fine except for that last one. (These are a subset of examples from a batch of hundreds, some of which get indexed and others of which fail.)

Here is the error:

MapperParsingException: object_mapper [links] tried to parse as object, but got EOF, has a concrete value been provided to it?

I spent ages trying to strip out any odd characters that might be causing an EOF, but there are none...

@markmacgillivray

I have avoided this issue for my particular version of this problem by changing my dynamic mapping to apply to string types, thus avoiding the issue on objects (which still get mapped properly anyway).

@chrislovecnm

I am having this issue as well. Has there been any progress? The data that I am store does not have any objects inside it.

@ramv
@chrislovecnm

@ramv the data I have in it does not have ANY objects.... Ideas?

@ramv

can you share example data? It is very hard to debug without looking at the data you are trying to index.

@chrislovecnm

Happy too. Maybe the dates are giving me grief ... Not too familiar with your JSON parsing :D Would the mapping be helpful as well? How do I get that?


{
"gateway": "Inspire",
"state": "CO",
"address1": "my address",
"address2": null,
"publisherCampaignGuid": "0886d40b-4f97-4193-a211-920dee663990",
"gatewayResponseCode": null,
"city": "Littleton",
"amount": "100.00",
"transactionState": "approved",
"publisherName": null,
"sponsorName": null,
"totalAmount": null,
"sponsorCampaignGuid": null,
"gatewayResponseMessage": null,
"firstName": "Christopher",
"zip": "80125",
"lastName": "Love",
"nonProfitName": null,
"gatewayTransactionId": "1664247231",
"paymentType": null,
"processingFeePercentage": "0.085",
"lastUpdated": "2012-07-26T01: 30: 55.800Z",
"nonProfitCampaignGuid": "7961eca6-69a5-49d4-8d51-4f5b680b422e",
"country": "UnitedStates",
"guid": "523ce9ee-3a43-4317-a5d7-27b1c03c5d69",
"email": "wow@wow.com",
"dateCreated": "2012-07-26T01: 30: 55.800Z",
"sponsorMatchPercentage": "0"
}


DEBUG [elasticsearch[Hurricane][bulk][T#3]] 2012-07-26 01:26:07,726 Log4jESLogger.java (line 99) [Hurricane] [com.igive][4] failed to execute bulk item (index) index {[com.igive][transaction][394], source[{"gateway":"Inspire","state":"CO","address1":"my address,"address2":null,"publisherCampaignGuid":"0886d40b-4f97-4193-a211-920dee663990","gatewayResponseCode":null,"city":"Littleton","amount":"100.00","transactionState":"approved","publisherName":null,"sponsorName":null,"totalAmount":null,"sponsorCampaignGuid":null,"gatewayResponseMessage":null,"firstName":"Christopher","zip":"80125","lastName":"Love","nonProfitName":null,"gatewayTransactionId":"1664247231","paymentType":null,"processingFeePercentage":"0.085","lastUpdated":"2012-07-26T01:30:55.800Z","nonProfitCampaignGuid":"7961eca6-69a5-49d4-8d51-4f5b680b422e","country":"United States ","guid":"523ce9ee-3a43-4317-a5d7-27b1c03c5d69","email":"wow@wow.com","dateCreated":"2012-07-26T01:30:55.800Z","sponsorMatchPercentage":"0"}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping for [transaction] tried to parse as object, but got EOF, has a concrete value been provided to it?
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:447)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:437)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:311)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

@ramv
@chrislovecnm

@ramv grrr ... it is the grails plugin not me :D Will take a look. DOH. How do I clear out the mapping?

@ramv

curl -XDELETE http://localhost:9200/{index}/{type}/_mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment