Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How it is handle Chinese in mongodb? #95

Closed
fzxu opened this issue Jun 23, 2013 · 4 comments
Closed

How it is handle Chinese in mongodb? #95

fzxu opened this issue Jun 23, 2013 · 4 comments
Milestone

Comments

@fzxu
Copy link

fzxu commented Jun 23, 2013

I have utf-8 in mongodb and stores normal documents with Chinese content. But when hooking the content in elasticsearch, it shows me parse error(seems like it gets all the '?'):

[2013-06-23 04:51:44,512][DEBUG][action.bulk ] [Omen] [test][1] failed to execute bulk item (index) index {[test][question][51c6d1f14b90c3be18174882], source[{"_id":"51c6d1f14b90c3be18174882","_class":"me.test.entities.Question","title":"??????????????????","content":"?????????????????????????????? ????????75??????????(Morgan Freeman)??????????????????(Michael Caine)??????????????????...http://t.cn/zHtoIH1","answers":[{"_id":"51c6d1f14b90c3be18174881","content":"@?????","createdAt":"2013-05-25T10:40:20.000Z","updatedAt":"2013-06-23T10:46:09.890Z","votesCount":0,"source":{"providerId":"weibo","referenceId":"3581914474562346"},"createdBy":"{ "$ref" : "users", "$id" : "51c6d1f14b90c3be1817487d" }"}],"createdAt":"2013-05-25T09:20:02.000Z","updatedAt":"2013-06-23T10:46:09.888Z","tags":[],"viewsCount":0,"votesCount":0,"source":{"providerId":"weibo","referenceId":"3581894262156440"},"createdBy":"{ "$ref" : "users", "$id" : "51c6d1f14b90c3be18174880" }"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:553)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:450)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:327)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:381)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:155)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal character '?' (code 0xe3) in base64 content
at [Source: [B@13c519a5; line: 1, column: 152]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:428)
at org.elasticsearch.common.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1048)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:183)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:276)

@richardwilly98
Copy link
Owner

Hi,

At this point I am not sure where the encoding issue comes from but I will investigate.

Could you please try to index another document with trace logging enable?

Add logging in $ES_HOME\config\logging.yml
In logger: section
river.mongodb: TRACE

Then restart ES.
Please post ES log file.

Thanks,
Richard.

@benmccann
Copy link
Collaborator

As a note to help reproduce this, I found that my logging file was located in /etc/elasticsearch/logging.yml

@benmccann
Copy link
Collaborator

The test to reproduce this issue was broken. However, after fixing it (#130), I don't see any problem with Chinese characters. Can you try again with the latest versions of elasticsearch and elasticsearch-river-mongodb?

@benmccann
Copy link
Collaborator

We have a test ensuring Chinese works now, so I think it's probably safe to close this issue. @arkxu let us know if you still have any trouble

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants