Unicode support #12

tatsuya · 2017-05-11T01:04:41Z

Great work! Also, it would be nice if this supports unicode text input (Example: café). See the error below:

➜  elyzer git:(master) python __main__.py --es "http://localhost:9200" --index my_index --analyzer my_analyzer "café"
TOKENIZER: kuromoji_tokenizer
Traceback (most recent call last):
  File "__main__.py", line 47, in <module>
    main()
  File "__main__.py", line 36, in main
    es=es))
  File "/Users/toiwa/Projects/Private/elyzer/elyzer/elyzer.py", line 72, in stepWise
    analyzeResp = es.indices.analyze(index=indexName, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/indices.py", line 32, in analyze
    '_analyze'), params=params, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
    body = self.serializer.dumps(body)
  File "/Library/Python/2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'text': 'caf\xc3\xa9', 'char_filter': [], 'tokenizer': u'kuromoji_tokenizer'}, UnicodeDecodeError('ascii', '"caf\xc3\xa9"', 4, 5, 'ordinal not in range(128)'))

The text was updated successfully, but these errors were encountered:

tatsuya mentioned this issue May 11, 2017

Support unicode text #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support #12

Unicode support #12

tatsuya commented May 11, 2017

Unicode support #12

Unicode support #12

Comments

tatsuya commented May 11, 2017