You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unicode is far from my expertise, so I may be very wrong about this. It seems that Sense is urlencoding unicode characters, which is preventing them from being properly decoded at Elasticsearch?
For example, if we setup a smartcn analyzer and analyze some chinese characters:
So that part looks ok (unlike previous versions of Sense), So I suspect the proxy portion is what's incorrectly encoding. I pulled out a packet sniffer and this is what the proxy is sending to ES:
GET /test_chinese/_analyze?text=%27���L}!%27 HTTP/1.1
connection: keep-alive
x-forwarded-proto: http
accept: text/plain, */*; q=0.01
referer: http://localhost:5601/app/sense
kbn-xsrf-token: 959b10246601e4bc85e7f57d254ea23c31800cd60b36ee50627d0b6ef84f52f7
accept-encoding: gzip, deflate, sdch
x-forwarded-for: 127.0.0.1
accept-language: en-US,en;q=0.8
x-forwarded-port: 59597
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 Safari/537.36
Host: localhost:9200
From: https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens
Unicode is far from my expertise, so I may be very wrong about this. It seems that Sense is urlencoding unicode characters, which is preventing them from being properly decoded at Elasticsearch?
For example, if we setup a smartcn analyzer and analyze some chinese characters:
The tokens are incorrect:
If we look at what get's sent over the wire:
So that part looks ok (unlike previous versions of Sense), So I suspect the proxy portion is what's incorrectly encoding. I pulled out a packet sniffer and this is what the proxy is sending to ES:
Full Packet dump:
For comparison, if you run the command via curl, you get the proper tokens back:
The text was updated successfully, but these errors were encountered: