New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyzers for Vietnamese? #6647
Comments
Indeed we don't have a good tokenizer for vietnamese today. Although we would like to have one, vietnamese segmentation is quite hard so I'm afraid this won't be fixed anytime soon. |
Do you have by any chance a list of Vietnamese stopwords? thx |
No we don't. |
@jpountz How about this thing? https://github.com/CaoManhDat/VNAnalyzer |
vnTokenizer is under GPL, which would be an issue for inclusion in Lucene or Elasticsearch. However, elasticsearch supports plugin-in custom analyzers so you could write a plugin that would expose this analyzer, see for instance https://github.com/elasticsearch/elasticsearch-analysis-kuromoji |
Thanks. For those wanting a vietnamese plugin, you guys can check out this one: https://github.com/duydo/elasticsearch-analysis-vietnamese |
@nguyenchiencong you want to submit a PR adding this to the plugins page here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#analysis-plugins ? |
Thanks @nguyenchiencong for mentioning the plugin. @clintongormley It would be great if you can add the plugin to the plugins page. Thank you. |
Added Vietnamese Analyser to plugins page Closes #6647
Added Vietnamese Analyser to plugins page Closes #6647
Added Vietnamese Analyser to plugins page Closes #6647
Added Vietnamese Analyser to plugins page Closes elastic#6647
Added Vietnamese Analyser to plugins page Closes elastic#6647
That plugin has issues with highlighting offsets, at least in version 2.2.0. |
@duydo @jpountz : We are trying to support Vietnamese with SOLR. Is there any plugin available to integrate https://github.com/CaoManhDat/VNAnalyzer ? |
Any analyzers for Vietnamese on the roadmap?
Thx and cheers
The text was updated successfully, but these errors were encountered: