Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzers for Vietnamese? #6647

Closed
nguyenchiencong opened this issue Jun 30, 2014 · 11 comments
Closed

Analyzers for Vietnamese? #6647

nguyenchiencong opened this issue Jun 30, 2014 · 11 comments

Comments

@nguyenchiencong
Copy link

Any analyzers for Vietnamese on the roadmap?

Thx and cheers

@jpountz jpountz assigned rmuir and unassigned rmuir Jun 30, 2014
@jpountz
Copy link
Contributor

jpountz commented Jun 30, 2014

Indeed we don't have a good tokenizer for vietnamese today. Although we would like to have one, vietnamese segmentation is quite hard so I'm afraid this won't be fixed anytime soon.

@nguyenchiencong
Copy link
Author

Do you have by any chance a list of Vietnamese stopwords? thx

@jpountz
Copy link
Contributor

jpountz commented Jul 2, 2014

No we don't.

@anhtran
Copy link

anhtran commented Oct 14, 2014

@jpountz How about this thing? https://github.com/CaoManhDat/VNAnalyzer
It based on the research at http://mim.hus.vnu.edu.vn/phuonglh/tools/userguide-vnTokenizer.pdf
I believe it can wrap about 80-90% cases in Vietnamese. That's good enough for searching.

@jpountz
Copy link
Contributor

jpountz commented Oct 14, 2014

vnTokenizer is under GPL, which would be an issue for inclusion in Lucene or Elasticsearch. However, elasticsearch supports plugin-in custom analyzers so you could write a plugin that would expose this analyzer, see for instance https://github.com/elasticsearch/elasticsearch-analysis-kuromoji

@nguyenchiencong
Copy link
Author

Thanks. For those wanting a vietnamese plugin, you guys can check out this one: https://github.com/duydo/elasticsearch-analysis-vietnamese

@clintongormley
Copy link

@nguyenchiencong
Copy link
Author

@duydo is the author. I think we should ask him first. @duydo it would be great if you can do it.

@duydo
Copy link

duydo commented Oct 20, 2014

Thanks @nguyenchiencong for mentioning the plugin.

@clintongormley It would be great if you can add the plugin to the plugins page. Thank you.

clintongormley added a commit that referenced this issue Oct 20, 2014
Added Vietnamese Analyser to plugins page

Closes #6647
clintongormley added a commit that referenced this issue Oct 20, 2014
Added Vietnamese Analyser to plugins page

Closes #6647
clintongormley added a commit that referenced this issue Oct 20, 2014
Added Vietnamese Analyser to plugins page

Closes #6647
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Added Vietnamese Analyser to plugins page

Closes elastic#6647
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Added Vietnamese Analyser to plugins page

Closes elastic#6647
@dripp1
Copy link

dripp1 commented Jul 17, 2016

That plugin has issues with highlighting offsets, at least in version 2.2.0.
Has anybody been able to use it or can recommend another Vietnamese plugin?

@jayeshgoyal1995
Copy link

@duydo @jpountz : We are trying to support Vietnamese with SOLR. Is there any plugin available to integrate https://github.com/CaoManhDat/VNAnalyzer ?
Also, Is there any other way we can add Vietnamese support in solr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants