Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

给定文档“CBD”,匹配搜索c,b,d,cb,cbd #9

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ocre
Copy link

@ocre ocre commented May 11, 2017

lc-pinyin分析器很强大,我们在搜索框自动提示中用着很爽。感谢作者分享!
不过,在处理中英文混合文档或者纯英文、数字时,我们遇到一个小问题,希望得到改进:
使用lc_index分析器索引文档“CBD”,然后使用lc_search分析器构建match_phrase查询,搜索“CBD” ,无法匹配。
经过反复测试,发现对于英文单词和数字,lc_index 分析器默认不会切分成单字符,比如“CBD”经分析后还是"CBD",而lc_search默认会切分成单字符搜索,比如“CBD”经分析后变成了“C” "B" "D",这样就搜不到结果了。 因此我们尝试修改lc_index生成token的过程,让输入“CBD”输出“CBD” “C” “B” “D”,这样可以解决搜不到的问题。 希望能够合并代码,并加以优化:把单词拆成单个字符的功能做成一个开关,供有需要的人使用。谢谢!

@gitchennan
Copy link
Owner

感谢@ocre的关注和支持,对于你说的针对英文分词这种情况,建议直接使用英文分词器来做分词和查询。lc-pinyin这个插件主要是用来做拼音分词的,不建议直接在lc-pinyin上加上英文分词的功能。如果你要同时支持拼音分词、中文分词、英文分词最好的做法是使用es的fields字段结合不同分词器来实现,官方文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants