-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.0 后版本的高亮问题 #169
Comments
|
|
我描述的不好,问题发生在文档里有中英、数字混合的情况下:
结果里的
|
我也遇到这个问题,有解吗?? |
@medcl pinyin 的tokenizer似乎有问题,如果设置"ignore_pinyin_offset": false,写入在一定量数据之后就会报startOffset must be non-negative异常,似乎是代码问题。 目前解决这个问题,可以采用ngram+pinyin filter方式,如下: { |
7.6.2 版本的配置: {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_ngram",
"filter": [
"pinyin_filter"
]
}
},
"tokenizer": {
"my_ngram": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"filter": {
"pinyin_filter": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true,
"none_chinese_pinyin_tokenize": false,
"remove_duplicated_term": true
}
}
} |
6.0 后多了 "ignore_pinyin_offset" 参数,index 时 startOffset endOffset 都是 0,导致出来的结果高亮词都是
<em></em>xxxx
。如果设置ignore_pinyin_offset: false
,会报错:要如何解决这个问题呢?
The text was updated successfully, but these errors were encountered: