-
Notifications
You must be signed in to change notification settings - Fork 1.5k
chapter24_part5: /270_Fuzzy_matching/50_Scoring_fuzziness.asciidoc #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_did-you-mean_ {ref}/search-suggesters-phrase.html[`phrase` suggester]. | ||
在模糊查询最初出现时很少能单独使用。他们更好的作为一个 ``bigger'' 场景的部分功能特性,如 _search-as-you-type_ | ||
{ref}/search-suggesters-completion.html[`完成` 建议]或 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里预览时两个引用的背景加粗貌似有点奇怪
LGTM |
1 similar comment
LGTM |
TIP: Fuzzy matching should not be used for scoring purposes--only to widen | ||
the net of matching terms in case there are misspellings. | ||
假设我们有1000个文档包含 ``Schwarzenegger`` ,只是一个文档的出现拼写错误 ``Schwarzeneger`` 。 | ||
根据 <<tfidf,term frequency/inverse document frequency>> 理论,这个拼写错误文档比拼写正确的相关度更高,因为它更少在文档中出现! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为它在更少的文档中出现!
解释:这里强调的是 TF-IDF 的 IDF值。
|
||
假设我们有1000个文档包含 ``Schwarzenegger`` ,只是一个文档的出现拼写错误 ``Schwarzeneger`` 。 | ||
根据 <<tfidf,term frequency/inverse document frequency>> 理论,这个拼写错误文档比拼写正确的相关度更高,因为它更少在文档中出现! | ||
根据 <<tfidf,term frequency/inverse document frequency>> 理论,这个拼写错误文档比拼写正确的相关度更高,因为错误拼写出现在更少在文档中! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
更少在文档中 -> 更少的文档中
LGTM |
模糊性评分。Scoring_fuzziness