-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tantivy index #149
Tantivy index #149
Conversation
To my mind, one of the next steps here could be swapping out the tokenizer for a custom routine. The NK tokenizer does a ton of things that may not at all be needed, and I'd rather rely on how the tantivy people did it. We might want to keep transliteration and a few bits of normalisation in place, but more on a per-type basis :) |
Agreed. I just want to spend a little bit of time identifying why some of the entities we're not matching are missing - if they're dropping out because they're not coming up as candidates, or some bug somewhere. But definitely want to make better use of the tokenisation available in tantivy |
It's giving results as if only one term needs to match, and multiple matches aren't scoring high
No description provided.