Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] consider moving to uFuzzy #179

Closed
KraXen72 opened this issue Jan 16, 2023 · 5 comments
Closed

[Feature request] consider moving to uFuzzy #179

KraXen72 opened this issue Jan 16, 2023 · 5 comments

Comments

@KraXen72
Copy link

Is your feature request related to a problem? Please describe.
https://github.com/leeoniya/uFuzzy is apparently faster & yields more results than minisearch.
see comparison:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&search=super%20ma

Describe the solution you'd like
if not too much hassle, swap searching library to uFuzzy

Describe alternatives you've considered
continue using minisearch, as it is ok now.

Additional context
more info about uFuzzy in their readme

@scambier
Copy link
Owner

I can take a look to see how it performs (in terms of speed and quality) just out of curiosity, but unless I'm blown away by the results, there's little chance I'm going to replace Minisearch.

@leeoniya
Copy link

leeoniya commented Jan 25, 2023

but unless I'm blown away by the results

when not trying to fuzzy match, MiniSearch is one of the better ones (fast to search, no terrible matches, but startup indexing still takes 600ms vs 0.5ms for my 162k test dataset and consumes hundreds of MB ram), so you won't be as blown away as someone coming from Fuse.

that being said, MiniSearch is quite bad outside of exact matches or exact prefixes. its fuzzy option brings back completely irrelevant results so is not good for typo tolerance. it also suffers from the same problem as many other searches -- it produces a single "score" that is then used for ordering. if we change uFuzzy settings to behave similarly to MiniSearch in terms of matching (outOfOrder, exact + prefixes), you can see how a slight change in its scores puts the results in 🤯 order. i don't think this is something that can fundamentally be fixed in most libraries that rely on composite relevance scores. you can read more thoughts about it here: https://github.com/leeoniya/uFuzzy#a-biased-appraisal-of-similar-work

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&outOfOrder&interLft=1&search=super%20ma

image

@leeoniya
Copy link

leeoniya commented Jan 25, 2023

i copied your settings from

prefix: term => term.length >= options.prefixLength,
// length <= 3: no fuzziness
// length <= 5: fuzziness of 10%
// length > 5: fuzziness of 20%
fuzzy: term => (term.length <= 3 ? 0 : term.length <= 5 ? 0.1 : 0.2),

into

https://github.com/leeoniya/uFuzzy/blob/15c0809b29f80d60abd6df33b83a499ef0085644/demos/compare.html#L857-L860

and got 0 results with options.prefixLength = 3, and the screenshot above with options.prefixLength = 1 (basically just prefix: true)

@scambier
Copy link
Owner

Thanks for those comments and clarifications. From your README:

In its default configuration (uFuzzy) is likely a poor fit for applications like spellcheck or fulltext/document search.

For context, fulltext search is an important feature for me. My goal with Omnisearch is to allow retrieval of documents from queries that often don't match with filenames or headings. I need it to work in a "good enough" way for varied users who all use Obsidian differently; many of them have folders that contain thousands of (sometimes quite large) documents.

I see how your demo returns relevant results with shorter queries, but I'm mainly worried with the performances and results quality for all those users. Since Minisearch is at least "good enough" for them, I'm cautious about switching it for something else and breaking their workflow.

@leeoniya
Copy link

leeoniya commented Jan 26, 2023

yep, no worries. 👍

for sure there's a tipping point when building an index is going to be more appropriate; "thousands of (sometimes quite large) documents" is definitely one of those cases. for fulltext search, you're better off stemming and doing spelling correction on the needle before continuing with efficient lookups in the index.

if you do end up experimenting with uFuzzy, i can help you with the settings to get the most out of it.

@scambier scambier closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants