[Feature request] consider moving to uFuzzy #179

KraXen72 · 2023-01-16T09:49:08Z

Is your feature request related to a problem? Please describe.
https://github.com/leeoniya/uFuzzy is apparently faster & yields more results than minisearch.
see comparison:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&search=super%20ma

Describe the solution you'd like
if not too much hassle, swap searching library to uFuzzy

Describe alternatives you've considered
continue using minisearch, as it is ok now.

Additional context
more info about uFuzzy in their readme

scambier · 2023-01-16T12:09:22Z

I can take a look to see how it performs (in terms of speed and quality) just out of curiosity, but unless I'm blown away by the results, there's little chance I'm going to replace Minisearch.

leeoniya · 2023-01-25T23:33:24Z

but unless I'm blown away by the results

when not trying to fuzzy match, MiniSearch is one of the better ones (fast to search, no terrible matches, but startup indexing still takes 600ms vs 0.5ms for my 162k test dataset and consumes hundreds of MB ram), so you won't be as blown away as someone coming from Fuse.

that being said, MiniSearch is quite bad outside of exact matches or exact prefixes. its fuzzy option brings back completely irrelevant results so is not good for typo tolerance. it also suffers from the same problem as many other searches -- it produces a single "score" that is then used for ordering. if we change uFuzzy settings to behave similarly to MiniSearch in terms of matching (outOfOrder, exact + prefixes), you can see how a slight change in its scores puts the results in 🤯 order. i don't think this is something that can fundamentally be fixed in most libraries that rely on composite relevance scores. you can read more thoughts about it here: https://github.com/leeoniya/uFuzzy#a-biased-appraisal-of-similar-work

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&outOfOrder&interLft=1&search=super%20ma

leeoniya · 2023-01-25T23:58:19Z

i copied your settings from

obsidian-omnisearch/src/search/omnisearch.ts

Lines 161 to 165 in c896fd4

    
           prefix: term => term.length >= options.prefixLength, 
        
           // length <= 3: no fuzziness 
        
           // length <= 5: fuzziness of 10% 
        
           // length > 5: fuzziness of 20% 
        
           fuzzy: term => (term.length <= 3 ? 0 : term.length <= 5 ? 0.1 : 0.2),

into

https://github.com/leeoniya/uFuzzy/blob/15c0809b29f80d60abd6df33b83a499ef0085644/demos/compare.html#L857-L860

and got 0 results with options.prefixLength = 3, and the screenshot above with options.prefixLength = 1 (basically just prefix: true)

scambier · 2023-01-26T06:16:19Z

Thanks for those comments and clarifications. From your README:

In its default configuration (uFuzzy) is likely a poor fit for applications like spellcheck or fulltext/document search.

For context, fulltext search is an important feature for me. My goal with Omnisearch is to allow retrieval of documents from queries that often don't match with filenames or headings. I need it to work in a "good enough" way for varied users who all use Obsidian differently; many of them have folders that contain thousands of (sometimes quite large) documents.

I see how your demo returns relevant results with shorter queries, but I'm mainly worried with the performances and results quality for all those users. Since Minisearch is at least "good enough" for them, I'm cautious about switching it for something else and breaking their workflow.

leeoniya · 2023-01-26T07:55:46Z

yep, no worries. 👍

for sure there's a tipping point when building an index is going to be more appropriate; "thousands of (sometimes quite large) documents" is definitely one of those cases. for fulltext search, you're better off stemming and doing spelling correction on the needle before continuing with efficient lookups in the index.

if you do end up experimenting with uFuzzy, i can help you with the settings to get the most out of it.

scambier mentioned this issue Mar 14, 2024

Stronger boolean search with date range #352

Closed

scambier closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] consider moving to uFuzzy #179

[Feature request] consider moving to uFuzzy #179

KraXen72 commented Jan 16, 2023

scambier commented Jan 16, 2023

leeoniya commented Jan 25, 2023 •

edited

Loading

leeoniya commented Jan 25, 2023 •

edited

Loading

scambier commented Jan 26, 2023

leeoniya commented Jan 26, 2023 •

edited

Loading

[Feature request] consider moving to uFuzzy #179

[Feature request] consider moving to uFuzzy #179

Comments

KraXen72 commented Jan 16, 2023

scambier commented Jan 16, 2023

leeoniya commented Jan 25, 2023 • edited Loading

leeoniya commented Jan 25, 2023 • edited Loading

scambier commented Jan 26, 2023

leeoniya commented Jan 26, 2023 • edited Loading

leeoniya commented Jan 25, 2023 •

edited

Loading

leeoniya commented Jan 25, 2023 •

edited

Loading

leeoniya commented Jan 26, 2023 •

edited

Loading