Prefix match is ranked lower than expected #2

MoonZ · 2024-02-15T16:17:53Z

Following the HN article, quickly tried the demo on the places set but came up with weird results:

How come a result that doesn't match the input string gets a better quality score than the one that includes the entry ?
And how come another one not maching is having the same score just below it ?
If it's due to length having a closer match, I then don't understand the advantage for making it real time ? (If I need to type my string almost entirely to make my targeted result come up in the visible list, then "time gained" by indeed good performances is already lost)

But anyway, performances are very good indeed, just trying to figure out the use case.

m31coding · 2024-02-18T10:53:28Z

Hi, thank you very much for reporting this observation. I think this is an interesting topic for many people.
Carcasse and Carasso are good matches for the query carcasso because the words are similar. To be precise, the computed quality is the number of common n-grams between the query and the matched term, divided by the number of n-grams of the longer string.

Most fuzzy searchers are based on such string similarities, related heuristics, or well-defined string metrics. That being said, I can definitely see where you are coming from. A correctly typed prefix could potentially yield a higher-quality match than what is currently computed.

One solution that can be adopted is to implement a search controller that queries a fuzzy searcher as well as a suffix array searcher. The latter is good at finding prefix and suffix matches. The result matches of both can then be distinct and combined. However, I consider this approach to be out of scope for this library.

To get closer to your desired behaviour with the library as is, you could boost the qualities of prefix matches like so:

const query = new fuzzySearch.Query(queryString, 50, 0.2);
const result = searcher.getMatches(query);
for (const match of result.matches) {
  if (match.matchedString.toLowerCase().normalize('NFKD').startsWith(query.string.toLowerCase().normalize('NFKD'))) {
    match.quality = Math.pow(match.quality, 0.5);
  }
}
result.matches.sort((m1, m2) => m2.quality - m1.quality);

The lower the exponent, the more weight is given to prefix matches. With my choice of 0.5 Carcassonne is at rank 1 with a quality of 0.8 for the query carcasso. The quality of the other results don't change because their prefixes don't match.

I will try to find time in Q2 to investigate whether something like this can be implemented in the core of the searcher in a proper and performant way.

m31coding changed the title ~~Quality index: how come ?~~ Prefix match is ranked low Feb 18, 2024

m31coding added the enhancement New feature or request label Feb 18, 2024

m31coding changed the title ~~Prefix match is ranked low~~ Prefix match is ranked lower than expected Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefix match is ranked lower than expected #2

Prefix match is ranked lower than expected #2

MoonZ commented Feb 15, 2024

m31coding commented Feb 18, 2024 •

edited

Loading

Prefix match is ranked lower than expected #2

Prefix match is ranked lower than expected #2

Comments

MoonZ commented Feb 15, 2024

m31coding commented Feb 18, 2024 • edited Loading

m31coding commented Feb 18, 2024 •

edited

Loading