Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix match is ranked lower than expected #2

Open
MoonZ opened this issue Feb 15, 2024 · 1 comment
Open

Prefix match is ranked lower than expected #2

MoonZ opened this issue Feb 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@MoonZ
Copy link

MoonZ commented Feb 15, 2024

Following the HN article, quickly tried the demo on the places set but came up with weird results:

image

How come a result that doesn't match the input string gets a better quality score than the one that includes the entry ?
And how come another one not maching is having the same score just below it ?
If it's due to length having a closer match, I then don't understand the advantage for making it real time ? (If I need to type my string almost entirely to make my targeted result come up in the visible list, then "time gained" by indeed good performances is already lost)

But anyway, performances are very good indeed, just trying to figure out the use case.

@m31coding m31coding changed the title Quality index: how come ? Prefix match is ranked low Feb 18, 2024
@m31coding m31coding added the enhancement New feature or request label Feb 18, 2024
@m31coding
Copy link
Owner

m31coding commented Feb 18, 2024

Hi, thank you very much for reporting this observation. I think this is an interesting topic for many people.
Carcasse and Carasso are good matches for the query carcasso because the words are similar. To be precise, the computed quality is the number of common n-grams between the query and the matched term, divided by the number of n-grams of the longer string.

Most fuzzy searchers are based on such string similarities, related heuristics, or well-defined string metrics. That being said, I can definitely see where you are coming from. A correctly typed prefix could potentially yield a higher-quality match than what is currently computed.

One solution that can be adopted is to implement a search controller that queries a fuzzy searcher as well as a suffix array searcher. The latter is good at finding prefix and suffix matches. The result matches of both can then be distinct and combined. However, I consider this approach to be out of scope for this library.

To get closer to your desired behaviour with the library as is, you could boost the qualities of prefix matches like so:

const query = new fuzzySearch.Query(queryString, 50, 0.2);
const result = searcher.getMatches(query);
for (const match of result.matches) {
  if (match.matchedString.toLowerCase().normalize('NFKD').startsWith(query.string.toLowerCase().normalize('NFKD'))) {
    match.quality = Math.pow(match.quality, 0.5);
  }
}
result.matches.sort((m1, m2) => m2.quality - m1.quality);

The lower the exponent, the more weight is given to prefix matches. With my choice of 0.5 Carcassonne is at rank 1 with a quality of 0.8 for the query carcasso. The quality of the other results don't change because their prefixes don't match.

I will try to find time in Q2 to investigate whether something like this can be implemented in the core of the searcher in a proper and performant way.

@m31coding m31coding changed the title Prefix match is ranked low Prefix match is ranked lower than expected Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants