Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonyms can create excessively complex search queries #3125

Closed
3 tasks
Tracked by #3111
loiclec opened this issue Nov 23, 2022 · 2 comments
Closed
3 tasks
Tracked by #3111

Synonyms can create excessively complex search queries #3125

loiclec opened this issue Nov 23, 2022 · 2 comments
Assignees
Labels
impacts docs This issue involves changes in the Meilisearch's documentation milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Milestone

Comments

@loiclec
Copy link
Contributor

loiclec commented Nov 23, 2022

When a word has a multi-word synonym, for example:

"btw" -> "by the way"

or, worse, multiple multi-word synonyms, such as:

"poc" -> ["person of colour", "proof of concept"]

then search queries containing these synonyms will take much longer to resolve, especially if the queries are already long and contain many words with synonyms.

The reason for this slowdown mostly comes down to the fact that we treat a multi-word synonym not as a phrase but as a list of regular search query terms.

Therefore the following search query:

btw I am going to nyc soon

gets mapped to:

by the way I am going to New York City soon

as opposed to:

"by the way" I am going to "New York City" soon

It is worth considering the impact of treating multi-word synonyms as phrases in terms of relevancy. If the impact is neutral or just slightly negative, then we should make this change to ensure good search performance.


TODO

  • Implement changes in Milli
  • Release a Milli version containing these changes
  • Bump this new Milli version in Meilisearch and merge it into main
@loiclec loiclec added milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption labels Nov 23, 2022
@loiclec loiclec self-assigned this Nov 23, 2022
@loiclec loiclec mentioned this issue Nov 23, 2022
8 tasks
@curquiza curquiza added this to the v1.0.0 milestone Jan 3, 2023
bors bot added a commit to meilisearch/milli that referenced this issue Jan 4, 2023
732: Interpret synonyms as phrases r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (when merged into meilisearch) meilisearch/meilisearch#3125

## What does this PR do?
We now map multi-word synonyms to phrases instead of loose words. Such that the request:
```
btw I am going to nyc soon
```
is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`):
```
"by the way" I am going to "New York City" soon
```
instead of:
```
by the way I am going to New York City soon
```

This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance.

In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for:
```
good DIY project
```
and have a result such as:
```
{
    "text": "whether it is a good project to do, you'll have to decide for yourself"
}
```
However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`.

## Performance

With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
@curquiza
Copy link
Member

curquiza commented Jan 4, 2023

Close by #3269, which integrates milli v0.38.0, which integrates the changes for this issue.

@curquiza curquiza closed this as completed Jan 4, 2023
@curquiza
Copy link
Member

@meilisearch/docs-team might be interested (it's up to you if you want to explain)

@curquiza curquiza added the impacts docs This issue involves changes in the Meilisearch's documentation label Jan 10, 2023
@meili-bot meili-bot added the v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impacts docs This issue involves changes in the Meilisearch's documentation milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Projects
None yet
Development

No branches or pull requests

3 participants