Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Ranking of aggregated search results based on relevancy of the search result to the user's search query #549

Merged
merged 12 commits into from
Mar 25, 2024

Conversation

spencerjibz
Copy link
Contributor

@spencerjibz spencerjibz commented Mar 20, 2024

What does this PR do?

Implements an algorithm to rank (in other words, sort) the aggregated search results according to the relevancy of the search result based on the user's search query by analyzing three properties of the search results:

  • The search result's tile
  • description
  • and URL.

Also, check each of the properties for how closely they are related if they are too closely related then it should be the first result in all search results. If it is the least related, then it should be the last result to be shown in all the search results.

Chosen Ranking function: TF-IDF (Term Frequency-Inverse Document Frequency)
This is a weighting scheme often used by search engines as a central tool in scoring and ranking a document's relevance given a user query, find implementation details below:

  • Our implementation measures the relevance of a user's query in a collection of corpus (title, description and URL), then finally sorts the results by the relevance score.
  • Optional: we can also opt-in to weigh the query by synonyms found in the search title, description and URL. This feature can be enabled using the flag use-synonyms-search or use-non-static-synonyms-search.

Caution

This feature flag enables using an offline thesaurus library (moby) to find synonyms and add them to the relevance calculations), there is both a performance and binary size cost when this feature is used..

For more information on this ranking function, check out this article

Why is this change important?

Currently, the Search engine doesn't sort results by importance or relevance. This PR aims to solve that.

How to test this PR locally?

It can be tested by installing and running Websurfx as mentioned in the docs and on the readme and by launching the browser and thoroughly testing. On searching for a topic, you find that the most relevant results are returned first .

To try out using synonyms, build or run Websurfx with the feature flag use-synonyms-search

Related issues

Closes #393

Copy link
Contributor

@ddotthomas ddotthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the search is much more accurate now too.

@alamin655 alamin655 added this to the Complete v2.0.0 release milestone Mar 21, 2024
@alamin655 alamin655 requested review from alamin655 and neon-mmd and removed request for ddotthomas March 21, 2024 06:39
@alamin655 alamin655 removed this from the Complete v2.0.0 release milestone Mar 21, 2024
@alamin655 alamin655 changed the title Rank and sort aggregated search results ✨ Ranking of aggregated search results based on relevancy of the search result to the user's search query Mar 21, 2024
Copy link
Owner

@neon-mmd neon-mmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review.

Thanks ❤️ for working on this PR. We really appreciate it 👍 Keep it up!! But before we merge this PR. We would like to suggest a few changes. 🙂.

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/models/aggregation_models.rs Outdated Show resolved Hide resolved
src/models/aggregation_models.rs Outdated Show resolved Hide resolved
spencerjibz and others added 7 commits March 23, 2024 23:57
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
Copy link
Owner

@neon-mmd neon-mmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ❤️ again for working on this. Now since everything looks good. We will merge the PR as soon as possible. 🙂

Copy link
Collaborator

@alamin655 alamin655 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Let's merge this in. 👍

@mergify mergify bot merged commit bb50e8b into neon-mmd:rolling Mar 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

✨ Ranking of aggregated search results based on relevancy of the search result to the user's search query
4 participants