Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] OpenSearch Search Relevance #1

Closed
ps48 opened this issue Jul 5, 2022 · 3 comments
Closed

[RFC] OpenSearch Search Relevance #1

ps48 opened this issue Jul 5, 2022 · 3 comments
Labels
enhancement change or upgrade that increases software capabilities beyond original client specifications rfc Substantial changes or new features that require community input to garner consensus.

Comments

@ps48
Copy link
Member

ps48 commented Jul 5, 2022

OpenSearch Search Relevance

Overview

In a search engine, the relevance is the measure of the relationship accuracy between the search query and the search result. Higher the relevance is, the higher is the quality of search result and the users are able to get more relevant content. This project aims to add plugins to OpenSearch to help users make their query results more accurate, contextual and relevant.

Relevancy and OpenSearch

Today, OpenSearch provides results in the order of scores generated by algorithms matching the indexed document contents to the input query. The documents that are more relevant to the query are ranked higher than the ones that are less relevant. These rankings may make sense to one set of users/applications and for others it may be very irrelevant. For example, relevancy for an E-commerce company can mean more similar products in the same category of the search query. While for a document search relevancy may mean, searching the query across different topics/categories present in the document store. This is why, we need more ways to customize the results and its rankings as per the need of the user/business.

Relevancy Engineering

Relevancy as a problem, can’t just be solved at the search layer. Improving relevancy should be envisioned holistically from understanding the ingested data and usage signals to extracting feature, adding re-writers and improving algorithms. Below is the architecture of OpenSearch Relevancy Engineering.

[Initially presented at Haystack 2022 by @anirudha , @JohannesDaniel and @ps48].

Overall the Relevancy Engineering can be divided into two tiers:
  1. Ingestion Tier: This tier handles getting the data from different sources to OpenSearch. This data may include:
    1. Search Data:
      1. Core search data, that needs to be queried on by OpenSearch
      2. Ingestion connectors to fetch the data from different data sources and sink in OpenSearch indices.
    2. Search Management Data:
      1. Adding rules and judgements to the rewriter indices.
    3. Observability Data:
      1. Adding customer usage signals to OpenSearch, these signals may include granular details like anonymized customer queries, clicks, orders and session details.
  2. Search & Relevancy Platform Tier: This tier is responsible for analytics, re-wrtiers, model improvements and adding search configurations.
    1. Search Analytics & Discovery:
      1. Dashboards for analytics, metrics for search tests, search UIs and query profiling.
    2. Querqy based query Rewriting:
      1. Rewriters to customize queries with synonyms, word-breaks, spell corrections, query relaxation.
    3. Search Back Office:
      1. Manage business rules, ontologies and manual judgments.
    4. Relevancy workbench:
      1. Improve algorithms with automated testing, relevance model trainings, personalizations and custom re-rankers.

Appendix

OpenSearch is flexible with its plugin based architecture. Each plugin interface in OpenSearch provides different options to intercept a current workflow or extend the engine with new workflow options. One of the places where relevancy plugins should focus is the “Search Plugin interface”. This interface provides plugins the functionality to intercept/add/modify:

  1. Score functions
  2. Significance Heuristics
  3. Define/modify Aggregator and their functions
  4. Highlighters
  5. Suggesters
  6. Queries
  7. Sorting orders
  8. Re-scorers

More details on the each interface can be found in OpenSearch code base and in this blog.

@ps48 ps48 added the enhancement change or upgrade that increases software capabilities beyond original client specifications label Jul 5, 2022
@macohen
Copy link
Collaborator

macohen commented Sep 23, 2022

Is there a link somewhere to the Haystack 2022 presentation for more context? While the Querqy integration is already awesome and makes a ton of sense, it's the one piece of this diagram and RFC that is opinionated in the implementation. What do you think of updating the diagram for the purposes of this RFC to abstract the implementation details?

@ps48
Copy link
Member Author

ps48 commented Sep 29, 2022

Hi @macohen, here's the link to slides form our Haystack talk. The above RFC was our first take on OpenSearch Search-Relevance project, this is very open to change with more comments/feedback from the community. We can surely update the diagram with a generic name like "Rules based Re-writing" instead of Querqy.

@noCharger noCharger added the rfc Substantial changes or new features that require community input to garner consensus. label Dec 20, 2022
@macohen
Copy link
Collaborator

macohen commented Dec 27, 2022

I'm closing this issue. It's broad and covers a lot so I moved it to RELEVANCE.md. We'll create other RFCs to cover this topic.

@macohen macohen closed this as completed Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement change or upgrade that increases software capabilities beyond original client specifications rfc Substantial changes or new features that require community input to garner consensus.
Projects
None yet
Development

No branches or pull requests

3 participants