[RFC] OpenSearch Search Relevance #1

ps48 · 2022-07-05T18:17:24Z

OpenSearch Search Relevance

Overview

In a search engine, the relevance is the measure of the relationship accuracy between the search query and the search result. Higher the relevance is, the higher is the quality of search result and the users are able to get more relevant content. This project aims to add plugins to OpenSearch to help users make their query results more accurate, contextual and relevant.

Relevancy and OpenSearch

Today, OpenSearch provides results in the order of scores generated by algorithms matching the indexed document contents to the input query. The documents that are more relevant to the query are ranked higher than the ones that are less relevant. These rankings may make sense to one set of users/applications and for others it may be very irrelevant. For example, relevancy for an E-commerce company can mean more similar products in the same category of the search query. While for a document search relevancy may mean, searching the query across different topics/categories present in the document store. This is why, we need more ways to customize the results and its rankings as per the need of the user/business.

Relevancy Engineering

Relevancy as a problem, can’t just be solved at the search layer. Improving relevancy should be envisioned holistically from understanding the ingested data and usage signals to extracting feature, adding re-writers and improving algorithms. Below is the architecture of OpenSearch Relevancy Engineering.

[Initially presented at Haystack 2022 by @anirudha , @JohannesDaniel and @ps48].

Overall the Relevancy Engineering can be divided into two tiers:

Ingestion Tier: This tier handles getting the data from different sources to OpenSearch. This data may include:
1. Search Data:
  1. Core search data, that needs to be queried on by OpenSearch
  2. Ingestion connectors to fetch the data from different data sources and sink in OpenSearch indices.
2. Search Management Data:
  1. Adding rules and judgements to the rewriter indices.
3. Observability Data:
  1. Adding customer usage signals to OpenSearch, these signals may include granular details like anonymized customer queries, clicks, orders and session details.
Search & Relevancy Platform Tier: This tier is responsible for analytics, re-wrtiers, model improvements and adding search configurations.
1. Search Analytics & Discovery:
  1. Dashboards for analytics, metrics for search tests, search UIs and query profiling.
2. Querqy based query Rewriting:
  1. Rewriters to customize queries with synonyms, word-breaks, spell corrections, query relaxation.
3. Search Back Office:
  1. Manage business rules, ontologies and manual judgments.
4. Relevancy workbench:
  1. Improve algorithms with automated testing, relevance model trainings, personalizations and custom re-rankers.

Appendix

OpenSearch is flexible with its plugin based architecture. Each plugin interface in OpenSearch provides different options to intercept a current workflow or extend the engine with new workflow options. One of the places where relevancy plugins should focus is the “Search Plugin interface”. This interface provides plugins the functionality to intercept/add/modify:

Score functions
Significance Heuristics
Define/modify Aggregator and their functions
Highlighters
Suggesters
Queries
Sorting orders
Re-scorers

More details on the each interface can be found in OpenSearch code base and in this blog.

macohen · 2022-09-23T16:58:01Z

Is there a link somewhere to the Haystack 2022 presentation for more context? While the Querqy integration is already awesome and makes a ton of sense, it's the one piece of this diagram and RFC that is opinionated in the implementation. What do you think of updating the diagram for the purposes of this RFC to abstract the implementation details?

ps48 · 2022-09-29T22:57:18Z

Hi @macohen, here's the link to slides form our Haystack talk. The above RFC was our first take on OpenSearch Search-Relevance project, this is very open to change with more comments/feedback from the community. We can surely update the diagram with a generic name like "Rules based Re-writing" instead of Querqy.

macohen · 2022-12-27T14:07:10Z

I'm closing this issue. It's broad and covers a lot so I moved it to RELEVANCE.md. We'll create other RFCs to cover this topic.

ps48 added the enhancement change or upgrade that increases software capabilities beyond original client specifications label Jul 5, 2022

ps48 mentioned this issue Jul 6, 2022

[RFC] OpenSearch Querqy Plugin Design Document #2

Closed

ps48 mentioned this issue Dec 2, 2022

[RFC] OpenSearch Relevancy Workbench opensearch-project/dashboards-search-relevance#105

Open

noCharger added the rfc Substantial changes or new features that require community input to garner consensus. label Dec 20, 2022

macohen closed this as completed Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] OpenSearch Search Relevance #1

[RFC] OpenSearch Search Relevance #1

ps48 commented Jul 5, 2022

macohen commented Sep 23, 2022

ps48 commented Sep 29, 2022

macohen commented Dec 27, 2022

[RFC] OpenSearch Search Relevance #1

[RFC] OpenSearch Search Relevance #1

Comments

ps48 commented Jul 5, 2022

OpenSearch Search Relevance

Overview

Relevancy and OpenSearch

Relevancy Engineering

Appendix

macohen commented Sep 23, 2022

ps48 commented Sep 29, 2022

macohen commented Dec 27, 2022