Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on combination of result sets from different query types #4557

Closed
jmazanec15 opened this issue Sep 19, 2022 · 21 comments
Closed

Discussion on combination of result sets from different query types #4557

jmazanec15 opened this issue Sep 19, 2022 · 21 comments
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@jmazanec15
Copy link
Member

Recently, from k-NN, we have received interest in combining results from text matching queries with k-NN queries (ref). So, I wanted to start a discussion about result combination and the problems doing it when scores are computed at the shard level.

From my understanding, the only way to currently combine scores from different queries is through a boolean query. In a boolean query, multiple query clauses are provided in the should or must section and the scores are combined via addition. On top of this, with function scoring, it is possible to manipulate the score of the results further before the addition of scores.

That being said, I see a few limitations with this approach.

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

Second, it is not possible to consider global hits for re-ranking. Because scores are assigned at the shard level, any rescoring has to be done at the shard level. I see a problem with this in two cases: first, for score combination, if an index has a significant number of shards, a user may want only the top results to be combined instead of combining for the results of all shards in the index and then aggregating at the coordinator; second, in the future, a user may have a model that they want to run the results through to re-rank them that is expensive and they dont want to run for each shard in the index.

That being said, my preliminary thoughts to creating a solution for this would be to create some kind of search and merge api that might look like:

GET /{index_name}/_search_and_merge
{
    "queries": [
      {
        "query": {
          "knn": {
            "my_vector2": {
              "vector": [2, 3, 5, 6],
              "k": 2
            }
          }
        }
      },
      {
        "query": {
          "match_all": {}
        }
      }
    ],
    "merge": {
      "norm": true,
      "disjunctive": false,
      "strategy": "linear_combo"
      "params": {
        "weight_1": 10.2,
        "weight_2": 0.25
      }
    },
    "size": 100
}

Here the queries would be a list of queries to be executed independently and merge would contain the logic to combine the results at the coordinator level.

All that being said, I want to see what people think of doing this? Is there a way to accomplish this with the current search interface?

@jmazanec15 jmazanec15 added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 19, 2022
@tlfeng tlfeng added discuss Issues intended to help drive brainstorming and decision making Indexing & Search and removed untriaged labels Sep 20, 2022
@dblock
Copy link
Member

dblock commented Sep 20, 2022

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)?
Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

@jmazanec15
Copy link
Member Author

In general, I think it would be similar to rescore on coordinator level. The workflow I have in mind would:

  1. Execute multiple queries that would return results to coordinator (almost like _msearch API)
  2. Pre-process results from each query (i.e. normalization)
  3. Expose rescore functionality based on these results (take into account score combination)
  4. Reduce results based on score and execute fetch phase

I think extending rescore might be confusing from user perspective (i.e. when does it happen shard or coordinator).

@navneet1v
Copy link
Contributor

navneet1v commented Sep 27, 2022

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

Yes this can be seen as a possible solution and I think we should do some brainstorming around how we want to do this.

But as from requirements standpoint of the above ask, what Jack mentioned in the comment is the exact requirements. Do we want to call is rescore or some other name that we can debate.

@peterdm
Copy link

peterdm commented Dec 20, 2022

Take a look at this paper from the Pinecone team: https://arxiv.org/abs/2210.11934

Convex Combination shows promise as a simple/effective way to combine the two hybrid scores. (Normalize each, and regress to find a linear scaling factor between the two distributions).

Distributed normalization of course is the challenge. Shard-level might be workable.

I’d suggest starting with simply allowing the scores from each query type to be returned and combined as the developer sees fit.

For example convex combination could be implemented by two script_score queries inside a dismax tie=1 (where each script score employed a precomputed normalization function and boost for each query type).

@aishwaryabajaj-54
Copy link

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

@navneet1v
Copy link
Contributor

We have not decided yet as of now. I am working on building some proposal around the same. Will update this issue once I have some research done. But on high level I was thinking to run the queries in parallel. So that we can reduce the latency and then do the normalizations of the scores before we combine them.

But this is still very early thought. Will need to see what is best.

@navneet1v
Copy link
Contributor

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

@aishwaryabajaj-54 do we see a difference if the queries are performed in different order? As per my understanding if we are doing normalization of scores for each query and then combining the scores then it doesn't matter in which order the queries are executed.

Please let us know if you think otherwise and is there can be a case where this assumption doesn't hold true.

@navneet1v
Copy link
Contributor

navneet1v commented Jan 9, 2023

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

@dblock
The idea of extending the rescoring at higher level that shard looks very promising. But on a different level I was thinking to build another phase in the whole Query and Fetch phases flow. This can be used by different plugins to do operations at Manager Level before the control goes to Fetch phase. I will try to create a proposal for this to start some discussion.

@dblock
Copy link
Member

dblock commented Jan 10, 2023

Thanks @navneet1v. Since we're talking additional phases in query and fetch, cc-ing @nknize who has been thinking about refactoring aggregation, which I believe is either on the same path or at least related.

@rhvaz
Copy link

rhvaz commented Jan 11, 2023

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation opensearch-project/k-NN#717

I realised that the query below simply retrieves based on keyword matching and then re-ranks based on cosinesimil, which leaves out many relevant candidates in vector space.

body = {
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "query_string": {
           "fields": ["category"],
           "query": "hairdresser"
       }
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "text_embeddings",
         "query_value": [0.1, 1.0, 0.3, 0.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

@rhvaz
Copy link

rhvaz commented Jan 11, 2023

I've tried to adapt the recommended answer on a test index from this https://stackoverflow.com/questions/67701323/full-text-and-knn-vector-hybrid-search-for-elastic and it seems to work for me. Any thoughts on this solution? Perhaps latency concerns?

body = {
  "size": 5,
  "query": {
    "function_score": {
      "query": {
        "bool": { 
          "should" : [
            {
              "multi_match" : { 
                "query": "electrician",
                "fields": ["category"]
              }
            },
            {
              "match_all": { }
            }
          ],
          "minimum_should_match" : 0
        }
      },
      "functions": [
      {
        "script_score" : {
          "script" : {
            "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_value, doc['my_vector']) + _score",
            "params": {
              "query_value": [1.0, 1.0, 0.0, 0.0]
            }
          }
        },
        "weight": 1
      }
      ],
      "score_mode": "sum",
      "boost_mode": "sum"
    }
  }
}

@navneet1v
Copy link
Contributor

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation opensearch-project/k-NN#717

I have responded on the feature request. Lets use that issue to talk more about the feature request.

Coming back the problem statement for this particular github issue, all the current score combinations work at Shard Level. We don't want to combine the scores at shard level, but at the Coordinator node level. Also the scores which are generated by keyword match and k-NN are not on the same scale. I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1. Quoting from description

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

@rhvaz
Copy link

rhvaz commented Jan 12, 2023

I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1.

Yeah that makes sense. One way could be to use a sigmoid function on the BM-25 score to map it between 0 and 1. The parameters of the function could be hyperparameters that the user selects. Example,

1 / (1 + Math.exp( - a * (BM25_score - b)))

This a and b can be used to control the slope and point of transition of the curve.

@ankitas3
Copy link

ankitas3 commented Jan 16, 2023

We want to use something similar to get results from vector field along with exact match. Currently the scores received from both queries are not on the same scale as mentioned in the discussion above. Following is the query we expect to work to achieve so.

body = {
 "size": 10,
 "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
              "query": "smart tv",
              "fields": ["summary"],
              "_name": "multi_match",
              "type": "phrase"
          }
        },
        {
          "knn": {
            "text_vector": {
              "vector": [2, 3, 5, 6],
              "k": 10,
                "_name": "knn"
            }
          }
        }
      ]
    }
  }
}

Will the queries above run in parallel or series? Search speed of BM25 is relatively slow.

@navneet1v
Copy link
Contributor

navneet1v commented Jan 17, 2023

Hi @ankitas3,
The queries present in the should clause doesn't run in parallel.
Consider there is a shard on the data node which has lets say 3 segments. So, the sub-queries(k-nn and multi-match) will be keep on executing on the segments one by one in sequential fashion and at the Bool query level for 1 shards the scores will be combined.

The speed of BM-25 depends on many factors. I would recommend running the multi-match query separately to find is BM-25 really running slow, if yes please try to optimize the BM-25 query.

@aishwaryabajaj-54
Copy link

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

@ankitas3
Copy link

@navneet1v I tried running BM25 separately, which is slow. Can you share any references to improve its speed.

@navneet1v
Copy link
Contributor

navneet1v commented Jan 18, 2023

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

@ankitas3, @aishwaryabajaj-54

Right now there is no way to execute them in parallel in a single query. You can see _msearch api but I am not sure if you are looking for that. It works something like _bulk api.

For improving the speed of the query I would recommend cutting a separate Github issue to talk, as this Github issue is not related to improving the BM-25 speed. In the new issue please make sure to provide details like OS version, Operating System, indices shards and number of nodes etc. On that GH issue we can talk around what things you can do to reduce the latency.

@ankitas3
Some places I would start looking is:

  1. Count of segments for an index. More segments for a replica for an index, it means more latency.
  2. See if you cluster is under-scaled. Check the CPU utilization for the data nodes.
  3. Check the service latency, the latency coming in the api response and not the latency seen by the client. This can help pinpoint from where latency is happening.

@aishwaryabajaj-54
Copy link

@navneet1v, I have tried _msearch and it has its own cons. Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

@navneet1v
Copy link
Contributor

Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

@aishwaryabajaj-54 can you please elaborate? what is your ask here? and it is related to what?

@navneet1v
Copy link
Contributor

I am closing this issue, as the problem defined above is getting taken care via : opensearch-project/neural-search#123. Please do a +1 on the issue to support the work. I will keep that other issue updated and will use that to track the progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

8 participants