Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to combine several query scores with multiply or other #17116

Closed
brwe opened this issue Mar 15, 2016 · 26 comments

Comments

Projects
None yet
9 participants
@brwe
Copy link
Contributor

commented Mar 15, 2016

Currently we can use a bool query to combine the result of different queriers as sum or dismax which does max. But sometimes people might want to combine the results of several queries in different ways, for example: http://stackoverflow.com/questions/31755642/how-can-i-multiply-the-score-of-two-queries-together-in-elasticsearch

function_score could be changed to enable multiply, min, and average.
Currently function_score can only make use of the score of one query and then combine that with some functions. But that could be changed by replacing the filter for each function with a query and then give access to the individual scores of these queries inside the functions

Something like this (not even trying to name stuff...):

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // here be a query or not, resulting score can be used later via _score
      },
      "functions": [
        {
          "query": {
            // another fancy query, maybe even another function_score?, resulting score can be used via _xyz_score
          }, 
          "boost_mode": "multiply", // need this here now too because we need to know how to combine the function with the _xyz_score
          "script_score": {
            "script": "_xyz_score * _score"
          }
        },
        {
          "query": {
            // even more query!, resulting score can be used via _xyz_score
          }, 
          "boost_mode": "sum", // need this here now too because we need to know how to combine the function with the _xyz_score
          "script_score": {
            "script": "_xyz_score * doc['b'].value"
          }
        },
        ...
      ],
      "score_mode": "multiply"
    }
  }
}

I think there was an issue about that already somewhere but I cannot find it.
This might relate to #10049 because people could then make arbitrary complicated combinations using scripts and nesting function score queries. It would be crude though.

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Mar 18, 2016

Just discussed this in fixit friday and now we think it should be differently structured, more like in #10049:

Each function produces a variable which can be named with some parameter (var_name?).

We add an additional option scrore_mode: script that has the results of the functions as variables. The final score is then the result of the script.

In addition, we need a different function query_function which returns the result of a query. We thought that the above approach (make the filters we have now together with functions score) would be confusing and convolute stuff too much.

Something like:

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // same as before, score will be accessible via _score
      },
      "functions": [
        {
          "query_function": {
            "query": {
              // here be any query, can also be function_score
            },
            "var_name": "score_a"
          }
        },
        {
          "random_score": {
            "var_name": "score_a",
            ...
          }
        },
        ...
      ],
      "score_mode": "script",
      "combine_script": "score_a * score_b + _score"
    }
  }
}
@babadofar

This comment has been minimized.

Copy link

commented Mar 27, 2016

Cool!

@jpountz jpountz referenced this issue Apr 22, 2016

Closed

Dis Min Query #17820

@brwe brwe added the help wanted label Apr 22, 2016

@synhershko

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2016

I'm the OP of #10049 and #17820 - both seem to be satisfied by the proposed solution, so looking forward to this implementation.

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2016

I guess best would be to split this in two: 1. implement query function and 2. implement custom combine. I'll start working on this unless anyone else calls dibs.

brwe added a commit to brwe/elasticsearch that referenced this issue May 27, 2016

function_score: add query_function to function_score
Allows to combine query scores with mult, sum etc
by wrapping individual queries in a query_function
of a funtion_score like so:

```
{
  "query": {
    "function_score": {
      "score_mode": "multiply",
      "functions": [
        {
          "query_function": {
            "query": {
              "match": {
                "text": "cat"
              }
            }
          }
        },
        {
          "query_function": {
            "query": {
              "match": {
                "text": "dog"
              }
            }
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}
```

relates to elastic#17116
@brwe

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2016

@JnBrymn-EB and I discussed a little about the combine script parts and we thought that we should probably change the above syntax. The variable name per function could be on the same level as the filter, weight and function instead of being a parameter inside the function definition because each function score can be assigned to a variable just like every function can have a weight or a filter. Also, the script should probably follow the same script syntax we have elsewhere. The query would then look like this:

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // same as before, score will be accessible via _score
      },
      "functions": [
        {
          "query_function": {
            "query": {
              // here be any query, can also be function_score
            }
          },
          "var_name": "score_a",
          "filter": {
               // some filter
          }
        },
        {
          "random_score": {
            ...
          },
          "var_name": "score_a",
          "weight": 3.33
        },
        ...
      ],
      "score_mode": "script",
      "combine_script": {
          "lang": "groovy",
          "inline": "score_a * score_b + _score"
      }
    }
  }
}

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jun 1, 2016

I'd suggest changing query_function to query_score, and combine_script to score_script. otherwise looks great!

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Jun 6, 2016

I'm building the combine part as we speak. Should we go with var_name as stated above or should we use _name as I've seen in other places?

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2016

We settled for var_name.

In addition, another question came up: A function might be associated with a filter that does not match. What value do we assign to the variable in this case? I have the feeling we need a default value here. Something like:

...
"functions": [
        {
          "script_variable": {
             "name": "score_a",
             "default": 123
          },
          "filter": {
               // some filter
          },
          "field_value_factor": {...}
        }
....
@JnBrymn-EB

This comment has been minimized.

Copy link

commented Jun 28, 2016

Could we add a missing field here just with the field_value_factor and make it default to 0 for the sake of a score_script? We'd have to be careful not to affect existing functionality like score_mode=avg which just assumes that the value doesn't exist. -- It might be a bit misleading.

Maybe another take would be adding a default_vals key to the combine_script that would enumerate the value of each clause that might be missing.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jul 1, 2016

I'd go with missing, and in fact we should probably apply this to all functions (this has come up before). I'm wondering if the change to score_mode:avg is a problem?

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 1, 2016

Just to be clear: I meant to add a default if the "filter" doesn't not match. In case the field is missing it would still be up to the function to decide what to do.

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 1, 2016

I'll explain in more detail what I mean.
We have two cases:

  1. the field is missing in the document
  2. the filter associated with the function does not match

In the first case, we have three functions that have to deal with it: field_value_factor (takes a missing parameter and if the value is missing uses that instead of an actual value), decay_function (assumes the value is perfectly at the origin, which has greatly annoyed many users and might change, see #18892) and script_score where everyone has to adjust the script to deal with it.

In the second case currently function_score acts for this document as if the function would not exist at all.

I was only talking about 2., filter not matching.

We could add a score_missing or default parameter that would do the following: If the filter for a function does not match then we always return this value.

This would have also the advantage that it would allow everyone to control not only input to individual functions in case field is missing (with the missing parameter) but also to control the output like so:

"function_score": {
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "field_value_factor": {
            "field": "age",
            "modifier": "ln"
          },
          "score_missing": 5 
        }
      ]
    }

Also, it would allow people to control what score_mode: avg means in case a filter is not matching, which is awkward right now.

For example in this case:

"function_score": {
      "score_mode": "avg", 
      "functions": [
        {
          "filter": {
            "term": {
              "skill": "codes_java"
            }
          },
          "weight": 5, 
          "score_missing": 0
        },
        {
          "filter": {
            "term": {
              "skill": "speaks_human"
            }
          },
          "weight": 2, 
          "score_missing": 0
        }
      ]
    }

in case the term codes_java is not in field skills, the score would be computed as (0+2)/(5+2) instead of just 2/2 which is the default right now and might not be desirable.

For the script_combine we should then enforce that this parameter exists if a function is associated with a filter.

I would not call it missing because I at least might mix that up with the missing in case the field does not exist in the doc.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jul 4, 2016

This makes sense to me. What about calling it no_match_score or default_score? I think I prefer the former because it is more explicit.

@mckinnovations

This comment has been minimized.

Copy link

commented Jul 18, 2016

Any timeline for this feature ? when is it going to be released?

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Aug 1, 2016

@serj-p

This comment has been minimized.

Copy link

commented Mar 2, 2017

Have query_score or query_function keywords been added?
I am trying to compute max scores for docs from two queries, one calculating field_value_factor for updated field from a child doc and other is a field_value_factor for parent's updated value. So doc score I need is max(child.updated, doc.updated). I see no way to tell elasticsearch to return such max updated currently.

@erebus1

This comment has been minimized.

Copy link

commented Mar 2, 2017

I think you can use dis_max query of 2 function_score queries
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html

@erebus1

This comment has been minimized.

Copy link

commented Mar 2, 2017

But the question about timeline for query_score is really important.

@erebus1

This comment has been minimized.

Copy link

commented Mar 30, 2017

Guys, do you plan to implement this feature?

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Mar 30, 2017

This is becoming more important for upcoming work at Eventbrite.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Mar 31, 2017

We've been rethinking this approach. Apparently, according to research, the best way to combine scores is to add them together (which the bool query does, now that coordination and query norm are gone).

So we're looking at better ways of exposing primitives for incorporating non-textual scores into the overall score.

Closing in favour of #23850

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Mar 31, 2017

"coordination and query norm are gone" - do you have any documentation on that @clintongormley ?

@clintongormley

This comment has been minimized.

Copy link
Member

commented Apr 3, 2017

@JnBrymn-EB they've been removed in Lucene 7 https://issues.apache.org/jira/browse/LUCENE-7347

query coordination was a hack to make TF/IDF work better in the face of poor TF saturation, and query norm (i believe) was essentially a failed experiment to try to make the scores from different queries comparable.

with those removed, the bool query now just does a simple sum, and boosting clauses is a much simpler calculation than before.

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Apr 3, 2017

Fascinating! I'll have to soak this in.

@marcusklaas

This comment has been minimized.

Copy link

commented Apr 26, 2018

For any one else struggling with the absence of a way to multiply scores directly, note that it is possible to take logarithms using function_score/script_score or using a modifier field. The addition of logarithms is equivalent to multiplication for scoring.

@JnBrymn-EB

This comment has been minimized.

Copy link

commented Apr 30, 2018

@marcusklaas, I understand the math of logarithms( score=A * B * C sorts the same as score=log(A * B * C) and log(A * B * C) = log(A) + log(B) + log(C))) but I'm unclear how this helps. Do you have an example? For instance - if I want to multiply field values 3 fields together, then I would just use score_mode=multiply. But if I wanted to make an interesting combination of field values like A*B + C then the logarithm trick doesn't help me because that isn't a bunch of products.

And if you want to get to arbitrary polynomials of the fields and if you want to incorporate the text score in the mix, then all the more what do I do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.