Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function score query functions needs offset param #18537

Closed
JnBrymn-EB opened this issue May 24, 2016 · 8 comments
Closed

Function score query functions needs offset param #18537

JnBrymn-EB opened this issue May 24, 2016 · 8 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@JnBrymn-EB
Copy link

The current implementation of function_value_factor accepts factor and modifier to shape and scale the resulting function, but missing from this is the ability to offset the value of the function.

Consider the following scenario: Documents represent events that one might attend. For a given query the total document score should be a function of the base query score, the distance, and the popularity. The query score is based solely upon the text match; distance uses a geo-based decay function; and popularity is based upon a function_value_factor function with modifier: "sqrt". The function score query allows us to create a boost by either adding or multiplying the distance and popularity values. For our purposes it doesn't make sense to sum popularity and distance values -- you can have a distant event that no one can attend, but that will nonetheless rank highly based solely upon its high popularity. So instead we will multiply the popularity and distance boosts.

But there's a problem - popularity is based upon number of tickets already sold and if 0 tickets are sold, then the popularity will be 0. Since we are multiplying the boosts together and since the query score is multiplied by the boosts, this means that the total score for those events will also be 0. Thus new events with no tickets sold are all but eliminated from the search results.

We are prepared to resolve this issue with a script_score function, but this is not ideal. I propose introducing an offset parameter to be included in the function_value_factor so that the value of the function would be factor*modifier(field) + offset. This would ensure that popularity could never be zero.

The problem here also exists with decay functions. When using functions multiplicatively there are time when it would be beneficial to have a non-zero minimum value.

@clintongormley
Copy link

@brwe what do you think about this? I'm loathe to add more parameters unless they genuinely generically useful, given that this can all be achieved with a script (and will be improved with #17116)

@JnBrymn-EB
Copy link
Author

@clintongormley your prior issue #6955 is relevant here. There you needed a weight for all of the functions - but when using the functions multiplicatively, just having weights does not make sense. Consider that in the multiplicative use case the total score of a document is query_score*(weight1*field_value1)*(weight2*field_value2) - you can see that final score with weighting is equal to the original non-weighted score times weight1*weight2.

Because weight doesn't have any effect when the the function values are multiplied together I think there is a need for an offset parameter.

Consider also that the most basic definition for a linear function is: y = m*x + b. I think we're missing the b.

@rjernst
Copy link
Member

rjernst commented May 24, 2016

I don't see the point, given, as Clint said, that this can be done with a script, and even with an expression script, which will be very fast (I benchmarked this when adding expressions, specifically comparing to function_value_factor and the perf was identical).

@brwe
Copy link
Contributor

brwe commented May 25, 2016

I also do not think that we can or should cover too many score combinations with the functions we provide so far. The idea of function score originally was to allow basic functionality out of the box and leave more sophisticated stuff to script_score. I somewhat agree that we are missing the b and that the weights do not make sense when the functions are multiplied in the end. But on the other hand the offset does not make sense either when the functions are summed up...not sure. I am more inclined to work on #17116 and leave the factor function as is.

@JnBrymn-EB
Copy link
Author

Ok - presuming I'm on Elasticsearch 2.3.x which scripting approach should I use? Groovy? Painless? Lucene Expression Script? My impression is that Groovy is insecure (which actually probably doesn't matter for my case); Painless is not available yet; Lucene Expression Script is marked as undergoing development.

@clintongormley
Copy link

Expressions are very fast and stable, I'll remove that warning from the docs. I'd definitely use expressions if it does what you need (which it sounds like it will). The only downside is it may not support all the syntax you need, in which case your only option for the moment is Groovy. Painless will be the lang to move to once 5.0 is out.

@clintongormley
Copy link

Removed in cf7b13d

@JnBrymn-EB
Copy link
Author

thanks, All

-John

On Wed, May 25, 2016 at 9:31 AM, Clinton Gormley notifications@github.com
wrote:

Removed in cf7b13d
cf7b13d


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#18537 (comment)

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants