Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
================ It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference. These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query. This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query. To use distance scoring, the user has to define 1. a reference and 2. a scale for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay. Example use case ---------------- Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in. You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price. Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel. In this case your *reference* for the location field is the town center and the *scale* is ~2km. If your budget is low, you would probably prefer something cheap above something expensive. For the price field, the *reference* would be 0 Euros and the *scale* depends on how much you are willing to pay, for example 20 Euros. Usage ---------------- The distance score functions can be applied in two ways: In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be: curl 'localhost:9200/hotels/_search/' -d '{ "query": { "function_score": { "gauss": { "location": { "reference": [ 52.516272, 13.377722 ], "scale": "2km" } }, "query": { "bool": { "must": { "city": "Berlin" } } } } } }' which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate. If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this: curl 'localhost:9200/hotels/_search/' -d '{ "query": { "function_score": { "functions": [ { "filter": { "match_all": {} }, "gauss": { "location": { "reference": "11,12", "scale": "2km" } } }, { "filter": { "match_all": {} }, "linear": { "price": { "reference": "0", "scale": "20" } } } ], "query": { "bool": { "must": { "city": "Berlin" } } }, "score_mode": "multiply" } } }' This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions. Supported fields ---------------- Only single valued numeric fields, including time and geo locations, are be supported. What is a field is missing? ---------------- Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set. To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions. … "functions": [ { "filter": { "match_all": {} }, "gauss": { "location": { "reference": "11, 12", "scale": "2km" } } }, { "filter": { "match_all": {} }, "linear": { "price": { "reference": "0", "scale": "20" } } }, { "boost_factor": 0.001, "filter": { "bool": { "must_not": { "missing": { "existence": true, "field": "coordinates", "null_value": true } } } } } ], ... Closes #3423
- Loading branch information