Skip to content

Commit

Permalink
Distance scoring
Browse files Browse the repository at this point in the history
================

It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference.

These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query.

This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query.

To use distance scoring, the user has to define

 1. a reference and
 2. a scale

for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay.

Example use case
----------------

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your *reference* for the location field is the town center and the *scale* is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive.
For the price field, the *reference* would be 0 Euros and the *scale* depends on how much you are willing to pay, for example 20 Euros.

Usage
----------------

The distance score functions can be applied in two ways:

In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "gauss": {
                "location": {
                    "reference": [
                        52.516272,
                        13.377722
                    ],
                    "scale": "2km"
                }
            },
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            }
        }
    }
    }'

which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate.

If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "filter": {
                        "match_all": {}
                    },
                    "gauss": {
                        "location": {
                            "reference": "11,12",
                            "scale": "2km"
                        }
                    }
                },
                {
                    "filter": {
                        "match_all": {}
                    },
                    "linear": {
                        "price": {
                            "reference": "0",
                            "scale": "20"
                        }
                    }
                }
            ],
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            },
            "score_mode": "multiply"
        }
    }
    }'

This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions.

Supported fields
----------------
Only single valued numeric fields, including time and geo locations, are be supported.

What is a field is missing?
----------------

Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set.

To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions.

      …
     "functions": [
        {
            "filter": {
                "match_all": {}
            },
            "gauss": {
                "location": {
                    "reference": "11, 12",
                    "scale": "2km"
                }
            }
        },
        {
            "filter": {
                "match_all": {}
            },
            "linear": {
                "price": {
                    "reference": "0",
                    "scale": "20"
                }
            }
        },
        {
            "boost_factor": 0.001,
            "filter": {
                "bool": {
                    "must_not": {
                        "missing": {
                            "existence": true,
                            "field": "coordinates",
                            "null_value": true
                        }
                    }
                }
            }
        }
    ],
    ...

Closes #3423
  • Loading branch information
brwe committed Aug 6, 2013
1 parent 720b550 commit e707308
Show file tree
Hide file tree
Showing 18 changed files with 1,461 additions and 45 deletions.
45 changes: 27 additions & 18 deletions src/main/java/org/elasticsearch/index/fielddata/GeoPointValues.java
Expand Up @@ -24,30 +24,45 @@

/**
*/
public interface GeoPointValues {
public abstract class GeoPointValues {

static final GeoPointValues EMPTY = new Empty();
public static final GeoPointValues EMPTY = new Empty();

private final boolean multiValued;

/**
* Is one of the documents in this field data values is multi valued?
*/
boolean isMultiValued();
public final boolean isMultiValued() {
return multiValued;
}

/**
* Is there a value for this doc?
*/
boolean hasValue(int docId);
public abstract boolean hasValue(int docId);

public abstract GeoPoint getValue(int docId);

GeoPoint getValue(int docId);
public abstract GeoPoint getValueSafe(int docId);

GeoPoint getValueSafe(int docId);
public abstract Iter getIter(int docId);

Iter getIter(int docId);
public abstract Iter getIterSafe(int docId);

protected GeoPointValues(boolean multiValued) {
this.multiValued = multiValued;
}

Iter getIterSafe(int docId);
public GeoPoint getValueMissing(int docId, GeoPoint defaultGeoPoint) {
if (hasValue(docId)) {
return getValue(docId);
}
return defaultGeoPoint;
}


static interface Iter {
public static interface Iter {

boolean hasNext();

Expand Down Expand Up @@ -93,34 +108,28 @@ public GeoPoint next() {
}
}

static class Empty implements GeoPointValues {
@Override
public boolean isMultiValued() {
return false;
static class Empty extends GeoPointValues {
protected Empty() {
super(false);
}

@Override
public boolean hasValue(int docId) {
return false;
}

@Override
public GeoPoint getValueSafe(int docId) {
return getValue(docId);
}

@Override
public Iter getIterSafe(int docId) {
return getIter(docId);
}


@Override
public GeoPoint getValue(int docId) {
return null;
}

@Override
public Iter getIter(int docId) {
return Iter.Empty.INSTANCE;
}
Expand Down
Expand Up @@ -94,7 +94,7 @@ public ScriptDocValues getScriptValues() {
}
}

public static class WithOrdinals extends GeoPointDoubleArrayAtomicFieldData {
static class WithOrdinals extends GeoPointDoubleArrayAtomicFieldData {

private final BigDoubleArrayList lon, lat;
private final Ordinals ordinals;
Expand Down Expand Up @@ -126,10 +126,10 @@ public long getMemorySizeInBytes() {

@Override
public GeoPointValues getGeoPointValues() {
return new GeoPointValues(lon, lat, ordinals.ordinals());
return new GeoPointValuesWithOrdinals(lon, lat, ordinals.ordinals());
}

static class GeoPointValues implements org.elasticsearch.index.fielddata.GeoPointValues {
public static class GeoPointValuesWithOrdinals extends GeoPointValues {

private final BigDoubleArrayList lon, lat;
private final Ordinals.Docs ordinals;
Expand All @@ -138,19 +138,15 @@ static class GeoPointValues implements org.elasticsearch.index.fielddata.GeoPoin
private final ValuesIter valuesIter;
private final SafeValuesIter safeValuesIter;

GeoPointValues(BigDoubleArrayList lon, BigDoubleArrayList lat, Ordinals.Docs ordinals) {
GeoPointValuesWithOrdinals(BigDoubleArrayList lon, BigDoubleArrayList lat, Ordinals.Docs ordinals) {
super(ordinals.isMultiValued());
this.lon = lon;
this.lat = lat;
this.ordinals = ordinals;
this.valuesIter = new ValuesIter(lon, lat);
this.safeValuesIter = new SafeValuesIter(lon, lat);
}

@Override
public boolean isMultiValued() {
return ordinals.isMultiValued();
}

@Override
public boolean hasValue(int docId) {
return ordinals.getOrd(docId) != 0;
Expand Down Expand Up @@ -204,12 +200,10 @@ public ValuesIter reset(Ordinals.Docs.Iter ordsIter) {
return this;
}

@Override
public boolean hasNext() {
return ord != 0;
}

@Override
public GeoPoint next() {
scratch.reset(lat.get(ord), lon.get(ord));
ord = ordsIter.next();
Expand Down Expand Up @@ -285,11 +279,11 @@ public long getMemorySizeInBytes() {

@Override
public GeoPointValues getGeoPointValues() {
return new GeoPointValues(lon, lat, set);
return new GeoPointValuesSingleFixedSet(lon, lat, set);
}


static class GeoPointValues implements org.elasticsearch.index.fielddata.GeoPointValues {
static class GeoPointValuesSingleFixedSet extends GeoPointValues {

private final BigDoubleArrayList lon;
private final BigDoubleArrayList lat;
Expand All @@ -298,17 +292,14 @@ static class GeoPointValues implements org.elasticsearch.index.fielddata.GeoPoin
private final GeoPoint scratch = new GeoPoint();
private final Iter.Single iter = new Iter.Single();

GeoPointValues(BigDoubleArrayList lon, BigDoubleArrayList lat, FixedBitSet set) {

GeoPointValuesSingleFixedSet(BigDoubleArrayList lon, BigDoubleArrayList lat, FixedBitSet set) {
super(false);
this.lon = lon;
this.lat = lat;
this.set = set;
}

@Override
public boolean isMultiValued() {
return false;
}

@Override
public boolean hasValue(int docId) {
return set.get(docId);
Expand Down Expand Up @@ -386,27 +377,24 @@ public long getMemorySizeInBytes() {

@Override
public GeoPointValues getGeoPointValues() {
return new GeoPointValues(lon, lat);
return new GeoPointValuesSingle(lon, lat);
}

static class GeoPointValues implements org.elasticsearch.index.fielddata.GeoPointValues {
static class GeoPointValuesSingle extends GeoPointValues {

private final BigDoubleArrayList lon;
private final BigDoubleArrayList lat;

private final GeoPoint scratch = new GeoPoint();
private final Iter.Single iter = new Iter.Single();

GeoPointValues(BigDoubleArrayList lon, BigDoubleArrayList lat) {

GeoPointValuesSingle(BigDoubleArrayList lon, BigDoubleArrayList lat) {
super(false);
this.lon = lon;
this.lat = lat;
}

@Override
public boolean isMultiValued() {
return false;
}

@Override
public boolean hasValue(int docId) {
return true;
Expand Down
@@ -0,0 +1,54 @@
/*
* Licensed to ElasticSearch and Shay Banon under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. ElasticSearch licenses this
* file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.query.functionscore;

import org.apache.lucene.search.Explanation;
import org.elasticsearch.index.query.functionscore.gauss.GaussDecayFunctionParser;

/**
* Implement this interface to provide a decay function that is executed on a
* distance. For example, this could be an exponential drop of, a triangle
* function or something of the kind. This is used, for example, by
* {@link GaussDecayFunctionParser}.
*
* */

public interface DecayFunction {

public double evaluate(double value, double scale);

public Explanation explainFunction(String valueString, double value, double scale);

/**
* The final scale parameter is computed from the scale parameter given by
* the user and a value. This value is the value that the decay function
* should compute if document distance and user defined scale equal. The
* scale parameter for the function must be adjusted accordingly in this
* function
*
* @param scale
* the raw scale value given by the user
* @param value
* the value which decay function should take once the distance
* reaches this scale
* */
public double processScale(double scale, double value);

}
@@ -0,0 +1,74 @@
/*
* Licensed to ElasticSearch and Shay Banon under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. ElasticSearch licenses this
* file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.query.functionscore;


import org.elasticsearch.ElasticSearchIllegalStateException;
import org.elasticsearch.common.xcontent.XContentBuilder;

import java.io.IOException;

public abstract class DecayFunctionBuilder implements ScoreFunctionBuilder {

protected static final String REFERNECE = "reference";
protected static final String SCALE = "scale";
protected static final String SCALE_WEIGHT = "scale_weight";
protected static final String SCALE_DEFAULT = "0.5";

private String fieldName;
private String reference;
private String scale;
private String scaleWeight;

public void setParameters(String fieldName, String reference, String scale, String scaleWeight) {
if(this.fieldName != null ) {
throw new ElasticSearchIllegalStateException("Can not set parameters of decay function more than once.");
}
this.fieldName = fieldName;
this.reference = reference;
this.scale = scale;
this.scaleWeight = scaleWeight;
}

public void setParameters(String fieldName, String reference, String scale) {
setParameters(fieldName, reference, scale, SCALE_DEFAULT);
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject(getName());
builder.startObject(fieldName);
builder.field(REFERNECE, reference);
builder.field(SCALE, scale);
builder.field(SCALE_WEIGHT, scaleWeight);
builder.endObject();
builder.endObject();
return builder;
}

public void addGeoParams(String fieldName, double lat, double lon, String scale) {
addGeoParams(fieldName, lat, lon, scale, SCALE_DEFAULT);
}

public void addGeoParams(String fieldName, double lat, double lon, String scale, String scaleWeight) {
String geoLoc = Double.toString(lat) + ", " + Double.toString(lon);
setParameters(fieldName, geoLoc, scale, scaleWeight);
}
}

0 comments on commit e707308

Please sign in to comment.