Skip to content

Latest commit

 

History

History
197 lines (167 loc) · 6.97 KB

85_Sorting.asciidoc

File metadata and controls

197 lines (167 loc) · 6.97 KB

Sorting and Relevance

By default, results are returned sorted by relevance—with the most relevant docs first. Later in this chapter, we explain what we mean by relevance and how it is calculated, but let’s start by looking at the sort parameter and how to use it.

Sorting

In order to sort by relevance, we need to represent relevance as a value. In Elasticsearch, the relevance score is represented by the floating-point number returned in the search results as the _score, so the default sort order is _score descending.

Sometimes, though, you don’t have a meaningful relevance score. For instance, the following query just returns all tweets whose user_id field has the value 1:

GET /_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "term" : {
                    "user_id" : 1
                }
            }
        }
    }
}

There isn’t a meaningful score here: because we are using a filter, we are indicating that we just want the documents that match user_id: 1 with no attempt to determine relevance. Documents will be returned in effectively random order, and each document will have a score of zero.

Note

If a score of zero makes your life difficult for logistical reasons, you can use a constant_score query instead:

GET /_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "user_id" : 1
                }
            }
        }
    }
}

This will apply a constant score (default of 1) to all documents. It will perform the same as the above query, and all documents will be returned randomly like before, they’ll just have a score of one instead of zero.

Sorting by Field Values

In this case, it probably makes sense to sort tweets by recency, with the most recent tweets first. We can do this with the sort parameter:

GET /_search
{
    "query" : {
        "bool" : {
            "filter" : { "term" : { "user_id" : 1 }}
        }
    },
    "sort": { "date": { "order": "desc" }}
}

You will notice two differences in the results:

"hits" : {
    "total" :           6,
    "max_score" :       null, (1)
    "hits" : [ {
        "_index" :      "us",
        "_type" :       "tweet",
        "_id" :         "14",
        "_score" :      null, (1)
        "_source" :     {
             "date":    "2014-09-24",
             ...
        },
        "sort" :        [ 1411516800000 ] (2)
    },
    ...
}
  1. The _score is not calculated, because it is not being used for sorting.

  2. The value of the date field, expressed as milliseconds since the epoch, is returned in the sort values.

The first is that we have a new element in each result called sort, which contains the value(s) that was used for sorting. In this case, we sorted on date, which internally is indexed as milliseconds since the epoch. The long number 1411516800000 is equivalent to the date string 2014-09-24 00:00:00 UTC.

The second is that the _score and max_score are both null. Calculating the _score can be quite expensive, and usually its only purpose is for sorting; we’re not sorting by relevance, so it doesn’t make sense to keep track of the _score. If you want the _score to be calculated regardless, you can set the track_scores parameter to true.

Tip

As a shortcut, you can specify just the name of the field to sort on:

    "sort": "number_of_children"

Fields will be sorted in ascending order by default, and the _score value in descending order.

Multilevel Sorting

Perhaps we want to combine the _score from a query with the date, and show all matching results sorted first by date, then by relevance:

GET /_search
{
    "query" : {
        "bool" : {
            "must":   { "match": { "tweet": "manage text search" }},
            "filter" : { "term" : { "user_id" : 2 }}
        }
    },
    "sort": [
        { "date":   { "order": "desc" }},
        { "_score": { "order": "desc" }}
    ]
}

Order is important. Results are sorted by the first criterion first. Only results whose first sort value is identical will then be sorted by the second criterion, and so on.

Multilevel sorting doesn’t have to involve the _score. You could sort by using several different fields, on geo-distance or on a custom value calculated in a script.

Note

Query-string search also supports custom sorting, using the sort parameter in the query string:

GET /_search?sort=date:desc&sort=_score&q=search

Sorting on Multivalue Fields

When sorting on fields with more than one value, remember that the values do not have any intrinsic order; a multivalue field is just a bag of values. Which one do you choose to sort on?

For numbers and dates, you can reduce a multivalue field to a single value by using the min, max, avg, or sum sort modes. For instance, you could sort on the earliest date in each dates field by using the following:

"sort": {
    "dates": {
        "order": "asc",
        "mode":  "min"
    }
}