Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query time boost for _all field #16920

Closed
jimczi opened this issue Mar 3, 2016 · 3 comments
Closed

Query time boost for _all field #16920

jimczi opened this issue Mar 3, 2016 · 3 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@jimczi
Copy link
Contributor

jimczi commented Mar 3, 2016

Index time boost in the _all field is currently done via payload. For each token/position we encode a float in the payload that is used at query time to influence the score of the query. The boost value is extracted from the mapping at index time when the value of a field is copied into _all.
For instance:

{
    "mappings": {
        "type1": {
            "properties": {
                "title": {
                    "type": "string",
                    "boost": 3.5
                }
            }
        }
    }
}

when querying the _all field, words that originated from the title field will have their score multiplied by 3.5. Unfortunately you cannot change this value without reindexing your data.

Proposal

Now that the dynamic boost at index time is prohibited (only field boost mapping is allowed) we could investigate an hybrid approach where instead of directly encoding a boost value in the payload we would encode a small number that refers to a class of score.
Suppose you have 3 fields in your mapping and you want to be able to score them differently in the _all fields. By assigning 3 classes of score (0, 1, 2) it is easy to remap those classes at query time into different boost values.
For instance a query with index time boost could look like this:

{
    "mappings": {
        "test": {
            "properties": {
                "title": {
                    "type": "string",
                    "payload": 0
                },
                "text": {
                    "type": "string",
                    "payload": 1
                },
                "author": {
                    "type": "string",
                    "payload": 2
                }
            }
        }
    }
}

{
   "query": {
     "payload_score_remap": {
         "0": 2.45,
         "1": 3.45,
         "2": 0.75,
         "mode": "max"
      },
      "query_string": {
         "query": "foo bar"
      }
   }
}
@jimczi jimczi added >enhancement discuss :Search/Search Search-related issues that do not fall into other categories labels Mar 3, 2016
@clintongormley
Copy link

Interesting idea. Instead of giving the payloads numbers, it'd be nice to just refer to field names, perhaps as part of the _all field configuration.

@jimczi
Copy link
Contributor Author

jimczi commented Mar 3, 2016

Yes especially because the boost parameter of the field can be use for query time boosting of the field itself (and not _all field) so it's kind of confusing. The key point is that we need to assign a small unique id to the field when we add it to the _all field so that the payload's encoding stays small. The tricky part is that this unique id cannot be automatically computed for dynamic fields. If it's in the _all field configuration then it's easy to assign one each time the configuration is updated.
Could be something like:

"_all": {
   "payload_fields": ["text", "title"]
}

... but I'll need to find a better name for "payload_field" ;)

@jimczi
Copy link
Contributor Author

jimczi commented Mar 31, 2017

Index time boost are deprecated in Lucene:
https://issues.apache.org/jira/browse/LUCENE-6819 and the _all field is gone.

@jimczi jimczi closed this as completed Mar 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

2 participants