From a536bd5f811150633e26d31f375cfaa999bdc92e Mon Sep 17 00:00:00 2001 From: Clinton Gormley Date: Fri, 8 May 2015 08:31:15 +0200 Subject: [PATCH] Docs: Rewrote the term query docs to explain analyzed vs not_analyzed --- docs/reference/query-dsl/term-query.asciidoc | 157 +++++++++++++++++-- 1 file changed, 146 insertions(+), 11 deletions(-) diff --git a/docs/reference/query-dsl/term-query.asciidoc b/docs/reference/query-dsl/term-query.asciidoc index 5b0acf6bace1a..ed81870bfdd58 100644 --- a/docs/reference/query-dsl/term-query.asciidoc +++ b/docs/reference/query-dsl/term-query.asciidoc @@ -1,31 +1,166 @@ [[query-dsl-term-query]] == Term Query -Matches documents that have fields that contain a term (*not analyzed*). -The term query maps to Lucene `TermQuery`. The following matches -documents where the user field contains the term `kimchy`: +The `term` query finds documents that contain the *exact* term specified +in the inverted index. For instance: [source,js] -------------------------------------------------- { - "term" : { "user" : "kimchy" } -} + "term" : { "user" : "Kimchy" } <1> +} -------------------------------------------------- +<1> Finds documents which contain the exact term `Kimchy` in the inverted index + of the `user` field. -A boost can also be associated with the query: +A `boost` parameter can be specified to give this `term` query a higher +relevance score than another query, for instance: [source,js] -------------------------------------------------- +GET /_search { - "term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } } -} + "query": { + "bool": { + "should": [ + { + "term": { + "status": { + "value": "urgent", + "boost": 2.0 <1> + } + } + }, + { + "term": { + "status": "normal" <2> + } + } + ] + } + } +} -------------------------------------------------- -Or : +<1> The `urgent` query clause has a boost of `2.0`, meaning it is twice as important + as the query clause for `normal`. +<2> The `normal` clause has the default neutral boost of `1.0`. + +.Why doesn't the `term` query match my document? +************************************************** + +String fields can be `analyzed` (treated as full text, like the body of an +email), or `not_analyzed` (treated as exact values, like an email address or a +zip code). Exact values (like numbers, dates, and `not_analyzed` strings) have +the exact value specified in the field added to the inverted index in order +to make them searchable. + +By default, however, `string` fields are `analyzed`. This means that their +values are first passed through an <> to produce a list of +terms, which are then added to the inverted index. + +There are many ways to analyze text: the default +<> drops most punctuation, +breaks up text into individual words, and lower cases them. For instance, +the `standard` analyzer would turn the string ``Quick Brown Fox!'' into the +terms [`quick`, `brown`, `fox`]. + +This analysis process makes it possible to search for individual words +within a big block of full text. + +The `term` query looks for the *exact* term in the field's inverted index -- +it doesn't know anything about the field's analyzer. This makes it useful for +looking up values in `not_analyzed` string fields, or in numeric or date +fields. When querying full text fields, use the +<> instead, which understands how the field +has been analyzed. + + +To demonstrate, try out the example below. First, create an index, specifying the field mappings, and index a document: [source,js] -------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "full_text": { + "type": "string" <1> + }, + "exact_value": { + "type": "string", + "index": "not_analyzed" <2> + } + } + } + } +} + +PUT my_index/my_type/1 { - "term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } } -} + "full_text": "Quick Foxes!", <3> + "exact_value": "Quick Foxes!" <4> +} -------------------------------------------------- +// AUTOSENSE + +<1> The `full_text` field is `analyzed` by default. +<2> The `exact_value` field is set to be `not_analyzed`. +<3> The `full_text` inverted index will contain the terms: [`quick`, `foxes`]. +<4> The `exact_value` inverted index will contain the exact term: [`Quick Foxes!`]. + +Now, compare the results for the `term` query and the `match` query: + +[source,js] +-------------------------------------------------- + +GET my_index/my_type/_search +{ + "query": { + "term": { + "exact_value": "Quick Foxes!" <1> + } + } +} + +GET my_index/my_type/_search +{ + "query": { + "term": { + "full_text": "Quick Foxes!" <2> + } + } +} + +GET my_index/my_type/_search +{ + "query": { + "term": { + "exact_value": "foxes" <3> + } + } +} + +GET my_index/my_type/_search +{ + "query": { + "match": { + "full_text": "Quick Foxes!" <4> + } + } +} +-------------------------------------------------- +// AUTOSENSE + +<1> This query matches because the `exact_value` field contains the exact + term `Quick Foxes!`. +<2> This query does not match, because the `full_text` field only contains + the terms `quick` and `foxes`. It does not contain the exact term + `Quick Foxes!`. +<3> A `term` query for the term `foxes` matches the `full_text` field. +<4> This `match` query on the `full_text` field first analyzes the query string, + then looks for documents containing `quick` or `foxes` or both. +************************************************** + +