Docs: Rewrote the term query docs to explain analyzed vs not_analyzed

javanna · May 8, 2015 · a536bd5 · a536bd5
1 parent 748a040
commit a536bd5
Showing 1 changed file with 146 additions and 11 deletions.
diff --git a/docs/reference/query-dsl/term-query.asciidoc b/docs/reference/query-dsl/term-query.asciidoc
@@ -1,31 +1,166 @@
 [[query-dsl-term-query]]
 == Term Query
 
-Matches documents that have fields that contain a term (*not analyzed*).
-The term query maps to Lucene `TermQuery`. The following matches
-documents where the user field contains the term `kimchy`:
+The `term` query finds documents that contain the *exact* term specified
+in the inverted index.  For instance:
 
 [source,js]
 --------------------------------------------------
 {
-    "term" : { "user" : "kimchy" }
-}    
+    "term" : { "user" : "Kimchy" } <1>
+}
 --------------------------------------------------
+<1> Finds documents which contain the exact term `Kimchy` in the inverted index
+    of the `user` field.
 
-A boost can also be associated with the query:
+A `boost` parameter can be specified to give this `term` query a higher
+relevance score than another query, for instance:
 
 [source,js]
 --------------------------------------------------
+GET /_search
 {
-    "term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
-}    
+  "query": {
+    "bool": {
+      "should": [
+        {
+          "term": {
+            "status": {
+              "value": "urgent",
+              "boost": 2.0 <1>
+            }
+          }
+        },
+        {
+          "term": {
+            "status": "normal" <2>
+          }
+        }
+      ]
+    }
+  }
+}
 --------------------------------------------------
 
-Or :
+<1> The `urgent` query clause has a boost of `2.0`, meaning it is twice as important
+    as the query clause for `normal`.
+<2> The `normal` clause has the default neutral boost of `1.0`.
+
+.Why doesn't the `term` query match my document?
+**************************************************
+
+String fields can be `analyzed` (treated as full text, like the body of an
+email), or `not_analyzed` (treated as exact values, like an email address or a
+zip code).  Exact values (like numbers, dates, and `not_analyzed` strings) have
+the exact value specified in the field added to the inverted index in order
+to make them searchable.
+
+By default, however, `string` fields are `analyzed`. This means that their
+values are first passed through an <<analysis,analyzer>> to produce a list of
+terms, which are then added to the inverted index.
+
+There are many ways to analyze text: the default
+<<analysis-standard-analyzer,`standard` analyzer>> drops most punctuation,
+breaks up text into individual words, and lower cases them.    For instance,
+the `standard` analyzer would turn the string ``Quick Brown Fox!'' into the
+terms [`quick`, `brown`, `fox`].
+
+This analysis process makes it possible to search for individual words
+within a big block of full text.
+
+The `term` query looks for the *exact* term in the field's inverted index --
+it doesn't know anything about the field's analyzer.  This makes it useful for
+looking up values in `not_analyzed` string fields, or in numeric or date
+fields.  When querying full text fields, use the
+<<query-dsl-match-query,`match` query>> instead, which understands how the field
+has been analyzed.
+
+
+To demonstrate, try out the example below.  First, create an index, specifying the field mappings, and index a document:
 
 [source,js]
 --------------------------------------------------
+PUT my_index
+{
+  "mappings": {
+    "my_type": {
+      "properties": {
+        "full_text": {
+          "type":  "string" <1>
+        },
+        "exact_value": {
+          "type":  "string",
+          "index": "not_analyzed" <2>
+        }
+      }
+    }
+  }
+}
+
+PUT my_index/my_type/1
 {
-    "term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
-}    
+  "full_text":   "Quick Foxes!", <3>
+  "exact_value": "Quick Foxes!"  <4>
+}
 --------------------------------------------------
+// AUTOSENSE
+
+<1> The `full_text` field is `analyzed` by default.
+<2> The `exact_value` field is set to be `not_analyzed`.
+<3> The `full_text` inverted index will contain the terms: [`quick`, `foxes`].
+<4> The `exact_value` inverted index will contain the exact term: [`Quick Foxes!`].
+
+Now, compare the results for the `term` query and the `match` query:
+
+[source,js]
+--------------------------------------------------
+
+GET my_index/my_type/_search
+{
+  "query": {
+    "term": {
+      "exact_value": "Quick Foxes!" <1>
+    }
+  }
+}
+
+GET my_index/my_type/_search
+{
+  "query": {
+    "term": {
+      "full_text": "Quick Foxes!" <2>
+    }
+  }
+}
+
+GET my_index/my_type/_search
+{
+  "query": {
+    "term": {
+      "exact_value": "foxes" <3>
+    }
+  }
+}
+
+GET my_index/my_type/_search
+{
+  "query": {
+    "match": {
+      "full_text": "Quick Foxes!" <4>
+    }
+  }
+}
+--------------------------------------------------
+// AUTOSENSE
+
+<1> This query matches because the `exact_value` field contains the exact
+    term `Quick Foxes!`.
+<2> This query does not match, because the `full_text` field only contains
+    the terms `quick` and `foxes`. It does not contain the exact term
+    `Quick Foxes!`.
+<3> A `term` query for the term `foxes` matches the `full_text` field.
+<4> This `match` query on the `full_text` field first analyzes the query string,
+    then looks for documents containing `quick` or `foxes` or both.
+**************************************************
+
+