From 7c3ed47cde2b69fdb50842a1f9d192e2fc1fbb8d Mon Sep 17 00:00:00 2001
From: Andrew Cholakian <andrew@andrewvc.com>
Date: Tue, 3 Dec 2013 21:11:15 -0500
Subject: [PATCH 1/3] Cleanup some of the documentation for the fuzzy query.

---
 .../query-dsl/queries/fuzzy-query.asciidoc    | 45 +++++++++++++------
 .../query-dsl/queries/match-query.asciidoc    |  8 ++--
 2 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/docs/reference/query-dsl/queries/fuzzy-query.asciidoc b/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
index 86a1062d16922..fd08cba81626d 100644
--- a/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
+++ b/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
@@ -1,14 +1,21 @@
 [[query-dsl-fuzzy-query]]
 === Fuzzy Query
 
-A fuzzy query that uses similarity based on Levenshtein (edit
-distance) algorithm. This maps to Lucene's `FuzzyQuery`.
+A fuzzy query that uses similarity based on the Levenshtein (edit distance) algorithm for text,
+and ranges for numeric and date data. This maps to Lucene's `FuzzyQuery` for text.
+Maximum edit distance is determined via the `min_similarity` parameter,
+which can only take the values 1 or 2 for text for performance reasons.
 
-Warning: this query is not very scalable with its default prefix length
-of 0 - in this case, *every* term will be enumerated and cause an edit
-score calculation or `max_expansions` is not set.
+Users should be warned, performance can easily degrade when using this query on an index with a
+large number of terms, especially when a `min_similarity` of 2 is set.
+Execution speed can be dramatically improved through the use of the `prefix_length` and
+`max_expansions` settings, described later in this document.
 
-Here is a simple example:
+It should also be noted that `min_similarity` can also take a `float` value, which is
+converted to an integer edit distance based based on the text's properties, but that this is deprecated.
+Please use only integer values for `min_similarity`.
+
+Here is a simple example of matching text with a `fuzzy` query:
 
 [source,js]
 --------------------------------------------------
@@ -17,8 +24,7 @@ Here is a simple example:
 }
 --------------------------------------------------
 
-More complex settings can be set (the values here are the default
-values):
+More complex settings can be seen below (the values here are the defaults):
 
 [source,js]
 --------------------------------------------------
@@ -27,20 +33,17 @@ values):
             "user" : {
                 "value" : "ki",
                 "boost" : 1.0,
-                "min_similarity" : 0.5,
+                "min_similarity" : 1,
                 "prefix_length" : 0
             }
         }
     }
 --------------------------------------------------
 
-The `max_expansions` parameter (unbounded by default) controls the
-number of terms the fuzzy query will expand to.
-
 [float]
 ==== Numeric / Date Fuzzy
 
-`fuzzy` query on a numeric field will result in a range query "around"
+A `fuzzy` query on a numeric field will result in a range query "around"
 the value using the `min_similarity` value. For example:
 
 [source,js]
@@ -77,3 +80,19 @@ For example, for dates, a fuzzy factor of "1d" will result in
 multiplying whatever fuzzy value provided in the min_similarity by it.
 Note, this is explicitly supported since query_string query only allowed
 for similarity valued between 0.0 and 1.0.
+
+==== Performance Tuning
+
+The default settings for this query prefer correct behavior over speed. Given an index
+with a large number of terms, performance can quickly degrade. The `prefix_length` and `max_expansions`
+parameters can be used to remedy performance problems in larger datasets by significantly
+reducing the search space at cost of not matching some valid documents.
+
+The `prefix_length` parameter restricts matches to those that share an exact prefix with the query `value`.
+The number of matching characters is controlled with this parameter.
+Using `prefix_length` greatly shortens the search space at the expense of not detecting edits that occur at the start of the term.
+
+The `max_expansions` parameter controls the number of alternate versions of the input term to look for.
+When the query is executed only a set number of permutations, by default 50, are actually matched.
+The lower the value of `max_expansions` the faster the query will be. The trade-off here is that some documents that
+should match may not be returned due to their specific edit not being a part of the expansion list.
\ No newline at end of file
diff --git a/docs/reference/query-dsl/queries/match-query.asciidoc b/docs/reference/query-dsl/queries/match-query.asciidoc
index 5460cbff1e448..9a06da5647548 100644
--- a/docs/reference/query-dsl/queries/match-query.asciidoc
+++ b/docs/reference/query-dsl/queries/match-query.asciidoc
@@ -38,10 +38,12 @@ definition, or the default search analyzer.
 string types it should be a value between `0.0` and `1.0`) to constructs
 fuzzy queries for each term analyzed. The `prefix_length` and
 `max_expansions` can be set in this case to control the fuzzy process.
+Please see the documentation for the <<query-dsl-fuzzy,fuzzy>> query type for
+more information on `max_expansions` and `prefix_length`.
+
 If the fuzzy option is set the query will use `constant_score_rewrite`
-as its <<query-dsl-multi-term-rewrite,rewrite
-method>> the `rewrite` parameter allows to control how the query will get
-rewritten.
+as its <<query-dsl-multi-term-rewrite,rewrite method>>
+the `rewrite` parameter controls how the query will get rewritten.
 
 Here is an example when providing additional parameters (note the slight
 change in structure, `message` is the field name):

From a17a1251e88c6f057308cc43aa54160ff2f67e81 Mon Sep 17 00:00:00 2001
From: Andrew Cholakian <andrew@andrewvc.com>
Date: Thu, 12 Dec 2013 10:49:24 -0800
Subject: [PATCH 2/3] Cleanup some wording on docs for Fuzzy queries

---
 docs/reference/query-dsl/queries/fuzzy-query.asciidoc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/reference/query-dsl/queries/fuzzy-query.asciidoc b/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
index fd08cba81626d..d4b3f9579c6e4 100644
--- a/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
+++ b/docs/reference/query-dsl/queries/fuzzy-query.asciidoc
@@ -2,7 +2,8 @@
 === Fuzzy Query
 
 A fuzzy query that uses similarity based on the Levenshtein (edit distance) algorithm for text,
-and ranges for numeric and date data. This maps to Lucene's `FuzzyQuery` for text.
+and ranges for numeric and date data.
+This maps to Lucene's `FuzzyQuery` when run against a text field, and maps to a range filter when run against a numeric field.
 Maximum edit distance is determined via the `min_similarity` parameter,
 which can only take the values 1 or 2 for text for performance reasons.
 
@@ -88,8 +89,7 @@ with a large number of terms, performance can quickly degrade. The `prefix_lengt
 parameters can be used to remedy performance problems in larger datasets by significantly
 reducing the search space at cost of not matching some valid documents.
 
-The `prefix_length` parameter restricts matches to those that share an exact prefix with the query `value`.
-The number of matching characters is controlled with this parameter.
+The `prefix_length` parameter restricts matches to those that have an an exact prefix match of the provided length with the query `value`.
 Using `prefix_length` greatly shortens the search space at the expense of not detecting edits that occur at the start of the term.
 
 The `max_expansions` parameter controls the number of alternate versions of the input term to look for.

From b7b0d1d2ae668a8f6519215ef27ccd9f6a34ad17 Mon Sep 17 00:00:00 2001
From: Andrew Cholakian <andrew@andrewvc.com>
Date: Tue, 17 Dec 2013 07:43:18 -0800
Subject: [PATCH 3/3] Better document fuzziness within match queries

---
 .../query-dsl/queries/match-query.asciidoc        | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/docs/reference/query-dsl/queries/match-query.asciidoc b/docs/reference/query-dsl/queries/match-query.asciidoc
index 9a06da5647548..4a3296a1dc270 100644
--- a/docs/reference/query-dsl/queries/match-query.asciidoc
+++ b/docs/reference/query-dsl/queries/match-query.asciidoc
@@ -1,3 +1,4 @@
+
 [[query-dsl-match-query]]
 === Match Query
 
@@ -34,12 +35,14 @@ The `analyzer` can be set to control which analyzer will perform the
 analysis process on the text. It default to the field explicit mapping
 definition, or the default search analyzer.
 
-`fuzziness` can be set to a value (depending on the relevant type, for
-string types it should be a value between `0.0` and `1.0`) to constructs
-fuzzy queries for each term analyzed. The `prefix_length` and
-`max_expansions` can be set in this case to control the fuzzy process.
-Please see the documentation for the <<query-dsl-fuzzy,fuzzy>> query type for
-more information on `max_expansions` and `prefix_length`.
+The `fuzziness` option can be set to search for values slightly different from the query value.
+The fuzziness option controls the maximum Levenshtein distance when querying string fields 
+with distances of `1` or `2` being legal. When querying numeric fields, it can take larger values,
+and is internally converted into a range query.
+For more information on fuzzy queries see <<query-dsl-fuzzy,fuzzy>>, which documents
+this query type more fully. Note that all the options supported by the `fuzzy` query, such as
+such as `max_expansions` and `prefix_length`, are supported within a `match` query when
+`fuzziness` is specified.
 
 If the fuzzy option is set the query will use `constant_score_rewrite`
 as its <<query-dsl-multi-term-rewrite,rewrite method>>