Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 32 additions & 13 deletions docs/reference/query-dsl/queries/fuzzy-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,22 @@
[[query-dsl-fuzzy-query]]
=== Fuzzy Query

A fuzzy query that uses similarity based on Levenshtein (edit
distance) algorithm. This maps to Lucene's `FuzzyQuery`.
A fuzzy query that uses similarity based on the Levenshtein (edit distance) algorithm for text,
and ranges for numeric and date data.
This maps to Lucene's `FuzzyQuery` when run against a text field, and maps to a range filter when run against a numeric field.
Maximum edit distance is determined via the `min_similarity` parameter,
which can only take the values 1 or 2 for text for performance reasons.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • for text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I just pushed a clarification about how the query is a FuzzyQuery searched against text, but is actually a range query when run against numeric fields.


Warning: this query is not very scalable with its default prefix length
of 0 - in this case, *every* term will be enumerated and cause an edit
score calculation or `max_expansions` is not set.
Users should be warned, performance can easily degrade when using this query on an index with a
large number of terms, especially when a `min_similarity` of 2 is set.
Execution speed can be dramatically improved through the use of the `prefix_length` and
`max_expansions` settings, described later in this document.

Here is a simple example:
It should also be noted that `min_similarity` can also take a `float` value, which is
converted to an integer edit distance based based on the text's properties, but that this is deprecated.
Please use only integer values for `min_similarity`.

Here is a simple example of matching text with a `fuzzy` query:

[source,js]
--------------------------------------------------
Expand All @@ -17,8 +25,7 @@ Here is a simple example:
}
--------------------------------------------------

More complex settings can be set (the values here are the default
values):
More complex settings can be seen below (the values here are the defaults):

[source,js]
--------------------------------------------------
Expand All @@ -27,20 +34,17 @@ values):
"user" : {
"value" : "ki",
"boost" : 1.0,
"min_similarity" : 0.5,
"min_similarity" : 1,
"prefix_length" : 0
}
}
}
--------------------------------------------------

The `max_expansions` parameter (unbounded by default) controls the
number of terms the fuzzy query will expand to.

[float]
==== Numeric / Date Fuzzy

`fuzzy` query on a numeric field will result in a range query "around"
A `fuzzy` query on a numeric field will result in a range query "around"
the value using the `min_similarity` value. For example:

[source,js]
Expand Down Expand Up @@ -77,3 +81,18 @@ For example, for dates, a fuzzy factor of "1d" will result in
multiplying whatever fuzzy value provided in the min_similarity by it.
Note, this is explicitly supported since query_string query only allowed
for similarity valued between 0.0 and 1.0.

==== Performance Tuning

The default settings for this query prefer correct behavior over speed. Given an index
with a large number of terms, performance can quickly degrade. The `prefix_length` and `max_expansions`
parameters can be used to remedy performance problems in larger datasets by significantly
reducing the search space at cost of not matching some valid documents.

The `prefix_length` parameter restricts matches to those that have an an exact prefix match of the provided length with the query `value`.
Using `prefix_length` greatly shortens the search space at the expense of not detecting edits that occur at the start of the term.

The `max_expansions` parameter controls the number of alternate versions of the input term to look for.
When the query is executed only a set number of permutations, by default 50, are actually matched.
The lower the value of `max_expansions` the faster the query will be. The trade-off here is that some documents that
should match may not be returned due to their specific edit not being a part of the expansion list.
19 changes: 12 additions & 7 deletions docs/reference/query-dsl/queries/match-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

[[query-dsl-match-query]]
=== Match Query

Expand Down Expand Up @@ -34,14 +35,18 @@ The `analyzer` can be set to control which analyzer will perform the
analysis process on the text. It default to the field explicit mapping
definition, or the default search analyzer.

`fuzziness` can be set to a value (depending on the relevant type, for
string types it should be a value between `0.0` and `1.0`) to constructs
fuzzy queries for each term analyzed. The `prefix_length` and
`max_expansions` can be set in this case to control the fuzzy process.
The `fuzziness` option can be set to search for values slightly different from the query value.
The fuzziness option controls the maximum Levenshtein distance when querying string fields
with distances of `1` or `2` being legal. When querying numeric fields, it can take larger values,
and is internally converted into a range query.
For more information on fuzzy queries see <<query-dsl-fuzzy,fuzzy>>, which documents
this query type more fully. Note that all the options supported by the `fuzzy` query, such as
such as `max_expansions` and `prefix_length`, are supported within a `match` query when
`fuzziness` is specified.

If the fuzzy option is set the query will use `constant_score_rewrite`
as its <<query-dsl-multi-term-rewrite,rewrite
method>> the `rewrite` parameter allows to control how the query will get
rewritten.
as its <<query-dsl-multi-term-rewrite,rewrite method>>
the `rewrite` parameter controls how the query will get rewritten.

Here is an example when providing additional parameters (note the slight
change in structure, `message` is the field name):
Expand Down