Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nGram and edgeNGram token filter names (#38911) #39070

Merged
merged 4 commits into from
Feb 21, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[[analysis-edgengram-tokenfilter]]
=== Edge NGram Token Filter

A token filter of type `edgeNGram`.
A token filter of type `edge_ngram`.

The following are settings that can be set for a `edgeNGram` token
The following are settings that can be set for a `edge_ngram` token
filter type:

[cols="<,<",options="header",]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[[analysis-ngram-tokenfilter]]
=== NGram Token Filter

A token filter of type `nGram`.
A token filter of type `ngram`.

The following are settings that can be set for a `nGram` token filter
The following are settings that can be set for a `ngram` token filter
type:

[cols="<,<",options="header",]
Expand Down
11 changes: 10 additions & 1 deletion docs/reference/migration/migrate_7_0/analysis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,13 @@ The `standard` token filter has been removed because it doesn't change anything
The `standard_html_strip` analyzer has been deprecated, and should be replaced
with a combination of the `standard` tokenizer and `html_strip` char_filter.
Indexes created using this analyzer will still be readable in elasticsearch 7.0,
but it will not be possible to create new indexes using it.
but it will not be possible to create new indexes using it.

[float]
==== The deprecated `nGram` and `edgeNGram` token filter cannot be used on new indices

The `nGram` and `edgeNGram` token filter names have been deprecated in an earlier 6.x version.
Indexes created using these token filters will still be readable in elasticsearch 7.0 but indexing
documents using those filter names will issue a deprecation warning. Using the deprecated names on
new indices starting with version 7.0.0 on will be prohibited and throw an error when indexing
cbuescher marked this conversation as resolved.
Show resolved Hide resolved
or analyzing documents. Both names should be replaces by `ngram` or `edge_ngram` respectively.
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,11 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
filters.add(PreConfiguredTokenFilter.singleton("edge_ngram", false, false, input ->
new EdgeNGramTokenFilter(input, 1)));
filters.add(PreConfiguredTokenFilter.singletonWithVersion("edgeNGram", false, false, (reader, version) -> {
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
if (version.onOrAfter(org.elasticsearch.Version.V_7_0_0)) {
throw new IllegalArgumentException(
"The [edgeNGram] token filter name was deprecated in 6.4 and cannot be used in new indices. "
+ "Please change the filter name to [edge_ngram] instead.");
} else if (version.onOrAfter(org.elasticsearch.Version.V_6_0_0)) {
cbuescher marked this conversation as resolved.
Show resolved Hide resolved
deprecationLogger.deprecatedAndMaybeLog("edgeNGram_deprecation",
"The [edgeNGram] token filter name is deprecated and will be removed in a future version. "
+ "Please change the filter name to [edge_ngram] instead.");
Expand All @@ -439,7 +443,10 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
LimitTokenCountFilterFactory.DEFAULT_CONSUME_ALL_TOKENS)));
filters.add(PreConfiguredTokenFilter.singleton("ngram", false, false, reader -> new NGramTokenFilter(reader, 1, 2, false)));
filters.add(PreConfiguredTokenFilter.singletonWithVersion("nGram", false, false, (reader, version) -> {
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
if (version.onOrAfter(org.elasticsearch.Version.V_7_0_0)) {
throw new IllegalArgumentException("The [nGram] token filter name was deprecated in 6.4 and cannot be used in new indices. "
+ "Please change the filter name to [ngram] instead.");
} else if (version.onOrAfter(org.elasticsearch.Version.V_6_0_0)) {
cbuescher marked this conversation as resolved.
Show resolved Hide resolved
deprecationLogger.deprecatedAndMaybeLog("nGram_deprecation",
"The [nGram] token filter name is deprecated and will be removed in a future version. "
+ "Please change the filter name to [ngram] instead.");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,12 @@
public class CommonAnalysisPluginTests extends ESTestCase {

/**
* Check that the deprecated name "nGram" issues a deprecation warning for indices created since 6.3.0
* Check that the deprecated name "nGram" issues a deprecation warning for indices created since 6.0.0
*/
public void testNGramDeprecationWarning() throws IOException {
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, Version.CURRENT))
.put(IndexMetaData.SETTING_VERSION_CREATED,
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, VersionUtils.getPreviousVersion(Version.V_7_0_0)))
.build();

IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
Expand All @@ -62,12 +63,11 @@ public void testNGramDeprecationWarning() throws IOException {
}

/**
* Check that the deprecated name "nGram" does NOT issues a deprecation warning for indices created before 6.4.0
* Check that the deprecated name "nGram" throws an error since 7.0.0
*/
public void testNGramNoDeprecationWarningPre6_4() throws IOException {
public void testNGramDeprecationError() throws IOException {
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
.put(IndexMetaData.SETTING_VERSION_CREATED,
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, Version.V_6_3_0))
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_7_0_0, null))
.build();

IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
Expand All @@ -76,16 +76,21 @@ public void testNGramNoDeprecationWarningPre6_4() throws IOException {
TokenFilterFactory tokenFilterFactory = tokenFilters.get("nGram");
Tokenizer tokenizer = new MockTokenizer();
tokenizer.setReader(new StringReader("foo bar"));
assertNotNull(tokenFilterFactory.create(tokenizer));
IllegalArgumentException ex = expectThrows(IllegalArgumentException.class, () -> tokenFilterFactory.create(tokenizer));
assertEquals(
"The [nGram] token filter name was deprecated in 6.4 and cannot be used in new indices. Please change the filter"
+ " name to [ngram] instead.",
ex.getMessage());
}
}

/**
* Check that the deprecated name "edgeNGram" issues a deprecation warning for indices created since 6.3.0
* Check that the deprecated name "edgeNGram" issues a deprecation warning for indices created since 6.0.0
*/
public void testEdgeNGramDeprecationWarning() throws IOException {
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, Version.CURRENT))
.put(IndexMetaData.SETTING_VERSION_CREATED,
VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, VersionUtils.getPreviousVersion(Version.V_7_0_0)))
.build();

IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
Expand All @@ -102,12 +107,11 @@ public void testEdgeNGramDeprecationWarning() throws IOException {
}

/**
* Check that the deprecated name "edgeNGram" does NOT issues a deprecation warning for indices created before 6.4.0
* Check that the deprecated name "edgeNGram" throws an error for indices created since 7.0.0
*/
public void testEdgeNGramNoDeprecationWarningPre6_4() throws IOException {
public void testEdgeNGramDeprecationError() throws IOException {
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
.put(IndexMetaData.SETTING_VERSION_CREATED,
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, Version.V_6_3_0))
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_7_0_0, null))
.build();

IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
Expand All @@ -116,11 +120,14 @@ public void testEdgeNGramNoDeprecationWarningPre6_4() throws IOException {
TokenFilterFactory tokenFilterFactory = tokenFilters.get("edgeNGram");
Tokenizer tokenizer = new MockTokenizer();
tokenizer.setReader(new StringReader("foo bar"));
assertNotNull(tokenFilterFactory.create(tokenizer));
IllegalArgumentException ex = expectThrows(IllegalArgumentException.class, () -> tokenFilterFactory.create(tokenizer));
assertEquals(
"The [edgeNGram] token filter name was deprecated in 6.4 and cannot be used in new indices. Please change the filter"
+ " name to [edge_ngram] instead.",
ex.getMessage());
}
}


/**
* Check that the deprecated analyzer name "standard_html_strip" throws exception for indices created since 7.0.0
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ public void testNgramHighlightingWithBrokenPositions() throws IOException {
.put("analysis.tokenizer.autocomplete.max_gram", 20)
.put("analysis.tokenizer.autocomplete.min_gram", 1)
.put("analysis.tokenizer.autocomplete.token_chars", "letter,digit")
.put("analysis.tokenizer.autocomplete.type", "nGram")
.put("analysis.tokenizer.autocomplete.type", "ngram")
.put("analysis.filter.wordDelimiter.type", "word_delimiter")
.putList("analysis.filter.wordDelimiter.type_table",
"& => ALPHANUM", "| => ALPHANUM", "! => ALPHANUM",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@
- match: { detail.tokenizer.tokens.0.token: Foo Bar! }

---
"nGram":
"ngram":
- do:
indices.analyze:
body:
text: good
explain: true
tokenizer:
type: nGram
type: ngram
min_gram: 2
max_gram: 2
- length: { detail.tokenizer.tokens: 3 }
Expand All @@ -40,7 +40,7 @@
- match: { detail.tokenizer.tokens.2.token: od }

---
"nGram_exception":
"ngram_exception":
- skip:
version: " - 6.99.99"
reason: only starting from version 7.x this throws an error
Expand All @@ -51,7 +51,7 @@
text: good
explain: true
tokenizer:
type: nGram
type: ngram
min_gram: 2
max_gram: 4
---
Expand Down Expand Up @@ -133,7 +133,7 @@
text: "foobar"
explain: true
tokenizer:
type: nGram
type: ngram
min_gram: 3
max_gram: 3
- length: { detail.tokenizer.tokens: 4 }
Expand Down Expand Up @@ -162,9 +162,9 @@
body:
text: "foo"
explain: true
tokenizer: nGram
tokenizer: ngram
- length: { detail.tokenizer.tokens: 5 }
- match: { detail.tokenizer.name: nGram }
- match: { detail.tokenizer.name: ngram }
- match: { detail.tokenizer.tokens.0.token: f }
- match: { detail.tokenizer.tokens.1.token: fo }
- match: { detail.tokenizer.tokens.2.token: o }
Expand Down Expand Up @@ -194,7 +194,7 @@
text: "foo"
explain: true
tokenizer:
type: edgeNGram
type: edge_ngram
min_gram: 1
max_gram: 3
- length: { detail.tokenizer.tokens: 3 }
Expand All @@ -219,9 +219,9 @@
body:
text: "foo"
explain: true
tokenizer: edgeNGram
tokenizer: edge_ngram
- length: { detail.tokenizer.tokens: 2 }
- match: { detail.tokenizer.name: edgeNGram }
- match: { detail.tokenizer.name: edge_ngram }
- match: { detail.tokenizer.tokens.0.token: f }
- match: { detail.tokenizer.tokens.1.token: fo }

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
analysis:
tokenizer:
trigram:
type: nGram
type: ngram
min_gram: 3
max_gram: 3
filter:
Expand Down