Remove `nGram` and `edgeNGram` token filter names #38911

cbuescher · 2019-02-14T17:08:16Z

In #30209 we deprecated the camel case nGram filter name in favour of ngram and
did the same for edgeNGram and edge_ngram. Using these names has been deprecated
since 6.4 and is issuing deprecation warnings since then.
I think we can remove these filters in 8.0. In a backport of this PR I would change what was a
dreprecation warning from 6.4. to an error starting with new indices created in 7.0.

elasticmachine · 2019-02-14T17:08:18Z

Pinging @elastic/es-search

cbuescher · 2019-02-14T17:09:00Z

docs/reference/analysis/tokenfilters/edgengram-tokenfilter.asciidoc

@@ -1,9 +1,9 @@
 [[analysis-edgengram-tokenfilter]]
 === Edge NGram Token Filter

-A token filter of type `edgeNGram`.
+A token filter of type `edge_ngram`.


These doc-changes should go back to 7.0 as well.

cbuescher · 2019-02-14T17:10:19Z

...is-common/src/test/java/org/elasticsearch/analysis/common/HighlighterWithAnalyzersTests.java

@@ -81,7 +81,7 @@ public void testNgramHighlightingWithBrokenPositions() throws IOException {
                        .put("analysis.tokenizer.autocomplete.max_gram", 20)
                        .put("analysis.tokenizer.autocomplete.min_gram", 1)
                        .put("analysis.tokenizer.autocomplete.token_chars", "letter,digit")
-                        .put("analysis.tokenizer.autocomplete.type", "nGram")
+                        .put("analysis.tokenizer.autocomplete.type", "ngram")


This is about the tokenizer, not the filter, but I think we also shouldn't use the camel case version of that one anymore.

cbuescher · 2019-02-14T17:11:08Z

modules/analysis-common/src/test/resources/rest-api-spec/test/analysis-common/30_tokenizers.yml

@@ -133,7 +101,7 @@
            text: "foobar"
            explain: true
            tokenizer:
-              type: nGram
+              type: ngram


same here, this is about the tokenizer, but we shouldn't use the camel case name here regardless

jimczi

LGTM, can you add an entry in the breaking changes (you'll need to create one for 8.0) ?

cbuescher · 2019-02-15T16:08:58Z

LGTM, can you add an entry in the breaking changes (you'll need to create one for 8.0) ?

Will do. I was wondering if we also need to start throwing errors starting with 7.0 when token filters are used via "nGram" or "edgeNGram" in new indices, so essentially throwing an error where since 6.4 we have issued the deprecation warning, or if we can remove the filter in 8.0 only on the basis of having deprecated it in 6.4? The problem with token filters is AFAIK we currently cannot easily throw errors on e.g. index creation really but only when the filter is used (e.g. an index or analyze operation). Wdyt?

jimczi · 2019-02-15T16:19:47Z

I was wondering if we also need to start throwing errors starting with 7.0 when token filters are used via "nGram" or "edgeNGram" in new indices, so essentially throwing an error

+1, we should throw an error if the deprecated name is used on an index created in 7.0+.

* master: Address some CCR REST test case flakiness (elastic#38975) Edits to text in Completion Suggester doc (elastic#38980) SQL: doc polishing [DOCS] Fixes broken formatting SQL: Polish the rest chapter (elastic#38971) Remove `nGram` and `edgeNGram` token filter names (elastic#38911) Add an exception throw if waiting on transport port file fails (elastic#37574) Improve testcluster distribution artifact handling (elastic#38933) Advance max_seq_no before add operation to Lucene (elastic#38879) Reduce global checkpoint sync interval in disruption tests (elastic#38931) [test] disable packaging tests for suse boxes Relax testStressMaybeFlushOrRollTranslogGeneration (elastic#38918) [DOCS] Edits warning in put watch API (elastic#38582) Fix serialization bug in ShardFollowTask after cutting this class over to extend from ImmutableFollowParameters. [DOCS] Updates methods for upgrading machine learning (elastic#38876)

In elastic#30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and did the same for `edgeNGram` and `edge_ngram`. Using these names has been deprecated since 6.4 and is issuing deprecation warnings since then. I think we can remove these filters in 8.0. In a backport of this PR I would change what was a dreprecation warning from 6.4. to an error starting with new indices created in 7.0.

In #30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and did the same for `edgeNGram` and `edge_ngram` and we are removing those names in 8.0. This change disallows using the deprecated names for new indices created in 7.0 by throwing an error if these filters are used. Relates to #38911

Christoph Büscher added 3 commits February 14, 2019 17:47

Remove deprecated nGram alias

d30c4c8

Remove deprecated edgeNGram alias

4db6cd4

Don't remove tokenizers just yet

e5fe2c3

cbuescher added :Search Relevance/Analysis How text is split into tokens >refactoring v8.0.0 labels Feb 14, 2019

cbuescher commented Feb 14, 2019

View reviewed changes

Merge branch 'master' into remove-nGram-edgeNGram

fd9a314

jimczi approved these changes Feb 15, 2019

View reviewed changes

Adding removal of deprecated token filters to new 8.0 migration doc

b540986

cbuescher merged commit 7bb2da1 into elastic:master Feb 15, 2019

cbuescher added the backport pending label Feb 15, 2019

This was referenced Feb 18, 2019

Remove nGram and edgeNGram token filter names (#38911) #39070

Merged

Move some token filter migration notes #39072

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

cbuescher removed the backport pending label Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `nGram` and `edgeNGram` token filter names #38911

Remove `nGram` and `edgeNGram` token filter names #38911

cbuescher commented Feb 14, 2019

elasticmachine commented Feb 14, 2019

cbuescher Feb 14, 2019

cbuescher Feb 14, 2019

cbuescher Feb 14, 2019

jimczi left a comment

cbuescher commented Feb 15, 2019

jimczi commented Feb 15, 2019

Remove nGram and edgeNGram token filter names #38911

Remove nGram and edgeNGram token filter names #38911

Conversation

cbuescher commented Feb 14, 2019

elasticmachine commented Feb 14, 2019

cbuescher Feb 14, 2019

Choose a reason for hiding this comment

cbuescher Feb 14, 2019

Choose a reason for hiding this comment

cbuescher Feb 14, 2019

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

cbuescher commented Feb 15, 2019

jimczi commented Feb 15, 2019

Remove `nGram` and `edgeNGram` token filter names #38911

Remove `nGram` and `edgeNGram` token filter names #38911