Unclear error message when using custom "synonym" with Analyze API #23943

tsouza · 2017-04-06T12:24:02Z

According to synonym filter comment:

/*
 * synonym and synonym_graph are different than everything else since they need access to the tokenizer factories for the index.
 * instead of building the infrastructure for plugins we rather make it a real exception to not pollute the general interface and
 * hide internal data-structures as much as possible.
*/

Because of this limitation, if you try to use it like the following (which is a valid syntax since the Analyze API docs mentions it but with stop filter instead):

GET _analyze
{
  "tokenizer": "standard",
  "filter": [{
    "type": "synonym",
    "synonyms": ["test,testing"]
  }],
  "text": "this is a test"
}

You will get the following error: failed to find global token filter under [synonym]

This is misleading since it is the same error that you get if you use an unknown token filter name and, for newbies, specially the ones that just learned about the Analyze API, it can be confusing since synonym should be a valid token filter name and the docs does not mentions this particular limitation of synonym token filter.

Moreover, IMO synonym should be supported by /_analyze API anyway, if possible. Since it only works in the context of an index, the need to create a new test index every time you change the synonym configuration can be a time consuming task compared to simply using the /_analyze API.

The text was updated successfully, but these errors were encountered:

javanna · 2017-04-07T12:21:04Z

We discussed this as part of FixItFriday, we agreed that the error message is odd. Also, we were wondering why an index is required and whether we can remove that requirement. I could reproduce against 5.3.0.

tsouza · 2017-04-07T12:27:22Z

Removing the index requirement would be a big win!

s1monw · 2017-04-07T13:14:11Z

I wonder if we should just special case it here too.. it's really a cyclic depdendency and I think this could fix is without too much trouble. I agree it's a bit of a hack:

diff --git a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
index d7e299b1cf..37a4bf1788 100644
--- a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
+++ b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
@@ -50,6 +50,8 @@ import org.elasticsearch.index.analysis.CharFilterFactory;
 import org.elasticsearch.index.analysis.CustomAnalyzer;
 import org.elasticsearch.index.analysis.IndexAnalyzers;
 import org.elasticsearch.index.analysis.NamedAnalyzer;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
 import org.elasticsearch.index.analysis.TokenFilterFactory;
 import org.elasticsearch.index.analysis.TokenizerFactory;
 import org.elasticsearch.index.mapper.AllFieldMapper;
@@ -505,7 +507,7 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
         return charFilterFactories;
     }
 
-    private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, AnalysisRegistry analysisRegistry,
+    private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, final AnalysisRegistry analysisRegistry,
                                                                 Environment environment, TokenFilterFactory[] tokenFilterFactories) throws IOException {
         if (request.tokenFilters() != null && request.tokenFilters().size() > 0) {
             tokenFilterFactories = new TokenFilterFactory[request.tokenFilters().size()];
@@ -521,7 +523,16 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
                     AnalysisModule.AnalysisProvider<TokenFilterFactory> tokenFilterFactoryFactory =
                         analysisRegistry.getTokenFilterProvider(filterTypeName);
                     if (tokenFilterFactoryFactory == null) {
-                        throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+                        switch(filterTypeName) {
+                            case "synonym":
+                                tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymTokenFilterFactory(is, env, analysisRegistry, name, s);
+                                break;
+                            case "synonym_graph":
+                                tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymGraphTokenFilterFactory(is, env, analysisRegistry, name, s);
+                                break;
+                            default:
+                                throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+                        }
                     }
                     // Need to set anonymous "name" of tokenfilter
                     tokenFilterFactories[i] = tokenFilterFactoryFactory.get(getNaIndexSettings(settings), environment, "_anonymous_tokenfilter_[" + i + "]", settings);
diff --git a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
index 61950942e6..ba1df40a7e 100644
--- a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
+++ b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
@@ -128,6 +128,8 @@ import org.elasticsearch.index.analysis.StemmerTokenFilterFactory;
 import org.elasticsearch.index.analysis.StopAnalyzerProvider;
 import org.elasticsearch.index.analysis.StopTokenFilterFactory;
 import org.elasticsearch.index.analysis.SwedishAnalyzerProvider;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
 import org.elasticsearch.index.analysis.ThaiAnalyzerProvider;
 import org.elasticsearch.index.analysis.ThaiTokenizerFactory;
 import org.elasticsearch.index.analysis.TokenFilterFactory;
diff --git a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
index bcd7bba8d3..3423eb5329 100644
--- a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
+++ b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
@@ -35,6 +35,8 @@ import org.elasticsearch.test.ESTestCase;
 import org.elasticsearch.test.IndexSettingsModule;
 
 import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.List;
 
 import static java.util.Collections.emptyList;
@@ -108,6 +110,26 @@ public class TransportAnalyzeActionTests extends ESTestCase {
         assertEquals("ck", tokens.get(3).getTerm());
         assertEquals("brown", tokens.get(4).getTerm());
         assertEquals("fox", tokens.get(5).getTerm());
+
+
+        request = new AnalyzeRequest();
+        request.analyzer(null);
+        request.tokenizer("standard");
+        request.addCharFilter("html_strip");
+        HashMap<String, Object> filter = new HashMap<>();
+        filter.put("type", "synonym");
+        filter.put("synonyms", Arrays.asList("test,testing"));
+        request.addTokenFilter(filter);
+        request.text("<p>this is a test</p>");
+        analyze = TransportAnalyzeAction.analyze(request, AllFieldMapper.NAME, null, randomBoolean() ? indexAnalyzers : null,
+            registry, environment);
+        tokens = analyze.getTokens();
+        assertEquals(5, tokens.size());
+        assertEquals("this", tokens.get(0).getTerm());
+        assertEquals("is", tokens.get(1).getTerm());
+        assertEquals("a", tokens.get(2).getTerm());
+        assertEquals("test", tokens.get(3).getTerm());
+        assertEquals("testing", tokens.get(4).getTerm());
     }
 
     public void testFillsAttributes() throws IOException {
@@ -190,6 +212,7 @@ public class TransportAnalyzeActionTests extends ESTestCase {
         assertEquals("hay", tokens.get(1).getTerm());
     }
 
+
     public void testGetIndexAnalyserWithoutIndexAnalyzers() throws IOException {
         IllegalArgumentException e = expectThrows(IllegalArgumentException.class,
             () -> TransportAnalyzeAction.analyze(

mr-mos · 2017-11-21T09:47:45Z

+1 synonym-filter should work with the _analyze endpoint
Any plans to fix this for ES 6.x?

mr-mos · 2018-01-24T17:18:57Z

@javanna Any news on this?

romseygeek · 2018-03-13T12:29:21Z

cc @elastic/es-search-aggs

romseygeek · 2019-04-03T12:48:22Z

This ought to work properly in elasticsearch 7.0, after #34034

…e API Relates to elastic#23943

Relates to #23943

romseygeek · 2019-04-04T08:04:30Z

Confirmed that #34034 fixes this, so I'm closing.

Relates to #23943

…#40781) Relates to elastic#23943

tsouza added discuss >enhancement labels Apr 6, 2017

cbuescher added the :Search/Analysis How text is split into tokens label Apr 6, 2017

javanna removed the discuss label Apr 7, 2017

javanna added good first issue low hanging fruit help wanted adoptme labels Apr 7, 2017

s1monw removed the good first issue low hanging fruit label Apr 7, 2017

damienalexandre mentioned this issue Jun 9, 2018

Synonym Token Filter wrongly assume that term is completely eliminated by analyzer #31224

Closed

romseygeek added a commit to romseygeek/elasticsearch that referenced this issue Apr 3, 2019

Add a test showing that no-index synonyms can be used with the Analyz…

df1cb4b

…e API Relates to elastic#23943

romseygeek mentioned this issue Apr 3, 2019

Test that no-index synonyms can be used with the Analyze API #40781

Merged

romseygeek added a commit that referenced this issue Apr 4, 2019

Test that no-index synonyms can be used with the Analyze API (#40781)

8cf4eb3

Relates to #23943

romseygeek closed this as completed Apr 4, 2019

romseygeek added a commit that referenced this issue Apr 4, 2019

Test that no-index synonyms can be used with the Analyze API (#40781)

4296ff2

Relates to #23943

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019

Test that no-index synonyms can be used with the Analyze API (elastic…

1ff9a96

…#40781) Relates to elastic#23943

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear error message when using custom "synonym" with Analyze API #23943

Unclear error message when using custom "synonym" with Analyze API #23943

tsouza commented Apr 6, 2017

javanna commented Apr 7, 2017

tsouza commented Apr 7, 2017

s1monw commented Apr 7, 2017

mr-mos commented Nov 21, 2017

mr-mos commented Jan 24, 2018

romseygeek commented Mar 13, 2018

romseygeek commented Apr 3, 2019

romseygeek commented Apr 4, 2019

Unclear error message when using custom "synonym" with Analyze API #23943

Unclear error message when using custom "synonym" with Analyze API #23943

Comments

tsouza commented Apr 6, 2017

javanna commented Apr 7, 2017

tsouza commented Apr 7, 2017

s1monw commented Apr 7, 2017

mr-mos commented Nov 21, 2017

mr-mos commented Jan 24, 2018

romseygeek commented Mar 13, 2018

romseygeek commented Apr 3, 2019

romseygeek commented Apr 4, 2019