Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error message when using custom "synonym" with Analyze API #23943

Closed
tsouza opened this issue Apr 6, 2017 · 8 comments
Closed

Unclear error message when using custom "synonym" with Analyze API #23943

tsouza opened this issue Apr 6, 2017 · 8 comments
Labels
>enhancement help wanted adoptme :Search/Analysis How text is split into tokens

Comments

@tsouza
Copy link

tsouza commented Apr 6, 2017

According to synonym filter comment:

/*
 * synonym and synonym_graph are different than everything else since they need access to the tokenizer factories for the index.
 * instead of building the infrastructure for plugins we rather make it a real exception to not pollute the general interface and
 * hide internal data-structures as much as possible.
*/

Because of this limitation, if you try to use it like the following (which is a valid syntax since the Analyze API docs mentions it but with stop filter instead):

GET _analyze
{
  "tokenizer": "standard",
  "filter": [{
    "type": "synonym",
    "synonyms": ["test,testing"]
  }],
  "text": "this is a test"
}

You will get the following error: failed to find global token filter under [synonym]

This is misleading since it is the same error that you get if you use an unknown token filter name and, for newbies, specially the ones that just learned about the Analyze API, it can be confusing since synonym should be a valid token filter name and the docs does not mentions this particular limitation of synonym token filter.

Moreover, IMO synonym should be supported by /_analyze API anyway, if possible. Since it only works in the context of an index, the need to create a new test index every time you change the synonym configuration can be a time consuming task compared to simply using the /_analyze API.

@cbuescher cbuescher added the :Search/Analysis How text is split into tokens label Apr 6, 2017
@javanna javanna removed the discuss label Apr 7, 2017
@javanna
Copy link
Member

javanna commented Apr 7, 2017

We discussed this as part of FixItFriday, we agreed that the error message is odd. Also, we were wondering why an index is required and whether we can remove that requirement. I could reproduce against 5.3.0.

@javanna javanna added good first issue low hanging fruit help wanted adoptme labels Apr 7, 2017
@tsouza
Copy link
Author

tsouza commented Apr 7, 2017

Removing the index requirement would be a big win!

@s1monw
Copy link
Contributor

s1monw commented Apr 7, 2017

I wonder if we should just special case it here too.. it's really a cyclic depdendency and I think this could fix is without too much trouble. I agree it's a bit of a hack:

diff --git a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
index d7e299b1cf..37a4bf1788 100644
--- a/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
+++ b/core/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
@@ -50,6 +50,8 @@ import org.elasticsearch.index.analysis.CharFilterFactory;
 import org.elasticsearch.index.analysis.CustomAnalyzer;
 import org.elasticsearch.index.analysis.IndexAnalyzers;
 import org.elasticsearch.index.analysis.NamedAnalyzer;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
 import org.elasticsearch.index.analysis.TokenFilterFactory;
 import org.elasticsearch.index.analysis.TokenizerFactory;
 import org.elasticsearch.index.mapper.AllFieldMapper;
@@ -505,7 +507,7 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
         return charFilterFactories;
     }
 
-    private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, AnalysisRegistry analysisRegistry,
+    private static TokenFilterFactory[] getTokenFilterFactories(AnalyzeRequest request, IndexSettings indexSettings, final AnalysisRegistry analysisRegistry,
                                                                 Environment environment, TokenFilterFactory[] tokenFilterFactories) throws IOException {
         if (request.tokenFilters() != null && request.tokenFilters().size() > 0) {
             tokenFilterFactories = new TokenFilterFactory[request.tokenFilters().size()];
@@ -521,7 +523,16 @@ public class TransportAnalyzeAction extends TransportSingleShardAction<AnalyzeRe
                     AnalysisModule.AnalysisProvider<TokenFilterFactory> tokenFilterFactoryFactory =
                         analysisRegistry.getTokenFilterProvider(filterTypeName);
                     if (tokenFilterFactoryFactory == null) {
-                        throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+                        switch(filterTypeName) {
+                            case "synonym":
+                                tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymTokenFilterFactory(is, env, analysisRegistry, name, s);
+                                break;
+                            case "synonym_graph":
+                                tokenFilterFactoryFactory = (is, env, name, s) -> new SynonymGraphTokenFilterFactory(is, env, analysisRegistry, name, s);
+                                break;
+                            default:
+                                throw new IllegalArgumentException("failed to find global token filter under [" + filterTypeName + "]");
+                        }
                     }
                     // Need to set anonymous "name" of tokenfilter
                     tokenFilterFactories[i] = tokenFilterFactoryFactory.get(getNaIndexSettings(settings), environment, "_anonymous_tokenfilter_[" + i + "]", settings);
diff --git a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
index 61950942e6..ba1df40a7e 100644
--- a/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
+++ b/core/src/main/java/org/elasticsearch/indices/analysis/AnalysisModule.java
@@ -128,6 +128,8 @@ import org.elasticsearch.index.analysis.StemmerTokenFilterFactory;
 import org.elasticsearch.index.analysis.StopAnalyzerProvider;
 import org.elasticsearch.index.analysis.StopTokenFilterFactory;
 import org.elasticsearch.index.analysis.SwedishAnalyzerProvider;
+import org.elasticsearch.index.analysis.SynonymGraphTokenFilterFactory;
+import org.elasticsearch.index.analysis.SynonymTokenFilterFactory;
 import org.elasticsearch.index.analysis.ThaiAnalyzerProvider;
 import org.elasticsearch.index.analysis.ThaiTokenizerFactory;
 import org.elasticsearch.index.analysis.TokenFilterFactory;
diff --git a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
index bcd7bba8d3..3423eb5329 100644
--- a/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
+++ b/core/src/test/java/org/elasticsearch/action/admin/indices/TransportAnalyzeActionTests.java
@@ -35,6 +35,8 @@ import org.elasticsearch.test.ESTestCase;
 import org.elasticsearch.test.IndexSettingsModule;
 
 import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
 import java.util.List;
 
 import static java.util.Collections.emptyList;
@@ -108,6 +110,26 @@ public class TransportAnalyzeActionTests extends ESTestCase {
         assertEquals("ck", tokens.get(3).getTerm());
         assertEquals("brown", tokens.get(4).getTerm());
         assertEquals("fox", tokens.get(5).getTerm());
+
+
+        request = new AnalyzeRequest();
+        request.analyzer(null);
+        request.tokenizer("standard");
+        request.addCharFilter("html_strip");
+        HashMap<String, Object> filter = new HashMap<>();
+        filter.put("type", "synonym");
+        filter.put("synonyms", Arrays.asList("test,testing"));
+        request.addTokenFilter(filter);
+        request.text("<p>this is a test</p>");
+        analyze = TransportAnalyzeAction.analyze(request, AllFieldMapper.NAME, null, randomBoolean() ? indexAnalyzers : null,
+            registry, environment);
+        tokens = analyze.getTokens();
+        assertEquals(5, tokens.size());
+        assertEquals("this", tokens.get(0).getTerm());
+        assertEquals("is", tokens.get(1).getTerm());
+        assertEquals("a", tokens.get(2).getTerm());
+        assertEquals("test", tokens.get(3).getTerm());
+        assertEquals("testing", tokens.get(4).getTerm());
     }
 
     public void testFillsAttributes() throws IOException {
@@ -190,6 +212,7 @@ public class TransportAnalyzeActionTests extends ESTestCase {
         assertEquals("hay", tokens.get(1).getTerm());
     }
 
+
     public void testGetIndexAnalyserWithoutIndexAnalyzers() throws IOException {
         IllegalArgumentException e = expectThrows(IllegalArgumentException.class,
             () -> TransportAnalyzeAction.analyze(

@s1monw s1monw removed the good first issue low hanging fruit label Apr 7, 2017
@mr-mos
Copy link

mr-mos commented Nov 21, 2017

+1 synonym-filter should work with the _analyze endpoint
Any plans to fix this for ES 6.x?

@mr-mos
Copy link

mr-mos commented Jan 24, 2018

@javanna Any news on this?

@romseygeek
Copy link
Contributor

cc @elastic/es-search-aggs

@romseygeek
Copy link
Contributor

This ought to work properly in elasticsearch 7.0, after #34034

@romseygeek
Copy link
Contributor

Confirmed that #34034 fixes this, so I'm closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement help wanted adoptme :Search/Analysis How text is split into tokens
Projects
None yet
Development

No branches or pull requests

6 participants