You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was going to create index from documents with above analyzer, following error is appeared.
/var/log/elasticearch/searcher.log
```
[2019-09-21T16:05:10,498][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [n01] failed to put template [products_template]
java.lang.IllegalArgumentException: failed to build synonyms
at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:138) ~[?:?]
at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.getChainAwareTokenFilterFactory(SynonymTokenFilterFactory.java:90) ~[?:?]
at org.elasticsearch.index.analysis.AnalyzerComponents.createComponents(AnalyzerComponents.java:84) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.analysis.CustomAnalyzerProvider.create(CustomAnalyzerProvider.java:63) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:50) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.produceAnalyzer(AnalysisRegistry.java:584) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:534) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:216) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.IndexService.(IndexService.java:180) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:411) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:563) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:512) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.validateAndAddTemplate(MetaDataIndexTemplateService.java:235) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.access$300(MetaDataIndexTemplateService.java:65) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService$2.execute(MetaDataIndexTemplateService.java:176) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.3.2.jar:7.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.text.ParseException: Invalid synonym rule at line 4
at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.elasticsearch.analysis.common.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[?:?]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:134) ~[?:?]
... 27 more
Caused by: java.lang.IllegalArgumentException: term: 焼餃子 analyzed to a token (焼餃子) with position increment != 1 (got: 0)
at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.elasticsearch.analysis.common.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[?:?]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41]
at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:134) ~[?:?]
... 27 more
```
I researched about this error and I understand following points.
1. kuromoji_tokenizer's search mode outputs these three tokens for 焼餃子.
In such case above tokenized words "焼" and "焼餃子" have same position number, so ESSolrSynonymParser fails because of position incrementation checking.
So I could create index with updating synonym content.
from:
"synonyms": [
'焼きぎょうざ, 焼き餃子, 焼餃子'
]
to:
"synonyms": [
'焼 きぎ ょうざ, 焼き餃子, 焼 餃子'
]
We need to tokenize synonym words and put space between each tokens.
My Opinion
I think many Japanese user want to use kuromoji_tokenizer's search mode and synonym_token_filter toggether.
So I'm glad if analyzer is improved not to fail with them.
Or at least, writing document about it in the Elasticssearch document.
Because at now, I couldn't find good description about this problem and I could understand the reason with only after reading source code.
Elasticsearch version (bin/elasticsearch --version):
7.2 Plugins installed: []
JVM version (java -version):
openjdk 11.0.4 2019-07-16
OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3)
OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)
OS version (uname -a if on a Unix-like system):
Ubuntu 18.04.3 LTS
The text was updated successfully, but these errors were encountered:
The Kuromoji tokenizer now supports a discard_compound_token option that can be used in conjunction with the search mode to output a single path in the tokenization. When this option is set to true, the synonym graph filter should work fine so I am closing this issue.
Please reopen if the provided solution doesn't work as expected.
There is unconvenience when we use
kuromoji_tokenizer
'ssearch
mode andsynonym_token_filter
toggether.Problem Description
Now I have following analyzer setting,
When I was going to create index from documents with above analyzer, following error is appeared.
/var/log/elasticearch/searcher.log
``` [2019-09-21T16:05:10,498][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [n01] failed to put template [products_template] java.lang.IllegalArgumentException: failed to build synonyms at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:138) ~[?:?] at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.getChainAwareTokenFilterFactory(SynonymTokenFilterFactory.java:90) ~[?:?] at org.elasticsearch.index.analysis.AnalyzerComponents.createComponents(AnalyzerComponents.java:84) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.analysis.CustomAnalyzerProvider.create(CustomAnalyzerProvider.java:63) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:50) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.analysis.AnalysisRegistry.produceAnalyzer(AnalysisRegistry.java:584) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:534) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:216) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.IndexService.(IndexService.java:180) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:411) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:563) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:512) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.validateAndAddTemplate(MetaDataIndexTemplateService.java:235) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.access$300(MetaDataIndexTemplateService.java:65) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService$2.execute(MetaDataIndexTemplateService.java:176) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.3.2.jar:7.3.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:835) [?:?] Caused by: java.text.ParseException: Invalid synonym rule at line 4 at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.elasticsearch.analysis.common.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[?:?] at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:134) ~[?:?] ... 27 more Caused by: java.lang.IllegalArgumentException: term: 焼餃子 analyzed to a token (焼餃子) with position increment != 1 (got: 0) at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.elasticsearch.analysis.common.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[?:?] at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:35:41] at org.elasticsearch.analysis.common.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:134) ~[?:?] ... 27 more ```I researched about this error and I understand following points.
1.
kuromoji_tokenizer
'ssearch
mode outputs these three tokens for焼餃子
.2.
ESSolrSynonymParser
checks position incrementation.ESSolrSynonymParser.java#L55
SynonymMap.java
In such case above tokenized words "焼" and "焼餃子" have same position number, so
ESSolrSynonymParser
fails because of position incrementation checking.So I could create index with updating synonym content.
from:
to:
We need to tokenize synonym words and put space between each tokens.
My Opinion
I think many Japanese user want to use
kuromoji_tokenizer
'ssearch
mode andsynonym_token_filter
toggether.So I'm glad if analyzer is improved not to fail with them.
Or at least, writing document about it in the Elasticssearch document.
Because at now, I couldn't find good description about this problem and I could understand the reason with only after reading source code.
Elasticsearch version (
bin/elasticsearch --version
):7.2
Plugins installed: []
JVM version (
java -version
):openjdk 11.0.4 2019-07-16
OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3)
OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)
OS version (
uname -a
if on a Unix-like system):Ubuntu 18.04.3 LTS
The text was updated successfully, but these errors were encountered: