Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonym building fails if filter proceeded by compound word filter #40000

Closed
BastianHofmann opened this issue Mar 13, 2019 · 1 comment
Closed

Comments

@BastianHofmann
Copy link

I'm using the official Docker image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2

Elasticsearch version (bin/elasticsearch --version): 6.3.2

Plugins installed: []

JVM version (java -version): 10.0.2

OS version (uname -a if on a Unix-like system): Linux 596b10634157 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

When creating an index with a custom analyzer with a dictionary_decompounder filter and synonym filter (in this order), the index creation fails with an invalid argument exception.

Steps to reproduce:

  1. create index
PUT http://localhost:9200/test

{
    "analysis": {
        "analyzer": {
            "test": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "myCompounds",
                    "mySynonyms"
                ]
            }
        },
        "filter": {
            "myCompounds": {
                "type": "dictionary_decompounder",
                "word_list": [
                    "Kaufmann"
                ]
            },
            "mySynonyms": {
                "type": "synonym",
                "synonyms": [
                    "Verkäufer, Kaufmann im Einzelhandel"
                ]
            }
        }
    }
}
  1. See error response
{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "failed to build synonyms"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to build synonyms",
        "caused_by": {
            "type": "parse_exception",
            "reason": "Invalid synonym rule at line 1",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "term: Kaufmann im Einzelhandel analyzed to a token (Kaufmann) with position increment != 1 (got: 0)"
            }
        }
    },
    "status": 400
}

It works if you switch the order of the filters. I know that the synonym list is analyzed with all the filters, that come before the synonym filter. However why would that cause an error as soon as a compound word is included in the synonyms. Shouldn't we just get a list of "Verkäufer, Kaufmann im Einzelhandel, Kaufmann" for a value of "Verkäufer"?

@jimczi
Copy link
Contributor

jimczi commented Mar 13, 2019

Thanks for reporting @BastianHofmann , this issue is already fixed by #34331 so I hope you don't mind if I close it.
Starting in 6.6 we don't apply the decompounding when building synonyms so only the original term is used to match the synonym rule.

@jimczi jimczi closed this as completed Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants