Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stemmer Token Filter error when multiple names are given #34170

Closed
TomonoriSoejima opened this issue Oct 1, 2018 · 1 comment
Closed

Stemmer Token Filter error when multiple names are given #34170

TomonoriSoejima opened this issue Oct 1, 2018 · 1 comment
Assignees
Labels
>bug :Search/Analysis How text is split into tokens

Comments

@TomonoriSoejima
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version):
6.4.0
Plugins installed: []

JVM version (java -version):
1.8.x
OS version (uname -a if on a Unix-like system):

Description of the problem including expected versus actual behavior:

Steps to reproduce:

If you run this request to creaste a mapping, no errors occurs until you push a document to it.
I think we should fail at the time of mapping creation.

DELETE test2
PUT /test2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "my_stemmer"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": [
            "english",
            "light_english",
            "minimal_english",
            "possessive_english",
            "porter2",
            "lovins"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "country": {
          "type": "text",
          "index": "true"
        },
        "hotel_name": {
          "type": "text",
          "analyzer": "my_analyzer"
        },
        "null_test": {
          "type": "text",
          "analyzer": "standard"
        },
        "message": {
          "type": "text"
        },
        "postDate": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        },
        "price": {
          "type": "long"
        },
        "user": {
          "type": "text"
        }
      }
    }
  }
}



 

PUT test2/test/4
{
  "hotel_name": "fdfa"
}
Response

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Invalid stemmer class specified: [english, light_english, minimal_english, possessive_english, porter2, lovins]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Invalid stemmer class specified: [english, light_english, minimal_english, possessive_english, porter2, lovins]",
    "caused_by": {
      "type": "class_not_found_exception",
      "reason": "org/tartarus/snowball/ext/[english, light_english, minimal_english, possessive_english, porter2, lovins]Stemmer"
    }
  },
  "status": 400
}
@tlrx tlrx added >bug :Search/Analysis How text is split into tokens labels Oct 1, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@cbuescher cbuescher self-assigned this Oct 18, 2018
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 18, 2018
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.

Closes elastic#34170
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 18, 2018
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.

Closes elastic#34170
cbuescher pushed a commit that referenced this issue Oct 19, 2018
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.

Closes #34170
cbuescher pushed a commit that referenced this issue Oct 19, 2018
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.

Closes #34170
kcm pushed a commit that referenced this issue Oct 30, 2018
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.

Closes #34170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Analysis How text is split into tokens
Projects
None yet
Development

No branches or pull requests

4 participants