Stemmer Token Filter error when multiple names are given #34170

TomonoriSoejima · 2018-10-01T01:57:54Z

Elasticsearch version (bin/elasticsearch --version):
6.4.0
Plugins installed: []

JVM version (java -version):
1.8.x
OS version (uname -a if on a Unix-like system):

Description of the problem including expected versus actual behavior:

Steps to reproduce:

If you run this request to creaste a mapping, no errors occurs until you push a document to it.
I think we should fail at the time of mapping creation.

DELETE test2
PUT /test2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "my_stemmer"
          ]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": [
            "english",
            "light_english",
            "minimal_english",
            "possessive_english",
            "porter2",
            "lovins"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "country": {
          "type": "text",
          "index": "true"
        },
        "hotel_name": {
          "type": "text",
          "analyzer": "my_analyzer"
        },
        "null_test": {
          "type": "text",
          "analyzer": "standard"
        },
        "message": {
          "type": "text"
        },
        "postDate": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        },
        "price": {
          "type": "long"
        },
        "user": {
          "type": "text"
        }
      }
    }
  }
}



 

PUT test2/test/4
{
  "hotel_name": "fdfa"
}

Response

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Invalid stemmer class specified: [english, light_english, minimal_english, possessive_english, porter2, lovins]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Invalid stemmer class specified: [english, light_english, minimal_english, possessive_english, porter2, lovins]",
    "caused_by": {
      "type": "class_not_found_exception",
      "reason": "org/tartarus/snowball/ext/[english, light_english, minimal_english, possessive_english, porter2, lovins]Stemmer"
    }
  },
  "status": 400
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-10-01T08:15:07Z

Pinging @elastic/es-search-aggs

Currently the StemmerTokenFilterFactory checks the validity of the language setting only when the first TokenStream is processed. Instead we should throw an error earlier at mapping creation time. This change adds a check to the StemmerTokenFilterFactory constructor that checks for a valid `language` setting by trying to create a new TokenStream from an empty input stream. This will throw errors about wrong language settings early on. Closes elastic#34170

Currently the StemmerTokenFilterFactory checks the validity of the language setting only when the first TokenStream is processed. Instead we should throw an error earlier at mapping creation time. This change adds a check to the StemmerTokenFilterFactory constructor that checks for a valid `language` setting by trying to create a new TokenStream from an empty input stream. This will throw errors about wrong language settings early on. Closes #34170

tlrx added >bug :Search/Analysis How text is split into tokens labels Oct 1, 2018

cbuescher self-assigned this Oct 18, 2018

cbuescher mentioned this issue Oct 18, 2018

Check stemmer language setting early #34601

Merged

cbuescher closed this as completed in #34601 Oct 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stemmer Token Filter error when multiple names are given #34170

Stemmer Token Filter error when multiple names are given #34170

TomonoriSoejima commented Oct 1, 2018

elasticmachine commented Oct 1, 2018

Stemmer Token Filter error when multiple names are given #34170

Stemmer Token Filter error when multiple names are given #34170

Comments

TomonoriSoejima commented Oct 1, 2018

elasticmachine commented Oct 1, 2018