New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.90.0, 0.90.2] Can't use empty replacement string in pattern_replace filter #3359

Closed
NoICE opened this Issue Jul 19, 2013 · 3 comments

Comments

Projects
None yet
2 participants
@NoICE

NoICE commented Jul 19, 2013

After upgrading from 0.20.0.rc1 to 0.90.2, this happens when we use blank replacement string in pattern_replace filter:

...
org.elasticsearch.indices.IndexCreationException: [xxx] failed to create index
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:382)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:296)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:162)
    at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:321)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: replacement is missing for [whitespace_remove] token filter of type 'pattern_replace'
    at org.elasticsearch.index.analysis.PatternReplaceTokenFilterFactory.<init>(PatternReplaceTokenFilterFactory.java:54)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.InjectorImpl$4$1.call(InjectorImpl.java:763)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:819)
    at org.elasticsearch.common.inject.InjectorImpl$4.get(InjectorImpl.java:759)
    at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:221)
    at $Proxy19.create(Unknown Source)
    at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:152)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:819)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at
...

Our elasticsearch.json configuration looks like this (note whitespace_remove at the bottom, we need that to strip any whitespace in between characters):

{
  "cluster": {
    "name": "1188_production_19_07"
  },
  "action": {
    "auto_create_index": false
  },
  "indices": {
    "memory": {
      "index_buffer_size": "1024m"
    }
  },
  "index": {
    "number_of_replicas": 2,
    "number_of_shards": 3,
    "analysis": {
      "analyzer": {
        "autocomplete_exact_index_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "standard",
            "lowercase",
            "edge_ngram"
          ]
        },
        "autocomplete_exact_search_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "standard",
            "lowercase"
          ]
        },
        "autocomplete_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "edge_ngram"
          ]
        },
        "autocomplete_search_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "whitespace_remove",
            "lowercase"
          ]
        },
        "ascii_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "czech_stem",
            "asciifolding"
          ]
        },
        "ascii_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "czech_stem",
            "asciifolding"
          ]
        },
        "untouched_ascii_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "asciifolding"
          ]
        },
        "untouched_ascii_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "asciifolding"
          ]
        },
        "czech_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "czech_stem"
          ]
        },
        "czech_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase",
            "czech_stem"
          ]
        },
        "untouched_czech_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase"
          ]
        },
        "untouched_czech_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "language": "Czech",
          "filter": [
            "standard",
            "lowercase"
          ]
        }
      },
      "filter": {
        "full_ngram": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "20"
        },
        "edge_ngram": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 20,
          "side": "front"
        },
        "whitespace_remove": {
          "type": "pattern_replace",
          "pattern": " ",
          "replacement": ""
        }
      }
    }
  }
}

I tried to search when this change occurred, so I tried it on 0.90.0, 0.90.1 and 0.90.2. They all produce the same result.

I've tried to look around and thought the JSON parser was the culprit, so I went digging and discovered, that JSON Loader returns null value for empty strings in JSONs, but YML loader does return empty strings for YMLs (which is by itself strange and should not happen imo :)) ) by adding relevant values to test-settings.json and test-settings.yml + their tests:

https://gist.github.com/NoICE/6039088
(note these added lines: https://gist.github.com/NoICE/6039088#file-jsonsettingsloadertests-java-L51-L53
https://gist.github.com/NoICE/6039088#file-yamlsettingsloadertests-java-L51-L53)

So I rewrote our elasticsearch.json to elasticsearch.yml, but the error still remains.

So:

  • JSON parser returns null for "" values
  • YML parser does return "" for "" values
  • pattern replace does not allow empty string in either JSON or YML format (so we can rule out JSON parser is the culprit, maybe...)

Let me know if I can provide some more tests or something...

@ghost ghost assigned kimchy Jul 19, 2013

@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Jul 19, 2013

Member

It happens because we change a bit the logic in our settings behavior to have empty strings represent no value set (for other reasons). We can fix it in the pattern replace one easily...

Member

kimchy commented Jul 19, 2013

It happens because we change a bit the logic in our settings behavior to have empty strings represent no value set (for other reasons). We can fix it in the pattern replace one easily...

@NoICE

This comment has been minimized.

Show comment
Hide comment
@NoICE

NoICE Jul 19, 2013

Cool :) So if everything goes well, the fix will be available in 0.90.3? Can I help with something?

NoICE commented Jul 19, 2013

Cool :) So if everything goes well, the fix will be available in 0.90.3? Can I help with something?

@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Jul 19, 2013

Member

Simple fix, pushed it: eb75a81.

Member

kimchy commented Jul 19, 2013

Simple fix, pushed it: eb75a81.

@kimchy kimchy closed this Jul 19, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment