Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Painless (update context) no longer allows returning boolean in 6.5 #35888

Closed
askkemp opened this issue Nov 26, 2018 · 9 comments
Closed

Painless (update context) no longer allows returning boolean in 6.5 #35888

askkemp opened this issue Nov 26, 2018 · 9 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache

Comments

@askkemp
Copy link

askkemp commented Nov 26, 2018

Elasticsearch version (bin/elasticsearch --version):
Version: 6.5.1, Build: default/tar/8c58350/2018-11-16T02:22:42.182257Z, JVM: 1.8.0_181
Version: 6.4.2, Build: default/tar/04711c2/2018-09-26T13:34:09.098244Z, JVM: 1.8.0_181

JVM version (java -version):
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux localhost.localdomain 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I receive "Too many dynamic script compilations" circuit breaker errors in ES 6.5.1 but not in ES 6.4.2 when running the below example. Both of these tests are "out of the box" extractions of the tar.gz excluding the change of the network host IP address and data path location. I believe that since params.event.get is being used in the painless script that it should not be compiling for each input log line. In ES 6.4.2, there are no errors and the index is as expected.

Steps to reproduce:
Create a file 'timestamps.test.json' with these three lines in it repeating until you have 420 lines total

{"timestamp":"1534023333", "hash":"1"}
{"timestamp":"1534022222", "hash":"1"}
{"timestamp":"1534011111", "hash":"1"}

Create this Logstash configuration

input {
  file {
    path => "timestamps.test.json"
    start_position => "beginning"
    codec => "json"
  }
}

filter {
    mutate {
        split => { "timestamp" => "," }
    }
}

output {
  elasticsearch {
    hosts => ["http://127.0.0.1:9200"]
    index => "test1"
    document_id => "%{[hash]}"
    doc_as_upsert => true
    script =>     'if(ctx._source.timestamp.contains(params.event.get("timestamp"))) return true; else (ctx._source.timestamp.add(params.event.get("timestamp")))'
    action => "update"
    retry_on_conflict=>500

  }
  #stdout { codec => rubydebug }
}

Provide logs (if relevant):
Run the above Logstash configuration against Elasticsearch 6.5.1 and the following error will be seen:

[2018-11-25T20:04:11,948][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["update", {:_id=>"1", :_index=>"test1", :_type=>"doc", :_routing=>nil, :_retry_on_conflict=>500}, #<LogStash::Event:0x6fbb1973>], :response=>{"update"=>{"_index"=>"test1", "_type"=>"doc", "_id"=>"1", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to execute script", "caused_by"=>{"type"=>"general_script_exception", "reason"=>"Failed to compile inline script [if(ctx._source.timestamp.contains(params.event.get(\"timestamp\"))) return true; else (ctx._source.timestamp.add(params.event.get(\"timestamp\")))] using lang [painless]", "caused_by"=>{"type"=>"circuit_breaking_exception", "reason"=>"[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_rate] setting", "bytes_wanted"=>0, "bytes_limit"=>0}}}}}}

The GET /_nodes/stats/script for 6.5.1 make no sense when the above error is received. It seems it actually did compile only four times, (not sure why 4 and not just 1) and not 75+ time which would cause the circuit breaker to trip:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "-XWuINKDSB2PgDdo6FX9DQ" : {
      "timestamp" : 1543232326968,
      "name" : "-XWuINK",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "16241254400",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "script" : {
        "compilations" : 4,
        "cache_evictions" : 0
      }
    }
  }
}

GET _cluster/settings/

{
  "persistent" : {
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : { }
}

Run the above Logstash configuration against Elasticsearch 6.4.2 and the log indexing will process correctly.

@askkemp askkemp changed the title Circuit breaker errors in ES 6.5.1 but not in ES 6.4.2 using Painless and params.event.get() Circuit breaker in ES 6.5.1 but not in ES 6.4.2 using Painless and params.event.get() Nov 26, 2018
@colings86 colings86 added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache :Data Management/Indices APIs APIs to create and manage indices and templates labels Nov 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jakelandis
Copy link
Contributor

I was able to replicate this difference in behavior between 6.4.2 and 6.5.1. However the root cause of the issue isn't due to the circuit breaker (as it appears), but rather it is change to the validity of a boolean for a return value.

A simpler example:

POST test/_doc/1/_update
{
    "script" : {
        "source": "if('a' == 'a'){ return true}",
        "lang": "painless"
    },
    "upsert" : {
        "counter" : 1
    }
}

On the second POST (when the script executes) results in:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[QPilpgj][127.0.0.1:9300][indices:data/write/update[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to execute script",
    "caused_by": {
      "type": "script_exception",
      "reason": "compile error",
      "script_stack": [
        "if('a' == 'a'){ return true}",
        "                       ^---- HERE"
      ],
      "script": "if('a' == 'a'){ return true}",
      "lang": "painless",
      "caused_by": {
        "type": "class_cast_exception",
        "reason": "Cannot cast from [boolean] to [void]."
      }
    }
  },
  "status": 400
}

The error does not occur on 6.4.2. This failure is the root cause of "Too many dynamic script compilations" ... when bulk requests are processed it attempts to compile/fails, compile/fail, repeat. There maybe a secondary issue here such that we shouldn't let bulk requests with script failures manifest like this. The real issue here is the change in no longer allowing return true.

However, returning 'true' in this context is arguably incorrect and you can work around this issue with something like this (not tested):

if(ctx._source.timestamp.contains(params.event.get("timestamp")) == false) {ctx._source.timestamp.add(params.event.get("timestamp")}

FWIW, I can also reproduce with (no errors on 6.4.2, but errors on 6.5.1:

POST test/_doc/1/_update
{
    "script" : {
        "source": "return true",
        "lang": "painless"
    },
    "upsert" : {
        "counter" : 1
    }
}

Pinging @elastic/infra for additional eyes on the change of behavior for Painless return values (in the update context)

@askkemp
Copy link
Author

askkemp commented Nov 27, 2018

Thank you for clarification.

The reason I did "return true" because it was the only way I could figure out to not compile a unique script with each request. This script will be run 1000s of times per second for days at a time. And in my testing, I believe I was experiencing a memory leak in JVM because I'd run out of memory. I ran the "return true" method against 200 million lines without issue.

Anyhow... would this be a proper work around?

script => 'if(ctx._source.timestamp.contains(params.event.get("timestamp"))) ctx.op = "noop"; else (ctx._source.timestamp.add(params.event.get("timestamp")))'

Because it seems that the below provided workaround creates a dynamic compile each time it is ran:

script => 'if(ctx._source.timestamp.contains(params.event.get("timestamp") == false) {ctx._source.timestamp.add(params.event.get("timestamp"))}'

I'm glad to ask the work around question under Discuss if it is not relevant to this issue.

@jakelandis
Copy link
Contributor

@askkemp - the earlier (not tested) example doesn't compile and results in the same compile/fail issue as described above. I went back to your original example and tested the following to work:

 script =>  "if(ctx._source.timestamp.contains(params.event.get('timestamp')[0]) == false) {ctx._source.timestamp.add(params.event.get('timestamp')[0])}"

I'm glad to ask the work around question under Discuss if it is not relevant to this issue.

Yes, please move to Discuss the above does not work around the issue for you so we can keep this issue focused on the change of behavior w/r/t Painless return values.

Also, I will update the description of the issue.

@jakelandis jakelandis changed the title Circuit breaker in ES 6.5.1 but not in ES 6.4.2 using Painless and params.event.get() Painless (update context) no longer allows returning boolean in 6.5 Dec 2, 2018
@jakelandis jakelandis added team-discuss and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Dec 2, 2018
@jakelandis jakelandis added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Dec 12, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jdconrad
Copy link
Contributor

jdconrad commented Dec 12, 2018

Update should never return a value as the value would not be used for anything. Modifying the update context to have a void return type as part of the signature is the best practice for this behavior. What needs to be looked into is why this script

script => 'if(ctx._source.timestamp.contains(params.event.get("timestamp") == false) {ctx._source.timestamp.add(params.event.get("timestamp"))}'

wouldn't be appropriately cached because this is the correct way to do this.

Another issue is also why we keep retrying to compile the script after failing to compile. If a script fails the update needs to stop and return an error immediately.

@jakelandis
Copy link
Contributor

Update should never return a value as the value would not be used for anything.

++ I think this is the crux of the issue and the reason for the discuss label.

What needs to be looked into is why this script ...

There is a typo in that referenced script, I provided a typo free version on this comment.

Another issue is also why we keep retrying to compile the script after failing to compile. If a script fails the update needs to stop and return an error immediately.

For a single update, it does indeed return immediately. For bulk updates, it processes the bulk requests and can trigger circuit breaker via one bulk request if there are enough updates with failed compilations in that bulk.

This issue can probably be closed.

@jdconrad
Copy link
Contributor

jdconrad commented Jan 9, 2019

Closing this issue as using return here does not make sense for the reasons listed in prior comments.

@villasv
Copy link

villasv commented Feb 5, 2019

I have a question regarding returning from update context. If I want to short-circuit the script with a condition... why can't I do if(condition) return;? It fails to compile.

If I try anything else, I get the expected Cannot cast ... to a primitive type [void].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache
Projects
None yet
Development

No branches or pull requests

6 participants