Create a new scripting language #13084

jdconrad · 2015-08-24T16:30:37Z

ElasticSearch needs a scripting language that can be used dynamically while remaining secure. While Lucene Expressions covers those two points it does not meet the needs of many scripts due the following behavior:

Expressions is designed to only work effectively with numerical values. Elasticsearch requires a language that can handle strings, dates, and possibly data structures such as map and list. To add these features to expressions would require a large architectural change within Lucene that doesn't really make sense for the purpose of that language.
Expressions is designed to be a single mathematical equation using only one line of code. This does not lend itself well to using things such as a loop to go through multi-valued fields.
Expressions is built to be extremely similar to Javascript. This does not lend itself well to having types other than double for translation into Java efficiently. To translate a language like Javascript into Java would require all variables to be Objects that can also track what type they are. This is extremely inefficient.

One of the main goals of this language is any somewhat experienced developer should be able to learn the entirety in about fifteen minutes. For this reason I'm going to keep the control flow simple by allowing only the equivalent of one linearly-run static Java function to be written in total for any given script. No multiple function/method scripts should be allowed, as at this point the users should be writing custom code for their application instead of leaning on scripting.

For the new language I intend to initially have the following:

Native types - boolean, byte, short, int, long, float, double, string, date, point, list, and map including the ability to cast when necessary
Arithmetic operators - multiplication *, division /, addition +, subtraction -, precedence ( )
Comparison operators - less than <, less than or equal to <=, greater than >, greater than or equal to >=, equal to ==, and not equal to !=
Boolean operators - not !, and &&, or ||
Bitwise operators - shift left <<, shift right >>, unsigned shift >>>, and &, or |, xor ^, not ~
A way to call set list of external functions to be defined at a later time (math functions, geo functions)
API for strings (possibly a limited api for of regular expressions)
API for dates
Assignment operations for native types (int x; x = 0;)
Control flow - if, else if, else and for, while, do-while using brackets { } and semicolons ; to denote the end of operations/lines
Bindings - single-valued and multi-valued field access as available variables along with a way to find out the number of values in a multi-valued field, and a way to access the existing multivaluemode api
Shortcuts for map and list access such as (double)map0.item1.0.item2.1 where map0 is the initial map, item1 is an element in the map of type list, 0 is the first element in the list of type map, item2 is an element in the map of type list, and finally 1 is the second element in the list of type double.

This list will be updated as the project moves forward. To ensure the language does not hang due to an infinite loop or extremely long operational set, the number of instructions will be counted, and an exception will be thrown if a specified limit is reached.

I intend to build the language using ANTLR and ASM as the backbone. The following steps will be required for the language to be created.

Create the ANTLR grammar.
Write the code to build the ASM function from the provided script.
Write tests.
Integrate the language into the ElasticSearch code base.
Write more tests.
Refine the feature set.
Write more tests.
Repeat 6 and 7 until completed.

clintongormley · 2015-08-24T16:52:49Z

w00t

uboness · 2015-08-24T16:54:31Z

w00t indeed

uboness · 2015-08-24T16:58:22Z

A way to call set list of external functions to be defined at a later time (math functions, geo functions)

this is super important aspect. beyond the basic native operations in the lang, the only other functions that will be available are those that we pre-register with the language (register in code that is). So this mechanism needs to be generic and not hard coded for the math/geo functions.

jdconrad · 2015-10-07T04:51:22Z

Haven't commented on here in a while, so I thought I would give a quick update. The majority of the features are implemented as a first pass in a separate project.

What needs to happen before a PR can really be made at this point --

A bit better string support. (goal: this week)
Bindings for search fields. (goal: week after next -- needs item 6)
Tests, lots and lots of tests. (goal: next week)
Generally improved stability and bug fixes as the tests reveal them. (goal: next week)
Documentation. (goal: week after next)
Integration into a plugin. (goal: week after next)
General clean up. (goal: week after next)
Loop counting to prevent runaway code. (goal: week after next)
A name for the language.

This timeline may be a bit ambitious (again), but I'll update in another couple weeks.

clintongormley · 2015-10-07T16:14:35Z

Awesome

A name for the language.

Well that's going to push the delivery date out to 2019 :)

nik9000 · 2015-10-07T16:29:52Z

I wonder if the language can continue to live outside of Elasticsearch? The reason we made it is because there isn't a good, safe, sandboxed scripting language in the JVM. So I imagine it'll be useful to other people. If it were easy to embed in other places that'd be good exposure for us and help the open source community.

jdconrad · 2015-10-07T16:36:33Z

@nik9000 That's a cool thought, and maybe something to consider down the road, but it's beyond the scope of the initial project by quite a bit. I think the biggest limiter for doing something like that is the language really is designed to only allow scripts that are the equivalent of a single static Java method running on one thread, so for it to be effective outside ES it would certainly need to have it's feature set expanded significantly.

jdconrad · 2015-10-13T18:36:10Z

I wanted to give an update about some of the features and the current state of the language:

The prior list remains except for string features; however, after speaking with @rmuir and @rjernst I would like to take a week to explore the possibility of using invoke dynamic to offer a language that doesn't require casting. Currently the language has strict typing making it very similar to Java. However, given the languages that people are used to this may be an issue, longer term. It's likely I won't solve this in a week, but wanted to explore the possibility as something to do after the initial release with the need to make sure that it's at least possible with the current design.

As it stands the following features exist:

Native types - boolean, byte, short, int, long, float, double, object, string, list, and map including the ability to cast when necessary
Arithmetic operators - multiplication *, division /, addition +, subtraction -, precedence ( )
Comparison operators - less than <, less than or equal to <=, greater than >, greater than or equal to >=, equal to ==, and not equal to !=
Boolean operators - not !, and &&, or ||
Bitwise operators - shift left <<, shift right >>, unsigned shift >>>, and &, or |, xor ^, not ~
A language definition specified in a properties file that allows the ability to add more types (Java classes), and serves as a whitelist for all the things available to create and call.
Assignment operations for native types (int x; x = 0;) including increment, decrement, +=, -=, etc.
Control flow - if, else if, else and for, while, do-while using brackets { } and semicolons ; to denote the end of operations/lines
Shortcuts for maps and lists using brace notation such as (int x = (int)map["test0"][0]["test1"];) where test0 is an index into a map, 0 is an index into a list, test1 is an index into a map, where the value is casted from object to int.
A string concatenation operator in the form of '..' instead of '+' because '+' leads to ambiguities such as "string" + 2 + 2, where in java this ends up being string22 as a string, but I believe this may be confusing
Promotion for numerics is done in the form of the java style where things are upcast as necessary (int -> double, or long -> float, etc.) or require a cast if promotion cannot be done (long --> int, etc.)
Auto-boxing -- since this is using basic types and is written using the JVM, auto boxing is necessary, and will be done when it can be automatically

A small example of the language definition:
class.object = object java.lang.Object // define java class Object as the type object
class.string = string java.lang.String // define string class String as the type string
method.object.string = object string string toString() // define java class Object toString method as string for use on the object type
...

A small example of what a script will look like:

list nums = input["inner"]["list"];
int size = nums.size();
double total = 0;
for (int count = 0; count < size; ++count) {
total += (double)nums[count];
}
return total;

where the automatically generated signature for the script is Object execute(Map<String, Object> input);

Note that this can be thought of as a single static Java method when writing the script. There is no way to script new functions/methods as if that's necessary, scripting may not be the best choice for the work that needs to be done in most cases. It may also be possible to add the ability to execute other scripts from the original script to make up for the lack of method calls, but will likely not be included in the initial release.

jdconrad · 2015-10-19T17:44:54Z

Quick update:

@rmuir has added ES plugin logic for the prototype language. This week will be about adding tests and fixing bugs as they arise. Not much else to add for now.

jdconrad · 2015-10-21T07:25:41Z

I have removed all shortcuts for now to reduce the amount of debugging necessary for a first iteration. Shortcuts add a huge amount of complication and ambiguity to the language at this point in time. A second iteration somewhere down the road will have the goal of shortcuts, plus dynamic method calls, and inferred casting. The main goal of this project at this time is a simple language that can improve security needs to be safe enough to run dynamic scripts in ES.

eskibars · 2015-10-26T17:27:40Z

I see "+=", what about ".="?

jdconrad · 2015-10-26T17:47:03Z

@eskibars To be clear is .= for string concatenation? If so, we have decided to use ..= as the (.) operator may end up overloaded with an alternative shortcut for reading through maps/lists at a later time and will be needed for that. We do not want to use += because that creates some possible ambiguities of it's own and is a math operator. (While this works in Java, some of the assumptions that need to be made may not be for the best in all situations.)

eskibars · 2015-10-26T20:50:58Z

@jdconrad yes, I was referring to string concatenation

damienalexandre · 2015-11-05T15:35:01Z

Great stuffs! One issue I have with scripting in Elasticsearch is that is really hard to test and debug a script; We need, IMO:

a way to play a script against any indexed document (maybe an API index/type/123/_script?)
a debug mode, maybe integrated in ?explain to show:
- the count of instructions (specially if you limit them)
- all the available variables
- the return value, un-edited
a way to log directly in Elasticsearch logs without playing with Java imports
better exceptions: if a script fail, often, we get a QueryExec exception but the actual scripting error is hidden in the stack.

Maybe this new (un-named?) scripting language could fix or at least improve those points :)

Cheers from Elastic{ON} Paris! 🍻

jdconrad · 2015-11-05T17:16:41Z

@damienalexandre Thanks for the feedback here. Points 1, 2, and 4 are all solid ideas, and hopefully somewhere down the road we will have time to spec and code some incarnation of them. For point 3, it's very unlikely that we will ever log anything from this language as we really don't want to write any files because it would mean that we have to open up security to allow this to happen.

clintongormley · 2016-01-28T12:59:10Z

Closed by #15136

jdconrad added >feature :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Aug 24, 2015

jdconrad self-assigned this Aug 24, 2015

clintongormley added the Meta label Aug 24, 2015

rjernst mentioned this issue Aug 24, 2015

Can't use complex params with lang=expression scripts? #13071

Closed

jpountz mentioned this issue Sep 8, 2015

Should scripts have context? #13375

Closed

clintongormley mentioned this issue Sep 19, 2015

Change default script language to expressions #10491

Closed

clintongormley mentioned this issue Nov 21, 2015

Support RFC 6902 style PATCH updates #7030

Closed

This was referenced Dec 5, 2015

Lua for scripting #9665

Closed

security manager for groovy? #9831

Closed

jdconrad mentioned this issue Dec 9, 2015

Added a new scripting language (PlanA) #15136

Closed

rashidkpc mentioned this issue Dec 10, 2015

Load script files from elasticsearch config/scripts directory elastic/kibana#3797

Closed

spalger mentioned this issue Dec 18, 2015

Added scripted_metric in Visualize elastic/kibana#5558

Closed

This was referenced Jan 18, 2016

script debugging #11648

Closed

Get function score query results as fields #13469

Closed

jccq mentioned this issue Jan 26, 2016

Time Picker Smart Textbox Enhancement elastic/kibana#6009

Closed

clintongormley closed this as completed Jan 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a new scripting language #13084

Create a new scripting language #13084

jdconrad commented Aug 24, 2015

clintongormley commented Aug 24, 2015

uboness commented Aug 24, 2015

uboness commented Aug 24, 2015

jdconrad commented Oct 7, 2015

clintongormley commented Oct 7, 2015

nik9000 commented Oct 7, 2015

jdconrad commented Oct 7, 2015

jdconrad commented Oct 13, 2015

jdconrad commented Oct 19, 2015

jdconrad commented Oct 21, 2015

eskibars commented Oct 26, 2015

jdconrad commented Oct 26, 2015

eskibars commented Oct 26, 2015

damienalexandre commented Nov 5, 2015

jdconrad commented Nov 5, 2015

clintongormley commented Jan 28, 2016

Create a new scripting language #13084

Create a new scripting language #13084

Comments

jdconrad commented Aug 24, 2015

clintongormley commented Aug 24, 2015

uboness commented Aug 24, 2015

uboness commented Aug 24, 2015

jdconrad commented Oct 7, 2015

clintongormley commented Oct 7, 2015

nik9000 commented Oct 7, 2015

jdconrad commented Oct 7, 2015

jdconrad commented Oct 13, 2015

jdconrad commented Oct 19, 2015

jdconrad commented Oct 21, 2015

eskibars commented Oct 26, 2015

jdconrad commented Oct 26, 2015

eskibars commented Oct 26, 2015

damienalexandre commented Nov 5, 2015

jdconrad commented Nov 5, 2015

clintongormley commented Jan 28, 2016