-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow painless to implement more interfaces #22983
Conversation
Generalizes two previously hard coded things in painless into generic concepts: 1. The "main method" is no longer hardcoded to: ``` public abstract Object execute(Map<String, Object> params, Scorer scorer, LeafDocLookup doc, Object value); ``` Instead Painless's compiler takes a functional interface and implements that. It looks like: ``` @FunctionalInterface public interface NoArgs { Object test(); } NoArgs script = scriptEngine.compile(NoArgs.class, null, "the_script_here", emptyMap()); ``` `PainlessScriptEngine` now compiles all scripts to the new `GenericElasticsearchScript` interface by default for compatibility with the rest of Elasticsearch until it is able to use this new ability. 2. `_score` and `ctx` are no longer hardcoded to be extracted from `#score` and `params` respectively. Instead the compiler has the concept of a `DerivedArgument` which is an argument to the script that is derived from the others, immutable, and only declared and set if used. Using it looks like: ``` ManyArgs script = scriptEngine.compile(ManyArgs.class, null, "a2", emptyMap(), new DerivedArgument(Definition.INT_TYPE, "a2", (writer, locals) -> { // final int a2 = 2 * a; Variable a = locals.getVariable(null, "a"); Variable a2 = locals.getVariable(null, "a2"); writer.push(2); writer.visitVarInsn(Opcodes.ILOAD, a.getSlot()); writer.math(GeneratorAdapter.MUL, Definition.INT_TYPE.type); writer.visitVarInsn(Opcodes.ISTORE, a2.getSlot()); })); assertEquals(2, script.test(1, 0, 0, 0)); ``` These are obviously fairly advanced to create but they allow us to remove hard coding elasticsearch specifics into Painless's interals. And they give us a clear path away from `GenericElasticsearchScript` when Elasticsearch is ready to be done with that.
I believe this should help with the "script context" concept for Elasticsearch. It is, at least, how I was envisioning it. Ulterior motive: one day I'd like to be able to use Painless as an embedded oriented language outside of Elasticsearch. One day. |
* The whitelisted type of the parameter. The default if unspecified is to look for an exact match for the type from the whitelist. If | ||
* there is no exact match then the compilation will fail. | ||
*/ | ||
String type() default "$infer$"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn here. Supporting $infer$
can be a little weird but it saves quite a bit of thought when you have an interface with a whitelisted type - especially for something like int
.
@Documented | ||
@Retention(RetentionPolicy.RUNTIME) | ||
@Target(ElementType.PARAMETER) | ||
public @interface Arg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make this an annotation so it was right next to that parameter on the method in the functional interface. I feel like it makes the interfaces much more readable. I chose a short name because you type it a lot when you have many arguments. I can certainly be convinced to go with a more obvious name like PainlessArg
or something.
@@ -43,7 +44,7 @@ | |||
/** | |||
* The maximum number of characters allowed in the script source. | |||
*/ | |||
static int MAXIMUM_SOURCE_LENGTH = 16384; | |||
static final int MAXIMUM_SOURCE_LENGTH = 16384; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just an oversight so I fixed it while I was here.
|
||
try { | ||
Class<? extends Executable> clazz = loader.define(CLASS_NAME, root.getBytes()); | ||
java.lang.reflect.Constructor<? extends Executable> constructor = | ||
clazz.getConstructor(String.class, String.class, BitSet.class); | ||
|
||
return constructor.newInstance(name, source, root.getStatements()); | ||
return iface.cast(constructor.newInstance(name, source, root.getStatements())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is super convenient to just return the thing of the provided type.
|
||
DerivedArgument[] DERIVED_ARGUMENTS = new DerivedArgument[] { | ||
new DerivedArgument(Definition.DOUBLE_TYPE, "_score", (writer, locals) -> { | ||
// If _score is used in the script then run this before any user code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just lifted from SSource
. The only change is that the string are no longer constants in Locals because this isn't really part of Painless's core anymore.
return compile(GenericElasticsearchScript.class, scriptName, scriptSource, params, GenericElasticsearchScript.DERIVED_ARGUMENTS); | ||
} | ||
|
||
<T> T compile(Class<T> iface, String scriptName, final String scriptSource, final Map<String, String> params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package private because I want to test with this. At some point I think we should yank some of this code into another spot that isn't Elasticsearch specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that can totally wait.
return new Walker(sourceName, sourceText, settings, debugStream).source; | ||
public static SSource buildPainlessTree(MainMethod mainMethod, String sourceName, String sourceText, CompilerSettings settings, | ||
Printer debugStream) { | ||
return new Walker(mainMethod, sourceName, sourceText, settings, debugStream).source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The walker needs to know stuff about the main method so it can figure out the variables in scope.
return Locals.MAIN_KEYWORDS.contains(name); | ||
} | ||
|
||
public boolean usesScore() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now part of usedDerivedArguments
.
|
||
void setMaxLoopCounter(int max); | ||
int getMaxLoopCounter(); | ||
} | ||
|
||
public static final class MainMethodReserved implements Reserved { | ||
private boolean score = false; | ||
private boolean ctx = false; | ||
private final Map<String, DerivedArgument> usedDerivedArguments = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we might change the order of the derived argument now - it used to be they'd be in the declared order. Now they are in the order of first used (as seen by walking the AST).
if (reserved.usesScore()) { | ||
// if the _score value is used, we do this once: | ||
// final double _score = scorer.score(); | ||
Variable scorer = mainMethod.getVariable(null, Locals.SCORER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to GenericElasticsearchScript
.
Hmmm.... I'm not really sure how either the annotation or |
Instead of implementing fancy derived arguments which require some asm knowledge, the script now has `getUsedVariables` which you can use to skip expensive computations required to build arguments to the painless script.
…ipts Also, it doesn't have the execute method on it any more....
* Removes the PainlessScript interface and use an abstract base class instead. * Remove the NeedsScript marker interface. Now that we have `getUsedVariables()` we don't need it.
@jdconrad - I had another go at this this morning. I removed the need for |
1 similar comment
@jdconrad - I had another go at this this morning. I removed the need for |
Notes from talking to @jdconrad and @rjernst:
|
Now we use the uses$argName methods to get that information.
That means we no longer have to make compilation metadata available outside of Painless at all.
This is actually failure complicated to get right and I'd kind of like to delay that until the next PR.... |
No need to leak it.
@rjernst and @jdconrad, I think this is ready for another look. I did almost all the things we talked about. I skipped whitelisting I also moved the exception handling from |
I'll try to take a look today. |
Thanks! I realized while I was out today that this still isn't fixed for |
Updated the PR description to be accurate for what the PR now implements. I'm keeping the (inaccurate) name for now though. |
Why does this work, actually?
@nik9000 Just want you to know that I'm not ignoring this review. I just want some more time to think about it again because it is pretty fundamental, and I don't want to deal with any backcompat issues down the road here. |
++ Fine by me. I found something else to have a look at in the mean time. I think it is safe. Whether or not it is what we want to do is another matter. I'm certainly willing to rework as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a few comments.
/** | ||
* Marker interface that a generated {@link Executable} uses the {@code _score} value | ||
* Generic script interface that all scripts that elasticsearch uses implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment be modified until this class is moved out to scripting in ES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is accurate until we get contexts. Maybe it should read "Generic script interface that Painless implements for all Elasticsearch scripts." or something like that. One day we'll be able to remove this entirely. I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems good just for now at least.
/** | ||
* Metadata about the script. | ||
*/ | ||
public static class ScriptMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this like how it was before without this extra internal class. It's not actually used in any new places outside of Painless from what I can tell (maybe I missed something), so these things are literally just part of the PainlessScript still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it back. When I had this implementing arbitrarily named methods it made sense to try and minimize the number of methods on the superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -324,6 +315,38 @@ void write(MethodWriter writer, Globals globals) { | |||
writer.visitInsn(Opcodes.ACONST_NULL); | |||
writer.returnValue(); | |||
} | |||
|
|||
writer.mark(endTry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason for making this part of the ASM because it makes it much more difficult to read about the types of exceptions we catch, especially if anyone not familiar with this project has to do some weekend/on-call debugging? As far as I can tell this was simply cut from the previous location it was at. However, if the reason for this is because we want all implementations to use this, I don't think this is the correct approach. I think we want to this to remain on a case by case basis especially if this projects ends up outside of ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did move it out for the reason you mentioned. The error messages become very difficult to make sense of without the exception interpretation. I agree that it is unfortunate that this references a class in Elasticsearch core, but I think that is a thing we can fix later.
The other thing I like about this is that it makes the internals required to generate the right exception not publicly facing.
I'm happy to back it out propose it separately. Without it the exceptions that don't go through ScriptImpl are wonky though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely conflicted here; you make good arguments. I would just leave this as you have it for now then and it's something we will have to address when we split off down the road. Maybe users pick the exceptions that should be caught as part of a script, but that's a discussion for a different PR for sure.
Generalizes three previously hard coded things in painless into generic concepts: 1. The "main method" is no longer hardcoded to: ``` public abstract Object execute(Map<String, Object> params, Scorer scorer, LeafDocLookup doc, Object value); ``` Instead Painless's compiler takes an interface and implements it. It looks like: ``` public interface SomeScript { // Argument names we expose to Painless scripts String[] ARGUMENTS = new String[] {"a", "b"}; // Method implemented by Painless script. Must be named execute but can have any parameters or return any value. Object execute(String a, int b); // Is the "a" argument used by the script? boolean uses$a(); } SomeScript script = scriptEngine.compile(SomeScript.class, null, "the_script_here", emptyMap()); Object result = script.execute("a", 1); ``` `PainlessScriptEngine` now compiles all scripts to the new `GenericElasticsearchScript` interface by default for compatibility with the rest of Elasticsearch until it is able to use this new ability. 2. `_score` and `ctx` are no longer hardcoded to be extracted from `#score` and `params` respectively. Instead Painless's default implementation of Elasticsearch scripts uses the `uses$_score` and `uses$ctx` methods to determine if it is used and gives them dummy values if they are not used. 3. Throwing the `ScriptException` is now handled by the Painless script itself. That way Painless doesn't have to leak the metadata that is required to build the fancy stack trace. And all painless scripts get the fancy stack trace.
) Fixes Painless to properly implement scripts that return primitives and void. Adds some simple tests that we emit sane opcodes and some other tests that we implement primitives as expected. Mostly this is just a fix following up from #22983 but there is one thing I did really worth talking about, I think. So, before this script Painless scripts could only ever return Object and they did would always return null for paths that didn't return any values. Now that they can return primitives the question is "what should Painless return from paths that don't return any values?" And I answered that with "whatever the JLS default value is". So 0/0L/0f/0d/false.
) Fixes Painless to properly implement scripts that return primitives and void. Adds some simple tests that we emit sane opcodes and some other tests that we implement primitives as expected. Mostly this is just a fix following up from #22983 but there is one thing I did really worth talking about, I think. So, before this script Painless scripts could only ever return Object and they did would always return null for paths that didn't return any values. Now that they can return primitives the question is "what should Painless return from paths that don't return any values?" And I answered that with "whatever the JLS default value is". So 0/0L/0f/0d/false.
Generalizes two previously hard coded things in painless into
generic concepts:
Instead Painless's compiler takes an and implements that. It looks like:
PainlessScriptEngine
now compiles all scripts to the newGenericElasticsearchScript
interface by default for compatibilitywith the rest of Elasticsearch until it is able to use this new
ability.
_score
andctx
are no longer hardcoded to be extracted from#score
andparams
respectively. Instead Painless's defaultimplementation of Elasticsearch scripts uses the
uses$_score
anduses$ctx
methods to determine if it is used and gives themdummy values if they are not used.
Throwing the
ScriptException
is now handled by the Painlessscript itself. That way Painless doesn't have to leak the metadata
that is required to build the fancy stack trace. And all painless scripts
get the fancy stack trace.