Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate elasticsearch native script examples to the main repo #19334

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Jul 8, 2016

I have been maintaining a separate project that demonstrates how to build different native scripts for elasticsearch-native-script-example. I think it might make sense to add this project to the main repo similarly to jvm-example.

Closes #14662

imotov and others added 30 commits February 12, 2013 15:54
Implemented Function Score Query model
Switeched to 0.90.7
- use ElasticSearchIntegrationTest
- use ElasticsearchAssertions
- index random
Check isEmpty() before casting to Longs or Strings. If a segment does
not conatain any document with the looked up field, the lookup will
return an instance of Empty instead of Longs/Strings that contains no
value. The search will then fail with a ClassCastException.
The doc rec-0 got the same score as rec5-9 so rec0 cannot
be assumed to be at osition 5 when retrieving result.
Adding 1 to the "number" field fixes that.
@imotov imotov added review :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v5.0.0-alpha5 labels Jul 8, 2016
try {
float score = 0;
// first, get the ShardTerms object for the field.
IndexField indexField = this.indexLookup().get(field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be promoting using this feature of scripting. If someone wants this level of control over scores, they should implement their own Similarity; that is its purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brwe what do you think? Does it still make sense to have it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. The example was never meant to "promote" scripts over custom similarities. IndexLookup makes it easy to test different methods before writing an actual plugin and this example shows how. However, we can add a "don't try this in production" warning.

Copy link
Member

@rjernst rjernst Jul 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should even have IndexLookup. It is no more work to write and use a custom similarity than to write and use a native script, and the point of similarity is for customizing exactly this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Not sure about the custom similarity part. For you maybe :) The whole purpose of IndexLookup was to have an easy to use api for access to term stats. I found it very convenient.
Whether or not we should have IndexLookup is a different discussion because it can be used in other contexts too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rjernst we should not promote anything slow. the right way to do this is a similarity. that is the extension point to do expert stuff like this. I think in hindsight adding this index lookup was a mistake and we should rethink it if possible. maybe painless offers something that is bridging between the functionality and a custom similarity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue for better visibility of this discussion: #19359

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19359 will take more time to discuss (so far no reaction at all) and it makes no sense to stall this pr on it. We can remove the examples for now and if we find we keep IndexLookup anyway add them again.

@imotov
Copy link
Contributor Author

imotov commented Jul 18, 2016

@rjernst I removed all examples of using indexLookup() from this PR. Could you take another look?

@imotov
Copy link
Contributor Author

imotov commented Aug 10, 2016

@rjernst any chance you can review it?


h3. Is Prime Native Script

p. One of the example scripts in this project is the "is_prime" script that can be used to check if a field contains a possible prime number. The script accepts two parameters @field@ and @certainty@. The @field@ parameter contains the name of the field that needs to be checked and the @certainty@ parameter specifies a measure of the uncertainty that the caller is willing to tolerate. The script returns @true@ if the field contains a probable prime number and @false@ otherwise. The probability that the number for which the script returned @true@ is prime exceeds (1 - 0.5^certainty). The script can be used in "Script Filter":http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html as well as a "Script Field":http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html. The implementation of the "is_prime" native script and it's factory can be found in the "IsPrimeSearchScript":https://github.com/elastic/elasticsearch/blob/master/plugins/native-script-example/src/main/java/org/elasticsearch/examples/nativescript/script/IsPrimeSearchScript.java class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have documentation about the examples on the script classes themselves?

@rjernst
Copy link
Member

rjernst commented Aug 11, 2016

@imotov I left some comments. A general concern that I have is we should not really be showing these simple examples, because then users will find the examples and copy/paste them, when in fact these should be done in normal scripts (with painless or expressions). The only case I see for native scripts is when someone needs to call some custom code, but that should be extremely rare and advanced (which would be fine to have examples for, but then show an example of calling custom code, not trivial examples doing things that can be done in painless). Everything else should be done through painless examples. Another concern I have is someone thinking that copying these native scripts will be faster because they are "native" (a general problem with native scripts altogether). I think there is a misbelief that native scripts are always faster. In fact, the current scripting api in how they access fields can cause them to be slower than eg expressions.

My comments are only a general caution for us to think about and discuss; there is nothing generally wrong with the gradle side or anything like that, it is only apprehension about them in general, now that I have gone through all of the examples (I stopped in my first review once I saw the index stats examples).

@imotov
Copy link
Contributor Author

imotov commented Aug 11, 2016

These examples are simple because their purpose was to show how to interact with elasticsearch and not how to call non-trivial custom code. I think they are useful because they are simple. If we make them complex and heavy by pulling some large 3rd party dependencies it will confuse users who will follow these examples. I think pulling this code out, compiling it, installing a plugin, rebuilding the plugin every time elasticsearch version changes are good enough deterrents that would convince any reasonable user to use painless instead of maintaining native plugins unless they really have to.

However, I understand your concern about sending a wrong message by placing these example into the main repo and I am fine with continuing to maintain these examples as a separate project in my spare time. It would be great if we could decide if we want this in or not rather sooner than later, though.

@rjernst
Copy link
Member

rjernst commented Aug 11, 2016

I don't think we need to pull in a separate library to show an example, but there could be some dummy function being called which is documented as "do something special that cannot be done in normal scripts", and also comments in the readme/docs that recommend first trying to build scripts with painless/expressions, and only resorting to native scripts if their is some complex code which must be called that cannot be done inside painless. I had thought BigInteger was available inside painless, but it looks like not, so perhaps that is a sufficient example for now. But the scripted metric agg?

At minimum, if we are going to merge this (I'm on the fence personally, I would like to hear opinions from others), we should have the readme/docs warn that these should only be used in rare circumstances.

@jdconrad
Copy link
Contributor

jdconrad commented Aug 11, 2016

I understand the desire to make simple examples for native scripts to just show the structure; however, I agree with @rjernst here that this may give the wrong impression.

@rmuir and I went through a bunch of scripting samples from actual users pulled from tickets, and I would wager 75% or more of them were almost direct copies from the examples. I really feel like these scripts should show off something that can't be done via some other scripting language, Expressions, Painless, Groovy, Javascript, Python, etc, otherwise a user may be lead to believe he/she needs to do this for something that could be easily taken care of with another language. This is an advanced feature and advanced examples should be okay here.

Our documentation doesn't lend itself to easily figure out that another language could be used should a user Google to native scripts documentation directly. This should likely also be fixed, but it also does emphasize the existing problem that a user will copy an example for a native script when it's not necessary.

@rjernst
Copy link
Member

rjernst commented Aug 11, 2016

@imotov I wonder if we should consider removing native scripts altogether? But we could still have custom java scripts, it would just be examples of how to build a script engine. Eg, making NativeScriptEngineService the example we have? It's not really any different amount of setup code, it would just be different, and it could be very clear in the docs, as well as just the naming (building a script engine, vs having a "native script" which sounds so much more lightweight) that this is a heavyweight solution for advanced use cases. Just a thought...

@imotov
Copy link
Contributor Author

imotov commented Aug 11, 2016

First of all, I agree with a need of a large disclaimer, saying that in most cases, users should use painless.

@rmuir and I went through a bunch of scripting samples from actual users pulled from tickets, and I would wager 75% or more of them were almost direct copies from the examples.

@jdconrad Are these native scripts or just script samples? In either case, do you think there is a bit of selection bias with this sampling? It seems to me that users are more likely to ask questions about scripts when they just start working with them and they are more likely to start with provided examples. Moreover, they might be reluctant of posting large internal scripts on public forums.

@imotov I wonder if we should consider removing native scripts altogether?

All script that were added to the project were added in response to needs of concrete users who needed to have a script that plays a particular role or uses a certain technique. If it was just about writing a native script this plugin would have contained only one script. We can remove native scripts and replace them with a single NativeScriptEngineService (native scrip are now really toothless in 5.0 anyway) but it wouldn't change the need for different scripts used in different contexts. So, I am not sure how it would help.

@jdconrad
Copy link
Contributor

jdconrad commented Aug 11, 2016

@imotov I completely agree that there is a likely selection bias here, but I would speculate these are the users that are the most vulnerable to falling into the trap of following a native script example without understanding there may be a better option available to them.

As a side note, out of a personal curiosity, are there any examples of real users native scripts that you have available? I'd just like to see what people are using them for in case there are features that we could possibly add to Painless.

@imotov
Copy link
Contributor Author

imotov commented Aug 11, 2016

I would speculate these are the users that are the most vulnerable to falling into the trap of following a native script example without understanding there may be a better option available to them.

I disagree with this statement because I think we made it hard enough for users to fall into this trap, but I cannot offer anything better than my contra-speculation :)

@rjernst
Copy link
Member

rjernst commented Aug 11, 2016

The current examples I see here are:

  1. isPrime
  2. adding popularity to scoring
  3. adding randomness to scoring
  4. using scripted_metric

Of these, the 2 and 3 shouldn't be done in native scripts: even if someone asked for how to do it, they should be pointed to existing docs we have on using an expression script for adding popularity, and a function score to add randomness. For scripted_metric, I think anyone that can understand how scripted metrics work (with multiple scripts) would be trivially able to use it from any scripting language (whether that be with native scripts or not, it is just how the script is referenced in the scripted metric request). But the example here is trival and I would not want someone to find this example and be doing scripted metrics with native scripts.

We can remove native scripts and replace them with a single NativeScriptEngineService (native scrip are now really toothless in 5.0 anyway) but it wouldn't change the need for different scripts used in different contexts. So, I am not sure how it would help.

I think it would help in the naming at least, and understanding that these are heavyweight things (an engine) instead of something lightweight (a script).

@imotov
Copy link
Contributor Author

imotov commented Aug 11, 2016

Yep, you are right. It's kind of pointless with all the interesting stuff that used index lookup being removed. Thanks for the review!

@imotov imotov closed this Aug 11, 2016
@imotov imotov deleted the issue-14662-migrate-elasticsearch-native-script-example-to-main-repo branch May 1, 2020 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants