Migrate elasticsearch native script examples to the main repo #19334

imotov · 2016-07-08T13:48:07Z

I have been maintaining a separate project that demonstrates how to build different native scripts for elasticsearch-native-script-example. I think it might make sense to add this project to the main repo similarly to jvm-example.

Closes #14662

…if it's allocated on the local node

Implemented Function Score Query model Switeched to 0.90.7

- use ElasticSearchIntegrationTest - use ElasticsearchAssertions - index random

Check isEmpty() before casting to Longs or Strings. If a segment does not conatain any document with the looked up field, the lookup will return an instance of Empty instead of Longs/Strings that contains no value. The search will then fail with a ClassCastException.

The doc rec-0 got the same score as rec5-9 so rec0 cannot be assumed to be at osition 5 when retrieving result. Adding 1 to the "number" field fixes that.

rjernst · 2016-07-08T16:35:58Z

...rc/main/java/org/elasticsearch/examples/nativescript/script/CosineSimilarityScoreScript.java

+        try {
+            float score = 0;
+            // first, get the ShardTerms object for the field.
+            IndexField indexField = this.indexLookup().get(field);


I don't think we should be promoting using this feature of scripting. If someone wants this level of control over scores, they should implement their own Similarity; that is its purpose.

@brwe what do you think? Does it still make sense to have it here?

It does. The example was never meant to "promote" scripts over custom similarities. IndexLookup makes it easy to test different methods before writing an actual plugin and this example shows how. However, we can add a "don't try this in production" warning.

I don't think we should even have IndexLookup. It is no more work to write and use a custom similarity than to write and use a native script, and the point of similarity is for customizing exactly this.

Hm. Not sure about the custom similarity part. For you maybe :) The whole purpose of IndexLookup was to have an easy to use api for access to term stats. I found it very convenient.
Whether or not we should have IndexLookup is a different discussion because it can be used in other contexts too.

I agree with @rjernst we should not promote anything slow. the right way to do this is a similarity. that is the extension point to do expert stuff like this. I think in hindsight adding this index lookup was a mistake and we should rethink it if possible. maybe painless offers something that is bridging between the functionality and a custom similarity?

I opened an issue for better visibility of this discussion: #19359

#19359 will take more time to discuss (so far no reaction at all) and it makes no sense to stall this pr on it. We can remove the examples for now and if we find we keep IndexLookup anyway add them again.

imotov · 2016-07-18T15:14:49Z

@rjernst I removed all examples of using indexLookup() from this PR. Could you take another look?

imotov · 2016-08-10T16:10:11Z

@rjernst any chance you can review it?

rjernst · 2016-08-11T20:00:20Z

plugins/native-script-example/README.textile

+
+h3. Is Prime Native Script
+
+p. One of the example scripts in this project is the "is_prime" script that can be used to check if a field contains a possible prime number. The script accepts two parameters @field@ and @certainty@. The @field@ parameter contains the name of the field that needs to be checked and the @certainty@ parameter specifies a measure of the uncertainty that the caller is willing to tolerate. The script returns @true@ if the field contains a probable prime number and @false@ otherwise. The probability that the number for which the script returned @true@ is prime exceeds (1 - 0.5^certainty). The script can be used in "Script Filter":http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html as well as a "Script Field":http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html. The implementation of the "is_prime" native script and it's factory can be found in the "IsPrimeSearchScript":https://github.com/elastic/elasticsearch/blob/master/plugins/native-script-example/src/main/java/org/elasticsearch/examples/nativescript/script/IsPrimeSearchScript.java class.


I think it would be better to have documentation about the examples on the script classes themselves?

rjernst · 2016-08-11T20:27:26Z

@imotov I left some comments. A general concern that I have is we should not really be showing these simple examples, because then users will find the examples and copy/paste them, when in fact these should be done in normal scripts (with painless or expressions). The only case I see for native scripts is when someone needs to call some custom code, but that should be extremely rare and advanced (which would be fine to have examples for, but then show an example of calling custom code, not trivial examples doing things that can be done in painless). Everything else should be done through painless examples. Another concern I have is someone thinking that copying these native scripts will be faster because they are "native" (a general problem with native scripts altogether). I think there is a misbelief that native scripts are always faster. In fact, the current scripting api in how they access fields can cause them to be slower than eg expressions.

My comments are only a general caution for us to think about and discuss; there is nothing generally wrong with the gradle side or anything like that, it is only apprehension about them in general, now that I have gone through all of the examples (I stopped in my first review once I saw the index stats examples).

imotov · 2016-08-11T21:02:25Z

These examples are simple because their purpose was to show how to interact with elasticsearch and not how to call non-trivial custom code. I think they are useful because they are simple. If we make them complex and heavy by pulling some large 3rd party dependencies it will confuse users who will follow these examples. I think pulling this code out, compiling it, installing a plugin, rebuilding the plugin every time elasticsearch version changes are good enough deterrents that would convince any reasonable user to use painless instead of maintaining native plugins unless they really have to.

However, I understand your concern about sending a wrong message by placing these example into the main repo and I am fine with continuing to maintain these examples as a separate project in my spare time. It would be great if we could decide if we want this in or not rather sooner than later, though.

rjernst · 2016-08-11T21:16:53Z

I don't think we need to pull in a separate library to show an example, but there could be some dummy function being called which is documented as "do something special that cannot be done in normal scripts", and also comments in the readme/docs that recommend first trying to build scripts with painless/expressions, and only resorting to native scripts if their is some complex code which must be called that cannot be done inside painless. I had thought BigInteger was available inside painless, but it looks like not, so perhaps that is a sufficient example for now. But the scripted metric agg?

At minimum, if we are going to merge this (I'm on the fence personally, I would like to hear opinions from others), we should have the readme/docs warn that these should only be used in rare circumstances.

jdconrad · 2016-08-11T21:25:52Z

I understand the desire to make simple examples for native scripts to just show the structure; however, I agree with @rjernst here that this may give the wrong impression.

@rmuir and I went through a bunch of scripting samples from actual users pulled from tickets, and I would wager 75% or more of them were almost direct copies from the examples. I really feel like these scripts should show off something that can't be done via some other scripting language, Expressions, Painless, Groovy, Javascript, Python, etc, otherwise a user may be lead to believe he/she needs to do this for something that could be easily taken care of with another language. This is an advanced feature and advanced examples should be okay here.

Our documentation doesn't lend itself to easily figure out that another language could be used should a user Google to native scripts documentation directly. This should likely also be fixed, but it also does emphasize the existing problem that a user will copy an example for a native script when it's not necessary.

rjernst · 2016-08-11T21:31:18Z

@imotov I wonder if we should consider removing native scripts altogether? But we could still have custom java scripts, it would just be examples of how to build a script engine. Eg, making NativeScriptEngineService the example we have? It's not really any different amount of setup code, it would just be different, and it could be very clear in the docs, as well as just the naming (building a script engine, vs having a "native script" which sounds so much more lightweight) that this is a heavyweight solution for advanced use cases. Just a thought...

imotov · 2016-08-11T21:55:07Z

First of all, I agree with a need of a large disclaimer, saying that in most cases, users should use painless.

@rmuir and I went through a bunch of scripting samples from actual users pulled from tickets, and I would wager 75% or more of them were almost direct copies from the examples.

@jdconrad Are these native scripts or just script samples? In either case, do you think there is a bit of selection bias with this sampling? It seems to me that users are more likely to ask questions about scripts when they just start working with them and they are more likely to start with provided examples. Moreover, they might be reluctant of posting large internal scripts on public forums.

@imotov I wonder if we should consider removing native scripts altogether?

All script that were added to the project were added in response to needs of concrete users who needed to have a script that plays a particular role or uses a certain technique. If it was just about writing a native script this plugin would have contained only one script. We can remove native scripts and replace them with a single NativeScriptEngineService (native scrip are now really toothless in 5.0 anyway) but it wouldn't change the need for different scripts used in different contexts. So, I am not sure how it would help.

jdconrad · 2016-08-11T21:59:22Z

@imotov I completely agree that there is a likely selection bias here, but I would speculate these are the users that are the most vulnerable to falling into the trap of following a native script example without understanding there may be a better option available to them.

As a side note, out of a personal curiosity, are there any examples of real users native scripts that you have available? I'd just like to see what people are using them for in case there are features that we could possibly add to Painless.

imotov · 2016-08-11T22:05:06Z

I would speculate these are the users that are the most vulnerable to falling into the trap of following a native script example without understanding there may be a better option available to them.

I disagree with this statement because I think we made it hard enough for users to fall into this trap, but I cannot offer anything better than my contra-speculation :)

rjernst · 2016-08-11T22:05:06Z

The current examples I see here are:

isPrime
adding popularity to scoring
adding randomness to scoring
using scripted_metric

Of these, the 2 and 3 shouldn't be done in native scripts: even if someone asked for how to do it, they should be pointed to existing docs we have on using an expression script for adding popularity, and a function score to add randomness. For scripted_metric, I think anyone that can understand how scripted metrics work (with multiple scripts) would be trivially able to use it from any scripting language (whether that be with native scripts or not, it is just how the script is referenced in the scripted metric request). But the example here is trival and I would not want someone to find this example and be doing scripted metrics with native scripts.

We can remove native scripts and replace them with a single NativeScriptEngineService (native scrip are now really toothless in 5.0 anyway) but it wouldn't change the need for different scripts used in different contexts. So, I am not sure how it would help.

I think it would help in the naming at least, and understanding that these are heavyweight things (an engine) instead of something lightweight (a script).

imotov · 2016-08-11T22:07:37Z

Yep, you are right. It's kind of pointless with all the interesting stuff that used index lookup being removed. Thanks for the review!

imotov and others added 30 commits February 12, 2013 15:54

first commit

fc57847

Fix link in README

4dc7b46

Upgrade to elasticsearch 0.20.5

4e8d053

Add check for null parameters

b15f57f

Add an example of injecting client into native script

ecb5cc5

Add license

418c4c8

Add Lookup Script description

9525fa5

Make sure that the local shard will be used for the lookup operation …

c9629a1

…if it's allocated on the local node

Add random sort order native script

25b1565

Add an example of using logger in a native script

92c4950

Add a curl example for lookup script

f90c8b0

Add random sort script description

cb8fcc5

Add a curl example for random sort script

9ab9fb1

Make sure that standard naming convention is used for plugin file

211f267

Release v1.0.0

2b184f3

Move to 1.1.0-SNAPSHOT

e5d10c7

Release v1.1.0

210aab0

Move to 1.1.0-SNAPSHOT

574fd62

Change node settings to avoid interference with other clusters

803337a

Add example of a custom score script

d12bfa7

Small cosmetic changes

f280b76

Fix typo in index settings

a7f8b4d

Add an example of popularity script

1e2da9c

Upgrade to elasticsearch v0.90.5

eed5d53

Function Score Query

1e3c365

Implemented Function Score Query model Switeched to 0.90.7

Fix elasticsearch.yml

0420b7a

use elasticsearch version 0.90.8 and move from testNG to junit

d472879

- use ElasticSearchIntegrationTest - use ElasticsearchAssertions - index random

fix popularity test

4d774aa

The doc rec-0 got the same score as rec5-9 so rec0 cannot be assumed to be at osition 5 when retrieving result. Adding 1 to the "number" field fixes that.

Add -s flag to curl requests in the sample scripts

a6db7f9

imotov added review :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v5.0.0-alpha5 labels Jul 8, 2016

rjernst reviewed Jul 8, 2016
View reviewed changes

Remove all scripts that demonstrate use of indexLookup()

3c7ea4c

imotov mentioned this pull request Jul 20, 2016

Confused about versioning imotov/elasticsearch-native-script-example#24

Closed

clintongormley added v5.0.0-beta1 and removed v5.0.0-alpha5 labels Jul 28, 2016

rjernst reviewed Aug 11, 2016
View reviewed changes

imotov closed this Aug 11, 2016

imotov mentioned this pull request Aug 11, 2016

Migrate elasticsearch-native-script-example to the main repo #14662

Closed

imotov deleted the issue-14662-migrate-elasticsearch-native-script-example-to-main-repo branch May 1, 2020 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate elasticsearch native script examples to the main repo #19334

Migrate elasticsearch native script examples to the main repo #19334

imotov commented Jul 8, 2016

rjernst Jul 8, 2016

imotov Jul 8, 2016

brwe Jul 9, 2016

rjernst Jul 9, 2016 •

edited

brwe Jul 9, 2016

s1monw Jul 9, 2016

brwe Jul 11, 2016

brwe Jul 12, 2016

imotov commented Jul 18, 2016

imotov commented Aug 10, 2016

rjernst Aug 11, 2016

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016

rjernst commented Aug 11, 2016

jdconrad commented Aug 11, 2016 •

edited

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016

jdconrad commented Aug 11, 2016 •

edited

imotov commented Aug 11, 2016

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016


		h3. Is Prime Native Script

		p. One of the example scripts in this project is the "is_prime" script that can be used to check if a field contains a possible prime number. The script accepts two parameters @field@ and @certainty@. The @field@ parameter contains the name of the field that needs to be checked and the @certainty@ parameter specifies a measure of the uncertainty that the caller is willing to tolerate. The script returns @true@ if the field contains a probable prime number and @false@ otherwise. The probability that the number for which the script returned @true@ is prime exceeds (1 - 0.5^certainty). The script can be used in "Script Filter":http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html as well as a "Script Field":http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html. The implementation of the "is_prime" native script and it's factory can be found in the "IsPrimeSearchScript":https://github.com/elastic/elasticsearch/blob/master/plugins/native-script-example/src/main/java/org/elasticsearch/examples/nativescript/script/IsPrimeSearchScript.java class.

Migrate elasticsearch native script examples to the main repo #19334

Migrate elasticsearch native script examples to the main repo #19334

Conversation

imotov commented Jul 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjernst Jul 9, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imotov commented Jul 18, 2016

imotov commented Aug 10, 2016

Choose a reason for hiding this comment

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016

rjernst commented Aug 11, 2016

jdconrad commented Aug 11, 2016 • edited

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016

jdconrad commented Aug 11, 2016 • edited

imotov commented Aug 11, 2016

rjernst commented Aug 11, 2016

imotov commented Aug 11, 2016

rjernst Jul 9, 2016 •

edited

jdconrad commented Aug 11, 2016 •

edited

jdconrad commented Aug 11, 2016 •

edited