Remove script access to term statistics #19462

rjernst · 2016-07-15T22:05:09Z

In scripts (at least some of the languages), the terms dictionary and
postings can be access with the special _index variable. This is for
very advanced use cases which want to do their own scoring. The problem
is segment level statistics must be recomputed for every document.
Additionally, this is not friendly to the terms index caching as the
order of looking up terms should be controlled by lucene.

This change removes _index from scripts. Anyone using it can and should
instead write a Similarity plugin, which is explicitly designed to allow
doing the calculations needed for a relevance score.

closes #19359

clintongormley · 2016-07-17T18:44:21Z

You should also remove the docs https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-advanced-scripting.html

Please could you change the title to something more meaningful, such as "Remove script access to term statistics"

jpountz · 2016-07-18T07:22:18Z

The code changes LGTM

In scripts (at least some of the languages), the terms dictionary and postings can be access with the special _index variable. This is for very advanced use cases which want to do their own scoring. The problem is segment level statistics must be recomputed for every document. Additionally, this is not friendly to the terms index caching as the order of looking up terms should be controlled by lucene. This change removes _index from scripts. Anyone using it can and should instead write a Similarity plugin, which is explicitly designed to allow doing the calculations needed for a relevance score. closes elastic#19359

rjernst · 2016-07-18T07:28:22Z

@clintongormley I removed those docs and updated the title as you suggested.

clintongormley · 2016-07-18T08:08:58Z

There's also a mention and link which you'll need to remove in this section: https://github.com/elastic/elasticsearch/blob/master/docs/reference/modules/scripting/fields.asciidoc#search-and-aggregation-scripts

Would it be possible to add the appropriate deprecation logging in 2.4.0?

rjernst · 2016-07-18T08:12:45Z

Would it be possible to add the appropriate deprecation logging in 2.4.0?

I'm not sure how to do that without creating potentially very large logs. We only know this is accessed when a script is being run, and it is called from the script. So eg if a script runs on a million docs you would get a million deprecation messages?

clintongormley · 2016-07-18T08:14:29Z

Deprecation logging is off by default. But yes, I see what you mean. I wonder if we should be rate-limiting duplicate messages in the deprecation log infra itself.

jpountz · 2016-07-18T08:17:15Z

Another way would be to do something like

if (logged == false) {
  // log message
  logged = true;
}

in every method of LeafIndexLookup in order to have one message per segment, which would make the volume lower.

clintongormley · 2016-07-18T13:52:22Z

I think we should rethink this PR given #19359 (comment)

astefan · 2016-08-08T10:54:14Z

I have seen scripts being used for retrieving terms' statistics and re-scoring the documents based on them (or sorting the documents based on them) in our public community. It is true it is not often being used, but I've seen it. Removing this possibility assumes the user will need to get a hold of Java and write code for the same thing that was possible in queries in a much simpler and accessible way.

dakrone · 2016-09-12T22:12:33Z

@rjernst is this PR still needed given Clint's earlier comment about rethinking it?

rjernst · 2016-09-15T00:47:49Z

@dakrone I think this PR still makes sense, and I left a comment on #19359 explaining why.

elasticmachine · 2017-02-23T18:14:40Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

rjernst · 2017-05-16T08:57:26Z

@jpountz I've updated this PR now that index lookup is deprecated in 5.5. Can you take a look again?

rjernst added >breaking :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v5.0.0-alpha5 labels Jul 15, 2016

rjernst changed the title ~~Scripting: Removing _index access~~ Remove script access to term statistics Jul 18, 2016

rjernst added 2 commits July 18, 2016 00:25

Remove advanced scripting docs

191cdaf

rjernst force-pushed the remove_index_lookup branch from 4f8bf3d to 191cdaf Compare July 18, 2016 07:28

Remove remaining reference to advanced scripting

345e90c

clintongormley added v5.0.0-beta1 and removed v5.0.0-alpha5 labels Jul 28, 2016

clintongormley removed the v5.0.0-beta1 label Sep 14, 2016

rjernst added 4 commits May 10, 2017 20:48

Merge branch 'master' into remove_index_lookup

5ccb918

Merge branch 'master' into remove_index_lookup

e969746

remove test

8143e9b

Merge branch 'master' into remove_index_lookup

99c80b9

rjernst added the v6.0.0 label May 16, 2017

jpountz approved these changes May 16, 2017

View reviewed changes

rjernst merged commit 97d2657 into elastic:master May 16, 2017

rjernst deleted the remove_index_lookup branch May 16, 2017 16:10

rjernst mentioned this pull request May 21, 2017

Painless script don't have access to a _index variable and groovy is depreciated in 5.4 #24820

Closed

clintongormley added v6.0.0-alpha2 and removed v6.0.0 labels Jun 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove script access to term statistics #19462

Remove script access to term statistics #19462

rjernst commented Jul 15, 2016

clintongormley commented Jul 17, 2016

jpountz commented Jul 18, 2016

rjernst commented Jul 18, 2016

clintongormley commented Jul 18, 2016

rjernst commented Jul 18, 2016

clintongormley commented Jul 18, 2016

jpountz commented Jul 18, 2016 •

edited

clintongormley commented Jul 18, 2016

astefan commented Aug 8, 2016

dakrone commented Sep 12, 2016

rjernst commented Sep 15, 2016

elasticmachine commented Feb 23, 2017

rjernst commented May 16, 2017

Remove script access to term statistics #19462

Remove script access to term statistics #19462

Conversation

rjernst commented Jul 15, 2016

clintongormley commented Jul 17, 2016

jpountz commented Jul 18, 2016

rjernst commented Jul 18, 2016

clintongormley commented Jul 18, 2016

rjernst commented Jul 18, 2016

clintongormley commented Jul 18, 2016

jpountz commented Jul 18, 2016 • edited

clintongormley commented Jul 18, 2016

astefan commented Aug 8, 2016

dakrone commented Sep 12, 2016

rjernst commented Sep 15, 2016

elasticmachine commented Feb 23, 2017

rjernst commented May 16, 2017

jpountz commented Jul 18, 2016 •

edited