Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ability to gather index statistics in scripts (aka IndexLookup) #19359

Closed
brwe opened this issue Jul 11, 2016 · 35 comments

Comments

Projects
None yet
8 participants
@brwe
Copy link
Contributor

commented Jul 11, 2016

IndexLookup was added to have the ability to access term statistics from scripts. However, it has some limitations and it is not clear whether the effort to maintain the code is worth it (see also https://github.com/elastic/elasticsearch/pull/19334/files/7df2d06eb62dbbc5cc365153088993d70801c286#r70103016). I'll add the pros and cons as far as I know them here:

  • when used for scoring IndexLookup cannot make use of sparse postings lists like a similarity and will therefore always be slower than a custom similarity
  • IndexLookup always needs the list of tokens used in advance, cannot make use of rewrite query
  • when used within a script field and scroll it will be terribly slow because all the caching mechanisms to get for example document frequency are currently broken, see #17110
  • it is rarely used

pros:

  • IndexLookup makes it easy to test custom similarities as a script before one actually goes and writes a custom similarity
  • there is currently no other way to access term statistics in a script (none that I know of anyway)

I'd be a little sad to see it go because it was useful for some of my experiments. But if I am the only one then so be it. I'd also be willing to work on a replacement.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jul 15, 2016

Discussed in FixItFriday. Given that:

  • this is a really advanced use case
  • it performs badly and would he hard (impossible?) to fix
  • it could be replaced by a similarity (or possibly a native Java script)

... we're in favour of removing this functionality.

rjernst added a commit to rjernst/elasticsearch that referenced this issue Jul 18, 2016

Scripting: Remove script access to term statistics
In scripts (at least some of the languages), the terms dictionary and
postings can be access with the special _index variable. This is for
very advanced use cases which want to do their own scoring. The problem
is segment level statistics must be recomputed for every document.
Additionally, this is not friendly to the terms index caching as the
order of looking up terms should be controlled by lucene.

This change removes _index from scripts. Anyone using it can and should
instead write a Similarity plugin, which is explicitly designed to allow
doing the calculations needed for a relevance score.

closes elastic#19359
@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2016

it could be replaced by a similarity (or possibly a native Java script)

When I wrote scripts I meant native script as well. I agree we can use a custom similarity instead of a script for scoring but in all other places where we can use scripts I do not see any way to for example access term frequencies even within native scripts. I at least used IndexLookup there as well.
For example I used it to get the term frequencies for text classification with naive bayes in an native script. I also used it to extract the term frequencies for a given list of terms to feed that information to an external tool.

when used within a script field and scroll it will be terribly slow because all the caching mechanisms to get for example document frequency are currently broken, see #17110

To clarify, #17110 applies to all lookups we have right now and is not just an inherent problem of just IndexLookup although it I think it hurts Index Lookup most.
There is one problem for IndexLookup that makes it slower than it was in 1.x even if not used within scan and scroll: IndexLookup currently computes document frequencies over the whole shard once per segment and not once per shard as it was on 1.x. While this decreases performance on 2.x it is not horrible and it is not impossible to fix that.

Again, I'd be willing to work on fixing the current problems or on an alternative implementation to IndexLookup but I would first like to hear why people think it is so important to remove it. Also, if there is a way to replace this functionality right now with a native script please point me to it, I might just be missing it.

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2016

If we really want to use scripting to implement similarity, IMO the simplest approach would be:

public class ExpressionSimilarity extends SimilarityBase {
   public ExpressionSimilarity(String expression) { ... }
   ...
}

It could just expose the simplified statistics of SimilarityBase via the ValueSource api, and should really be as fast as e.g. DFRSimilarity.

IMO it doesn't necessarily make Similarity any easier, because the true challenges are elsewhere: its really hard to get this stuff right. Many published algorithms struggle with this: e.g. especially with ES not removing stopwords by default, actual distributions are very unlike the expected and can cause problems in some cases. Even the classic BM25 IDF had an issue with this, but its not the only instance.

The code comments in DFR basic models give examples of problems:

 * WARNING: for terms that do not meet the expected random distribution
 * (e.g. stopwords), this model may give poor performance, such as
 * abnormally high or NaN scores for low tf values.

When possible we applied simple adjustments (e.g. BM25, DFR geometric model), but in other cases it is not obvious how or if things can even be fixed mathematically :) It is very expert stuff.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2016

++ to a scripted similarity (must be a fast script ie. expressions)

@clintongormley

This comment has been minimized.

Copy link
Member

commented Jul 18, 2016

@rmuir that works for the scoring aspect. @brwe is also using these scripts to export term freq statistics for ML purposes. That was the original purpose of this feature. Any ideas how this part could be better implemented?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2016

The only thing i can suggest is to see if we can avoid mixing all these concerns in one api. It sounds like we have our current scripting api trying to tackle 3 different concerns:

  • per-doc: (typical script field/script score)

  • per-term-per-doc: (Similarity)

  • per-term??? (ml-type stuff)

    Similarity is a per-term-per-doc api. Much other stuff like script-score functionality is per-doc api. These two are two different perspectives. Thats why i think e.g. ExpressionSimilarity would be simpler (i am playing around with it now), because it would be "its own thing", and removed from here. I think it will also be easier on the user, too.

"exporting frequencies for ML purposes" isn't concrete enough for my understanding to know what would be needed (e.g. would it just be a proper per-term api or something more complicated?, would it go out to all shards and really return those correct numbers?), but it would be great to separate this too.

I am suspicious of generic apis that claim to solve all ML problems. Lets at least separate this concern from complex issues like term weighting and script-based ranking and fields, because its nothing like those two things, and those two things are nothing like each other :)

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2016

"exporting frequencies for ML purposes" isn't concrete enough for my understanding to know what would be needed (e.g. would it just be a proper per-term api or something more complicated?

Here is what I did:

  • for a given list of terms, export per document the term frequencies as sparse vector (so for terms "a", "b", "c" you get something like [[0,2], [3,1]] if the doc is "a c a a" per document)
  • train some classifier outside elasticsearch
  • store the parameters back to elasticsearch somewhere
  • execute a native script that uses these parameters together with the term frequencies per document to get some classification going

In the export I used IndexLookup to get the frequencies for training documents. In the script execution I used the term frequencies for new documents to execute the classification. I hope that makes a little more sense. Not sure in how much detail I have to get here.

BTW, this was a little bit of a hassle because in a way this is an intersection of terms in document and the list of terms I was interested in so I tried with a mixture of fielddata and term vectors and IndexLookup.
Conceptually it seemed to me a similar problem to search just that the number of terms is huge and the "score" that is outputted can be several numbers as well. But I never got to convert that vague feeling into code.

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2016

i dont understand how this really scales. if the term is common it may need to transfer millions and billions of things across the wire.

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2016

it would not be slower than scroll in general I think. or what do you mean?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2016

It doesn't lend itself to a top-N approach in any way, hence i think it makes sense to try to divorce it from the search API.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

so I think we are on the right path here, IMO what we are missing is:

  • a way to script similarities (that would be awesome)
  • a way to script a scorer

lemme explain the latter, from my perspective the last remaining purpose of the index lookup was to have full control over the linear combination of the individual term scores. Now if you think about it its really a way to change what BooleanScorer does but with added flexibility? I think if we add a dedicated query that calls a script to calculate the linear combination for each subqueries score we are at the point that @brwe has everything she needs. the input of the script can simply be a double array holding all score of the subqueries where the index is their ordinal of how they are specified?

@brwe @rmuir does this make sense to you?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

Why does a user need to change how scores are combined? Mathematically what else make sense? Users often ask for multiply because they think they want that, but that is supported via sum of logs.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

I think, and please correct me if I am wrong @brwe that the context here is for instance something like logistic regression where your score is the result of some sigmod function (I know negative scores might be a problem here but there is a way to model that). The result of this could be used in aggregations to see how many docs fall into a certain class. I am really just making things up but I can totally see the value for experiments like this.

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

As i stated above, I don't think we should piggyback such things on our script search api... sorry, its not search to me at all.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

As i stated above, I don't think we should piggyback such things on our script search api... sorry, its not search to me at all.

I think you misunderstood my comments, I am suggestion to remove the IndexLookup and extend ScirptQuery to also score. Since we already have a scorer on LeafSearchScript the script can just access it and call Scorer#getChildren and do whatever it wants? I mean all we need to guarantee is that the iteration order of this method is consistent?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

I sense my concerns may not have been communicated very well either.

Scoring APIs (in both lucene and elasticsearch) are a sensitive part of the system, a real hotspot. Can we please not try to wedge things like document classification into the scoring system? It has a difficult job already: serious challenges especially in ES (e.g. very slow scripting language as the default).

If we want to support document classification, lets do that with some other API. I don't think it makes sense to use the search datastructures, algorithms, and apis for this problem: but if we want to do that, let's please make that an implementation detail, but expose a very simple api to the user.

As far as things being "flexible", it scares me to hear that term. Please, none of this is personal to any
developers, but this is like a raging infection in the elasticsearch codebase. There are far too many overcomplicated abstractions in this codebase, and despite that fact, a significant portion of our developers time is on refactoring. In my opinion, those abstractions are not helping out, instead they just make things complicated.

I really like how the linux kernel explains the problem[1](you kinda gotta translate the gist of this from C to Java or REST api or whatever is relevant):

* Abstraction layers

Computer Science professors teach students to make extensive use of
abstraction layers in the name of flexibility and information hiding.
Certainly the kernel makes extensive use of abstraction; no project
involving several million lines of code could do otherwise and survive.
But experience has shown that excessive or premature abstraction can be
just as harmful as premature optimization.  Abstraction should be used to
the level required and no further.

At a simple level, consider a function which has an argument which is
always passed as zero by all callers.  One could retain that argument just
in case somebody eventually needs to use the extra flexibility that it
provides.  By that time, though, chances are good that the code which
implements this extra argument has been broken in some subtle way which was
never noticed - because it has never been used.  Or, when the need for
extra flexibility arises, it does not do so in a way which matches the
programmer's early expectation.  Kernel developers will routinely submit
patches to remove unused arguments; they should, in general, not be added
in the first place.

[1] https://www.kernel.org/doc/Documentation/development-process/4.Coding

So instead of making the search api "flexible" enough to also do document classification, and instead of trying to make it "flexible" enough to support vague, ill-defined concepts like "ML purposes", I would prefer instead we added smaller, simpler concrete apis that address those needs. For example an API around document classification could be added that is very simple and easy to use, and has minimal parameters. Because its been separated from scoring, its no longer stuck with a priority queue + DAAT approach, it can use whatever datastructures it wants.

The search api already has enough problems, its already doing enough "dark magic", its already overwhelmed. Can we please separate this stuff?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

I am 100% behind that statement. I think we should try to simplify the entire scripting API etc. let detach this from vague statements and look at issues like #17116 it's pretty much what I described above and if we say we go with #17116 to allow this flexibility on that issue then we can just move on. But apparently the ability to combine scores is something users would like to have. I am more than in favor to simplify the script API if we can to a minimum and use higher level APIs like in that issue to add flexibility that doesn't cost much of performance here

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

I don't think we should do #17116 either. Instead of multiplying users can use the sum of logs.

@brwe

This comment has been minimized.

Copy link
Contributor Author

commented Jul 21, 2016

For example an API around document classification could be added that is very simple and easy to use, and has minimal parameters. Because its been separated from scoring, its no longer stuck with a priority queue + DAAT approach, it can use whatever datastructures it wants.

Just to be sure: You mean an actual api like

GET index/type/doc_id/_classify

?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2016

Sorry, I don't really have a concrete design for how classification should work: which algorithms are best or how we should expose it in our APIs.

Just hypothetically, i would think of it as a document enrichment type of thing, so maybe we start from say an ingest processor or whatever, that adds a new field (the assigned "classification") to the document, and then figure out from there how it should work. That would imply classifying documents before they are indexed at all, based on some model, whatever that is.

Maybe that model is something complicated, and based on a bunch of index statistics from a separate index ("the training set"). Maybe its just a decision tree based on a big pile of regular expressions. Or maybe its best to leave tasks outside of ES to tools like nltk. Honestly, I have no idea :)

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2016

ok given that we should scoring for scoring ;) and we have some really strong arguments towards not mixing concerns while rather minor arguments towards opening things up more for satisfy the usecase of ML or classification we should remove this API without a full replacement. I would still like us to explore the way of a scripted similarity, I think this could help quite a bit for experimenting with scoring. @rmuir WDYT?

Are users of ES typically information retrieval researchers? I can't think of who else would need that.

Like i said: if we really truly need it, then ExpressionSimilarity is the way to do it. However, I feel like all the time i would invest in such a feature may be wasted: nobody would ever use it :(

It really should not be so hard to remove things in elasticsearch. This issue is a classic example! That's why elasticsearch has half a million lines of code.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2016

Like i said: if we really truly need it, then ExpressionSimilarity is the way to do it. However, I feel like all the time i would invest in such a feature may be wasted: nobody would ever use it :(

can it be a lucene feature, I could see some folks use it...

@s1monw s1monw reopened this Jul 22, 2016

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2016

well to begin with I think lets remove the IndexLookup and then we can have individual issues about improvements and then we can still decide if it makes sense to add such a similarity.

@softwaredoug

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2016

Just want to chime into this issue as a practioner. I have used term statistics in scripts and find them very handy. The use case that comes to mind is a small collection, hosted somewhere that doesn't allow plugins, that cares about relevance.

Sure In a larger scale system, scripting isn't a great answer. But many folks do search at a smaller scale. These orgs look at you sideways when you start talking about maintaining custom scorers and similarities for them. They are terrified of Java and Lucene. And they don't want to keep having to hire someone to maintain even a simple similarity.

Second Elasticsearch seems to be being used as a general purpose matching/ranking system in analytics solutions, not strictly a search engine. I'm not just speaking from my own experience at OSC, but from Elastics own marketing. Often when doing weird things like searching DNA or the predicting weather or building recommenders what a "term" is becomes a data modeling exercise.

In these cases erring towards a flexible "frameworky" approach IMO works best. Allowing several options to perform arbitrary scoring computations is really important for these users. I don't think the flexibility argument is merit less based on the very cool things I see folks doing in Lucene.

So my bias as a practitioner, not as someone steeped in the ES code base, would be to caution you against any change that didn't have an alternative for the folks that don't want to or can't maintain Java plugins. And something that let the non text use cases continue to do well with Elasticsearch.

I really like the idea of scripting scorers and similarities. That removes a lot of Java code and adds a ton for relevance use cases.

@softwaredoug

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2016

Ive seen it mentioned a few times here uncertainty why why users should need to control how scores are combined. I think it's vital. Lucene based search has been successful compared to commercial offerings precisely because of how much you can customize its behavior. And the more of that I can do outside of Java, the easier it is for me to implement that power. (Otherwise these days I can just go plug in algolia or something and be done with it.)

For example a "term" in Lucene world as I'm sure everyone has seen may have nothing to do with text. It might have something to do with product affinities in a recommendation system. Or ngrams of DNA. Or identifiers in some taxonomy. Or something even more exotic. And who knows then what the term stats around it are supposed to mean and how they might be used? In this context, the more control one can have before having to deal with custom plugins the more successful ES can be at solving the meaty search probs out there.

A high quality ES solution is still only as free as your time is (or as powerful as your time investment is). So while I do see more "IR researchers" in pure Lucene and Solr land, there's still quite a few using Elasticsearch precisely because it hits a nice sweet spot between control and ease of use compared to Solr or pure Lucene.

So again 👍 to scripted scorers and similarities. I see a lot of power there.

(PS I'm just trying to offer a "outside" practitoners devils advocate for your efforts. Totally get you guys have a large complex code base to maintain and can empathize with the need to tidy things up and avoid bugs. Keep up the great work! :) )

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2016

For example a "term" in Lucene world as I'm sure everyone has seen may have nothing to do with text. It might have something to do with product affinities in a recommendation system. Or ngrams of DNA. Or identifiers in some taxonomy. Or something even more exotic. And who knows then what the term stats around it are supposed to mean and how they might be used? In this context, the more control one can have before having to deal with custom plugins the more successful ES can be at solving the meaty search probs out there.

Sorry this argument just does not hold. For example BM25 tells us exactly how to handle such "non-textual relevance features". Please read it.

http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf

@softwaredoug

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2016

Great article Rob, I believe I've read it in the past. BM25 is pretty awesome for what it's built for (document search)

The area you cite provides a very general framework for a linear combination of scalars. It says

Here are some Vi functions that we have used with success in the past for different features
(lists sigmoid, log, etc)

The linear combination looks like the cosine similarity summation under the Bool query. But most of Elasticsearch's documentation (and my experience) points to multiplying text relevance by these scalar factors instead of trying to treat them as another "boolean clause" to optimize for as the first tool to reach for. Indeed, this is what Elasticsearch's function_score_query does by default. It's likely both approaches have their applications.

Additionally, like I said BM25 is proven at general document search. This is only one application of Elasticsearch. I still believe Lucene-based search's sweet spot is as a framework in that (1) it's focus is on deep configurability (2) it offers a variety of ways of thinking about and configure "relevance". People want to use Elasticsearch to build recommendations systems, perform image search, and search chemical structures. They have something unique to what they're doing, that's why they didn't grab an off the shelf product that did what they wanted.

I don't want to derail this thread. I'm just trying to offer my support & perspective on where I see Elasticsearch in the marketplace when people are making the "buying decision." Little features like tf in scripts seem like a small thing, but in some sense its these little bits of configurability that sell me Lucene-based search as opposed to a commercial "plug and play" model where all the decisions are made for me.

@rjernst

This comment has been minimized.

Copy link
Member

commented Sep 15, 2016

I've been thinking on how to allow looking at index stats for "advanced scripts", but without needing IndexLookup and I think the answer falls in line with #19966. Without IndexLookup, the script engine api will still take the LeafReaderContext (which is currently used to construct the LeafIndexLookup). Replacing native scripts with advanced users simply writing script engines means they would still have access to statistics by using the LeafReader from LeafReaderContext. IMO this is better anyways, as LeafReader is well documented, and does not have the overhead that IndexLookup incurs even when it is not used (setDocument is always called, advancing various iterators, even when they are not used by a script).

rjernst added a commit to rjernst/elasticsearch that referenced this issue May 10, 2017

Scripting: Replace advanced and native scripts with ScriptEngine docs
This commit documents how to write a `ScriptEngine` in order to use
expert internal apis, such as using Lucene directly to find index term
statistics. These documents prepare the way to remove both native
scripts and IndexLookup.

The example java code is actually compiled and tested under a new gradle
subproject for example plugins. This change does not yet breakup
jvm-example into the new examples dir, which should be done separately.

relates elastic#19359
relates elastic#19966

rjernst added a commit that referenced this issue May 11, 2017

Scripting: Replace advanced and native scripts with ScriptEngine docs (
…#24603)

This commit documents how to write a `ScriptEngine` in order to use
expert internal apis, such as using Lucene directly to find index term
statistics. These documents prepare the way to remove both native
scripts and IndexLookup.

The example java code is actually compiled and tested under a new gradle
subproject for example plugins. This change does not yet breakup
jvm-example into the new examples dir, which should be done separately.

relates #19359
relates #19966

rjernst added a commit that referenced this issue May 11, 2017

Scripting: Replace advanced and native scripts with ScriptEngine docs (
…#24603)

This commit documents how to write a `ScriptEngine` in order to use
expert internal apis, such as using Lucene directly to find index term
statistics. These documents prepare the way to remove both native
scripts and IndexLookup.

The example java code is actually compiled and tested under a new gradle
subproject for example plugins. This change does not yet breakup
jvm-example into the new examples dir, which should be done separately.

relates #19359
relates #19966

rjernst added a commit to rjernst/elasticsearch that referenced this issue May 15, 2017

Scripting: Deprecate index lookup
This commit adds a deprecation warning if `_index` is used in scripts.
It is emitted each time a script is invoked, but not per document. There
is no test because constructing a LeafIndexLookup is quite difficult,
but the deprecation warning does show up in IndexLookupIT, there is just
no way to assert warnings in integ tests.

relates elastic#19359

rjernst added a commit that referenced this issue May 16, 2017

Scripting: Deprecate index lookup (#24691)
This commit adds a deprecation warning if `_index` is used in scripts.
It is emitted each time a script is invoked, but not per document. There
is no test because constructing a LeafIndexLookup is quite difficult,
but the deprecation warning does show up in IndexLookupIT, there is just
no way to assert warnings in integ tests.

relates #19359

rjernst added a commit that referenced this issue May 16, 2017

Scripting: Deprecate index lookup (#24691)
This commit adds a deprecation warning if `_index` is used in scripts.
It is emitted each time a script is invoked, but not per document. There
is no test because constructing a LeafIndexLookup is quite difficult,
but the deprecation warning does show up in IndexLookupIT, there is just
no way to assert warnings in integ tests.

relates #19359

rjernst added a commit that referenced this issue May 16, 2017

Remove script access to term statistics (#19462)
In scripts (at least some of the languages), the terms dictionary and
postings can be access with the special _index variable. This is for
very advanced use cases which want to do their own scoring. The problem
is segment level statistics must be recomputed for every document.
Additionally, this is not friendly to the terms index caching as the
order of looking up terms should be controlled by lucene.

This change removes _index from scripts. Anyone using it can and should
instead write a Similarity plugin, which is explicitly designed to allow
doing the calculations needed for a relevance score.

closes #19359
@MLnick

This comment has been minimized.

Copy link

commented Sep 4, 2017

Hey folks

Late coming to this but I used indexLookup in a native script in my vector-scoring plugin: https://github.com/MLnick/elasticsearch-vector-scoring.

This use case doesn't access TF/DF, but needs to access term vector (specifically PAYLOADS).

I see there might be a way to use ScriptEngine to implement this now. But it does seem that other than a native script there is no way to do something similar using painless. (Well, other than using source and I think that would be extremely slow? It was very slow before when I tried a version using that).

But users who are using cloud deployments typically have locked down ES clusters where they aren't able to install custom plugins. So under the new ScriptEngine this use case is impossible for these deployment scenarios (e.g. see https://discuss.elastic.co/t/how-do-i-access-term-vector-in-painless-scripting/84579).

@rjernst

This comment has been minimized.

Copy link
Member

commented Sep 7, 2017

@MLnick There is a tradeoff we must balance between ease of use and customizability. We tend to favor ease of use for the common cases, and are happy to add customizability with less ease of use. In this case, the complexity of maintaining access to these very low level and advanced structures within lucene was not worthwhile. As you said, one can still access these with a custom ScriptEngine. And the Elastic Cloud does support custom plugins.

@mickeystroller

This comment has been minimized.

Copy link

commented Sep 14, 2017

Hi all,

I want to make sure my understanding is correct. In ES 5.4.0, the only way to access index statistics in scripts is to write a custom ScriptEngine which can only be written in Java. If that is the case, I want to further ask whether there is a way to force the usage of Groovy in ES 5.4.0 assuming it has not been completely deleted.

Thanks.

@rjernst

This comment has been minimized.

Copy link
Member

commented Sep 15, 2017

@mickeystroller
Please ask questions on our forum. We use github for feature requests and confirmed bug reports. Commenting on closed issues often results in missed messages. It is best to open a new discussion on the forum for a question such as this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.