Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fielddata accessors (.value/.values/.distance()/etc) #18169

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
5 participants
@rmuir
Copy link
Contributor

rmuir commented May 5, 2016

Currently fields in painless are recommended to be accessed as input.doc['field'].0.

Instead we should work closer to the current syntax input.doc['field'].value. To do this, we have to add ScriptDocValues to the whitelist so we can see its getValue() method.

This has other benefits, it means we expose more of the scripting api (see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods)

So input.doc['field_name'].lat and doc['field_name'].arcDistance(34, 56) work too and so on, and geo fields become usable.

I TODO'd whitelisting any joda-time stuff to be a separate change.

Field loads for this common .value case with some temporary cheating. This is a 467% performance improvement for scripts accessing document fields from painless, so I think we should do it for now. We can do fancy stuff later and remove this, but it means at least common use cases are much faster.

I cutover the integration tests and docs to use this syntax.

copyStruct("Strings", "List<String>", "Collection<String>", "Object");
copyStruct("Longs", "List<Object>", "Collection<Object>", "Object");
copyStruct("Doubles", "List<Object>", "Collection<Object>", "Object");
copyStruct("GeoPoints", "List<Object>", "Collection<Object>", "Object");

This comment has been minimized.

Copy link
@jdconrad

jdconrad May 5, 2016

Contributor

I think for Longs, Doubles, and GeoPoints it's safer to use List (List < def > ) so as an example Longs would be

copyStruct("Longs", "List", "Collection", "Object");

The reason being is that if anyone ever casts to one of these then we probably want to be able to handle the Object coming out of it without requiring a second cast.

((Longs)longField).get(0) is now a def type instead of an Object type. Really rare that this would ever happen, but I think for consistency this could be important.

@jdconrad

This comment has been minimized.

Copy link
Contributor

jdconrad commented May 5, 2016

Looks great! Just one comment.

@rjernst

This comment has been minimized.

Copy link
Member

rjernst commented May 5, 2016

LGMT, awesome speedup!

@rmuir

This comment has been minimized.

Copy link
Contributor Author

rmuir commented May 5, 2016

@jdconrad i cut these over to <Def>

@jdconrad

This comment has been minimized.

Copy link
Contributor

jdconrad commented May 5, 2016

LGTM. Thanks!

@rmuir

This comment has been minimized.

Copy link
Contributor Author

rmuir commented May 5, 2016

thanks for reviewing. there is a lot to followup with for this one, I will take care of it.

@rmuir rmuir closed this in e3ce6c9 May 5, 2016

@jdconrad jdconrad referenced this pull request May 6, 2016

Closed

Ongoing Painless Improvements #17992

15 of 18 tasks complete
@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented May 6, 2016

@rmuir should i move these properties/methods (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-groovy.html#_doc_value_properties_and_methods) back to the general scripting section instead of leaving them in groovy? or perhaps add them to the painless docs?

@rmuir

This comment has been minimized.

Copy link
Contributor Author

rmuir commented May 6, 2016

@clintongormley I don't know what to do here!

These properties/methods, lets call it "the scripting api", is really exposed to all script engines via Java. So today groovy, javascript, python, and painless support it.

Expressions basically exposes what looks like a subset of this API to users, but its a lie: it in fact bypasses the whole thing entirely, and the expressions API is inconsistent in random annoying ways: e.g. doc['field'].count() vs doc['field'].size(). So not really a proper subset :(

On one hand, I think painless needs to support this API, to get people off of groovy. On the other hand, I'm not sure if we want to declare it front-and-center "the scripting api", because its not very good. I don't mean this in an offensive way, the API is both slow and bad, its just a fact. Expressions is 2x faster than anything else because it bypasses it completely.

One reason it is cumbersome and slow is that it exposes a document's fields as always being multivalued lists, but its not clear to me user's even care about multi-valuedness whatsoever: (4ddf916)

There are also tons and tons of geo functions, but many of these look very obscure. I think its ok to continue supporting them, to not break anyones scripts, but do we really have to document all the obscure stuff?

I think it makes things overwhelming. An easy win might be to try to drop some of these obscure methods from the documentation. I think all the multi-valued methods (e.g. .values, .lats, .lons) should probably be dropped from the documentation too. It would be better to instead have a footnote at the bottom that says "each field is really a list so to get access to the other values you can use .get(1)/.size()/etc"

@jdconrad

This comment has been minimized.

Copy link
Contributor

jdconrad commented May 6, 2016

I really like the proposal to drop the more obscure stuff other than as a footnote. On the differences in API between Expressions and others -- maybe we can deprecate the ones that don't match and add ones that do match?

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented May 9, 2016

On one hand, I think painless needs to support this API, to get people off of groovy. On the other hand, I'm not sure if we want to declare it front-and-center "the scripting api", because its not very good. I don't mean this in an offensive way, the API is both slow and bad, its just a fact. Expressions is 2x faster than anything else because it bypasses it completely.

I do understand that we need to provide an API which is similar in it's functionality but putting ourself into the position of building painless on top of an already known problem sounds weird to me. I think we can build a new API that allows for good performance based on a subset of the current API and deprecate the current one. All other scripting languages can potentially have access to both such that folks can move gradually but we can build a new and fast API? Does this make sense and / or is this feasible at all?

@clintongormley clintongormley changed the title Painless: add fielddata accessors (.value/.values/.distance()/etc) Add fielddata accessors (.value/.values/.distance()/etc) May 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.