Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The fields option should always return an array #4542

Closed
martijnvg opened this issue Dec 24, 2013 · 17 comments
Closed

The fields option should always return an array #4542

martijnvg opened this issue Dec 24, 2013 · 17 comments

Comments

@martijnvg
Copy link
Member

The fields options allows to extract field values from _source or load specific stored fields. The fields option is supported in various apis (get, search and explain).

The behaviour when it comes to array fields with a single value is inconsistent between apis, between source and stored fields. Based on the previous an array field is either serialised as a single value or an array containing a single value.

Doing the right thing here is difficult because the field option works on both _source and stored fields. The _source contains the meta information (json) to serialise field values correctly, but this information isn't available in stored fields. The plan is to make fields always return an array for both _source and stored fields and in all APIs with the goal to be consistent. Also the fields option can only serialise leaf fields, this to be further consistent between stored fields and _source. Metadata (_id, _routing, _parent etc.) fields are always single values, for this reason in the response the metadata fields are never wrapped in a json array.

If better serialisation is required for _source, the source filtering feature should be used instead: http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-request-source-filtering.html

@ghost ghost assigned martijnvg Dec 24, 2013
@clintongormley
Copy link

What about _version, _timestamp etc, where we know they are single-valued fields?

@kimchy
Copy link
Member

kimchy commented Jan 2, 2014

good Q..., _routing also...

@martijnvg
Copy link
Member Author

I lean towards having the same behaviour for both json and metadata fields for consistency.

@clintongormley
Copy link

Hmmm, not so sure...

For stored fields (or extracted from source) we don't know if they are single or multi-valued, so there it feels right to default to arrays. But for metadata fields, returning these values as an array starts to feel like a lot of overhead. We know and the user knows that these are always single-valued fields, and it feels cleaner to treat them as such.

@martijnvg
Copy link
Member Author

I'm okay with this exception to the rule, I'll update the issue description.

@kimchy
Copy link
Member

kimchy commented Jan 2, 2014

++, agreed. I would love to eventually also have an "_source + all_metadata" option, since it is needed when reindexing for example (since the routing doesn't have to be part of the _source). This can be a different issue though.

@clintongormley
Copy link

++ agreed re source and all metadata!

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 2, 2014
…ields and single valued field for metadata fields.

The `fields` option can only be used to fetch leaf fields, trying to do fetch object fields will return in a client error.

Closes to elastic#4542
honzakral added a commit to honzakral/elasticsearch that referenced this issue Jan 4, 2014
honzakral added a commit that referenced this issue Jan 7, 2014
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
…ields and single valued field for metadata fields.

Also the `fields` option can only be used to fetch leaf fields, trying to do fetch object fields will return in a client error.

Closes elastic#4542
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
@ehsanul
Copy link

ehsanul commented Feb 26, 2014

This was a breaking change for 1.0, but was unfortunately not mentioned in the breaking changes section of the manual. Our application relied on getting back the fields as the same type they were stored as originally in the _source, but getting back an array with a single result broke that assumption.

Not sure about the best way to prevent this breakage, without trying to handle this in the application wherever we've used fields options. @Mpdreamz mentioned using the includes / excludes option for fields mapping, but I'm unsure how that works.

@ehsanul
Copy link

ehsanul commented Feb 26, 2014

Oh, I think I see. The link was to the source mapping, but I'm guessing I just want to use source filtering here.

@javanna
Copy link
Member

javanna commented Feb 26, 2014

Hi @ehsanul , the breaking change is mentioned here. You should indeed switch to source filtering, which is meant to extract portions of the _source and is more flexible as well.

@ehsanul
Copy link

ehsanul commented Feb 26, 2014

Ah, I must have not read that closely enough, my apologies. Thanks for the confirmation, I will give source filtering a shot!

@nemosupremo
Copy link

This is probably a done deal and forgive my ignorance on the subject / elasticsearch. I am using ES as my single datastore, and I have relatively large _source objects (a single field has up to 256kb, the rest are tiny). However when I want to query objects, I don't need that field so I _source_exclude it.

The problem is compared using _source_exclude is slow (I'd imagine because you have to parse the JSON object), and when making a query for 500+ objects at time, I see query times jump from 5-15ms on average to 800-1400ms. I also only have less than 10,000 objects (a tiny database).

The solution I came up with was to store the fields I need in the index, then return those with the fields parameter - and it works, the queries are back down 5-15ms range and only I have the data I need.

Now I don't know how common my use case is, but having to deal with the array is bothering and in my specific use case, in Go, it means having two struct definitions - 1 for the fields field that handles arrays, and another for everything else (getting/putting/etc.)

Edit: I'm not sure if you wish to tackle this use case, but with large _source fields, _source_exclude can cripple a query. Making a query with 100 results on my local macbook air (no network latency) gives me ~5ms with _source=false, ~200ms with _source=true (the response size is over 100M), ~160ms for _source=true&fields=subject,date,etc and 30000ms for _source_exclude=ast - over 150x slower (and understandably so, from my understanding to exclude the fields, you need to parse and then reserialize the whole JSON object).

The reason I am asking for greater flexibility over the fields property is it allows me to store all my data in ES and have quicker access to fields I need in the source without parsing the _source. Now I'm new to ES and I'm not sure my use case common, or if I am properly using ES but I thought it would be useful to present it.

@ajhalani
Copy link

Upgrading to v1.0.x and just ran into this change. I am worried once we switch to source filtering and in future decide to store fields for performance reasons, it's going to be very difficult to switch back to "fields" because of the inconsistency of returning non-array fields as array.

Maybe we could have a field level option like allow_array:true|false (defaults to true), which if set to false allows to index/returns only non-array value so users can switch from source filtering to fields w/o breaking their code. Just an idea from top of my head, I am sure there may be better ways.

esmarkowski added a commit to esmarkowski/tire that referenced this issue Mar 13, 2014
Elasticsearch v1 returns field values as an array when using `fields` option.

elastic/elasticsearch#4542
@seti123
Copy link

seti123 commented Mar 31, 2014

+1 We have to change whole application and check where we query with fields and handle array instead of value. Please note it is a lot of work to check all places in ES Applications if a field query is used, and changing handling of results. This delays all my projects.

Is there any option to switch it OFF - so that single values are NOT returned as array? Is there an option in the node.js elastic search client to convert it?

@seti123
Copy link

seti123 commented Mar 31, 2014

Please note, I'm talking about the REST interface (I don't care about internal structure in ES, but if I store single field I expect to get the same back ...). Since a week we put code like (typeof doc.fields.xx == Array) //es 1.x then doc.fields.xx = doc.fields.xx[0] all around in our applications.

@kimchy
Copy link
Member

kimchy commented Mar 31, 2014

@seti123 now in 1.0 the response is much more consistent, though for rarer cases, but if you had single value vs. multiple values, you would have had to check it as well, worse when it was doing the _source extraction compared to stored fields, in which case the format was broken for different type of structures between _source and stored fields. Now, it only handles stored fields, and its much more consistent in returning an array of "core" values always.

@seti123
Copy link

seti123 commented Mar 31, 2014

Ok, I understand and we started already to do the changes. I think its more a business discussion, we have a product in evaluation by customers, for some other reason we had to update to 1.x (now to 1.1) - and all the effort delayed it.
Back to the technical part: is there a way around, can we change mapping (e.g. declare all relevant fields used in Report and UI as "stored" and it will not be wrapped to "[]" ? ). Could somebody pls. advise on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants