Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format doc values fields. #22146

Closed
wants to merge 4 commits into from

Conversation

Projects
None yet
7 participants
@jpountz
Copy link
Contributor

commented Dec 13, 2016

Currently docvalues_fields return the values of the fields as they are stored
in doc values. I don't like that it exposes implementation details, but there
are also user-facing issues like the fact it cannot work with binary fields.
This change will also make it easier for users to reindex if they do not store
the source, since docvalues_fields will return data is such a format that it
can be put in an indexing request with the same mappings.

The hard part of the change is backward compatibility, since it is breaking.
The approach taken here is that 5.x will keep exposing the internal
representation, with a special format name called use_field_format which
will format the field depending on how it is mapped. This will become the
default in 6.0, and this hardcoded format name will be removed in 7.0 to ease
the transition from 5.x to 6.x.

@nik9000
Copy link
Contributor

left a comment

I like the idea. I like anything that makes it clear that these fields have been processed somehow and we aren't just returning them directly from source.

* @param name The field to get from the docvalue
* @param format How to format the field, {@code null} to use defaults.
*/
public SearchRequestBuilder addDocValueField(String name, String format) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Dec 13, 2016

Contributor

Since we're usually OK with making breaking changes to the Transport API in a minor release it is probably ok to add the format parameter to this method if you think it makes sense.

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 14, 2016

Author Contributor

I know Kibana uses this feature to retrieve dates in millis since Epoch, so with this option they will be able to pass format: epoch_millis when we format docvalue fields. I also plan to use it for the transition period: users will be able in 5.x to enable the 6.x behaviour by using the special format use_field_defaults.

core/src/main/java/org/elasticsearch/index/query/InnerHitBuilder.java Outdated
* @param name name of the field
* @param format how to format the field
*/
public InnerHitBuilder addDocValueField(String name, String format) {

This comment has been minimized.

Copy link
@nik9000

nik9000 Dec 13, 2016

Contributor

Can you make format @Nullable and describe what null means here?

@jimczi

This comment has been minimized.

Copy link
Member

commented Dec 14, 2016

+1 as well. I think it's useful for dates, it also fixes problems we can have with binary doc_values. Not sure about the name of the BWC param though, in the code you use use_field_format and in the docs you use use_field_defaults. What about use_field_mapping ?

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2016

Woooooops, good catch. I like use_field_mapping too, I'll wait for this to be discussed in Fixit Friday to see what others think about it.


@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeString(name);

This comment has been minimized.

Copy link
@nik9000

nik9000 Dec 14, 2016

Contributor

Can you move this up under the reading constructor? It helps to have them on the same screen.

--------------------------------------------------
GET _search
{
"docvalue_fields" : [ { "name": "my_date", "format": "epoch_millis" } ]

This comment has been minimized.

Copy link
@nik9000

nik9000 Dec 14, 2016

Contributor

We should keep this as the default if possible in 5.x so we aren't making breaking changes, I think.

This comment has been minimized.

Copy link
@jpountz

jpountz Dec 15, 2016

Author Contributor

Agreed

@jpountz jpountz force-pushed the jpountz:fix/format_doc_values_fields branch Dec 27, 2016

@jpountz jpountz referenced this pull request Dec 27, 2016

Closed

Format doc_value fields #22354

@jpountz jpountz removed the discuss label Dec 27, 2016

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Dec 27, 2016

I'm removing the discuss label since we agreed to do this in FixitFriday. I also opened another PR for 5.x to better show the bw compat layer: #22354.

"docvalue_fields" : [ { "name": "my_date", "format": "epoch_millis" } ]
}
--------------------------------------------------
// CONSOLE

This comment has been minimized.

Copy link
@nik9000

nik9000 Dec 30, 2016

Contributor

Optionally, you may want to do something like // TEST[setup:twitter] to create a small index so the test generated from the snippet will find anything an exercise the snippet more fully. It certainly isn't required but I've been trying to do it lately.

@clintongormley clintongormley added v5.3.0 and removed v5.2.0 labels Jan 24, 2017

@jpountz jpountz force-pushed the jpountz:fix/format_doc_values_fields branch Jan 31, 2017

@clintongormley clintongormley added v5.4.0 and removed v5.3.0 labels Feb 7, 2017

Format doc values fields.
Currently `docvalues_fields` return the values of the fields as they are stored
in doc values. I don't like that it exposes implementation details, but there
are also user-facing issues like the fact it cannot work with binary fields.
This change will also make it easier for users to reindex if they do not store
the source, since `docvalues_fields` will return data is such a format that it
can be put in an indexing request with the same mappings.

The hard part of the change is backward compatibility, since it is breaking.
The approach taken here is that 5.x will keep exposing the internal
representation, with a special format name called `use_field_format` which
will format the field depending on how it is mapped. This will become the
default in 6.0, and this hardcoded format name will be removed in 7.0 to ease
the transition from 5.x to 6.x.

@jpountz jpountz force-pushed the jpountz:fix/format_doc_values_fields branch to d1490e1 Apr 21, 2017

jpountz added some commits Apr 21, 2017

@clintongormley clintongormley removed the v5.4.1 label May 15, 2017

@rjernst

This comment has been minimized.

Copy link
Member

commented Jun 9, 2017

@jpountz Should this PR (and the related 5.x PR) be brought up to date and merged?

@clintongormley clintongormley added v5.4.3 and removed v5.4.2 labels Jun 14, 2017

@clintongormley clintongormley added v5.4.4 and removed v5.4.3 labels Jun 27, 2017

@dakrone

This comment has been minimized.

Copy link
Member

commented Aug 15, 2017

@jpountz ping on this issue again for updating and merging (or closing if it's not needed any more)

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2017

this PR needs sync with Kibana in order to be merged so I'll do it after 6.0 is out

@jpountz jpountz closed this Oct 10, 2017

@jpountz jpountz referenced this pull request Oct 10, 2017

Closed

Format doc values fields #26948

@lcawl lcawl added v6.0.0-rc2 and removed v6.0.0 labels Oct 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.