Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate _field_stats endpoint #23914

Merged
merged 1 commit into from
Apr 10, 2017
Merged

Deprecate _field_stats endpoint #23914

merged 1 commit into from
Apr 10, 2017

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 5, 2017

_field_stats has evolved quite a lot to become a multi purpose API capable of retrieving the field capabilities and the min/max value for a field.
In the mean time a more focused API called _field_caps has been added, this enpoint is a good replacement for _field_stats since he can retrieve the field capabilities by just looking at the field mapping (no lookup in the index structures).
Also the recent improvement made to range queries makes the _field_stats API obsolete since this queries are now rewritten per shard based on the min/max found for the field.
This means that a range query that does not match any document in a shard can return quickly and can be cached efficiently.
For these reasons this change deprecates _field_stats endpoint. The deprecation should happen in 5.4 but we won't remove this API in 6.x yet which is why this PR is made directly to 6.0.
The rest tests have also been adapted to not throw an error while this change is backported to 5.4.

_field_stats has evolved quite a lot to become a multi purpose API capable of retrieving the field capabilities and the min/max value for a field.
In the mean time a more focused API called `_field_caps` has been added, this enpoint is a good replacement for _field_stats since he can
retrieve the field capabilities by just looking at the field mapping (no lookup in the index structures).
Also the recent improvement made to range queries makes the _field_stats API obsolete since this queries are now rewritten per shard based on the min/max found for the field.
This means that a range query that does not match any document in a shard can return quickly and can be cached efficiently.
For these reasons this change deprecates _field_stats. The deprecation should happen in 5.4 but we won't remove this API in 6.x yet which is why
 this PR is made directly to 6.0.
 The rest tests have also been adapted to not throw an error while this change is backported to 5.4.
@@ -59,7 +59,14 @@ setup:

---
"Basic field stats":
- skip:
version: " - 5.99.99"
reason: Deprecation was added in 6.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused: are we deprecating in 5.4 or 6.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it to not break tests until this is backported to 5.4 like you mention in the PR description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry for the confusion. I should have added a note here

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimczi jimczi merged commit 9b3c85d into elastic:master Apr 10, 2017
@jimczi jimczi deleted the deprecate_field_stats branch April 10, 2017 08:10
jimczi added a commit that referenced this pull request Apr 10, 2017
_field_stats has evolved quite a lot to become a multi purpose API capable of retrieving the field capabilities and the min/max value for a field.
In the mean time a more focused API called `_field_caps` has been added, this enpoint is a good replacement for _field_stats since he can
retrieve the field capabilities by just looking at the field mapping (no lookup in the index structures).
Also the recent improvement made to range queries makes the _field_stats API obsolete since this queries are now rewritten per shard based on the min/max found for the field.
This means that a range query that does not match any document in a shard can return quickly and can be cached efficiently.
For these reasons this change deprecates _field_stats. The deprecation should happen in 5.4 but we won't remove this API in 6.x yet which is why
 this PR is made directly to 6.0.
 The rest tests have also been adapted to not throw an error while this change is backported to 5.4.
@aleph-zero
Copy link
Contributor

Kinda late to the party on this one, but can we reconsider the name field_caps? Reason being, our API is already overwhelming to beginner and intermediate users, so our naming has to be easy to understand. Naming an endpoint field_caps makes one think at first glance it has something to do with capitalization. Why not just call it field_capabilities? Much easier to remember.

@trevan
Copy link

trevan commented May 17, 2017

https://www.elastic.co/blog/managing-time-based-indices-efficiently mentions using _field_stats to find indices that are old. With _field_stats gone, is there a way to quickly get the list of indices that are older than a certain point? I think the min/max aggregation would require querying every single index (or at least all the potentially old indices) to see if it is old enough.

@hakanai
Copy link

hakanai commented Jun 5, 2017

We are using this to get the min and max for a field without having to do a search. Unless the old version was already implemented by doing a search, isn't changing our code to an aggregation just going to make it slower?

@jpountz
Copy link
Contributor

jpountz commented Jun 5, 2017

The field stats API was just looking at index stats, while aggs do a search all the time, so aggs will indeed be slower. We could look into making aggs look at index statistics too when the query matches all docs and there are no deletions.

@ggrossetie
Copy link
Contributor

Kibana 5.4+ is using the _field_stats API to get indices names from a time range. It's a great feature because users can configure an index pattern with a wildcard but the search/aggregation will only be executed on indices that match the time range.
I was planning to implement a similar feature on Grafana but since this API is deprecated I don't think it's good idea 😐

In our case, running an aggregation is not desirable because we have one billion of documents each day. Is field_caps API will include min/max values at some point ? Do you plan to create another API to get index stats ?

@ncepuwanghui
Copy link

@trevan Hi, I want to get the list of indices in the specified time range by timestamp field in Elasticsearch 6.3. How can I do this without _field_stats(as follows). I failed to get it with the new API _field_caps. Do you have any better way?

GET search-logs/_field_stats?level=indices
{
  "fields": ["@timestamp"],
  "index_constraints": {
    "@timestamp": {
      "max_value": {
        "lt": "2016/07/03",
        "format": "yyyy/MM/dd"
      }
    }
  }
}

@trevan
Copy link

trevan commented Jul 23, 2018

@ncepuwanghui, the only work around I have is to do a search request with an aggregation of the indices and the min/max of the timestamp in each index. It is a LOT slower than the _field_stats so we are actively trying to get away from that workflow.

@radu-gheorghe
Copy link
Contributor

We were in need of this functionality as well and added it back via a plugin: https://github.com/sematext/elasticsearch-field-stats

Adding the link here in case it's useful for someone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants