Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Tiers] Add telemetry enhancements for data tiers utilization #71204

Open
jethr0null opened this issue Apr 1, 2021 · 4 comments
Open

[Data Tiers] Add telemetry enhancements for data tiers utilization #71204

jethr0null opened this issue Apr 1, 2021 · 4 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team

Comments

@jethr0null
Copy link

jethr0null commented Apr 1, 2021

Telemetry was added for data tiers in this pr.

Currently collected data:

node_count :: number of nodes with this tier/role
index_count :: number of indices on this tier
total_shard_count :: total number of shards for all nodes in this tier
primary_shard_count :: number of primary shards for all nodes in this tier
doc_count :: number of documents for all nodes in this tier
total_size_bytes :: total number of bytes for all shards for all nodes in this tier
primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
primary_shard_size_median_bytes :: median shard size for primary shard in this tier
primary_shard_size_mad_bytes :: median absolute deviation of shard size for primary shard in this tier

Challenges with the current data:

The existing telemetry does not enable us to distinguish actual utilization and will wind up reporting things like index_count in multiple tiers if the node is tagged with multiple node roles. In order to be able to accurately report on the actual utilization of each tier, we need to add telemetry which would associate these fields with the role that the data is currently associated with.

For example, I would expect something like the following query of our telemetry data should accurately return only data that is “actively associated” with the warm tier: stack_stats.xpack.data_tiers.data_warm.index_count > 1

A concrete example of how this data will be used is to report on and visualize the number of unique clusters that have data residing on a given tier (the ability to drill down into more detailed stats such as the doc_count or index_count for the data residing on each tier would also be useful).

It would also be useful to be able to distinguish whether the tier an index is located on matches its first preference (index.routing.allocation.include._tier_preference). So for example, an index might specify cold as its first preference but if no cold nodes are available it could reside on its tier of second preference (say warm). We could use this distinction to suggest actions to the user such as scaling or enabling autoscaling.

cc @dakrone @sajjadwahmed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@dakrone
Copy link
Member

dakrone commented Apr 26, 2021

distinguish between what roles a node is capable of acting as versus what role(s) it is actively acting as

Can you explain this one a little more? I don't think I understand what you mean by "actively acting as".

@jethr0null
Copy link
Author

Sure thing. I updated the original comment for clarity/corrections.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants