Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mandatory data tier preference #76147

Open
7 tasks done
henningandersen opened this issue Aug 5, 2021 · 6 comments
Open
7 tasks done

Mandatory data tier preference #76147

henningandersen opened this issue Aug 5, 2021 · 6 comments
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team

Comments

@henningandersen
Copy link
Contributor

henningandersen commented Aug 5, 2021

Using data tiers allows allocating indices to dedicated tiers of nodes. Such nodes would typically have different characteristics, either physically (storage type, RAM:storage ratio) or from a usage standpoint (my hot tier is expected to respond fast).

Using data tiers is optional in that using the data role will assign all data tiers to the node. However, if a cluster is using separate data tiers it is desirable to be explicit about where a specific index belongs.

Today we allow index.routing.allocation.include._tier_preference to be unspecified for an index. This prevents Elasticsearch and its clients from relying on which tier an index/shard is located on, affecting following:

  • Autoscaling does not know which data tier to scale up.
  • The _tier query will not know the tier of an index/shard.

Futhermore, it allows us to rely on this for future developments, such as balancing of shards, UI, monitoring and more. There is no known good use case for a tier-less index and allowing it only adds complexity for ourselves and users and can be considered bad data.

The proposal here is to work towards having index.routing.allocation.include._tier_preference be mandatory for all indices in following steps:

  • Add a cluster setting to signal that creating new indices should always result in a tier preference. When set, creating an index should add the default tier preference if no explicit preference was given in the request. This will be default off in 7.x, default on in 8.
    • And, in fact, only on allowed in 8. (edit: it's always treated as on, in that we disregard the value of the setting)
  • Add a deprecation info and warnings in 7.x, only for clusters that have data nodes without all data roles.
    • Add deprecation info for indices without a tier preference set.
    • Add deprecation warning when creating an index results in no tier preference set. This should include create index, rollover and create data stream.
  • Make ILM migrate action mandatory in 8.0, regardless of allocate action.
  • On 7.x, change the migrate_to_data_tiers API to apply the default data tier preference to any index that results in no tier preference otherwise and set the cluster setting mentioned in the first work item to ensure new indices are assigned a tier preference.

In a future release (possibly 9.0) we should close the loop and:

  • Remove the flag from cluster settings.
  • Enforce not setting tier preference to null (we could consider doing this in 8.0 too).
  • Evaluate at what point we need/want to drop the migrate_to_data_tiers API from the code (8.x? 9.x?)
@henningandersen henningandersen added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management needs:triage Requires assignment of a team area label labels Aug 5, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Aug 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@gwbrown gwbrown removed the needs:triage Requires assignment of a team area label label Aug 5, 2021
@gwbrown
Copy link
Contributor

gwbrown commented Aug 5, 2021

@colings86 @dakrone This issue also seems relevant to recent conversations regarding how node shutdown should interact with tier preference - if tier preference becomes mandatory, that could change our calculus on the issue.

@joegallo
Copy link
Contributor

joegallo commented Oct 15, 2021

Things to come back to:

@droberts195
Copy link
Contributor

What is the expectation when a snapshot taken in a pre-7.10 cluster is restored into an 8.x cluster? The indices in that snapshot will not have a tier preference set. Does it mean that 8.x code cannot safely assume that every index will have a tier preference set? Or will snapshot restoration be changed for 8.0 and above to automatically set a tier preference on restored indices that didn't have one when snapshotted?

@henningandersen
Copy link
Contributor Author

@droberts195
We allow indices without a _tier_preference in 8.0, we did not break this, that is for an upcoming release after proper deprecation period. We cannot be guaranteed a _tier_preference, but will assume it is there (without breaking if missing) for things like autoscaling, since users should run the migrate api to fix deprecations before upgrading.

@droberts195
Copy link
Contributor

users should run the migrate api to fix deprecations before upgrading

If it's not already documented I think it might be worth calling out in the docs that even if you fix all the deprecations before upgrading you can reintroduce indices that have those same deprecated characteristics by restoring an old snapshot.

will assume it is there (without breaking if missing)

This needs to be well-known among developers who write code that searches Elasticsearch. It's not safe to assume _tier_preference is always set in 8.0+. Code needs to be written in such a way that it won't break if it encounters an index that was restored from an old snapshot. This knowledge doesn't just need to be in the core ES team, but also for example in teams writing UI code that wants to quickly obtain an example document from the fastest indices that match a pattern. I will make sure the ML UI team are aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

5 participants