Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Doc+) Flush out Data Tiers #107981

Merged
merged 12 commits into from
Jul 18, 2024
Merged

(Doc+) Flush out Data Tiers #107981

merged 12 commits into from
Jul 18, 2024

Conversation

stefnestor
Copy link
Contributor

👋🏽 howdy, team!

I highly value the content on this Data Tiers page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise.

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs.

The specific clarifications I'd like to push in order of appearance:

  • There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up.
  • Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR [TIP] guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
  • There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about _tier_preference so just extended the existing section already about it.

TIA! 🙏 cc: @dakrone @bytebilly

👋🏽  howdy, team!

I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

TIA! 🙏
@stefnestor stefnestor added >enhancement >docs General docs changes Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. labels Apr 27, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine

This comment was marked as resolved.

@elasticsearchmachine elasticsearchmachine added v8.15.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 27, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine elasticsearchmachine removed the Team:Data Management Meta label for data/management team label Apr 27, 2024
@stefnestor stefnestor added the Team:Data Management Meta label for data/management team label Apr 27, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Data Management Meta label for data/management team label Apr 27, 2024
@shainaraskas shainaraskas self-requested a review April 29, 2024 15:33
Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 you added so many great details in this PR!

I've reviewed and provided some feedback/edits from an organization and clarity POV. There are some nuances around tier hardware profiles that I didn't completely understand, so I apologize for any inaccuracies I injected with my edits and for any feedback that doesn't exactly align with your goals.

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
stefnestor and others added 2 commits May 2, 2024 11:05
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
@stefnestor
Copy link
Contributor Author

👋🏽 @shainaraskas , thanks for hanging out! Apologies for the delay, I work weekends so today's my Monday.

Your edits are also 🔥 , cheers! I accepted all grammar and most rewordings; I've left comments on what remains because I agree it matters to get these parts right to avoid confusion.

Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just working through your comments on the index allocation section but thought I'd throw these comments your way :)

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Copy link
Contributor

@shainaraskas shainaraskas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking so good! left a couple of comments that are up to your preference.

I think we're basically ready to go, but I'm not sure why the tests are failing. looking into it now. 👍

edit: this looks like it's maybe the same error as your other PR, so I'm going to rebase this one too.

edit 2: after it's green and you check out my comments, feel free to merge (unless you're waiting on an engineering review).

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
@shainaraskas
Copy link
Contributor

we can also probably target 8.14.0, 8.13.3, and 8.13.4 with this so the docs are available asap.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments for this change.

I also have concerns that we give a false sense of specificity with giving hard recommendations for percentages in these docs. My preference would be to teach the reader to weigh the values of cost, performance, and configuration complexity rather than giving hard numbers that are likely to mislead a user. I'm curious what your thoughts about this are.

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@stefnestor
Copy link
Contributor Author

I also have concerns that we give a false sense of specificity with giving hard recommendations for percentages in these docs.

From sub-thread, we're agreed to leave this out for now & consider in future doc/blog. Ready again for your review, @dakrone 🙏 & sorry for the delay.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on this Stef!

@stefnestor stefnestor merged commit 67a8e89 into main Jul 18, 2024
6 checks passed
@stefnestor stefnestor deleted the stefnestor-patch-7 branch July 18, 2024 20:35
stefnestor added a commit to stefnestor/elasticsearch that referenced this pull request Jul 18, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
stefnestor added a commit to stefnestor/elasticsearch that referenced this pull request Jul 18, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.14
8.13

elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
ioanatia pushed a commit to ioanatia/elasticsearch that referenced this pull request Jul 22, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Jul 23, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Jul 23, 2024
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. 

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. 

The specific clarifications I'd like to push in order of appearance:

- There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. 
- (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
- There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. 

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. Team:Docs Meta label for docs team v8.13.3 v8.13.5 v8.14.0 v8.14.4 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants