Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for data stream naming scheme and related assets in Kibana #134883

Open
ruflin opened this issue Jun 22, 2022 · 3 comments
Open

Native support for data stream naming scheme and related assets in Kibana #134883

ruflin opened this issue Jun 22, 2022 · 3 comments
Labels
Feature:Index Management Index and index templates UI Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more

Comments

@ruflin
Copy link
Member

ruflin commented Jun 22, 2022

With 7.13, data streams and component templates were introduced in Elasticsearch as a feature. Along with it came the data stream naming scheme and Elasticsearch loading templates for logs-*-*, metrics-*-* etc. The data stream naming scheme does not only describe how data should be organised and index but Fleet also follows conventions on how index templates and ingest pipelines are named and how these are extended.

Up to today, this is all based on conventions and enforced when modifying anything through Fleet. But as soon as users access the stack directly through the Stack Management UI. Currently (8.3) Stack Management UI is able to show which templates are loaded directly by Elasticsearch or Fleet as a tag "managed: true" exists. But there the understanding on how these assets are organised stops.

As the data stream naming scheme is not only a Fleet concept but now at the core of Elasticsearch and the recommended way to ingest and manage data in Elasticsearch, Stack Management UI should have an understanding for how it works to provide a better UX around it. There are many aspects to this inside the Stack Management UI but also outside for example inside the unified search bar. Here an example for the Stack Management UI on what could be done.

User story

If a user goes to the Stack Management UI and tries to modify any of the templates belonging to the data stream naming scheme, the Stack Management UI provides guidance. Index Templates and Component templates that belong together are grouped together in the UI. If a user tries to modify one of the managed assets, Stack Management guides the user to use the @custom template instead and creates it if needed. The same applies for ingest pipelines.

The main goal for all flows is to support users making the right decisions, not allow them to shoot themselves into the foot and provide a good UX around the conventions we have.

@ruflin ruflin added the Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more label Jun 22, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

@cjcenizal
Copy link
Contributor

cjcenizal commented Jun 22, 2022

Thanks @ruflin! I have questions about the naming scheme and how we're enforcing it. For clarity, I'm going to use "managed data stream" to refer to the data streams described by the naming scheme docs. According to these docs, a managed data stream must satisfy these criteria:

  • It must be named {type}-{dataset}-{namespace}.
  • It must have the field data_stream.type.
  • The value of the data_stream.type field must equal {type} from the data stream's name.
  • It must have the field data_stream.dataset.
  • The value of the data_stream.dataset field must equal {dataset} from the data stream's name.
  • It must have the field data_stream.namespace.
  • The value of the data_stream.namespace field must equal {namespace} from the data stream's name.

Is that right?

I just did some testing, and it looks like ES allows the user to create data streams and index templates that only partially meet these criteria. So it's easy for a user to create entities that kinda look like they're managed, but actually aren't. In order to accurately identify a data stream as managed, we'll need logic to assess whether it meets all of the above criteria. This logic will need to live on both the front-end (for any special behavior or presentation specific to managed data streams) and the back-end (for any APIs that perform operations on managed data streams).

I think we can simplify this substantially if we can build a strong validation strategy into ES, to ensure that managed data streams and related entities are easy to create and identify, and behave as expected. For example:

  • Signifying managed data streams with a reserved _meta.managed: true field. Something like this will make it trivial to identify whether a data stream or other entity is managed, assuming that we have strong validation that ensures it will always meet the above criteria.
  • Validation that a request to create a managed data stream satisfies the criteria listed above.
  • Validation that a request to change any of a data stream's special mappings doesn't invalidate these criteria.
  • Validation that a request to create an index template that would generate a managed data stream also satisfies these criteria.

These are just a few thoughts off the top of my head. We'd probably need to do a more thorough analysis concerning other related entities like component templates and ingest pipelines. For example, a request to change a component template's mappings needs to be validated against the composed result in case it removes a required field. Ingest node pipelines that feed into managed data streams need to be validated to ensure they don't remove a managed data stream's special fields.

@ruflin
Copy link
Member Author

ruflin commented Jun 24, 2022

The points you make above @cjcenizal are all correct. I like the idea of building this even deeper into Elasticsearch. But I think there is on aspect that you partially missed which is around the UX. In an ideal world from my perspective, a user has never to know about the data stream naming scheme and all the conventions around it. It the users is using Kibana UI or API, we hide it from the user. If a user wants to modify some mappings, we offer the user to add / remove / edit a mapping, what template it is stored in is not relevant, it just works.

Taking this to the data streams: A user wants to store nginx log data in Elasticsearch. We can ask first the user if logs or metrics, then what the user would like to call this data set. We create logs-nginx-default for the user and explain how to ship data there. But the user never created a data stream or had to learn what an index template as and what component templates are.

@yuliacech yuliacech added the Feature:Index Management Index and index templates UI label Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Index Management Index and index templates UI Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more
Projects
None yet
Development

No branches or pull requests

4 participants