Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Naming convention for builtin pipelines #96267

Closed
eyalkoren opened this issue May 23, 2023 · 27 comments
Closed

[Discussion] Naming convention for builtin pipelines #96267

eyalkoren opened this issue May 23, 2023 · 27 comments
Assignees
Labels
discuss Team:Data Management Meta label for data/management team

Comments

@eyalkoren
Copy link
Contributor

eyalkoren commented May 23, 2023

TL;DR

This issue quickly evolved to include not only ingest pipelines, but all builtin assets.
The eventual agreed schema is described in #96267 (comment), with examples shown in a spreadsheet.

Description

#95782 introduced a way for stack components to automatically install prebuilt ingest pipelines.
Since then, the following pipelines were already added:

  1. logs-default-pipeline (through [Logs+] Default pipeline for logs data streams #95971)
  2. behavioral_analytics-events-final_pipeline (through [Behavioral Analytics] Analytics pipeline to the index template registry #96104)
  3. logs@json-message (through [Logs+] Add pipeline that parses JSON log events into top-level fields #96083)

We want to come up with a naming convention that will:

  • allow sufficient flexibility to easily create unique names
  • produce descriptive names
  • produce names that are less likely to collide with custom pipelines added by users

Just as an example, consider a template such as: {data-stream-type}@{purpose}[-{description}], where -{description} is optional, yielding the following names:

  1. logs@default
  2. behavioral_analytics@final
  3. logs@json-message
  4. all@ecs-core
  5. all@ecs-extended
  6. logs@ecs

Maybe as an extension to this discussion, we should decide on how and where we properly document the builtin pipelines, as well as builtin component templates. We could use the org.elasticsearch.xpack.core.template.IndexTemplateRegistry concrete implementations to produce this documentation, or at least to verify that all automatically-installed components are documented (e.g. through a unit test that matches documented lists to registry-produced lists).

@ruflin @felixbarny @jimczi @dakrone - please provide your input.

@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 23, 2023
@astefan astefan added :Data Management/Other and removed needs:triage Requires assignment of a team area label labels May 23, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 23, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@felixbarny
Copy link
Member

Summarizing the conventions for other assets:

Type Naming convention Example
Index template {type} logs
ILM Policy {type} logs
Mappings component template {type}-mappings logs-mappings
Settings component template {type}-settings logs-settings

Given that, the proposed naming convention {data-stream-type}@{purpose}[-{description}] seems not to be following the existing conventions.

What about {type}-pipeline_{purpose}/{type}-{dataset}-pipeline_{purpose} as a convention.
For example: logs-pipeline_default, logs-pipeline_json_message, behavioral_analytics-events-pipeline_final.

In #96083 (comment), @ruflin mentioned a concern that this naming convention may conflict with the data stream naming scheme so that behavioral_analytics-events-pipeline_final may be interpreted as type: behavioral_analytics, dataset: events, namespace: pipeline_final.

To be honest, I don't think that's a huge concern and reason enough to break with existing naming conventions. After all, that's a name of a pipeline, not the name of a data stream.

@ruflin
Copy link
Member

ruflin commented May 24, 2023

For me this discussion is linked to elastic/kibana#149484 @joshdover. We should not only discuss pipeline names but also index template and component names as I expect we will also ship some of these by default.

The old one don't follow any convention as there wasn't any. One difference here is, that the existing logs-mappings were never recommended / documented for reuse AFAIK. I expect this to change and why the convention is more important. Even if the new convention conflicts with the old one, we don't need to break the old one but ensure all the new ones follow it.

I like the direction @eyalkoren is taking with {data-stream-type}@{purpose}[-{description}]. Part of that for me is that the @ marks it is something internal / managed by Elastic. Applying this to potential ecs component templates, this would become all@ecs-8.2.0. Instead of all, we could use signals. Or should this be ecs@8.2.0, meaning it is {data-stream-type}|{purpose}@{description}

For {type}-{dataset}-pipeline_{purpose}, my assumption is all the pipeline / reusable templates we load are never specific to a dataset but always on the type or global level.

@ruflin
Copy link
Member

ruflin commented Jun 5, 2023

As #96171 is moving forward, we should come to a conclusion here quickly. The most important part for me is, that it is directly visible that an ingest pipeline or template is managed by the stack. @ should be used as this sign as we already started down this path with Fleet. @custom falls also under this even though the user can modify it, it has a special purpose.

Based on the above, I propose we follow the {type}@... rule with the exception of ecs@. Several ECS component templates will exist in the future and ECS is important enough, to have it's own prefix.

We still need to continue iterating on what the conventions are after @ but we can figure this out along the way.

@eyalkoren
Copy link
Contributor Author

Anyone wishing to propose a convention that will cover what we need- please take a look at the current list of resources we use to get a good idea.

@eyalkoren
Copy link
Contributor Author

Documenting here the decision we made in our offline discussion: we are good with replacing component names within the registry, as opposed to registering each component both with the old and the new name.

This means that in rolling upgrades, the already registered components will not be removed, so the cluster would serve both.
New clusters will serve the components only through the new names.
External integrations that rely on stack templates may still break, but we believe that there are very few such, if any, in which case the fix would be to switch using the new name. This should be included in docs.

@dakrone
Copy link
Member

dakrone commented Jun 5, 2023

I think the idea of {type}@… is reasonable, and would allow us to stick with our …@custom standard that's already used today. I don't think that we need to make the @ mandatory (i.e., enforced in the API). I think a convention is enough for these rather than any hard rules.

@ruflin I would prefer to leave out ES or stack versions anywhere in the id of the thing itself (so no @8.2.0). With Serverless these versions will not make any user sense and can easily be mismatched between the client and server.

If I had to pick my preference, it would be the format of {type}[-{description}]@{kind}, where we enumerate the {kind} values. For example:

Type ID Description
Index template logs@default Default index template for logs
ILM Policy logs@default Default ILM policy for logs
ILM Policy logs-30days@optional Optional ILM policy for logs with 30d retention
Component template logs-mappings@default Default logs mappings
Component template logs-settings@default Default component template
User customizable template logs@custom Location for user customization
Pipeline logs-pipeline@default Default logs pipeline
Optional pipeline logs-json-parsing-pipeline@optional Optional default pipeline for JSON parsing

I don't have an extremely strong opinion about the {type}[-{description}]@{kind} vs {type}@{kind}[-{description}] except that I like being able to filter out items for *@custom or *@optional. Would it be worth having a separate delimiter for the description since - is almost "special-cased" in a lot of ways? Something like logs<delimiter>30days@optional or logs@optional<delimiter>30days?

I'd like to enumerate the values for what {kind} can be though, I think @default and @custom make sense. What about @optional to indicate that it is something installed but not necessarily used by default?

What about the {type} options? In the original I see logs, behavioral_analytics, and all. I think almost all of those will correspond to a dataset except for "all", but what others do we expect?

@ruflin
Copy link
Member

ruflin commented Jun 6, 2023

I think a convention is enough for these rather than any hard rules.

++

With Serverless these versions will not make any user sense and can easily be mismatched between the client and server.

I don't expect the versioned templates to be loaded by Elasticsearch itself. Agree, it doesn't make sense for these. I expect us eventually to have an ECS / semconv package which is installed / managed by Fleet and could install different versions of the templates to be used. This are only there on demand.

logs-mappings

I like the @ directly after type because it ensures, these "global" assets do not conflict / are confused with a dataset name which could be called mapping. There already is for example logs-nginx.access@custom for dataset specific templates.

For the delimiter, the rule I was thinking of is that - is between kind / description and an _ as to be used to combine words together. This matches also with the data stream naming scheme conventions.

For the {type} values, at the moment I think we only have logs, metrics, traces, ecs. I'm not even sure we need all as all is likely ECS? And if we need all, I don't like all too much, maybe we can find a better term?

@felixbarny
Copy link
Member

And if we need all, I don't like all too much, maybe we can find a better term?

We could just use data_stream. For example data_stream@mappings or data_stream@default-mappings for what's currently called data-streams-mappings.

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Jun 20, 2023

Summarizing points from the discussion above:

  • @ is a good delimiter to indicate that a component is managed by Elastic
  • - is not a great delimiter, I already saw us struggling with it, especially in combination with _
  • no versions in names
  • use data_stream (or data-stream if - is not part of the schema) if a component is general and not intended for a specific data type

Adding my own points:

  • if we try to create consistency, we may either include or exclude the type of the component in the name (I vote to include). For example- logs@default represents both index template and ILM policy for logs, whereas logs-mappings@default and logs-settings@default include the component type.
  • also for the sake of consistency, I'd avoid optional part in the name, so that all names are constructed the same

First attempt to accommodate all of those, I propose this schema: {data-type}@{component-type}@{description}. The description is not optional and it can be written feely, including either - or _ or both, like other parts can.
Taking some examples for existing names, it would look like this:

Current ID New ID
data-streams-mappings data-stream@mappings@default
ecs@dynamic_templates data-stream@mappings@ecs
30-days-default data-stream@ilm-policy@30-days
logs-mappings logs@mappings@default
logs-settings logs@settings@default
logs logs@ilm-policy@default
logs logs@index-template@default
metrics-mappings metrics@mappings@default
.fleet-actions-results-ilm-policy fleet-actions-results@ilm-policy@default
.fleet-actions-results fleet-actions-results@index-template@default
.monitoring-kibana monitoring-kibana@index-template@default

Open questions:

  • do we need to include index templates in this, or only component templates? (for now I assume we do need)
  • do we need to enforce this naming change on xpack components as well? Others (like Fleet's)? If so- how should we manage this change?

Any feedback would be highly appreciated.

@felixbarny
Copy link
Member

do we need to include index templates in this, or only component templates? (for now I assume we do need)

Yes, I think it makes sense to not only apply this to component templates, but also index templates.

do we need to enforce this naming change on xpack components as well? Others (like Fleet's)? If so- how should we manage this change?

I don't think we need to enforce this but I agree that there should be consistency in terms of naming conventions.

Currently, the convention in fleet seems to be {data-type}@{description}.

For example, logs-nginx.access@custom may refer to both a custom component template or a custom ingest pipeline for the logs-nginx.access-* data stream.

While I do agree that having the component type as a part of the identifier would be nice, I'm not sure if that's worth breaking with the conventions in fleet which either imply an inconsistency or that we need changes in fleet. However, we can't change the conventions in fleet easily as it would mean the existing @custom component templates and index pipelines wouldn't be picket up anymore, which would be a breaking change. Therefore, I'd be inclined to not include the component type and piggy-back on the current conventions in fleet.

cc @jpountz FYI that we're planning to change the names of the built-in assets like index templates, component templates, and ILM policies.

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Jun 21, 2023

Well, if we are using this as a guideline, then we already have a convention and what we need to do is make everything adhere to it.

Second attempt then: the naming scheme has only two mandatory parts with one delimiter: {data-type}@{description}, where the only special character is @ and each part is free-form otherwise, meaning it may contain -, _, . and the component type.
Please 👍 or 👎 or propose something else

@ruflin
Copy link
Member

ruflin commented Jun 26, 2023

There is a potential way to include what we do today in Fleet into this:

{data-type}[-{dataset}]?[-{namespace}]?@{description}

This gives a bit more structure to what is before the @

Most important from my perspective is that we have Fleet following the same convention (but I hope it already does).

@eyalkoren Your table above did not include ingest pipelines, is that one purpose?

@eyalkoren
Copy link
Contributor Author

There is a potential way to include what we do today in Fleet into this:

{data-type}[-{dataset}]?[-{namespace}]?@{description}

I am OK with that, but I tried to avoid - as part of the schema, as it seems quite confusing with _.

Let me try and explain my thinking behind take 2 in a different way: this discussion made me think that it's better to make the schema looser, as long as it contains @, because the main driver for having the schema is making it easy to distinguish stack-managed from custom components.

The other important guidance is that the two parts of the schema are such where the first describes the data for which the component is for and the second part is essentially the ID of the component,. For example, if your data is described as XXX, then you know that XXX@default is the default component, XXX@custom is the reserved component for custom settings and you can also add XXX@ruflin to override this component for the XXX data.
I think that a two-mostly-free-form-parts schema is the best way to accommodate everyone's wishes without breaking it.
Maybe it's better to call these parts differently, so think about it as:
{data-description}@{component-id}

I hope this makes sense...

Your table above did not include ingest pipelines, is that one purpose?

No, not intended, but since we decided to not include the component type in the name by default, no need for examples for such.

@ruflin
Copy link
Member

ruflin commented Jun 27, 2023

{data-description}@{component-id}

I like it. I got confused by the usage of type and related it to data_stream.type but it is now clear.

+1 on using this.

No, not intended, but since we decided to not include the component type in the name by default, no need for examples for such.

Having some examples also for pipelines would help to make the point that pipelines are also affected by it especially as pipelines is in the title of the github issue. If everyone agrees with the your above recommendation, it would be great to have a final table as conclusion where we can send people to look up examples.

@eyalkoren
Copy link
Contributor Author

I started laying out my third proposal, based on everything discussed above.
A major part of it relied on @felixbarny's proposal to not include the component type, thus stay consistent with Fleet's naming.
However, when I started creating the examples, I realized that this would be an issue with how composable templates work, at least when mappings and settings are separated, which is quite common within the stack-managed components.
The composed_of list has no notion of the component type, only component ID, which means that the component ID needs to be unique across the entire component template registries.
For example, logs-template can use both the mappings and the settings components because they have different IDs that include the component template type.

Some ways we can deal with that, while still accommodate everything we decided so far:

  • since this problem is specific to mappings and settings components, we can merge those into a single component template. For example- logs-mappings.json and logs-settings.json will be merged into one file
  • include component template type in the name only when required (settings and mappings), but leave the naming scheme simple, with two mandatory parts and one delimiter (@)

@felixbarny @ruflin your input on that will be required in order for me to proceed

@jpountz
Copy link
Contributor

jpountz commented Jul 10, 2023

One difference here is, that the existing logs-mappings were never recommended / documented for reuse AFAIK. I expect this to change and why the convention is more important.

I'm wary about making these templates reusable, as components usually have dependencies. E.g. if you have an ingest pipeline that introduces a new field, this field needs to exist in mappings. You can't easily reason about components in isolation. So things become quite complicated if we start mixing up component templates across Elasticsearch's built-in component templates and Fleet's templates, as the two have different upgrade lifecycles. I worry that this would eventually result in any change in built-in templates being considered as a potential breaking change.

I can think of two ways to avoid this issue:

  • Discourage reuse across different owners of templates. E.g. Elasticsearch could prefix it's templates with internal/ (or managed/ or ...) to reflect that it owns these templates and that they are not expected to be reused. Fleet would use a different prefix.
  • Enable reuse by never changing component templates after they have been released, e.g. by introducing versioning in the naming scheme. So we could have logs-template-5.json using component template logs-mappings-2.json. When we update default mappings for logs, we'd introduce a new logs-mappings-3.json as well as a new logs-template-6.json that would include logs-mappings-3.json. This would require more work on the Fleet side to keep up-to-date with new templates, but would be safer as things would never change under Fleet's feet. Fleet would also introduce version numbers to enable users to reuse components in their own templates if they want to.

I have less experience dealing with templates than you do so I'm happy to take feedback, am I making it a bigger issue than it is? In any case, I'd like to make sure that changing the default template for logs would never be considered a breaking change, as it's an important way of keeping improving the out-of-the-box user experience.

@felixbarny
Copy link
Member

@eyalkoren, I think we can solve these issues by treating mappings/settings as description part of the naming convention. For example logs@mappings/logs@settings. Compared to previous suggestions, we'd remove the @default suffix, which isn't adding much, I think. The @custom suffix still makes sense and it may contain any custom configuration, including both mappings and settings.


@jpountz

I'm wary about making these templates reusable, as components usually have dependencies.
So things become quite complicated if we start mixing up component templates across Elasticsearch's built-in component templates and Fleet's templates, as the two have different upgrade lifecycles.

I think we'll want to use some of the built-in component templates, such as default mappings and settings, in Fleet in the future. Currently, when we want to make changes, such as using ignore_malformed, we need to duplicate the work in Elasticsearch (#95329) and in Fleet (elastic/kibana#157184).

It does imply, however, that we'll need to evaluate all changes to the built-in component templates on whether they make sense for Fleet and the integrations. I think this is a feature, not a bug, and acts as a forcing function to ensure integrations are leveraging best practices.


I'm having second thoughts about renaming the component templates. We've previously concluded (#96267 (comment)) that renaming wouldn't be a breaking change as we wouldn't delete component templates in existing deployments so that the current names will continue to live in the cluster and only new installations are affected.

One use-case came to mind that would break if we did that: ephemeral Elasticsearch environments that are provisioned with custom index templates that use built-in component templates that are versioned in a git repository.

When spinning up an ephemeral ES cluster, we can't differ between existing use-cases and new ones. Therefore, we might want to re-consider adding the components under both their old and their new name or bring up this question in the breaking changes committee.

@eyalkoren
Copy link
Contributor Author

So things become quite complicated if we start mixing up component templates across Elasticsearch's built-in component templates and Fleet's templates, as the two have different upgrade lifecycles.

@jpountz I need a clarification on this- is it about the order in which stack components get upgraded? How is this different from any other possible inconsistencies between Elasticsearch and Kibana?
@ruflin when mentioning reusability of components, is it about components that are reused by Fleet, or is it also about reuse by the integrations at the endpoints (Agents)? If the latter use case is valid, then I better understand this concern.

@eyalkoren, I think we can solve these issues by treating mappings/settings as description part of the naming convention. For example logs@mappings/logs@settings. Compared to previous suggestions, we'd remove the @default suffix, which isn't adding much, I think.

@felixbarny I don't get what the convention is in this case and how it will be applied to other components. I created a spreadsheet so that we can compare multiple convention options and examples of how components will be renamed, please add yours.

What I was thinking is that part of the refactoring will include the merge of mappings and settings components. From what I can see, couples of those (logs/metrics/synthetics-mappings/settings) are only used together, which indicates that the separation is somewhat artificial, which means that they can be merged. It also makes more sense because the counterpart @custom components are used for both settings and mappings.

Based on that, following is my third attempt to propose a convention:

Take III

There are three dimensions for the matrix we are trying to describe here

  • the data- examples: logs, metrics, nginx-access-log
  • the component-type- examples: mappings/settings, ILM policy, pipeline
  • ID- a specific version of the component type used for some specific data

We need a scheme that will allow any combination of those. For example, if we want to have specific mappings for a data that we describe as logs-nginx.access, it may have its default mappings and its custom mappings.
Since each component type has its own registry, the uniqueness needs to be enforced only within each component type. Using the example above, we can use logs-nginx.access@default and logs-nginx.access@custom for mappings/settings, as well as for ilm-policies, pipelines etc. This allows us to use a two-dimensional matrix for each component type.

Based on this rationale, the naming scheme can have only two mandatory parts with one delimiter:

{data-description}@{component-id}

where the only special character is @ and each part is free-form otherwise, meaning it may contain -, _, . etc.

data-description

A free-form description of the data for which we provide opinionated configuration in the form of component templates. We assume that we have a good idea of what the default configuration would be for this type of data, but we are aware that users would know better and should have the ability to customize.

component-id

This part allows to have multiple versions of a specific type of configurations for a specific type of data.
Although it is free-form, we encourage the use of conventions, such as default and custom.

Examples

See in the examples spreadsheet.

@ruflin
Copy link
Member

ruflin commented Jul 31, 2023

I have a slight preference for the proposal from @felixbarny in the spreadsheet. The reason is the split up of settings and mappings. I understand @eyalkoren you want to merge these as often these are used together. But I would argue, it should be possible to use logs mappings only without being force to inherit the global settings too. One thing I don't like too much in @felixbarny proposal is that it creates many prefixed, but I get the logic that it matches the data stream / index names so hopefully there should not be a naming conflict. For templates like fleet-action-results my understanding is, these should never be reused or touched by users so basically are "hidden / system" component templates. Ideally this could be indicated in the template itself to make it "hidden".

Around the concern from @jpountz on the upgrade lifecycles different from the Fleet / Package Manager ones. I historically shared this concern but have come around to think that the benefits here outweigh the risks. Lets take a step back and decouple the template conventions from who installs it. Lets assume there is a component X that ensures some built in templates are always of the most recent version and we expect users to be able to use this when they build integrations. I'm stating integrations here because I expect in most scenarios in a future, users using these templates are doing it during building an integration. Decoupled from who loads the templates, do we agree on a convention?

As a follow up, we can focus on how installs it. As the templates are needed directly after startup, Elasticsearch is currently our only option. I personally would prefer if we could bundle these default templates in one more multiple integrations that are loaded on startup by Elasticsearch instead of building it into the source code. But we are not there yet. So I think for now Elasticsearch is the right place.

What about breaking changes? In general, all these templates should only get additions. But even then, things could become breaking as a field previous not mapped has suddenly become mapped. My assumption is, this is the edge case and not the norm. Most users will get a better experience thanks to changes made to the basic templates. But how do we make sure we don't break some of the users where this would be a breaking change? Could we make the upgrade a config option? By default, these component templates are always overwritten and ideally everything using these templates would be rolled over automatically. A config option could exist to disable this feature and the existing component templates would be kept forever. It is then up to the user to get the newest templates and update these manually through docs. Having an integration package with all the templates ,would make this even simpler. The nice part about our @custom option is, the user can always include an template or setting that is more important to keep the existing settings / mappings.

@eyalkoren
Copy link
Contributor Author

eyalkoren commented Aug 2, 2023

OK, then trying to summarize @felixbarny's proposal, as it is reflected in the examples spreadsheet:

  1. First and most strict rule: @ is a special character, used to distinguish components that are registered by the stack from such that are registered by users. Note that this is still true for @custom components- even though users are invited to use those, they are still ones that are registered by the stack. The documentation around adding component templates should state clearly that the safest way to avoid component ID collision is to avoid using @ in the ID.
  2. If the component is used for a specific index or data-stream, the first part (before the @) should reflect that. For example: logs-apache.access@XXX for a logs-apache.access-* index pattern or .fleet-action-results@XXX for a .fleet-action-result index pattern. Otherwise, the first part should reflect the general purpose of this component. For example: ecs@mappings or data-stream@XXX if it is a general default for all data streams.
  3. The second part of the name (after the @) can either state the type of the component (e.g. mappings, settings, lifecycle etc.) or any sub-ID of the used component that distinguishes it from the default one (e.g. custom or package). Wherever applicable, it is advised to pick from a list of common uses, that contains: mappings, settings, lifecycle, template, pipeline, default-pipeline, final-pipeline, custom.

@dakrone @joshdover would this convention cover the use cases you can think of? Please feel free to add either rows to spreadsheet with additional examples and/or add a column with an alternative proposal.

@eyalkoren
Copy link
Contributor Author

@dakrone @joshdover we really need your input in order to proceed with actually enforcing a convention. If the latest proposal makes sense, a simple 👍 would do 😉

@dakrone
Copy link
Member

dakrone commented Aug 10, 2023

I think the proposal sounds good in general. The only thing that I would change is instead of doing anything at the "ilm" level, make it generic and name it "lifecycle". This would allow us to configure ILM, a data stream lifecycle, or both within the component template without messing with the naming convention. Additionally it is one place for lifecycle-related configuration instead of two.

@joshdover
Copy link
Contributor

Apologies for my lack of engagement on this topic and thanks to everyone driving this forward.

👍 from my end on the naming convention we landed on, it seems to cover all of the existing and upcoming use cases for the Elastic Package Manager installed templates.

@eyalkoren
Copy link
Contributor Author

Awesome, thanks everyone for your input 🙏
I changed the spreadsheet based on @dakrone's feedback, as well as the comment above, which will serve as the base for the convention once we get to apply it.

@ruflin
Copy link
Member

ruflin commented Aug 15, 2023

LGTM.

@felixbarny
Copy link
Member

Closing this discussion issue as decided now. I've created implementation issues for ES and Fleet-managed components.

axw added a commit to axw/elasticsearch that referenced this issue Sep 13, 2023
See elastic#96267 (comment)

- Index templates have a "@template" suffix
- Component templates are split into settings and mappings,
  and have a @settings and @mappings suffix respectively
- Ingest pipelines generally have a @pipeline suffix,
  except for one special case where we use @default-pipeline

We no longer have a component template for every data stream, rather we
have reusable component templates where it makes sense. For example, we
now have metrics-apm@settings and metrics-apm@mappings, which are
included by all APM metrics data streams.

We now set both default_pipeline and final_pipeline in the index
templates, preventing users from overriding them completely. The
default pipeline is always apm@default-pipeline, which performs
some rerouting of legacy data, and invokes user-defined @Custom
pipelines based on the data_stream.type and data_stream.dataset
fields. The final pipeline just performs built-in processing.
axw added a commit that referenced this issue Oct 31, 2023
* x-pack/plugin/apm: introduce x-pack-apm plugin

* Dependency fix and tests

* Restore addition to ESRestTestCase

* Replace IngestPipelineConfig instantiation

* Update DataStreamUpgradeRestIT to expect logs-apm.error

* Adding rollover functionality

* Extend basic rollover funtionality tests

* Start adding integration test

* Hide rollup data streams

* Apm ingest fixes

- Map transaction.duration.us independently of span.duration.us,
  fix ingest pipeline. We might consider aliasing later.
- Set event.ingested
- Set processor.event, using constant_keyword where possible

* Fix error.grouping_name script

* Only set event.ingested for traces-apm.sampled

* Enabling APMRolloverIT

* Spotless...

* Assertion change

* Wait a bit before assertBusy

* Adjust template and pipeline names to convention

See #96267 (comment)

- Index templates have a "@template" suffix
- Component templates are split into settings and mappings,
  and have a @settings and @mappings suffix respectively
- Ingest pipelines generally have a @pipeline suffix,
  except for one special case where we use @default-pipeline

We no longer have a component template for every data stream, rather we
have reusable component templates where it makes sense. For example, we
now have metrics-apm@settings and metrics-apm@mappings, which are
included by all APM metrics data streams.

We now set both default_pipeline and final_pipeline in the index
templates, preventing users from overriding them completely. The
default pipeline is always apm@default-pipeline, which performs
some rerouting of legacy data, and invokes user-defined @Custom
pipelines based on the data_stream.type and data_stream.dataset
fields. The final pipeline just performs built-in processing.

* No need for manual cluster change events anymore

* Fix test

* Lower template priority to 110

Certain index templates installed by Fleet are given
a priority of 150, so lower the priority to avoid
conflicts.

See https://github.com/elastic/kibana/blob/a18c68f7ac0b2cb6a46552328e9c6fc28c223970/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/template.ts#L58

* Add logging

* Remove explicit timeout for rollovers

* Add _meta to component templates too

* Increase template priorities to 140

This goes closer to the Fleet-installed template priorities,
allowing for more builtin templates to be introduced without
affecting APM.

* Test multiple index template upgrades with rollovers

* Enfore correct component template versions

* Rename to x-pack-apm-data, default to disabled

Because there's already an "apm" plugin,
we haven't yet solved all issues with rollovers,
and the templates need some more love.

* Fix checkstyle

* Fix more checkstyle

* Fix even more linting issues

* Adjust test to disabled by default

* remove todo

* Fixing test

* Refactor leftovers

* Fix constant keyword mapping conflict

* Fix after merge

* Comment fixes

* Rename ApmIngestPipelineConfig to YamlIngestPipelineConfig

* Always return a registry

* spotless, what else

* Add rollover integration test to core

* Remove APMRolloverIT

* Refactor in renamed components

---------

Co-authored-by: eyalkoren <41850454+eyalkoren@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

8 participants