-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Dynamic variables in data stream names #64
Comments
cc @elastic/apm-server |
As above, the current proposal is to have CC @ruflin |
In the above, we have Taking this, when you ingest data, the data stream names are created out of What are the limitations on the service name? It should not contain |
Sorry, I didn't explain myself well.
|
In case we don't have a common pre or postfix, how will we make the index template match only these indices and not all indices? |
@ruflin I'm not sure I follow your last question above. I'll try to restate what we intend to do.
The last point means that we will have a "common pre or postfix", enabling index security etc. |
Sorry, this felt trough the cracks. @ruflin that sounds good to me, I don't have an strong opinion on whether we should use a pre or a post suffix, and its value. We can go maybe with |
What this will mean, that @axw @simitt @graphaelli Service prefix it is? |
@ruflin I'm rather confused. Why do we need a In #64 (comment) where I said Going back to @jalvz's example in the description (modified slightly to remove the "_service" from service names), let's say we have two services: For
Then for
|
Expanding slightly: @ruflin clarified that a pre/postfix is required in order to have the integration package install an index template for the documents APM produces. The alternative would be to rely on the minimal templates installed by Elasticsearch. We can't at the moment as we have both keyword and text fields, and the template will index strings as keywords. My current thinking is that we should install our own templates anyway, so that we're not tying APM Server too tightly to Elasticsearch versions and they can be upgraded independently. I'll update the proposal doc with something concrete. |
You mean minimal templates for apm metrics and logs? If so, there isn't really much difference between a dataset prefix and a whole set of different types for apm data altogether? A dataset prefix is a harmless safeguard any case. |
ACTUALLY: Maybe we do need our own minimal templates for all the types? |
What I meant was that we should install our own dataset templates, and not rely on base templates in Elasticsearch.
I think we should just require ingest-after-install for APM. If we require running in Fleet mode, then this is implied. |
I've updated the proposal. In summary, we'll have the following data streams:
You'll notice that the first data stream lacks a prefix. That's because the logs streams will be produced by Filebeat rather than APM, and won't rely on any APM-specific mappings. For everything else there will be a common |
Looks like we definitively need to add support for the prefix template feature. Trying to understand If I understand this correctly, lets assume Kibana contains the APM Agent which writes the service logs. These are picked up and then shipped to |
That's the idea -- this would depend on something like https://github.com/elastic/integrations-dev/issues/368, with APM Server informing Filebeat (by way of Elastic Agent) of the dataset. However, this part is not strictly necessary. As long as we have a keyword-type |
Sounds like we are on the same page. We should now figure out how the config in the package spec for this looks like and how it is installed in Kibana correctly. @jalvz Could you make a proposal for the package spec? |
I took a stab at this, and unless I am too dense, I don't think we need dynamic dataset names. To this effect, I opened #102 to allow us to customize the index pattern for each datasets; then we would need Kibana to pick it up for template creation. Let me know how that looks or if I am missing something. |
Interesting idea. So far the names of indices, pipelines and everything related to data_streams was predictable based on the convention that all the names are always the same. With this, we would allow certain packages to diverge from this. I'm wondering now if this might have unexpected side effects by us allowing additional flexibility. For example at the moment we can easily link integrations and data streams together as it is not a pattern but a fixed name. Not saying we should not do it but we should think through if this causes some other issues. For the pattern, I think it is important that we keep naming scheme with the 3 parts, so it should be |
As a matter of fact, like this pipeline names stay the same. Kibana automatically renames pipelines to Regarding index names, the point of this issue is that they won't have a predictable name (one way or another), so not sure I understand the concern. Can you elaborate on:
? |
Today on the data streams page in Fleet we show an icon to which integration a data stream belongs. If I remember correctly, this is based on the knowledge about data streams. The above causes an other issue around Kibana index pattern. With this it would be possible that a the type defines is I think we agree, we need to change the pattern that matches. There are at least 2 ways of doing it: Let the user use any pattern or tell Kibana through a config to build a pattern that matches. If we hand over full control to the creator of the package to define it, at one stage a user will break it. If we do it in Kibana, we can enforce that it is done correctly. My more general thinking around the packages is:
If we define a flag in the manifest that it should define a pattern for the data_stream instead of a fixed one, I would argue it still can fall under "a". The reason is as following:
Having a config option in the manifest to tell Fleet to create a pattern is a bit more complex at first instead of using your proposal around defining the index pattern directly but I think long term it makes sure everything that is put into |
Thanks @ruflin . I am having a hard time understanding your argument, lets see:
An APM user, you mean? how does that happen if they can't edit the integration manifest files?
Isn't that the case already? What is in my proposal that makes this not be the case?
My proposal is a field in the manifest to tell Fleet to create a template with a given pattern. I guess you can call that a "config option", is that what you mean by it? What exactly are you proposing and how is different from this? Sorry, I'm really confused |
I think we are talking about 2 different users. The user I'm referring to is the one creating the package (in the apm case, @jalvz ). There is a future, where not only Elastic but anyone can build packages. Lets take the
What I propose would be:
In your case, Kibana would only now their is a custom index_pattern that it should add, but how will this affect the Kibana index pattern. With your option, it is also possible to set this to With my example, Kibana will build the index pattern based on the above flag which is probably something like Does this help? |
Ah, didn't know that. That clarifies things. In that case: If you put What does That also seems a very specific APM thing that would be in Kibana Fleet code. Why |
IIUC, @ruflin is not suggesting that we hard-code the index pattern but that we allow package developers to define a pattern for the data set only. This would guarantee the index pattern always adheres to the indexing strategy. (Correct me if I misinterpreted you, Nicolas.) You may still be able to shoot yourself in the foot in some other way, but I think it would be more ergonomic to define things in terms of the indexing strategy rather than a complete index pattern that has to match the indexing strategy anyway. Is there a reason why we can't allow the top-level "dataset" property in the data stream manifest to be a pattern? e.g. in the apm package's "traces" data stream, change the dataset from |
#102 allows to set the index pattern for the data set only, isn't it? |
@axw Correct. The @jalvz There is "shooting yourself in the foot" and "breaking the system". If you build the APM package and set What Is there an issue with |
No, it matches the entire index name. What I meant is to define a pattern which matches only the part after the type, and before the namespace. Fleet would take care of joining them all together to form a complete index pattern.
@ruflin fair enough. I'd be fine with |
That is what I meant, it is very apm specific. But I think I got it this time: the index pattern will be hardcoded to one or other thing depending on that boolean value. I'll update the related pr and file a Kibana ticket. |
Oh, just read this after sending my reply:
Maybe we need a Zoom after all? 😅 What I get from there is that you suggest a string that Kibana will insert in the right place, while @ruflin suggested a boolean. So string or boolean? |
Two additional suggestion for names taken out of descriptions for it used above:
There must be a better name out there. |
The default pattern also matches "dynamically", isn't it? 😅 |
@jalvz Could you write the proposed doc for it here? The reason I'm asking for this is because often the "correct" name can be deducted from how we document it / describe it. BTW I don't see an issue with a pretty long key as long as it is descriptive on what it does. |
Sure, it is in the description of the field: https://github.com/elastic/package-spec/pull/102/files#diff-8096817edd596c8787f8d5efb39c5c1a1555c016b5fccc30f30eced78864d494R115 I intend to add the same as comment when I add it to the apm package |
The default pattern matches a dynamic index but not a dynamic dataset. I suggested something similar in the open PR, calling it |
Two more options:
i.e. highlighting not just that it's dynamic, but the specific way (prefix) in which it's dynamic. |
additionally throwing in |
I like all the names that start with |
Done, closing. |
* Unexporting const * Fleshing out system test runner a bit * Adding godoc and TODOs * Exporting Ctx * Try to boot up test cluster * Using logger * WIP * WIP: rename service to servicedeployer * WIP: start talking to Kibana APIs * Fixing compile errors * Delete policy when test ends * WIP: Adding idea TODOs * More WIP TODO * Fixing syntax error * WIP: Fleshing out Kibana API call for adding package to policy * Fixing compile errors * WIP: call Kibana API to add datastream to policy * Allow for scalar or list var types * Add elastic-agent serice to snapshot docker-compose file * Adding healthcheck for elastic-agent service * Adding methods to assign policy to agent * Reassign agent to original policy so test policy can be cleaned up * Overlay test config vars * WIP: mounting hackery in progress * Fixing rebase errors * Fleshing out docker-compose service deployer * Adding bind mount for temp dir in agent container * Adding TODO + debug log for ctxt * Using consts for log file names * Creating function stub and moving TODO * Apply context to config * Updating fleet ES URL in Kibana config * Add Service.Hostname context var * Wait until true * Update root service in dependency tree * Renaming per feedback * Move STDOUT/STDERR cleanup to defers * Consistent naming * Adding godoc explaining context * Breaking up ServiceRunner into ServiceDeployer and DeployedService interfaces * Strictly type context * Sharing ES client across test runs * Renaming files to be consistent across test runners * Adding some godoc + unexporting unnecessary exports * Adding godoc * Adding license headers * Changing local dir to /tmp/service_logs * Moving service context to servicedeployer package * Read ES and Kibana hostnames from Elastic Stack Docker Compose configuration file * Adding test timestamp to policy name * Bumping up default version of Elastic Stack to latest released version * Renaming method per language idiom * Fixing comment that I missed earlier * Prefixing shellinit hostnames with http:// * Refactoring: introducing dockerComposeOptions struct * Bumping up default stack version to 7.10.0-SNAPSHOT * Updating health check command for Elastic Agent
As per the current APM Index Strategy proposal, we need the ability to use
service.name
as a data stream name, or part of a data stream name.service.name
is a top level indexed field in APM documents (example, spec), and having it in thedata_stream
would give us the guarantee that each APM service will have its own indices. This would improve performance, allow users to have more granular security and retention policies, and simplify the APM experience in general.So for metrics data, for instance, we would have indices looking like:
metrics-backend_service-production
metrics-backend_service.profiles-production
metrics-frontend_service-staging
Same thing for other types (traces, logs).
Right now, I believe folders in the package need to match exactly the data stream name set in
manifest.yml
, so we would need to circumvent that limitation as well.The text was updated successfully, but these errors were encountered: