New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenTelemetry] data_stream.namepace and data_stream.dataset aren't being respected #10191
Comments
I've looked through: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#elastic-common-schema and I can't see any references to data_stream in the otel spec to ECS mappings. But I'm not sure how to best solve this besides using this method. Fundamentally I'm trying to separate out and replicate the functionality and management of otel indexes like the APM ones are. |
Currently this is not possible, neither for otel nor for Elastic APM agent collected data. Only metrics events with service specific metricsets are written to a service specific datastream. We might be sending more events to service specific datastreams in the future. The Could you share a bit more about your use case for configuring the |
Ah, so I'm trying to handle the amount of data that is coming in by separating and applying the ILM and index policies related to shards. Basically want both different sharding policies, and like you mention, different retention per env and per application. Right now with traces and logs coming into a single index that causes a lot of churn/possible lock up. We were expecting traces and logs to work similarly to metrics and were trying to more cleanly handle the amount of data coming in. |
I'll also add that I'm able to see the attributes set correctly when looking at them in NR.
Which means they are being added and passed by OTEL properly. we'd just need APM server to support them |
+1 for this, as an enterprise customer managing a centralized monitoring platform for hundreds of applications this is critical. |
@knechtionscoding @mholttech the team has been discussing built-in routing rules such as this. There are some edge cases related to untrusted agents, such as RUM and mobile, where it may be undesirable to automatically route without compensating controls (e.g. authorization tokens that can restrict permitted values). In the mean time: are you aware that Elasticsearch 8.8.0 introduced a new |
Interesting, that might work for my needs. I'm on 8.8.2 though and don't see reroute available when adding a processor to an ingest pipeline, does it need to be added as a custom option? |
@mholttech do you mean in the visual pipeline editor in Kibana? Support was only added there in 8.9 (elastic/kibana#159224). Only Elasticsearch support was added in 8.8. |
Thanks for the Clarification. I was referring to the Visual Pipeline Editor. Good to know it was added in the UI on 8.9, just have to find out when we'll be upgrading. |
@axw unfortunately that won't work for our use case. We don't, and don't plan on, using ingest pipelines. I definitely understand the concern about untrusted agents. Curious, if we can just segment those off? Or allow us to choose to accept that risk? |
@knechtionscoding got it, thanks for the additional context. To be clear, I think it's likely that we will add support for data stream routing through attributes. So far in the team we have discussed a few options: (1) never route on any attributes by default, require users to define rules with an ingest pipeline This would be the safest in terms of trust, but requires centralisation of the routing rules; so more operational overhead for some users. (Not sure if that's the reason in your case -- if you're willing to share more, I'd be keen to hear why you would not be interested in using ingest pipelines for routing.) (2) route to different On the face of it this is nice and simple, but it overloads the meaning of the attribute. Some users may just want to separate their data logically (e.g. adding a (3) route to different This is currently my favourite. Would you be able to elaborate on your use case for controlling the
I think so. We have a way to identify untrusted RUM agents, and could disallow client-controlled routing for those. We plan to eventually add support for constraining allowed attributes/values for trusted agents, e.g. enable a central ops team to lock down auth tokens so that the bearer can only ingest data for |
We have a strategy where multiple k8s namespaces and clusters are logging against the same elastic cloud instance. As the schema might be different we custom log against data streams using the log-[app]-[namespace] format, which also works very with permissions within kibana spaces. We'd like to use the same mechanism with otel telemetries. Using the same integration endpoint, but route telemetries to different data stream via attributes so to:
That aside, APM comes bundled with fleet when an integration server is created via the elastic cloud console. The default integration pushes to |
@federicobarera thanks for chiming in.
No, and there are no plans to support that.
👍 I think there may be a high-level UI in the future for defining routing rules. For now, using the |
@axw I just gave this a try with the reroute processor but doing this is then running into an issue due to the API Key not being authorized properly failed to index document (security_exception): action [indices:admin/auto_create] is unauthorized for API key id [xxxx] of user [elastic/fleet-server] on indices [traces-apm-pdcs_dms2_dev], this action is granted by the index privileges [auto_configure,create_index,manage,all] |
@mholttech sorry, I forgot a crucial detail regarding 8.9. In 8.9.x, when running under Fleet, we don't get sufficient privileges to write to arbitrary logs/metrics/traces data streams. From 8.10.0 on, that will be fixed. We're expecting 8.10.0 to be released in the next month or so. |
That's unfortunate :( I'll keep an eye out but we also don't like upgrading our clusters right away because last time we did it cost us 2 months of fleet being almost unusable due to a bug. |
Apologies for the delay @axw
First, As you point out, it would centralize the config quite a bit. We work on a very self-service model and being able to scale effectively and without our teams intervention when adding a new application is critical. We want teams and engineers to be able to route effectively without us having to update any config. Second, adding ingest pipelines would slow down all the other parts of the cluster. We process somewhere in the neighborhood of 1 billion log messages and 300 million trace events/segments. Adding ingest pipelines (rather than controlling it at the open telemetry collector level) would add significant overhead and cost to our ES cluster (rather than being able to dynamically scale the otel collector properly).
Is it overloading the meaning of the attribute? As far as I can tell, based on the spec I don't think it would be. Allowing for, but not requiring, separation based on the
I was only attempting to control dataset as it seemed to be the closest thing I could get at the time. If I can route based on data_stream.namespace I'd be happy with that. The only caveat is that it is unique to ES, so if there's something that can exist inside the OTEL spec I've a slight preference for that. I'm not super sure why |
@knechtionscoding thanks for the details!
As an example, let's say you have 1000 deployments of a set of services, e.g. an Elasticsearch cluster + Kibana + whatever. Each Elasticsearch cluster would have If we just automatically routed every There would be some scenarios where this is the right thing to do, e.g. maybe each deployment needs to be physically separated for security/compliance/retention/whatever reasons. It's just not always the right thing. Maybe it could be the default, with a way to opt out. This is not yet clear to me.
My thinking is that data streams and routing are Elastic-specific features, so therefore Elastic-specific attributes would be OK. Not to say that my thinking is necessarily right -- just explaining myself :) |
IMHO, we get the best of both worlds if we route by default on
I don't anticipate adding such a routing rule to the ingest pipeline to have any noticeable impact on the Elasticsearch cluster. The reroute processor was implemented with it being used in high-throughput cases in mind. It just looks up a property (such as All you'll need to do is to add a custom pipeline like this:
See also https://www.elastic.co/guide/en/apm/guide/current/ingest-pipelines.html#custom-ingest-pipeline-create for more details on custom pipelines. |
Hi, I am able to reroute the data using ingest pipeline as mentioned by @felixbarny and I am rerouting based on {{service.environment}}. Is there a way to setup different ILM policies to these data streams. For example, I would like to setup a different retention periods for my traces based on environment. traces-apm-dev --> 7d As all these data streams are using same Index template (traces-apm) and component template (traces-apm@custom), Is there a way to dynamically assign different ILM policies during data stream creation? |
@rpanand24 please see https://www.elastic.co/guide/en/observability/current/ilm-how-to.html#data-streams-custom-policy. If you have follow up questions, please raise a topic at https://discuss.elastic.co/c/observability/82 |
@axw , that was quick. I have gone through that document but it seems to be misleading. I have created a topic as you mentioned btw https://discuss.elastic.co/t/custom-ilm-policies-for-apm-datastreams/355568. |
APM Server version (
apm-server version
): 8.6.1Description of the problem including expected versus actual behavior:
Trying to make opentelemetry data behave the same as APM data coming into the APM server. Primarily separate the traces and the logs into a datastream/index per application. Currently all data from OTEL hitting the APM server is being sent to traces-apm-default (traces) and logs-apm-default (logs).
Currently setting this via resource attributes either as an env variable:
Or via the resource attribute processor in the OTEL Collector:
Regardless the logs and traces being produced aren't being shipped to their own dataset:
I've successfully set up and lots of other resource attributes: deployment.environment, service.name, etc But I can't get data_stream to work on stuff.
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.
I can't find a place where the mappings are listed, so I'm not even sure if this is possible right now or if there is a translation between OTEL and ECS for the datastream
The text was updated successfully, but these errors were encountered: