-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an OpenTelemetry scaler #2353
Comments
It's a really good improvement, I will have a look to see if I can help with this topic 🙂 |
Awesome, thank you! |
@mknet3 , ping me if you need help ;) |
just to confirm, I'm on it and I will help with this scaler |
Great, thanks! |
Hi @tomkerkhove, I have had a look at this issue and I would like to clarify some things. AFAIK the goal of this issue is to provide an scaler based on metrics exported by an exporter configured in the collector. This exporter will expose metrics in a KEDA format to be read by the scaler. Quick question, does the exporter already exist or is there a plan to develop it? (I suppose will be in opentelemetry-collector-contrib). This question is to figure out what will be the format of the exposed data to pull it in the scaler. |
That would be part of the investigation but I think we'll need to build our own exporter to get the metrics in; or use the gRPC OTEL exporter / HTTP OTEL exporter as a starting point to push it to KEDA. I'd prefer the latter approach to get started as we don't have a preference on the metric format, so OTEL is fine. |
@mknet3 prefer to keep it free for the moment because it's his first task with golang |
Working on this |
Before we go all in, might be good to post a proposal here @SushmithaVReddy to avoid having to redo things but think relying on OTEL exporter is best |
@tomkerkhove , sure. I'll put a proposal here before we start the implementation. Quick doubt: Is the idea here to scale based on the metrics obtained from the data type -go.opentelemetry.io/otel/exporters/otlp/otlpmetrics ? |
@tomkerkhove Will KEDA be acting as a collector that gets metrics data from an exporter? Is the idea to create metrics using https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#instrument and observe through hpa to scale accordingly? Slightly confused on the term exporter and collector w.r.t KEDA. Plausible solution looks like one where user has an exporter and exports metrics, keda connects to this exporter and gets metrics (collector?) for scaling decision based on the same metrics being mentioned in scaled object. |
The idea is to use the OTEL exporter (https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md) from which KEDA fetches metrics to make scaling decision on. This is similar to how we integrate with Prometheus where we pull the metrics from Prometheus and move on, however, here it's in OTEL format coming from the expected OTEL exporter that end-users have to add to their OTEL collector (so not up to KEDA) From an end-user perspective, they should give us:
Hope that helps? |
This helps Tom. Thanks! |
@tomkerkhove any thoughts on the scaled object here(ref below). The idea is to use OTEL (https://pkg.go.dev/go.opentelemetry.io/otel) and connect to the endpoint mentioned in the scaledobject and pull the metric value and compare to the threshold to scale.
I was also wondering about scenario's where users want to pull multiple metrics from their application and scale based on conditions on the metrics. Eg as below
Any ideas on what is the scope of the scalar we'll be building in terms of multiple metrics? |
It's ok for me to use that package since that's the official SDK - Thanks for checking. I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it? In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer? Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use |
Yes @tomkerkhove the proposals point out multiple metrics usage as you understood. I agree with the consistency over other scalars we have, but I'm concerned about how much value will our scaling add considering it can scale on a single metric where open-telemetry's is majorly used to spit a lot of metrics. nitpick: If we have one metric/scaled object and user wants to scale based on multiple metrics and goes ahead and creates that many scaled objects, I wonder how we handle concurrent scenarios where multiple metrics will result in scaling (over scaling? because the scaled up instances could've been reused? ) |
It would be nice to also make the protocol configurable given OTEL supports both http and gRPC |
Are we going to store the metrics internally? I mean, that exporter pushes the metrics to another server, it's not a pulling endpoint AFAIK https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/otlp.go#L103-L119. In the other hand, as a user I'd not like to have the collector without HA to use it with KEDA, observability is crucial and having HA it's required, so each instance of the collector could have different metrics because each app pushes its metrics randomly to each collector instance, so if KEDA hits only one of the, we could miss some metrics, that's why I asked about aggregating or having another collector on top of them only for KEDA. |
It's push indeed so we'd just need an endpoint to store the latest metric in memory and that's fine; am I missing something? Based on your remark there is a serious concern but I'm missing it I think. |
Most probably I have missed some important point in the thread :/
Based on the trigger, I guess that KEDA will establish a gRPC connection to the collector instance ( TBH, my knowledge gap here could be the problem because I'm a noob otel user and I can be overthinking the things |
@JorTurFer I'm trying to find the answer to the same question about aggregation of multiple otel agents, but I have no answer yet. Probably need to take a look at how jaeger or any other visualizer for otel did this |
We didn't spec this out to this detail yet, but in my opinion push is the best way to avoid polling constantly and just let them send the metrics to us.
That's the problem of OpenTelemetry Collector as this is just how their receivers work 🤷 That's something they have to fix if they offer push-based metrics. I'm eager to know what @zroubalik thinks |
Are there any new thoughts? @zroubalik? |
I am probably not fully up to speed with all the conversation here, but how does other metrics related tools handle this (jaeger,..)? I don't think we are the first one that hit this issue. We should stick to what is common in this domain. By chance, can we use push scaler with this scenario? |
@tomkerkhove , @JorTurFer , @Friedrich42 , is the idea here to build a KEDA exporter?
I have been researching on a similar topic for a while now, while I was customising the OTEL kafka receiver to solve a problem my organisation is working on. After researching thoroughly on the this repo: https://github.com/open-telemetry/opentelemetry-collector-contrib, I have seen that whether it is a connector, exporter, processor or a receiver, everyone has their own set of configs, how will we deal when we need metrics in a certain way with maybe a certain query from receiver side so that keda exporter can get the metrics? |
I think in our case we can just use the built-in OTEL gRPC/HTTP exporter and use that; I don't personally see the need to create a dedicated exporter. May I ask why you believe that would be an added value? |
Sure, I thought that maybe attaching any data source receiver would be easier Can you kindly elaborate, how will the OTEL gRPC/HTTP exporter and can be integrated? |
We didn't spend much investigation on this yet but I'm happy to review proposals. |
Pardon my ignorance since I don't have extensive experience with OTEL collectors, but can you explain how we plan to provide the parameters required for defining the event pertaining to a respective data source? Will they somehow be defined in the collector config itself or are we going to provide them from KEDA? |
That makes sense. I couldn't figure out how to use it, while I was trying
to resolve the issue, but now I understand it a little more.
I agree with you, it doesn't really make sense to get the data from otel
collector.
…On Thu, Jun 29, 2023, 11:43 AM Jorge Turrado Ferrero < ***@***.***> wrote:
Please, ignore this:
[image: image]
<https://user-images.githubusercontent.com/36899226/249733490-d4e15978-d7a2-45ee-9112-9f028d827b43.png>
I deleted my own comment with the wrong browser 🤦
I'm not sure if we can use OpenTelemetry as scaler because OpenTelemtry
doesn't store the data, it's just a "producer and communication protocol",
I mean, OpenTemetry defines how to generate and send the data, but it
doesn't have any store where we can query the values. For achieving this,
we (KEDA) should be a data store using OTLP (OpenTelemetry Protocol) for
receiving the telemetry information. You can't just ask the collector about
the information because collector isn't a backend store, it's a "routing
pipe".
I wouldn't like to receive all the telemetry in KEDA to scale based on it
because it'd be crazy and we would need to manage it securely, having
access to ALL telemetry data in our side. I think that we can close this
issue because it doesn't make sense (IMO). End users should use the proper
backend storage scaler (loki, prometheus, elastic, etc) to scale based on
them.
—
Reply to this email directly, view it on GitHub
<#2353 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJNTN64UTUIR2BVYO7AFG2LXNVE4LANCNFSM5I2E7K6A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This is not the goal of this proposal, the proposal is to have a This is exactly why Exporters are available: In this case, KEDA is a system that will pull the metrics when it has to and evaluate the metrics by using the built-in HTTP/gRPC exporter and evaluate the metrics which are in OTEL format. This behaviour would be identical to what the I do not want to create an exporter as this is out of scope for us and agree there are enough backend technologies already that do this.
This is exactly what I do not want to do with KEDA because that means integrating and maintaining x systems while there is a vendor-neutral spec that can simplify this for us. Even if we had those scalers already, it is still beneficial to have the spec-based scaler to decouple the effective metric provider from the actual running technology to decouple things. |
This is the problem, does the collector support being queried? I think that it's not supported because the collector isn't a data store, it's just a pipe. We use the collector in our apps and when I checked the docs, I didn't see anything similar that we can use, maybe something has changed during last months, but I don't think so Apart from this, even though we could query them somehow, collector usually runs with > 1 replicas (for HA), and KEDA'd need to query all of them because each replica is independent (they are pipes, no stores) and aggregate them |
Yes you can by using the exporters mentioned above.
Are you sure or is that an assumption? |
Q with OTEL collector folks: open-telemetry/opentelemetry-collector#8006 |
I'm 100% sure that each replica is totally independent of the others. We have faced with this problem using prometheus receiver and AFAIK, they haven't found a good solution for it yet Assuming that we query the metrics somehow in pull mode (I don't think so, but maybe somehow), we have to manage the topology and aggregate the instances. That's why we need the backend that aggregates the info, duplicates the info, etc
It could be the best option, yep 😄 |
An alternative approach is to use push instead, if that is the case. However, that might require separate component and probably as an add-on then. |
I think that this can be done with a Prometheus server, OTEL Collector pushing with Prometheus writer to a Prometheus server, and KEDA querying Prometheus. I see the power of OpenTelemetry, I use it indeed, but I still think that OpenTelemetry is not to be used as we want as scaler. In this case I disagree with adding another component for it, there are already open source options that users can use for scaling with OpenTelemetry data, they can add Prometheus or Grafana Mimir for example. I think that we shouldn't store any kind of user data at all, and a component that stores the telemetry for scaling is storing user information. Personally, I'd wait until they answer open-telemetry/opentelemetry-collector#8006 and based on their answer, continue or abandon this issue |
I'm OK with waiting! However, I don't really agree on this:
One does not simply "add Prometheus" as this increases the infrastructure you are running. If you don't have Prometheus already, you shouldn't have to add it to autoscale your apps IMO. Edit - Actually nevermind, it's just exposing the Prometheus endpoint so we can scrape the collector directly instead of needing a Prometheus installation :) |
This doesn't work, the collector exposes prometheus metrics in prometheus format, but the query api is totally different and can't work just using the metrics endpoint. We still would need a prometheus in that case |
Proposal
OpenTelemetry allows applications/vendors to push metrics to a collector or integrate it's own exporters in the app.
KEDA should provide an OpenTelemetry scaler which is used as an exporter so we can pull metrics and scale accordingly.
Scaler Source
OpenTelemetry Metrics
Scaling Mechanics
Scale based on returned metrics.
Authentication Source
TBD
Anything else?
OpenTelemetry Metrics are still in beta but going GA by end of the year.
Go SDK: https://github.com/open-telemetry/opentelemetry-go
The text was updated successfully, but these errors were encountered: