Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an OpenTelemetry scaler #2353

Open
Tracked by #3698
tomkerkhove opened this issue Nov 26, 2021 · 60 comments
Open
Tracked by #3698

Provide an OpenTelemetry scaler #2353

tomkerkhove opened this issue Nov 26, 2021 · 60 comments
Labels
feature All issues for new features that have been committed to opentelemetry scaler

Comments

@tomkerkhove
Copy link
Member

Proposal

OpenTelemetry allows applications/vendors to push metrics to a collector or integrate it's own exporters in the app.

KEDA should provide an OpenTelemetry scaler which is used as an exporter so we can pull metrics and scale accordingly.

Scaler Source

OpenTelemetry Metrics

Scaling Mechanics

Scale based on returned metrics.

Authentication Source

TBD

Anything else?

OpenTelemetry Metrics are still in beta but going GA by end of the year.

Go SDK: https://github.com/open-telemetry/opentelemetry-go

@tomkerkhove tomkerkhove added help wanted Looking for support from community needs-discussion scaler labels Nov 26, 2021
@mknet3
Copy link
Contributor

mknet3 commented Nov 26, 2021

It's a really good improvement, I will have a look to see if I can help with this topic 🙂

@tomkerkhove
Copy link
Member Author

Awesome, thank you!

@JorTurFer
Copy link
Member

@mknet3 , ping me if you need help ;)

@tomkerkhove tomkerkhove added feature All issues for new features that have been committed to help wanted Looking for support from community and removed help wanted Looking for support from community labels Nov 29, 2021
@mknet3
Copy link
Contributor

mknet3 commented Dec 12, 2021

just to confirm, I'm on it and I will help with this scaler

@tomkerkhove
Copy link
Member Author

Great, thanks!

@mknet3
Copy link
Contributor

mknet3 commented Dec 19, 2021

Hi @tomkerkhove, I have had a look at this issue and I would like to clarify some things. AFAIK the goal of this issue is to provide an scaler based on metrics exported by an exporter configured in the collector. This exporter will expose metrics in a KEDA format to be read by the scaler. Quick question, does the exporter already exist or is there a plan to develop it? (I suppose will be in opentelemetry-collector-contrib). This question is to figure out what will be the format of the exposed data to pull it in the scaler.

@tomkerkhove
Copy link
Member Author

That would be part of the investigation but I think we'll need to build our own exporter to get the metrics in; or use the gRPC OTEL exporter / HTTP OTEL exporter as a starting point to push it to KEDA.

I'd prefer the latter approach to get started as we don't have a preference on the metric format, so OTEL is fine.

@JorTurFer
Copy link
Member

@mknet3 prefer to keep it free for the moment because it's his first task with golang

@sushmithavangala
Copy link
Contributor

Working on this

@tomkerkhove
Copy link
Member Author

Before we go all in, might be good to post a proposal here @SushmithaVReddy to avoid having to redo things but think relying on OTEL exporter is best

@sushmithavangala
Copy link
Contributor

@tomkerkhove , sure. I'll put a proposal here before we start the implementation.

Quick doubt: Is the idea here to scale based on the metrics obtained from the data type -go.opentelemetry.io/otel/exporters/otlp/otlpmetrics ?

@sushmithavangala
Copy link
Contributor

@tomkerkhove Will KEDA be acting as a collector that gets metrics data from an exporter? Is the idea to create metrics using https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#instrument and observe through hpa to scale accordingly? Slightly confused on the term exporter and collector w.r.t KEDA. Plausible solution looks like one where user has an exporter and exports metrics, keda connects to this exporter and gets metrics (collector?) for scaling decision based on the same metrics being mentioned in scaled object.

@tomkerkhove
Copy link
Member Author

tomkerkhove commented May 23, 2022

The idea is to use the OTEL exporter (https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md) from which KEDA fetches metrics to make scaling decision on.

This is similar to how we integrate with Prometheus where we pull the metrics from Prometheus and move on, however, here it's in OTEL format coming from the expected OTEL exporter that end-users have to add to their OTEL collector (so not up to KEDA)

From an end-user perspective, they should give us:

  1. Uri of OTEL endpoint to talk to on the collector (but they add the following to their collector: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md#getting-started)
  2. Optional parameter to use gRPC or HTTP (but we can just start with gRPC for now as well)

Hope that helps?

@sushmithavangala
Copy link
Contributor

The idea is to use the OTEL exporter (https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md) from which KEDA fetches metrics to make scaling decision on.

This is similar to how we integrate with Prometheus where we pull the metrics from Prometheus and move on, however, here it's in OTEL format coming from the expected OTEL exporter that end-users have to add to their OTEL collector (so not up to KEDA)

From an end-user perspective, they should give us:

  1. Uri of OTEL endpoint to talk to on the collector (but they add the following to their collector: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md#getting-started)
  2. Optional parameter to use gRPC or HTTP (but we can just start with gRPC for now as well)

Hope that helps?

This helps Tom. Thanks!

@sushmithavangala
Copy link
Contributor

sushmithavangala commented Jun 6, 2022

Before we go all in, might be good to post a proposal here @SushmithaVReddy to avoid having to redo things but think relying on OTEL exporter is best

@tomkerkhove any thoughts on the scaled object here(ref below). The idea is to use OTEL (https://pkg.go.dev/go.opentelemetry.io/otel) and connect to the endpoint mentioned in the scaledobject and pull the metric value and compare to the threshold to scale.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
        exporter: http://otel-collector:4317
	  metrics:
         - metricName: http_requests_total
           threshold: '100'
      authenticationRef:
        name: authdata

I was also wondering about scenario's where users want to pull multiple metrics from their application and scale based on conditions on the metrics. Eg as below

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: opentelemetry-scaledobject
  namespace: keda
  labels:
    deploymentName: dummy
spec:
  maxReplicaCount: 12
  scaleTargetRef:
    name: dummy
  triggers:
    - type: opentelemetry
      metadata:
        exporter: http://otel-collector:4317
	  metrics:
         -  metricName: http_requests_total
             threshold: '100'
	     operator: greaterthan
	  - metricName: http_timeouts
	     threshold:  '5'
             operator: lesserthan
        query: http_requests_total and http_timeouts
      authenticationRef:
        name: authdata

Any ideas on what is the scope of the scalar we'll be building in terms of multiple metrics?

@tomkerkhove
Copy link
Member Author

It's ok for me to use that package since that's the official SDK - Thanks for checking.

I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it?

In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer?

Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use exporter.url instead of exporter given we might need auth in the future or similar settings.

@sushmithavangala
Copy link
Contributor

It's ok for me to use that package since that's the official SDK - Thanks for checking.

I don't see the difference between both proposals other than one vs multiple metrics though? Can you elaborate on it?

In terms of supporting multiple metrics - I'd argue that given we support multiple triggers it might be more aligned with other scalers to only support 1 metric per trigger to keep a consistent approach in KEDA. The only consideration I would have here is performance but I think we can manage that in the implementation. Thoughts @zroubalik @JorTurFer?

Based on that we'll need to review the YAML spec but in general I think it's ok; however if we use multiple levels then I would use exporter.url instead of exporter given we might need auth in the future or similar settings.

Yes @tomkerkhove the proposals point out multiple metrics usage as you understood. I agree with the consistency over other scalars we have, but I'm concerned about how much value will our scaling add considering it can scale on a single metric where open-telemetry's is majorly used to spit a lot of metrics.

nitpick: If we have one metric/scaled object and user wants to scale based on multiple metrics and goes ahead and creates that many scaled objects, I wonder how we handle concurrent scenarios where multiple metrics will result in scaling (over scaling? because the scaled up instances could've been reused? )

@tomkerkhove
Copy link
Member Author

It would be nice to also make the protocol configurable given OTEL supports both http and gRPC

@JorTurFer
Copy link
Member

JorTurFer commented Oct 13, 2022

Metrics are pushed in the collector and we just consume them; we shouldn't have to worry about all of this as this is a OTEL problem. https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/README.md

Are we going to store the metrics internally? I mean, that exporter pushes the metrics to another server, it's not a pulling endpoint AFAIK https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/otlpexporter/otlp.go#L103-L119.

In the other hand, as a user I'd not like to have the collector without HA to use it with KEDA, observability is crucial and having HA it's required, so each instance of the collector could have different metrics because each app pushes its metrics randomly to each collector instance, so if KEDA hits only one of the, we could miss some metrics, that's why I asked about aggregating or having another collector on top of them only for KEDA.
If I were a user, how would I configure this scaler when my collector deployment has >=2 instances?

@tomkerkhove
Copy link
Member Author

It's push indeed so we'd just need an endpoint to store the latest metric in memory and that's fine; am I missing something?

Based on your remark there is a serious concern but I'm missing it I think.

@JorTurFer
Copy link
Member

Most probably I have missed some important point in the thread :/
Are the collector instances going to push metrics to KEDA, or is KEDA who will connect to the collector?

  • If it's KEDA who established the connection, we need to take care about the collection instances because each collector instance could have different values (and we need to connect to all of them, not just with one).
  • If the collector instances will be whose stablish the connections and push the metrics to KEDA, we need to be care about the performance, collector can potentially push a huge amount of metrics we need to manage (even if we just update the stored value based on time).

Based on the trigger, I guess that KEDA will establish a gRPC connection to the collector instance (exporter key) and then the collector will use that channel to push metrics to KEDA. What will happen if there are 2 or more collector instances? Will KEDA establish a connection with every collector instance? What about collector autoscaling?

TBH, my knowledge gap here could be the problem because I'm a noob otel user and I can be overthinking the things

@fira42073
Copy link
Contributor

@JorTurFer I'm trying to find the answer to the same question about aggregation of multiple otel agents, but I have no answer yet. Probably need to take a look at how jaeger or any other visualizer for otel did this

@tomkerkhove
Copy link
Member Author

We didn't spec this out to this detail yet, but in my opinion push is the best way to avoid polling constantly and just let them send the metrics to us.

What will happen if there are 2 or more collector instances? Will KEDA establish a connection with every collector instance? What about collector autoscaling?

That's the problem of OpenTelemetry Collector as this is just how their receivers work 🤷 That's something they have to fix if they offer push-based metrics.

I'm eager to know what @zroubalik thinks

@fira42073
Copy link
Contributor

Are there any new thoughts? @zroubalik?

@zroubalik
Copy link
Member

I am probably not fully up to speed with all the conversation here, but how does other metrics related tools handle this (jaeger,..)? I don't think we are the first one that hit this issue. We should stick to what is common in this domain.

By chance, can we use push scaler with this scenario?

@fira42073 fira42073 removed their assignment Feb 2, 2023
@26tanishabanik
Copy link
Contributor

26tanishabanik commented Jun 23, 2023

@tomkerkhove , @JorTurFer , @Friedrich42 , is the idea here to build a KEDA exporter?
If yes, I have a few questions:

  • How will be format for different OTEL receivers be generalised?
  • Would we need any processor in between the receiver and the keda exporter, if someone wants to query to get the metric?

I have been researching on a similar topic for a while now, while I was customising the OTEL kafka receiver to solve a problem my organisation is working on.

After researching thoroughly on the this repo: https://github.com/open-telemetry/opentelemetry-collector-contrib, I have seen that whether it is a connector, exporter, processor or a receiver, everyone has their own set of configs, how will we deal when we need metrics in a certain way with maybe a certain query from receiver side so that keda exporter can get the metrics?

@tomkerkhove
Copy link
Member Author

I think in our case we can just use the built-in OTEL gRPC/HTTP exporter and use that; I don't personally see the need to create a dedicated exporter.

May I ask why you believe that would be an added value?

@26tanishabanik
Copy link
Contributor

I think in our case we can just use the built-in OTEL gRPC/HTTP exporter and use that; I don't personally see the need to create a dedicated exporter.

May I ask why you believe that would be an added value?

Sure, I thought that maybe attaching any data source receiver would be easier

Can you kindly elaborate, how will the OTEL gRPC/HTTP exporter and can be integrated?
And how will we handle different data formats with it coming from the data source, let's say if someone wants to query a data source for scaling?

@tomkerkhove
Copy link
Member Author

We didn't spend much investigation on this yet but I'm happy to review proposals.

@neelanjan00
Copy link
Contributor

We didn't spend much investigation on this yet but I'm happy to review proposals.

Pardon my ignorance since I don't have extensive experience with OTEL collectors, but can you explain how we plan to provide the parameters required for defining the event pertaining to a respective data source? Will they somehow be defined in the collector config itself or are we going to provide them from KEDA?

@kedacore kedacore deleted a comment from JorTurFer Jun 29, 2023
@JorTurFer
Copy link
Member

Please, ignore this:
image
I deleted my own comment with the wrong browser 🤦

I'm not sure if we can use OpenTelemetry as scaler because OpenTelemtry doesn't store the data, it's just a "producer and communication protocol", I mean, OpenTemetry defines how to generate and send the data, but it doesn't have any store where we can query the values. For achieving this, we (KEDA) should be a data store using OTLP (OpenTelemetry Protocol) for receiving the telemetry information. You can't just ask the collector about the information because collector isn't a backend store, it's a "routing pipe".

I wouldn't like to receive all the telemetry in KEDA to scale based on it because it'd be crazy and we would need to manage it securely, having access to ALL telemetry data in our side. I think that we can close this issue because it doesn't make sense (IMO). End users should use the proper backend storage scaler (loki, prometheus, elastic, etc) to scale based on them.

@fira42073
Copy link
Contributor

fira42073 commented Jun 29, 2023 via email

@tomkerkhove
Copy link
Member Author

tomkerkhove commented Jun 30, 2023

I wouldn't like to receive all the telemetry in KEDA to scale based on it because it'd be crazy and we would need to manage it securely, having access to ALL telemetry data in our side. I think that we can close this issue because it doesn't make sense (IMO).

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector. I was under the impression that using the OLTP exporter that we query every now and then is just enough - I don't personally see what the problem is with that?

This is exactly why Exporters are available:
"An exporter, which can be push or pull based, is how you send data to one or more backends/destinations. Exporters may support one or more data sources."

In this case, KEDA is a system that will pull the metrics when it has to and evaluate the metrics by using the built-in HTTP/gRPC exporter and evaluate the metrics which are in OTEL format.

This behaviour would be identical to what the metric-api scaler does and just scale based on the point-in-time metric value. If you want over-time queries, then another scaler is more suitable such as Prometheus but at least then they have a solid reason to use Prometheus.

I do not want to create an exporter as this is out of scope for us and agree there are enough backend technologies already that do this.

End users should use the proper backend storage scaler (loki, prometheus, elastic, etc) to scale based on them.

This is exactly what I do not want to do with KEDA because that means integrating and maintaining x systems while there is a vendor-neutral spec that can simplify this for us. Even if we had those scalers already, it is still beneficial to have the spec-based scaler to decouple the effective metric provider from the actual running technology to decouple things.

@JorTurFer
Copy link
Member

JorTurFer commented Jun 30, 2023

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector

This is the problem, does the collector support being queried? I think that it's not supported because the collector isn't a data store, it's just a pipe. We use the collector in our apps and when I checked the docs, I didn't see anything similar that we can use, maybe something has changed during last months, but I don't think so

Apart from this, even though we could query them somehow, collector usually runs with > 1 replicas (for HA), and KEDA'd need to query all of them because each replica is independent (they are pipes, no stores) and aggregate them

@tomkerkhove
Copy link
Member Author

This is not the goal of this proposal, the proposal is to have a opentelemetry-collector scaler that queries the collector

This is the problem, does the collector support being queried? I think that it's not supported because the collector isn't a data store, it's just a pipe.

Yes you can by using the exporters mentioned above.

Apart from this, even though we could query them somehow, collector usually runs with > 1 replicas (for HA), and KEDA'd need to query all of them because each replica is independent (they are pipes, no stores) and aggregate them

Are you sure or is that an assumption?

@tomkerkhove
Copy link
Member Author

Q with OTEL collector folks: open-telemetry/opentelemetry-collector#8006

@JorTurFer
Copy link
Member

JorTurFer commented Jun 30, 2023

Are you sure or is that an assumption?

I'm 100% sure that each replica is totally independent of the others. We have faced with this problem using prometheus receiver and AFAIK, they haven't found a good solution for it yet
They don't share anything between them. You can run 1 instance without HA, you can run multiple instances with HA, you can run the collector as sidecar on each pod, there are multiple options, but all of them results on independent collector instances.

Assuming that we query the metrics somehow in pull mode (I don't think so, but maybe somehow), we have to manage the topology and aggregate the instances.

That's why we need the backend that aggregates the info, duplicates the info, etc

Q with OTEL collector folks: open-telemetry/opentelemetry-collector#8006

It could be the best option, yep 😄

@tomkerkhove
Copy link
Member Author

Are you sure or is that an assumption?

I'm 100% sure that each replica is totally independent of the others. We have faced with this problem using prometheus receiver and AFAIK, they haven't found a good solution for it yet They don't share anything between them. You can run 1 instance without HA, you can run multiple instances with HA, you can run the collector as sidecar on each pod, there are multiple options, but all of them results on independent collector instances.

Assuming that we query the metrics somehow in pull mode (I don't think so, but maybe somehow), we have to manage the topology and aggregate the instances.

That's why we need the backend that aggregates the info, duplicates the info, etc

An alternative approach is to use push instead, if that is the case. However, that might require separate component and probably as an add-on then.

@JorTurFer
Copy link
Member

An alternative approach is to use push instead, if that is the case. However, that might require separate component and probably as an add-on then.

I think that this can be done with a Prometheus server, OTEL Collector pushing with Prometheus writer to a Prometheus server, and KEDA querying Prometheus. I see the power of OpenTelemetry, I use it indeed, but I still think that OpenTelemetry is not to be used as we want as scaler.

In this case I disagree with adding another component for it, there are already open source options that users can use for scaling with OpenTelemetry data, they can add Prometheus or Grafana Mimir for example. I think that we shouldn't store any kind of user data at all, and a component that stores the telemetry for scaling is storing user information.

Personally, I'd wait until they answer open-telemetry/opentelemetry-collector#8006 and based on their answer, continue or abandon this issue

@tomkerkhove
Copy link
Member Author

tomkerkhove commented Aug 29, 2023

I'm OK with waiting! However, I don't really agree on this:

In this case I disagree with adding another component for it, there are already open source options that users can use for scaling with OpenTelemetry data, they can add Prometheus or Grafana Mimir for example. I think that we shouldn't store any kind of user data at all, and a component that stores the telemetry for scaling is storing user information.

One does not simply "add Prometheus" as this increases the infrastructure you are running. If you don't have Prometheus already, you shouldn't have to add it to autoscale your apps IMO.

Edit - Actually nevermind, it's just exposing the Prometheus endpoint so we can scrape the collector directly instead of needing a Prometheus installation :)

@JorTurFer
Copy link
Member

Edit - Actually nevermind, it's just exposing the Prometheus endpoint so we can scrape the collector directly instead of needing a Prometheus installation :)

This doesn't work, the collector exposes prometheus metrics in prometheus format, but the query api is totally different and can't work just using the metrics endpoint. We still would need a prometheus in that case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature All issues for new features that have been committed to opentelemetry scaler
Projects
Status: In Progress
Development

No branches or pull requests

10 participants