Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce (Cluster)EventSource CRD to subscribe to CloudEvents emitted by KEDA #3533

Open
tomkerkhove opened this issue Aug 9, 2022 · 25 comments
Assignees
Labels
cloudevents All events related to CloudEvents to extend KEDA extensibility All issues related to extensibility of KEDA feature All issues for new features that have been committed to operations

Comments

@tomkerkhove
Copy link
Member

tomkerkhove commented Aug 9, 2022

Use-Case

From the design proposal:

Goal

Allow end-users to subscribe to events emitted by KEDA allowing them to gain insights into what is going on and how their applications are being scaled.

These should serve several purposes such as:

  • Allow for autoscaling awareness (scaling)
  • Allow for automated notifications for incidents (errors, misconfiguration)
  • Allow for extensibility to enable end-users and our community to build on top of KEDA

Events that are being emitted must be CloudEvent-compliant and support pushing to destination endpoints inside/outside of the cluster.

With this proposal, our intention is to extend the event capabilities of KEDA with CloudEvents.

From a high-level perspective, KEDA will be doing three things in this area:

  1. Scale workloads across one or more namespaces
  2. Provide Kubernetes events inside the cluster that can be used through typical tooling (CLI, UIs, …)
  3. Emit events to endpoints inside/outside the cluster that are CloudEvents-compliant

image

What are we doing

With the introduction of CloudEvents, we are targeting two audiences:

  • Cluster operators who are interested to subscribe to events across all namespaces to have a holistic overview of everything going on in the cluster
  • App developers who want to subscribe to events of their own application by using. This allows them to reduce the noise of other teams and save compute for filtering them out

With the introduction of EventSource & ClusterEventSource CRD, we cover both scenarios and give end-users the controls that they need. The goal is that they can define an endpoint, authentication, and optional filter to which KEDA will emit its events.

Similar to TriggerAuthentication & ClusterTriggerAuthentication CRDs, the idea is that EventSource is scoped to a single namespace while ClusterEventSource is cluster-wide.

End-users can optionally define event types they want to exclude so that the destination will never receive them.

While the event subscription configuration opens up a lot of opportunities, we should strive to keep the control as minimal as possible so that we don’t implement our own eventing engine.

End-users who need more robust filtering capabilities have to use another tool that is a better fit in this scenario.

Do we need a new CRD?

In order to build a scalable way of emitting events, across various teams and parties it is not an option to configure an endpoint directly on our KEDA control plane, but rather rely on a new CRD.

Otherwise, it would quickly become a bottleneck for end-users that have numerous namespaces and want to have more control over who is allowed to receive what events:

  • Cluster admins/operators will want to have events for scaling in all namespaces
  • App devs/operators will only be interested in the events for their applications

While every team could do the filtering on their own, this is a waste of compute.

Requirements

Introduce a new (Cluster)EventSource CRD that allows people to subscribe for CloudEvents:

apiVersion: events.keda.sh/v1alpha1
kind: EventSource # Or ClusterEventSource
metadata:
  name: operations-cross-cluster-events
spec:
  destination:
    # Support regular webhook endpoints over HTTP(S)
    http:
      uri: http://foo.bar
      authentication:
        apiKey:
          headerName: x-api-key
          valueFrom:
            secretKeyRef:
            name: secrets-operations-events
            key: webhook-api-key
  eventSubscription:
    includedEventTypes:
      - keda.example.Event
    excludedEventTypes:
      - keda.example.Event

Once these are created by end-users; KEDA will automatically push the events to the configured sink(s).

Issues for events are created separately.

Anything else?

Relates to #479

@tomkerkhove tomkerkhove added needs-discussion feature-request All issues for new features that have not been committed to labels Aug 9, 2022
@tomkerkhove
Copy link
Member Author

@kedacore/keda-maintainers I don't have a preference on the implementation - We can start doing things in our current operator and later on introduce a dedicated container (if we really have to)

@tomkerkhove tomkerkhove added extensibility All issues related to extensibility of KEDA feature All issues for new features that have been committed to operations cloudevents All events related to CloudEvents to extend KEDA and removed needs-discussion feature-request All issues for new features that have not been committed to labels Aug 9, 2022
@tomkerkhove tomkerkhove modified the milestone: CloudEvents - Initial version Aug 9, 2022
@tomkerkhove
Copy link
Member Author

@zroubalik @JorTurFer We had some discussion on what the best model for source & subject is so I reached out to @duglin and landed on this:

  • source: Who is emitting the event? In our case KEDA and more specifically it should be the CRD that subscribes to it <cluster-name>/<namespace>/keda (<cluster-name>/keda if it's clustercloudevent)
  • subject: For what/whom was the event emitted? In most of the cases this is the workload or so that we are scaling cluster/namespace/workload/resource-name

Make sense?

@SpiritZhou
Copy link
Contributor

SpiritZhou commented Sep 5, 2023

Hi all, here's the high-level idea of how I'd like to implement CloudEvents in KEDA:

eventemitrefactor

To acheive this, we'll need to

  1. Introduce new CRD and start watching it
  2. Refactor current event emitting and add internal adapter in code to handle normal k8s event emitting and CloudEvent emitting.

Operations

To help operate this at scale, we should offer new Prometheus & OTEL metrics:

  • keda_event_emitted_error_totals - Provides an indication of all the errors related to pushing events, per event sink
  • keda_event_emitted_totals - Provides an indication of all the events that have been emitted, per event sink
  • keda_event_sinks_totals - Provides an indication of all the event sinks created, per event sink type, per type (namespace/cluster)
  • keda_event_queue_status - Provides an indication of how many events are droped or still queue

Proposed Action plan Estimation:

The proposal is to implement the whole scope in multiple phases to be more agile and merge changes faster.

For the MVP, it feels best to implement the following features with one event example:

  1. Introduce new CRD
  2. Implement the logic of emitting CloudEvent.
  3. Emit event when authentication fails

The following features can be implemented with follow-up PRs (in order):

  1. Add Prometheus & OTEL metrics about CloudEvent
  2. Filter events in KEDA side
  3. Support Azure Event Grid
  4. Introduce ClusterCloudEvent
  5. Fulfill all current events to be emitted.

@tomkerkhove
Copy link
Member Author

LGTM

@JorTurFer
Copy link
Member

LGTM but I have one question. How are we going to take the cluster name? I mean, IIRC we don't have that info inside the pod, we have to request it to the users (it's not a problem IMHO, but just to be sure)

@tomkerkhove
Copy link
Member Author

That's a valid question but no need to worry. This is configurable and when not specified "default" will be used (AFAIK, need to check design doc)

@zroubalik
Copy link
Member

Hi, sorry for the delay. I like the proposal and the proposed direction. Great job @SpiritZhou and @tomkerkhove!

@tomkerkhove
Copy link
Member Author

All the work was done by @SpiritZhou :)

@SpiritZhou
Copy link
Contributor

There may be a scenario where the user needs to send one CloudEvent to multiple destinations, and there are two possible solutions:

  1. Let the user create multiple CloudEvent resources, with each resource having only one destination.
  2. Change the destination specification in CloudEvent from an object type to an array type so that the user can add different destinations in one CloudEvent resource.

I think it would be convenient for users to create one CloudEvent resource, but I am not sure if there are any drawbacks or if the concept of CloudEvents would be better served by creating multiple CloudEvent resources. What do you think? Which one is better? @tomkerkhove @JorTurFer @zroubalik

@tomkerkhove
Copy link
Member Author

There is actually a 3rd option which is what i originally had in mind:

  • Allow to use multiple destinations, but only 1 per type (ie HTTP and to Azure Event Grid)

I personally think creating 1 resource per scenario (aka send all events when auth fails for my app) is what we should strive for and avoid having monolithical subscriptions. That's why I prefer to keep the model simple and have multiple event source resources.

Does that mean I want to avoid using an array? No. But if that is at the cost of having a good schema which we can validate against then I'd say yes.

I'd love to know what business scenario there is for creating 1 EventSource resource that pushes to 3 HTTP endpoints?

@JorTurFer
Copy link
Member

I'm not an expert in CloudEvents (and neither a noob, I'm a real ignorant), so maybe I'm saying something stupid, but where is the problem of supporting multiple CloudEvent resources with multiple targets of the same type?
Wouldn't I want to emit the event to multiple receivers for different stuffs? I see this (with a lot of imagination) as an audit trail about what has happened in KEDA. I see this as something that different departments could configure for different stuff as self-service. Maybe SRE team wants some events on any targets, and security team wants other events on other target, etc

I'm not saying that we have to support it now (we could just wait to get feedback before doing a lot of things), it's just a question. Maybe in the beginning we could start with a single CloudEvent with multiple targets or even with a single CloudEvent with a single target to get feedback from end users

@JorTurFer
Copy link
Member

JorTurFer commented Sep 22, 2023

I have also reviewed the PR and I'm curious about how you will choose between HTTP and Event Grid. Will you include the extra CRD that @tomkerkhove proposed for it? Will it be part of the current CRD?
Once again, from my absoluto ignorance, IDK how many different targets we would have, but if there are several of them, maybe another CRD for defining them instead of adding all the options to current is more flexible

@tomkerkhove
Copy link
Member Author

tomkerkhove commented Sep 23, 2023

I see this as something that different departments could configure for different stuff as self-service. Maybe SRE team wants some events on any targets, and security team wants other events on other target, etc

Yes, and because it's owned by different departments they should create their own CRD instance and not bundle it in to a single "subscription" in my opinion. That's why for me 1 subscription is linked to 1 destination (or at least of the same type IMO). If you need multiple destinations, then it's because that's for a difference scenario and thus a separate resource, but that's just my opinion.

I have also reviewed the PR and I'm curious about how you will choose between HTTP and Event Grid. Will you include the extra CRD that @tomkerkhove proposed for it? Will it be part of the current CRD? Once again, from my absoluto ignorance, IDK how many different targets we would have, but if there are several of them, maybe another CRD for defining them instead of adding all the options to current is more flexible

The current PR is not according to what we specced out and I believe @SpiritZhou is going to update the PR to align with it but he's waiting for our outcome on the single or multiple destination per type.

The original proposal which we agreed on is this:

apiVersion: events.keda.sh/v1alpha1
kind: EventSource # Or ClusterEventSource
metadata:
  name: operations-cross-cluster-events
spec:
  destination:
    http:
      uri: http://foo.bar/
      authentication:
        apiKey:
          headerName: x-api-key
          valueFrom:
            secretKeyRef:
            name: secrets-operations-events
            key: webhook-api-key
    azureEventgrid:
      topicEndpoint: https://{resource-name}.{region}.eventgrid.azure.net/api/events # Mandatory
      authentication: # End-users must use accessKey or activeDirectory
        accessKey:
          # Allow end-users to pull information from Kubernetes secret or from TriggerAuthentication resources
          valueFrom:
            secretKeyRef:
              name: secrets-operations-events
              key: webhook-api-key
            triggerAuthenticationRef:
              name: trigger-auth-sample
              parameterName: eventGridAuth
        activeDirectory:
          tenantId: xyz
          clientApplication:
            id: ABC
            secret:
              valueFrom:
                secretKeyRef:
                  name: secrets-operations-events
                  key: webhook-api-key
                triggerAuthenticationRef:
                  name: trigger-auth-sample
                  parameterName: eventGridAuth
          managedIdentity:
            valueFrom:
              triggerAuthenticationRef:
                name: trigger-auth-sample
            key: webhook-api-key
  eventSubscription:
    includedEventTypes:
      - keda.example.Event
    excludedEventTypes:
      - keda.example.Event

This allows end-users to use HTTP and/or Azure Event Grid, but only 1 destination and not an array.

The thing that is now brought up is:

  1. Do we allow end-users to use HTTP, Azure Event Grid and future destinations to be mixed in 1 CRD instance?
  2. Do we allow multiple destinations of the same type? (ie multiple HTTP endpoints)

Personally I'd say no to both and they need separate EventSource instances so that we can offer proper error metric per resource, log per resource, proper status on CRD, etc.

But curious about your thoughts @zroubalik & @JorTurFer.

@JorTurFer
Copy link
Member

I think that we can start with 1-1 and over it, iterate. I mean, currently we can support 1 destination per EventSource, and then based on feedback, we could update it to support multiple destinations or not.

I have a question here. Does it make sense to have another CRD for destinations and link to it in the EventSource? I mean, could a situation like a cluster admin setting allowed destinations (and teams just using them inside their EventSources) be a real world scenario or it doesn't apply here?

@tomkerkhove
Copy link
Member Author

That's a valid question, but instead I'd introduce crd in the future then for cluster admiks to define what is (not) allowed and use the for event source validation going forward.

I think that is nicer rather than adding more resources because of destinations?

@zroubalik
Copy link
Member

These are valid points, we should think about what is the main usecase though for mutliple destinations.

Is it that admin would like to create a different types of eventSubscription and for each type, he would like to add multiple destinations? If so, then 1 CRD is probably better than creating a separate CRD for each relation.

But I agree that we can start with 1-1 relation, but don't block ourselves on adding more in the future (in this case we shouldn't probably change the CRD spec, ie going from a single field to array).

@tomkerkhove
Copy link
Member Author

Based on @JorTurFer's remarks above and a chat I've had with @zroubalik we've agreed to go with:

  • CloudEventSource instead of EventSource in case a new format/pattern comes in the future
  • Allow 1 destination per type, and not use an array (ie 1 HTTP destination, 1 for Azure Event Grid, 1 for AWS/GCP/...)

I have a question here. Does it make sense to have another CRD for destinations and link to it in the EventSource? I mean, could a situation like a cluster admin setting allowed destinations (and teams just using them inside their EventSources) be a real world scenario or it doesn't apply here?

We'll start simple and will evaluate this if it comes up in the future

@tomkerkhove
Copy link
Member Author

This is done

@tomkerkhove
Copy link
Member Author

Re-opening to better list events supported in our docs

@tomkerkhove
Copy link
Member Author

@SpiritZhou did we add cluster-wide CRD already or should we open a separate issue for this?

@tomkerkhove
Copy link
Member Author

@SpiritZhou did we add cluster-wide CRD already or should we open a separate issue for this?

@SpiritZhou any update on this?

@neelanjan00
Copy link
Contributor

Any ETA on this? I'd like to contribute to a few of the CloudEvents integrations, is this issue a blocker for them?

@tomkerkhove
Copy link
Member Author

What are you planning on contributing? The crds are already in

@neelanjan00
Copy link
Contributor

What are you planning on contributing? The crds are already in

I can start with this one: #3527
It looks easy to begin with 😅

@tomkerkhove
Copy link
Member Author

Sure, you should be able to get started - Thanks @neelanjan00!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloudevents All events related to CloudEvents to extend KEDA extensibility All issues related to extensibility of KEDA feature All issues for new features that have been committed to operations
Projects
Status: In Progress
Development

No branches or pull requests

5 participants