Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bubble Up vault errors to ScaledObject Events #5190

Closed
BojanZelic opened this issue Nov 16, 2023 · 4 comments · Fixed by #5219
Closed

Bubble Up vault errors to ScaledObject Events #5190

BojanZelic opened this issue Nov 16, 2023 · 4 comments · Fixed by #5219
Assignees
Labels
feature-request All issues for new features that have not been committed to needs-discussion

Comments

@BojanZelic
Copy link
Contributor

Proposal

Any errors in vault TriggerAuth configurations such as permissions or connectivity problems make it confusing to debug. Since the vault configuration lies within each ScaledObject, it could bubble up more descriptive errors when it's misconfigured instead of having to parse keda logs to find the root cause.

Use-Case

For example; when defining a ScaledObject and a TriggerAuth with an incorrect params;

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
...
  - authenticationRef:
      kind: TriggerAuthentication
      name: my-trigger-auth
    metadata:
...
    type: kafka
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
...
spec:
  hashiCorpVault:
    address: my-vault-server
    authentication: kubernetes
    credential:
      serviceAccount: /var/run/secrets/kubernetes.io/serviceaccount/token
    mount: cluster
    role: keda-triggerauth
    secrets:
    - key: MY_PASSWORD_KEY
      parameter: password
      path: secret_v2/data/my-secret-path
    - key: MY_USERNAME_KEY
      parameter: username
      path: secret_v2/data/my-secret-path

& lets say that our user can't auth to vault because of some permission issues and we get the following:

2023-11-16T21:28:40Z    ERROR   scale_handler   error authenticate to Vault     {"type": "ScaledObject", "namespace": "my-namespace", "name": "my-scaled-object", "triggerAuthRef.Name": "my-trigger-auth", "error": "Error making API request.\n\nURL: PUT https://my-vault-server/v1/auth/cluster/login\nCode: 403. Errors:\n\n* permission denied"}
2023-11-16T21:28:40Z    ERROR   Reconciler error        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"my-scaled-object","namespace":"my-namespace"}, "namespace": "my-namespace", "name": "my-scaled-object", "reconcileID": "2e5660bd-9d71-46ee-9b93-23141af2f92d", "error": "error parsing kafka metadata: no username given"}

this gives us the following Events on the ScaledObject

Warning  KEDAScalerFailed         38s (x17 over 6m6s)    keda-operator  error parsing kafka metadata: no username given

it's a confusing error message to debug; it would be easier if it exposed the direct error from vault:
ex:

Warning  KEDAScalerFailed         38s (x17 over 6m6s)    keda-operator  Error making API request.\n\nURL: PUT https://my-vault-server/v1/auth/cluster/login\nCode: 403. Errors:\n\n* permission denied

Is this a feature you are interested in implementing yourself?

Yes

Anything else?

No response

@BojanZelic BojanZelic added feature-request All issues for new features that have not been committed to needs-discussion labels Nov 16, 2023
@JorTurFer
Copy link
Member

JorTurFer commented Nov 26, 2023

I think that this will be really useful. I have assigned it to you (as you are willing to implement it 😄 )

@zroubalik
Copy link
Member

We can probably also update TriggerAuth controller to check this when TriggerAuth object is created/updated.

@kedacore/keda-maintainers WDYT?

@tomkerkhove
Copy link
Member

I think we should do both:

  • Check during create/update to fail fast
  • Bubble errors as connectivity/ACLs might change outside of KEDA scope, so we still need to better surface them

@BojanZelic
Copy link
Contributor Author

I tried to add support for this in the validating webhook but it's much more complicated then it seems;
since the webhook code lives in v1alpha1 there would be a cyclical dependency between v1alpha1 and the resolvers package where the vault/azuekeyvault logic lives.

I see a path forward but it's not trivial,

  • would either have to rewrite hashicorpvault_handler and azure_keyvault_handler to not depend on v1alpha1 packages by having a new struct & not use the ones in v1alpha1
  • or.... moving the webhooks to a different package (keda/v1alpha1/triggerauthentication_webhook.go -> webhooks or pkg/webhooks and refactoring the code there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants