Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IRSA for SQS Scalar #837

Closed
NasAmin opened this issue May 19, 2020 · 10 comments
Closed

Support IRSA for SQS Scalar #837

NasAmin opened this issue May 19, 2020 · 10 comments
Labels
support All issues related to questions and supporting customers

Comments

@NasAmin
Copy link

NasAmin commented May 19, 2020

A clear and concise description of what you want to happen.

We use EKS as our kubernetes cluster. To allow our pods to authenticate against AWS to access AWS services, we use IAM Roles for Service Accounts (IRSA). We'd like to use the same approach on the KEDA operator so the scalar can get AWS authentication from the operator.

Specification

  • I have enabled IRSA on the service account for the KEDA operator on my EKS cluster
  • This means that the service account is annotated with an IAM role.
  • KEDA operator was deployed using the Helm chart
  • I have deployed the SQS scalar with the following config
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-queue-scaledobject
  namespace: default
  labels:
    test: my-deployment
spec:
  scaleTargetRef:
    deploymentName: my-deployment
  minReplicaCount: 1
  maxReplicaCount: 10
  pollingInterval: 5
  triggers:
  - type: aws-sqs-queue
    metadata:
      # Required: queueURL
      queueURL: https://sqs.eu-west-2.amazonaws.com/someaccount/cluster-AuditEventsQueue
      queueLength: "5"  # Default: "5"
      # Required: awsRegion
      awsRegion: "eu-west-2" 
      identityOwner: operator 
  • An HPA is created automatically but it gives the following error
arning  FailedGetExternalMetric       81s (x60 over 16m)  horizontal-pod-autoscaler  unable to get external metric default/AWS-SQS-Queue-ApproximateNumberOfMessages-cluster-AuditEventsQueue/&LabelSelector{MatchLabels:map[string]string{deploymentName: my-deployment,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: No matching metrics found for aws-sqs-queue-approximatenumberofmessages-cluster-auditeventsqueue
  • I have enabled debug logs on the operator and I am seeing this error
 {"level":"debug","ts":1589916310.2131512,"logger":"scalehandler","msg":"Error getting scale decision","ScaledObject.Namespace":"default","ScaledObject.Name":"my-sqs-queue-scaledobject","ScaledObjec ││ t.ScaleType":"deployment","Error":"WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: p ││ ermission denied"}

I suspect this may be because the SQS scalar isn't using the right SDK version

I'd really appreciate some help with this.

Regards

Nas

@NasAmin NasAmin added feature-request All issues for new features that have not been committed to needs-discussion labels May 19, 2020
@ahmelsayed
Copy link
Contributor

according to go.sum, we're using aws-sdk-go v1.25.6 which should be fine. The error message is about not being able to find file /var/run/secrets/eks.amazonaws.com/serviceaccount/token, which according to this, should be injected into the deployment in the form of

  env:
  - name: AWS_ROLE_ARN
    value: arn:aws:iam::123456789012:role/eksctl-irptest-addon-iamsa-default-my-serviceaccount-Role1-UCGG6NDYZ3UE
  - name: AWS_WEB_IDENTITY_TOKEN_FILE
    value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
  volumeMounts:
  - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
volumes:
- name: aws-iam-token
  projected:
    defaultMode: 420
    sources:
    - serviceAccountToken:
        audience: sts.amazonaws.com
        expirationSeconds: 86400
        path: token

If you do

kubectl get deployment keda-operator -n keda -o yaml

do you see those added on the deployment?

@NasAmin
Copy link
Author

NasAmin commented May 20, 2020

@ahmelsayed Thanks for the quick response.
Yes EKS automatically injects AWS credentials into the pod (not deployment).
When I describe the keda operator pod, I get the following

    env:
    - name: WATCH_NAMESPACE
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: OPERATOR_NAME
      value: keda-operator
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::032356282346:role/nas-dev-pod-sqs-access
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: docker.io/kedacore/keda:1.4.1
    imagePullPolicy: Always
    name: keda-operator
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: keda-operator-token-crlp9
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true

As you can see the pod does seem to the the right credentials mounted to it.
So it would seem that the pod does not actually has permissions to access that path.
When I try to ssh into the pod and go to that pod and try to view that path I get a permissios denied

bash-4.4$ ls /var/run/secrets/eks.amazonaws.com/serviceaccount/token
/var/run/secrets/eks.amazonaws.com/serviceaccount/token
bash-4.4$ cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token
cat: /var/run/secrets/eks.amazonaws.com/serviceaccount/token: Permission denied
bash-4.4$ cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token

So I am not really sure what I can do. I would appreciate any help

Thanks

@NasAmin
Copy link
Author

NasAmin commented May 22, 2020

I would really appreciate if anyone can help. Currently we are using a custom auto scalar based on a public GitHub repo. We'd like to get away from that and use KEDA where possible.

I originally created this issue as a feature request but it seems like IRSA should already be supported. Can it be changed to a defect?

Regards,

Nas

@tomkerkhove tomkerkhove added support All issues related to questions and supporting customers and removed feature-request All issues for new features that have not been committed to needs-discussion labels May 25, 2020
@ahmelsayed
Copy link
Contributor

I wonder if it's the same issue as aws/amazon-eks-pod-identity-webhook#8

what do you see if you run

$ id

$ ls -alh /var/run/secrets/eks.amazonaws.com/serviceaccount/

keda container doesn't run as root by default

There is a workaround descriped here kubernetes-sigs/external-dns#1185 (comment) but I haven't verified it.

@RaymondKYLiu
Copy link

Hi,

I have ran into IRSA problem with Grafana, I am not sure KEDA is similar to it ? grafana/grafana#20473 (comment)

The solution is to add securityContext.

Could you try to add to KEAD operator ?

securityContext:
  fsGroup: 1001
  runAsGroup: 1001
  runAsUser: 1001

@ben11211
Copy link

Can confirm this works with IRSA with the following

  • keda-operator service account annotated with an EKS role with appropriate sqs:GetQueueAttributes permissions.
  • identityOwner: operator configured for the trigger
  • Appropriate securityContext per Support IRSA for SQS Scalar #837 (comment).

@zroubalik
Copy link
Member

@ben11211 thanks! Would you mind contributing this info to the Troubleshooting guide? Thanks!

https://keda.sh/docs/2.0/troubleshooting/

@mzupan
Copy link

mzupan commented Nov 27, 2020

The one thing that got me using the helm chart was thinking that worked setting the context's here

https://github.com/kedacore/charts/blob/master/keda/values.yaml#L76

That sets the context for the containers but the securitycontexts need to go in the section for the pod. I had to fork the helm chart to make that change

@zroubalik
Copy link
Member

@mzupan keen to send a PR for this?

@NasAmin
Copy link
Author

NasAmin commented Mar 2, 2021

Sorry for taking such a long time to get back to this.
I can confim that setting podSecurityContext fixes my problem
Given that:

  • I have an IAM role with the appropriate SQS queue permissions
  • I have annotated the operator service account with that IAM role
  • I have set the security context like this for the operator
podSecurityContext:
  fsGroup: 1001
  runAsGroup: 1001
  runAsUser: 1001

Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support All issues related to questions and supporting customers
Projects
None yet
Development

No branches or pull requests

7 participants