Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add annotation to ignore local storage volume during scale down #5594

Conversation

vadasambar
Copy link
Member

@vadasambar vadasambar commented Mar 14, 2023

  • this is so that scale down is not blocked on local storage volume
  • for pods where it is okay to ignore local storage volume
    Signed-off-by: vadasambar surajrbanakar@gmail.com

What type of PR is this?

/kind feature

What this PR does / why we need it:

Some pods like istio sidecars have local storage which block node scale down. We would like to have the ability to ignore local storage volumes on certain pods provided the right annotation of the form cluster-autoscaler.kubernetes.io/ignore-local-storage-volume: <volume-name> is present on the pod.

Which issue(s) this PR fixes:

Fixes #3947

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added: cluster-autoscaler now supports a new pod annotation `cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes: "volume-1,volume-2"` to evict pods with local storage volumes (where `volume-1` and `volume-2` are safe to evict local storage volumes) during node scale down.  

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

TBD

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 14, 2023
@vadasambar
Copy link
Member Author

Test suite is failing because the tests haven't been updated for the new annotation yet.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 15, 2023
@@ -38,6 +38,8 @@ const (
// PodSafeToEvictKey - annotation that ignores constraints to evict a pod like not being replicated, being on
// kube-system namespace or having a local storage.
PodSafeToEvictKey = "cluster-autoscaler.kubernetes.io/safe-to-evict"
// IgnoreLocalStorageVolumeKey - annotation that ignores (doesn't block on) a local storage volume during scale down
IgnoreLocalStorageVolumeKey = "cluster-autoscaler.kubernetes.io/ignore-local-storage-volume"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better naming suggestions are welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"safe-to-evict" is a widely recognized concept at this point, maybe we could reuse it, e.g. cluster-autoscaler.kubernetes.io/local-volume-safe-to-evict? I know we're not technically evicting the volume, but I think it conveys the intent better. Or maybe include "scale down" and "blocking" instead, like cluster-autoscaler.kubernetes.io/scale-down-non-blocking-local-volume? Just "ignore" doesn't tell much about what purpose it's being ignored for.

On a slightly related note, I don't think we should restrict ourselves to just one volume here, what if there's more that should be ignored? WDYT about making the value of the annotation a comma-separated list of volume names instead?

@@ -115,6 +115,7 @@ func TestGetPodsToMove(t *testing.T) {
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "empty-vol",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added volume name because it is a required field in Kubernetes. I realized the tests in this files were failing because I wasn't handling the case for empty volume name here. This has been fixed in this PR here.

I thought about removing the volume name above since I am handling the case for empty volume name but I kept it since this makes the manifest closer to actual manifest on Kubernetes.

suraj@suraj:~/Downloads$ kubectl explain pod.spec.volumes | grep name
     Volume represents a named volume in a pod that may be accessed by any
   name	<string> -required-
     name of the volume. Must be a DNS_LABEL and unique within the pod. More
     https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
     PersistentVolumeClaim in the same namespace. More info:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, that makes sense.

@@ -136,6 +137,7 @@ func TestGetPodsToMove(t *testing.T) {
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "my-repo",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vadasambar
Copy link
Member Author

Can I ask for a review on this before I go and test it on a live cluster 🙏

Asking this because I think it's a small change and I think the code should work. If possible, I want to address the comments first, get an OK and go test things on a live cluster. This would save me one (or more) round(s) of manual testing 🙇

@vadasambar vadasambar marked this pull request as ready for review March 16, 2023 09:31
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2023
@k8s-ci-robot k8s-ci-robot requested a review from x13n March 16, 2023 09:31
@towca
Copy link
Collaborator

towca commented Mar 20, 2023

Hi! We already have an annotation that you can use to mark a pod as "never blocking scale down" - cluster-autoscaler.kubernetes.io/safe-to-evict=true. Maybe that's enough for your use-case? If not, could you share why? The "pod blocking scale-down" logic basically guesses whether it's safe to evict a given pod. If you're annotating the pod manually anyway, shouldn't you be able to tell if it's generally safe to evict (so there's no need to guess)?

@vadasambar
Copy link
Member Author

Hi! We already have an annotation that you can use to mark a pod as "never blocking scale down" - cluster-autoscaler.kubernetes.io/safe-to-evict=true. Maybe that's enough for your use-case? If not, could you share why? The "pod blocking scale-down" logic basically guesses whether it's safe to evict a given pod. If you're annotating the pod manually anyway, shouldn't you be able to tell if it's generally safe to evict (so there's no need to guess)?

@towca had the same question myself. Quoting a snippet from @ldemailly's comment from here:

the annotation would evict any pod; while what we want is have the sidecar not prevent eviction but also not change the logic for the main/app container

Copy link
Collaborator

@towca towca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I get it, it makes a lot of sense for sidecars, thanks for the explanation. Overall the PR looks good, I just had a couple comments. Also, could you please squash the intermediate commits? I think this is all pretty self-contained and 1 commit for the whole PR should suffice.

@@ -115,6 +115,7 @@ func TestGetPodsToMove(t *testing.T) {
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "empty-vol",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, that makes sense.

@@ -489,6 +509,16 @@ func TestDrain(t *testing.T) {
expectBlockingPod: &BlockingPod{Pod: emptydirPod, Reason: LocalStorageRequested},
expectDaemonSetPods: []*apiv1.Pod{},
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider also including test cases for pod with local volume(s) and:

  • The annotation value being empty (=no local volumes match, pod is blocking)
  • The annotation value not matching any local volumes on the pod (=pod is blocking)
  • The annotation value only matching one local volume on the pod, but there is another one without the annotation (=pod is still blocking)
  • [if we have multiple values available] Multiple annotation values matching all local volumes on the pod (=pod is not blocking)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them make sense. I came up with a couple more after reading your comment.

Copy link
Member Author

@vadasambar vadasambar Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the new test cases. Thank you for the suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@@ -38,6 +38,8 @@ const (
// PodSafeToEvictKey - annotation that ignores constraints to evict a pod like not being replicated, being on
// kube-system namespace or having a local storage.
PodSafeToEvictKey = "cluster-autoscaler.kubernetes.io/safe-to-evict"
// IgnoreLocalStorageVolumeKey - annotation that ignores (doesn't block on) a local storage volume during scale down
IgnoreLocalStorageVolumeKey = "cluster-autoscaler.kubernetes.io/ignore-local-storage-volume"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"safe-to-evict" is a widely recognized concept at this point, maybe we could reuse it, e.g. cluster-autoscaler.kubernetes.io/local-volume-safe-to-evict? I know we're not technically evicting the volume, but I think it conveys the intent better. Or maybe include "scale down" and "blocking" instead, like cluster-autoscaler.kubernetes.io/scale-down-non-blocking-local-volume? Just "ignore" doesn't tell much about what purpose it's being ignored for.

On a slightly related note, I don't think we should restrict ourselves to just one volume here, what if there's more that should be ignored? WDYT about making the value of the annotation a comma-separated list of volume names instead?

@towca
Copy link
Collaborator

towca commented Mar 24, 2023

/assign @towca

@vadasambar
Copy link
Member Author

vadasambar commented Mar 28, 2023

"safe-to-evict" is a widely recognized concept at this point, maybe we could reuse it, e.g. cluster-autoscaler.kubernetes.io/local-volume-safe-to-evict? I know we're not technically evicting the volume, but I think it conveys the intent better. Or maybe include "scale down" and "blocking" instead, like cluster-autoscaler.kubernetes.io/scale-down-non-blocking-local-volume? Just "ignore" doesn't tell much about what purpose it's being ignored for.

I think safe-to-evict conveys the intent better than ignore. cluster-autoscaler.kubernetes.io/scale-down-non-blocking-local-volume is good to but when I read it I thought of 2 possible readings of the annotation:

  1. scale-down the non-blocking-local-volume
  2. scale-down-non-blocking (for node) local-volume

ignore-local-storage-volume also has the same vagueness about it e.g., does it mean ignore storage volume during scale down or ignore local storage volume for something else?

I think safe-to-evict-local-volume might be slightly easier to remember than local-volume-safe-to-evict. WDYT?

P.S.: Thank you for the suggestions.

@vadasambar
Copy link
Member Author

vadasambar commented Mar 28, 2023

On a slightly related note, I don't think we should restrict ourselves to just one volume here, what if there's more that should be ignored? WDYT about making the value of the annotation a comma-separated list of volume names instead?

I like the idea. 👍 Let me think about it. Thank you for the suggestion!

P.S.: updated the code to support this idea

@vadasambar
Copy link
Member Author

Also, could you please squash the intermediate commits?

Wanted to squash but after seeing your comments I think I need to add more commits. Do you mind if I do squashing at the end before we merge the PR?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2023
@@ -38,6 +39,8 @@ const (
// PodSafeToEvictKey - annotation that ignores constraints to evict a pod like not being replicated, being on
// kube-system namespace or having a local storage.
PodSafeToEvictKey = "cluster-autoscaler.kubernetes.io/safe-to-evict"
// SafeToEvictLocalVolumeKey - annotation that ignores (doesn't block on) a local storage volume during node scale down
SafeToEvictLocalVolumeKey = "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volume"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the annotation key here to safe-to-evict-local-volume instead of the suggested local-volume-safe-to-evict because I think it's easier to read and remember. This is of course my thinking. Happy to change it based on review comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation key name is being discussed in this comment: #5594 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name LGTM, but I'd make it plural if we allow multiple volumes - safe-to-evict-local-volumes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to safe-to-evict-local-volumes

@vadasambar
Copy link
Member Author

Testing

Without the annotation

image

Here's what the pod looks like
suraj@suraj:~/bin$ k get po nginx-777b5694f6-hfvx6 -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/default-container: nginx
    kubectl.kubernetes.io/default-logs-container: nginx
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
  creationTimestamp: "2023-04-12T05:27:23Z"
  generateName: nginx-777b5694f6-
  labels:
    app: nginx
    pod-template-hash: 777b5694f6
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: nginx
    service.istio.io/canonical-revision: latest
  name: nginx-777b5694f6-hfvx6
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-777b5694f6
    uid: 83078dec-aedd-433e-bde3-4f6ed87540f0
  resourceVersion: "35993"
  uid: ce4c95f7-7a5d-4e8f-820e-e40e901cb54e
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-wknsc
      readOnly: true
  - args:
    - proxy
    - sidecar
    - --domain
    - $(POD_NAMESPACE).svc.cluster.local
    - --proxyLogLevel=warning
    - --proxyComponentLogLevel=misc:error
    - --log_output_level=default:info
    - --concurrency
    - "2"
    env:
    - name: JWT_POLICY
      value: third-party-jwt
    - name: PILOT_CERT_PROVIDER
      value: istiod
    - name: CA_ADDR
      value: istiod.istio-system.svc:15012
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: INSTANCE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: PROXY_CONFIG
      value: |
        {}
    - name: ISTIO_META_POD_PORTS
      value: |-
        [
        ]
    - name: ISTIO_META_APP_CONTAINERS
      value: nginx
    - name: ISTIO_META_CLUSTER_ID
      value: Kubernetes
    - name: ISTIO_META_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: ISTIO_META_INTERCEPTION_MODE
      value: REDIRECT
    - name: ISTIO_META_WORKLOAD_NAME
      value: nginx
    - name: ISTIO_META_OWNER
      value: kubernetes://apis/apps/v1/namespaces/default/deployments/nginx
    - name: ISTIO_META_MESH_ID
      value: cluster.local
    - name: TRUST_DOMAIN
      value: cluster.local
    image: docker.io/istio/proxyv2:1.17.1
    imagePullPolicy: IfNotPresent
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP
    readinessProbe:
      failureThreshold: 30
      httpGet:
        path: /healthz/ready
        port: 15021
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 40Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/workload-spiffe-uds
      name: workload-socket
    - mountPath: /var/run/secrets/credential-uds
      name: credential-socket
    - mountPath: /var/run/secrets/workload-spiffe-credentials
      name: workload-certs
    - mountPath: /var/run/secrets/istio
      name: istiod-ca-cert
    - mountPath: /var/lib/istio/data
      name: istio-data
    - mountPath: /etc/istio/proxy
      name: istio-envoy
    - mountPath: /var/run/secrets/tokens
      name: istio-token
    - mountPath: /etc/istio/pod
      name: istio-podinfo
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-wknsc
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,15020
    - --log_output_level=default:info
    image: docker.io/istio/proxyv2:1.17.1
    imagePullPolicy: IfNotPresent
    name: istio-init
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 40Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: false
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-wknsc
      readOnly: true
  nodeName: gke-cluster-1-default-pool-87aace2f-vwr9
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: workload-socket
  - emptyDir: {}
    name: credential-socket
  - emptyDir: {}
    name: workload-certs
  - emptyDir:
      medium: Memory
    name: istio-envoy
  - emptyDir: {}
    name: istio-data
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: istio-podinfo
  - name: istio-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: istio-ca
          expirationSeconds: 43200
          path: istio-token
  - configMap:
      defaultMode: 420
      name: istio-ca-root-cert
    name: istiod-ca-cert
  - name: kube-api-access-wknsc
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:27:25Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:27:26Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:27:26Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:27:23Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://9f2762d4f98efa5cde0070c6679fabbfcf099c450aba1a0b72f073eacd7c607b
    image: docker.io/istio/proxyv2:1.17.1
    imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
    lastState: {}
    name: istio-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-04-12T05:27:25Z"
  - containerID: containerd://e7e2a70b7a4632a9105d58cd7f19d9f5d5a7c84621027172dddf8e2ba39e2461
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:82c42833dbed48dc403246c532caef3aa7f5c6b5633d74a17ae565745f7215e9
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-04-12T05:27:25Z"
  hostIP: 10.128.15.201
  initContainerStatuses:
  - containerID: containerd://f4452f732aa8fea3145b5281b55a097cb8c7f12c319162349baf31727a26e38c
    image: docker.io/istio/proxyv2:1.17.1
    imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
    lastState: {}
    name: istio-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://f4452f732aa8fea3145b5281b55a097cb8c7f12c319162349baf31727a26e38c
        exitCode: 0
        finishedAt: "2023-04-12T05:27:24Z"
        reason: Completed
        startedAt: "2023-04-12T05:27:24Z"
  phase: Running
  podIP: 10.4.3.5
  podIPs:
  - ip: 10.4.3.5
  qosClass: Burstable
  startTime: "2023-04-12T05:27:23Z"

With the annotation

These are the local storage volumes present on the pod
image

Added annotation
image
image

Node is being scaled down
image

Here's what the pod looks like
suraj@suraj:~/bin$ k get po nginx-7bd5c976c7-tctrh -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes: workload-socket,credential-socket,workload-certs,istio-envoy,istio-data
    kubectl.kubernetes.io/default-container: nginx
    kubectl.kubernetes.io/default-logs-container: nginx
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
  creationTimestamp: "2023-04-12T05:57:32Z"
  generateName: nginx-7bd5c976c7-
  labels:
    app: nginx
    pod-template-hash: 7bd5c976c7
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: nginx
    service.istio.io/canonical-revision: latest
  name: nginx-7bd5c976c7-tctrh
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-7bd5c976c7
    uid: 53caba4f-d437-4e53-b108-ac2ac5a8a7d5
  resourceVersion: "53326"
  uid: 5607da02-69d1-452c-b049-3348f69696d0
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-lfqxf
      readOnly: true
  - args:
    - proxy
    - sidecar
    - --domain
    - $(POD_NAMESPACE).svc.cluster.local
    - --proxyLogLevel=warning
    - --proxyComponentLogLevel=misc:error
    - --log_output_level=default:info
    - --concurrency
    - "2"
    env:
    - name: JWT_POLICY
      value: third-party-jwt
    - name: PILOT_CERT_PROVIDER
      value: istiod
    - name: CA_ADDR
      value: istiod.istio-system.svc:15012
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: INSTANCE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: PROXY_CONFIG
      value: |
        {}
    - name: ISTIO_META_POD_PORTS
      value: |-
        [
        ]
    - name: ISTIO_META_APP_CONTAINERS
      value: nginx
    - name: ISTIO_META_CLUSTER_ID
      value: Kubernetes
    - name: ISTIO_META_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: ISTIO_META_INTERCEPTION_MODE
      value: REDIRECT
    - name: ISTIO_META_WORKLOAD_NAME
      value: nginx
    - name: ISTIO_META_OWNER
      value: kubernetes://apis/apps/v1/namespaces/default/deployments/nginx
    - name: ISTIO_META_MESH_ID
      value: cluster.local
    - name: TRUST_DOMAIN
      value: cluster.local
    image: docker.io/istio/proxyv2:1.17.1
    imagePullPolicy: IfNotPresent
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP
    readinessProbe:
      failureThreshold: 30
      httpGet:
        path: /healthz/ready
        port: 15021
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 40Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/workload-spiffe-uds
      name: workload-socket
    - mountPath: /var/run/secrets/credential-uds
      name: credential-socket
    - mountPath: /var/run/secrets/workload-spiffe-credentials
      name: workload-certs
    - mountPath: /var/run/secrets/istio
      name: istiod-ca-cert
    - mountPath: /var/lib/istio/data
      name: istio-data
    - mountPath: /etc/istio/proxy
      name: istio-envoy
    - mountPath: /var/run/secrets/tokens
      name: istio-token
    - mountPath: /etc/istio/pod
      name: istio-podinfo
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-lfqxf
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,15020
    - --log_output_level=default:info
    image: docker.io/istio/proxyv2:1.17.1
    imagePullPolicy: IfNotPresent
    name: istio-init
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 40Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: false
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-lfqxf
      readOnly: true
  nodeName: gke-cluster-1-default-pool-87aace2f-vwr9
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: workload-socket
  - emptyDir: {}
    name: credential-socket
  - emptyDir: {}
    name: workload-certs
  - emptyDir:
      medium: Memory
    name: istio-envoy
  - emptyDir: {}
    name: istio-data
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: istio-podinfo
  - name: istio-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: istio-ca
          expirationSeconds: 43200
          path: istio-token
  - configMap:
      defaultMode: 420
      name: istio-ca-root-cert
    name: istiod-ca-cert
  - name: kube-api-access-lfqxf
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:57:34Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:57:35Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:57:35Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-04-12T05:57:32Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://56f5be6512dcb1edf444e06cbcf1e09e7ca55d9c1bce7f126f0a795b77d69cec
    image: docker.io/istio/proxyv2:1.17.1
    imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
    lastState: {}
    name: istio-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-04-12T05:57:34Z"
  - containerID: containerd://248f080691804d591428bcda1f4546e83eec20df1ab16c96eeb5a88c23b540e0
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:82c42833dbed48dc403246c532caef3aa7f5c6b5633d74a17ae565745f7215e9
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-04-12T05:57:34Z"
  hostIP: 10.128.15.201
  initContainerStatuses:
  - containerID: containerd://464130fec23a9688f1355436b2a93ca5dd8dac859a4c5ad1127395bccee2e67a
    image: docker.io/istio/proxyv2:1.17.1
    imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
    lastState: {}
    name: istio-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://464130fec23a9688f1355436b2a93ca5dd8dac859a4c5ad1127395bccee2e67a
        exitCode: 0
        finishedAt: "2023-04-12T05:57:33Z"
        reason: Completed
        startedAt: "2023-04-12T05:57:33Z"
  phase: Running
  podIP: 10.4.3.6
  podIPs:
  - ip: 10.4.3.6
  qosClass: Burstable
  startTime: "2023-04-12T05:57:32Z"

CA params for both the cases above

image

@vadasambar
Copy link
Member Author

@towca I think I'm done from my end. Waiting for your review.

@towca
Copy link
Collaborator

towca commented Apr 14, 2023

Everything looks good, thanks for the contribution!

I have one more small nit if you agree and want to fix, otherwise feel free to unhold the PR.

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 14, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2023
- this is so that scale down is not blocked on local storage volume
- for pods where it is okay to ignore local storage volume
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing
- there was a problem in the logic
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add unit test for `IgnoreLocalStorageVolumeKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `IgnoreLocalStorageVolumeKey`  in tests instead of hardcoding the annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: wording for test name
- `pod with EmptyDir but IgnoreLocalStorageVolumeKey annotation` -> `pod with EmptyDir and IgnoreLocalStorageVolumeKey annotation`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: simulator drain tests failing
- set local storage vol name (required)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: add support for multiple vals in `safe-to-evict-local-volume` annotation
- add more unit tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename ignore local vol key `safe-to-evict-local-volume` -> `safe-to-evict-local-volumes`
- abtract code to process annotation into a separate fn
- shorten name for test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update FAQ with info about `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add the FAQ for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix formatting for `safe-to-evict-local-volumes` in FAQ
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: format the `safe-to-evict-local-volumes` as a bullet
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix `Unless` -> `unless` to make it consistent with other lines
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add an extra test for mismatching local vol value in annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: make the wording clearer
- for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
@vadasambar vadasambar force-pushed the feat/3947/ignore-some-local-storage-volumes branch from c37f4fd to b663f13 Compare April 17, 2023 04:23
@vadasambar
Copy link
Member Author

Everything looks good, thanks for the contribution!

I have one more small nit if you agree and want to fix, otherwise feel free to unhold the PR.

/lgtm /approve /hold

Thank you for the review. I have updated the PR to address the last comment.

@vadasambar
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2023
@vadasambar
Copy link
Member Author

/lgtm

@k8s-ci-robot
Copy link
Contributor

@vadasambar: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vadasambar
Copy link
Member Author

@towca need a /lgtm to merge the PR. It was removed automatically after I updated the PR to address the last comment.

@towca
Copy link
Collaborator

towca commented Apr 17, 2023

Thanks again!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: towca, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 1009797 into kubernetes:master Apr 17, 2023
@jonathan-mothership
Copy link

Nice feature @vadasambar!

@MaciekPytel any idea on timeline for this to get released? It doesn't seem to be in latest which was cut about a month ago.

@vadasambar
Copy link
Member Author

vadasambar commented Apr 28, 2023

Nice feature @vadasambar!

@MaciekPytel any idea on timeline for this to get released? It doesn't seem to be in latest which was cut about a month ago.

You can find it in 1.27 which will be released soon (check cluster-autoscaler-release-1.27 branch. For other versions, we might have to backport.

P.S.: https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-release-1.27/cluster-autoscaler/utils/drain/drain.go#L43

@thesuperzapper
Copy link

@vadasambar @towca what do we think about allowing cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes to be set as "*", so that all local volumes for a Pod will be ignored, rather than having to specify them all?

@vadasambar
Copy link
Member Author

@vadasambar @towca what do we think about allowing cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes to be set as "*", so that all local volumes for a Pod will be ignored, rather than having to specify them all?

I think there is a room for improvement in the annotation. "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "*" will have the same effect as "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" (ref). There are 2 cases here:

  1. You have set "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" (pod is not safe to evict)
  2. You have set "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" (pod is safe to evict; if this is set "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes" annotation will be ignored)

In 1, if you add "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "*", it will have the same effect as setting 2
In 2, if you add "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "*", it will have the same effect as setting 2

Unless there is a specific usecase, I don't see a lot of value in adding "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "*" because we already support "cluster-autoscaler.kubernetes.io/safe-to-evict": "true".

I guess what you want is "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "<container-name>:*" e.g., "cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes": "istio:*" which makes sense because it provides better user experience than specifying the entire list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to ignore some containers (sidecars) when evaluating if a pod is evictable or not
5 participants