Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secrets-store dropping connection to registrar, liveness-probe after 30-45sec of deployment, only works correctly after container is restarted #620

Closed
jddexxx opened this issue Jul 8, 2021 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects

Comments

@jddexxx
Copy link

jddexxx commented Jul 8, 2021

What steps did you take and what happened:
When spinning up project, the csi driver is unable to be found.

The namespace for the SecretsProviderClass is the same as the project's namespace (apps)
The driver and provider are running on all nodes, and the project runs on those same nodes
The driver and provider have been up and running in the same namespace
The driver and SecretsProviderClass are all existing with no suspicious entries in log

project describe (clipped):

Mounts:
      /mnt/secrets-store from secrets (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from main-api-token-zllnd (ro)

Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  secrets:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            secrets-store.csi.k8s.io
    FSType:            
    ReadOnly:          true
    VolumeAttributes:      secretProviderClass=aws-secrets
  main-api-token-zllnd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  main-api-token-zllnd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    92s                default-scheduler  Successfully assigned apps/main-api-6d57964949-zcbgm to ip-10-0-132-173.us-east-2.compute.internal
  Warning  FailedMount  29s (x8 over 92s)  kubelet            MountVolume.SetUp failed for volume "secrets" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers

What did you expect to happen:
The driver is found and able to be used.

Anything else you would like to add:

The drivers and providers running:

infrastructure/k8s/mainnet$ kubectl get pods -owide -napps
NAME                                          READY   STATUS             RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-secrets-store-provider-aws-gfwkg          1/1     Running            0          20m   10.0.97.35     ip-10-0-97-35.us-east-2.compute.internal     <none>           <none>
csi-secrets-store-provider-aws-xbrbf          1/1     Running            0          20m   10.0.132.173   ip-10-0-132-173.us-east-2.compute.internal   <none>           <none>
secret-store-secrets-store-csi-driver-rmcfx   3/3     Running            1          20m   10.0.147.94    ip-10-0-132-173.us-east-2.compute.internal   <none>           <none>
secret-store-secrets-store-csi-driver-xl6mp   3/3     Running            1          20m   10.0.108.206   ip-10-0-97-35.us-east-2.compute.internal     <none>           <none>

Logs for secrets-store:

infrastructure/k8s/mainnet$ kubectl logs secret-store-secrets-store-csi-driver-rmcfx secrets-store -napps
I0708 08:46:14.462180       1 exporter.go:33] metrics backend: prometheus
I0708 08:46:15.369386       1 secrets-store.go:74] Driver: secrets-store.csi.k8s.io 
I0708 08:46:15.369409       1 secrets-store.go:75] Version: v0.0.23, BuildTime: 2021-06-10-18:14
I0708 08:46:15.369415       1 secrets-store.go:76] Provider Volume Path: /etc/kubernetes/secrets-store-csi-providers
I0708 08:46:15.369420       1 secrets-store.go:77] GRPC supported providers will be dynamically created
I0708 08:46:15.369434       1 driver.go:80] "Enabling controller service capability" capability="CREATE_DELETE_VOLUME"
I0708 08:46:15.369608       1 driver.go:90] "Enabling volume access mode" mode="SINGLE_NODE_READER_ONLY"
I0708 08:46:15.369617       1 driver.go:90] "Enabling volume access mode" mode="MULTI_NODE_READER_ONLY"
I0708 08:46:15.369887       1 server.go:111] Listening for connections on address: //csi/csi.sock
I0708 08:46:15.369952       1 main.go:157] starting manager

Logs for csi-driver:

infrastructure/k8s/mainnet$ kubectl logs secret-store-secrets-store-csi-driver-rmcfx -napps node-driver-registrar
I0708 08:44:28.749280       1 main.go:113] Version: v2.2.0
I0708 08:44:28.749927       1 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0708 08:44:28.749953       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0708 08:44:28.760042       1 main.go:144] Calling CSI driver to discover driver name
I0708 08:44:28.760109       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0708 08:44:28.760116       1 connection.go:183] GRPC request: {}
I0708 08:44:28.763043       1 connection.go:185] GRPC response: {"name":"secrets-store.csi.k8s.io","vendor_version":"v0.0.23"}
I0708 08:44:28.763111       1 connection.go:186] GRPC error: <nil>
I0708 08:44:28.763121       1 main.go:154] CSI driver name: "secrets-store.csi.k8s.io"
I0708 08:44:28.763145       1 node_register.go:52] Starting Registration Server at: /registration/secrets-store.csi.k8s.io-reg.sock
I0708 08:44:28.763331       1 node_register.go:61] Registration Server started at: /registration/secrets-store.csi.k8s.io-reg.sock
I0708 08:44:28.763471       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0708 08:44:28.891910       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0708 08:44:28.929750       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
E0708 08:44:49.082364       1 connection.go:131] Lost connection to unix:///csi/csi.sock.

CSIDriver list:

infrastructure/k8s/mainnet$ kubectl get csidriver
NAME                       ATTACHREQUIRED   PODINFOONMOUNT   MODES        AGE
efs.csi.aws.com            false            false            Persistent   22h
secrets-store.csi.k8s.io   false            true             Ephemeral    22h

SecretProviderClass correctly exists:

infrastructure/k8s/mainnet$ kubectl describe secretproviderclass aws-secret -napps
Name:         aws-secrets
Namespace:    apps
Labels:       app.kubernetes.io/managed-by=pulumi
Annotations:  <none>
API Version:  secrets-store.csi.x-k8s.io/v1alpha1
Kind:         SecretProviderClass

Provider log:

infrastructure/k8s/mainnet$ kubectl logs csi-secrets-store-provider-aws-gfwkg -napps
I0708 08:44:25.753995       1 main.go:31] Starting secrets-store-csi-driver-provider-aws version 1.0.r1-10-g1942553-2021.06.04.00.07
I0708 08:44:25.852168       1 main.go:72] Listening for connections on address: /etc/kubernetes/secrets-store-csi-providers/aws.sock

SecretsProviderClass config (via pulumi):

export const awsSecrets = new k8s.apiextensions.CustomResource(
  'aws-secrets',
  {
    apiVersion: 'secrets-store.csi.x-k8s.io/v1alpha1',
    kind: 'SecretProviderClass',
    metadata: {
      name: 'aws-secrets',
      namespace: 'apps',
    },
    spec: {  ....

Secrets-related YML for app:

volumeMounts:
  - name: secrets
    mountPath: '/mnt/secrets-store'
    readOnly: true

volumes:
  - name: secrets
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: 'aws-secrets'

Which provider are you using:
[e.g. Azure Key Vault, HashiCorp Vault, etc. Have you checked out the provider's repo for more help?]
aws - although the provider part seems to be irrelevant until the CSIDriver is picked up.

Environment:

  • Secrets Store CSI Driver version: v0.0.23
  • Kubernetes version: (use kubectl version):
infrastructure/k8s/mainnet$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
@jddexxx jddexxx added the kind/bug Categorizes issue or PR as related to a bug. label Jul 8, 2021
@nilekhc
Copy link
Contributor

nilekhc commented Jul 8, 2021

Hello @jddexxx Thanks for reporting this.
From logs, it looks like node registrar lost connection. E0708 08:44:49.082364 1 connection.go:131] Lost connection to unix:///csi/csi.sock.

Also, could you try installing driver and provider in kube-system namespace?

@tam7t
Copy link
Contributor

tam7t commented Jul 8, 2021

Lost connection to unix:///csi/csi.sock is definitely suspicious.

another area to look is for OOM or abnormal process failures. I found Azure/secrets-store-csi-driver-provider-azure#328 on the azure provider of a similar report.

@jddexxx
Copy link
Author

jddexxx commented Jul 8, 2021

@nilekhc running the driver and provider in kube-system has the same result (and same logs)
@tam7t I'd found the same issue report but I don't see any OOM issues. There are <100 secrets running off this node and it doesn't seem as if the memory limit is come even close to. I'm certainly not a kubernetes professional, but I don't see any process failures either (aside from the 1 health check at boot).

description of the pod (after moving it into kube-system)

infrastructure/k8s/mainnet$ kubectl describe pod -nkube-system secret-store-secrets-store-csi-driver-48xnl secrets-store
Name:         secret-store-secrets-store-csi-driver-48xnl
Namespace:    kube-system
Priority:     0
Node:         ip-10-0-97-35.us-east-2.compute.internal/10.0.97.35
Start Time:   Fri, 09 Jul 2021 02:24:55 +0800
Labels:       app=secrets-store-csi-driver
              app.kubernetes.io/instance=secret-store
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=secrets-store-csi-driver
              app.kubernetes.io/version=0.0.23
              controller-revision-hash=69dd75f68f
              helm.sh/chart=secrets-store-csi-driver-0.0.23
              pod-template-generation=1
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Running
IP:           10.0.100.82
IPs:
  IP:           10.0.100.82
Controlled By:  DaemonSet/secret-store-secrets-store-csi-driver
Containers:
  node-driver-registrar:
    Container ID:  docker://632a6ece97aaf875d05eab2dab25a47e6c9f9219712d37e84ca7e3c9ac639300
    Image:         k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.2.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-node-driver-registrar@sha256:2dee3fe5fe861bb66c3a4ac51114f3447a4cd35870e0f2e2b558c7a400d89589
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi/csi.sock
      --kubelet-registration-path=/var/lib/kubelet/plugins/csi-secrets-store/csi.sock
    State:          Running
      Started:      Fri, 09 Jul 2021 02:24:57 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from secrets-store-csi-driver-token-mtt6p (ro)
  secrets-store:
    Container ID:  docker://df7b22da9647bec37328b4f6c5705d4c6fc43d380f7dab97443ea7584894cda1
    Image:         k8s.gcr.io/csi-secrets-store/driver:v0.0.23
    Image ID:      docker-pullable://k8s.gcr.io/csi-secrets-store/driver@sha256:e25d4daca186e8c3b26db395d67af56c7a5d0e3aaf61d36de39ea2f17b21ed98
    Port:          9808/TCP
    Host Port:     0/TCP
    Args:
      --endpoint=$(CSI_ENDPOINT)
      --nodeid=$(KUBE_NODE_NAME)
      --provider-volume=/etc/kubernetes/secrets-store-csi-providers
      --metrics-addr=:8095
      --provider-health-check-interval=2m
      --max-call-recv-msg-size=4194304
    State:          Running
      Started:      Fri, 09 Jul 2021 02:26:41 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 09 Jul 2021 02:24:57 +0800
      Finished:     Fri, 09 Jul 2021 02:26:41 +0800
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     50m
      memory:  100Mi
    Liveness:  http-get http://:healthz/healthz delay=30s timeout=10s period=15s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:    unix:///csi/csi.sock
      KUBE_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /csi from plugin-dir (rw)
      /etc/kubernetes/secrets-store-csi-providers from providers-dir (rw)
      /var/lib/kubelet/pods from mountpoint-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from secrets-store-csi-driver-token-mtt6p (ro)
  liveness-probe:
    Container ID:  docker://2b40f65ef551dac4b85f3fb00530e4cd3fefba3c150a739c9c08c48728fe036e
    Image:         k8s.gcr.io/sig-storage/livenessprobe:v2.3.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/livenessprobe@sha256:1b7c978a792a8fa4e96244e8059bd71bb49b07e2e5a897fb0c867bdc6db20d5d
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
      --probe-timeout=3s
      --http-endpoint=0.0.0.0:9808
      -v=2
    State:          Running
      Started:      Fri, 09 Jul 2021 02:24:57 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from secrets-store-csi-driver-token-mtt6p (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  mountpoint-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:  DirectoryOrCreate
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi-secrets-store/
    HostPathType:  DirectoryOrCreate
  providers-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/secrets-store-csi-providers
    HostPathType:  DirectoryOrCreate
  secrets-store-csi-driver-token-mtt6p:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  secrets-store-csi-driver-token-mtt6p
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  10m                    default-scheduler  Successfully assigned kube-system/secret-store-secrets-store-csi-driver-48xnl to ip-10-0-97-35.us-east-2.compute.internal
  Normal   Pulled     10m                    kubelet            Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.2.0" already present on machine
  Normal   Created    10m                    kubelet            Created container node-driver-registrar
  Normal   Started    10m                    kubelet            Started container node-driver-registrar
  Normal   Pulled     10m                    kubelet            Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.3.0" already present on machine
  Normal   Created    10m                    kubelet            Created container liveness-probe
  Normal   Started    10m                    kubelet            Started container liveness-probe
  Normal   Pulled     8m48s (x2 over 10m)    kubelet            Container image "k8s.gcr.io/csi-secrets-store/driver:v0.0.23" already present on machine
  Normal   Created    8m48s (x2 over 10m)    kubelet            Created container secrets-store
  Normal   Started    8m48s (x2 over 10m)    kubelet            Started container secrets-store
  Warning  Unhealthy  8m48s (x5 over 9m48s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    8m48s                  kubelet            Container secrets-store failed liveness probe, will be restarted

top output:

infrastructure/k8s/mainnet$ kubectl top pods -nkube-system
W0709 02:51:42.131478   69009 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                                          CPU(cores)   MEMORY(bytes)   
aws-node-7jz2t                                4m           41Mi            
aws-node-jlj8k                                5m           44Mi            
coredns-56b458df85-cp7vl                      3m           9Mi             
coredns-56b458df85-j9m7q                      3m           9Mi             
csi-secrets-store-provider-aws-f6958          1m           6Mi             
csi-secrets-store-provider-aws-wl5b8          1m           6Mi             
kube-proxy-6jnmb                              1m           13Mi            
kube-proxy-qw7rd                              3m           13Mi            
metrics-server-9f459d97b-gf7rd                4m           16Mi            
secret-store-secrets-store-csi-driver-48xnl   1m           22Mi            
secret-store-secrets-store-csi-driver-shmb7   1m           22Mi 

both nodes are reporting no pressure of any kind:


MemoryPressure | False | kubelet has sufficient memory available
DiskPressure | False | kubelet has no disk pressure
PIDPressure | False | kubelet has sufficient PID available
Ready | True | kubelet is posting ready status

Does the driver perhaps get unregistered when the liveness-probe restarts the secrets-store? If so that would explain the reason the driver doesn't show up, but not why the disconnect occurs. Here are the logs in the interim before the liveness-probe causes the secrets-store to restart:

liveness-probe
I0708 20:05:46.704921       1 main.go:149] calling CSI driver to discover driver name
I0708 20:05:46.706755       1 main.go:155] CSI driver name: "secrets-store.csi.k8s.io"
I0708 20:05:46.706771       1 main.go:183] ServeMux listening at "0.0.0.0:9808"
E0708 20:06:23.124681       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded
W0708 20:06:30.124690       1 connection.go:172] Still connecting to unix:///csi/csi.sock
E0708 20:06:38.124552       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded
W0708 20:06:40.125086       1 connection.go:172] Still connecting to unix:///csi/csi.sock
W0708 20:06:45.124832       1 connection.go:172] Still connecting to unix:///csi/csi.sock
W0708 20:06:50.124533       1 connection.go:172] Still connecting to unix:///csi/csi.sock
E0708 20:06:53.125344       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded
W0708 20:06:55.124562       1 connection.go:172] Still connecting to unix:///csi/csi.sock



registrar:
I0708 20:05:46.458101       1 main.go:113] Version: v2.2.0
I0708 20:05:46.458632       1 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0708 20:05:46.458663       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0708 20:05:46.459836       1 main.go:144] Calling CSI driver to discover driver name
I0708 20:05:46.459868       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0708 20:05:46.459875       1 connection.go:183] GRPC request: {}
I0708 20:05:46.462768       1 connection.go:185] GRPC response: {"name":"secrets-store.csi.k8s.io","vendor_version":"v0.0.23"}
I0708 20:05:46.462835       1 connection.go:186] GRPC error: <nil>
I0708 20:05:46.462844       1 main.go:154] CSI driver name: "secrets-store.csi.k8s.io"
I0708 20:05:46.463254       1 node_register.go:52] Starting Registration Server at: /registration/secrets-store.csi.k8s.io-reg.sock
I0708 20:05:46.464697       1 node_register.go:61] Registration Server started at: /registration/secrets-store.csi.k8s.io-reg.sock
I0708 20:05:46.464756       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0708 20:05:46.495006       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0708 20:05:46.527450       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
E0708 20:06:04.785235       1 connection.go:131] Lost connection to unix:///csi/csi.sock.



secrets-store:
I0708 20:05:46.669549       1 exporter.go:33] metrics backend: prometheus
I0708 20:05:47.626534       1 secrets-store.go:74] Driver: secrets-store.csi.k8s.io 
I0708 20:05:47.626556       1 secrets-store.go:75] Version: v0.0.23, BuildTime: 2021-06-10-18:14
I0708 20:05:47.626562       1 secrets-store.go:76] Provider Volume Path: /etc/kubernetes/secrets-store-csi-providers
I0708 20:05:47.626567       1 secrets-store.go:77] GRPC supported providers will be dynamically created
I0708 20:05:47.626582       1 driver.go:80] "Enabling controller service capability" capability="CREATE_DELETE_VOLUME"
I0708 20:05:47.626602       1 driver.go:90] "Enabling volume access mode" mode="SINGLE_NODE_READER_ONLY"
I0708 20:05:47.626609       1 driver.go:90] "Enabling volume access mode" mode="MULTI_NODE_READER_ONLY"
I0708 20:05:47.626850       1 server.go:111] Listening for connections on address: //csi/csi.sock
I0708 20:05:47.626900       1 main.go:157] starting manager

top during that time shows the csi driver using a little more CPU:

secret-store-secrets-store-csi-driver-47xvd   3m           10Mi            
secret-store-secrets-store-csi-driver-7fzng   3m           19Mi   

but there is still no pressure

@aramase
Copy link
Member

aramase commented Jul 12, 2021

@jddexxx Thank you for the details. This looks very similar to kubernetes-csi/node-driver-registrar#139. The node-driver-registrar is the component that registers the driver with kubelet. Seems like there is no retry in place when the connection is lost and it's only resolved when the pod is restarted.

@jddexxx
Copy link
Author

jddexxx commented Jul 12, 2021

I can confirm that after simply restarting the pod by deleting it and letting the replicaset run it again (not redeploying), it works. I'll keep a note in the stack deployment manifest that this will likely need to be manually restarted after each deployment. I didn't try this originally as I assumed restarting would be the same as redeploying.

Thank you for your help, I will not close this issue for now as this is not a proper solution - it seems very odd that it would work only after a restart. Perhaps some connection retry facility would work as you mentioned.

@jddexxx jddexxx changed the title kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers secrets-store stalling and dropping connection to registrar, liveness-probe after 30-45sec of deployment, only works correctly after container is restarted Jul 12, 2021
@jddexxx jddexxx changed the title secrets-store stalling and dropping connection to registrar, liveness-probe after 30-45sec of deployment, only works correctly after container is restarted secrets-store dropping connection to registrar, liveness-probe after 30-45sec of deployment, only works correctly after container is restarted Jul 12, 2021
@aramase aramase added this to Backlog in Roadmap Aug 3, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
Roadmap
Backlog
Development

No branches or pull requests

6 participants