Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader election lease not removed on shutdown #40

Closed
thomascube opened this issue Jun 22, 2023 · 8 comments
Closed

Leader election lease not removed on shutdown #40

thomascube opened this issue Jun 22, 2023 · 8 comments
Labels

Comments

@thomascube
Copy link

I'm running kubedock on OpenShift to enable Testcontainers within Tekton pipelines.

When using the --lock option, the first time kubedock server starts, it'll create the kubedock-lock lease. But when terminating kubedock (with kill -3), the lease remains and on the next run, the server hangs with the message "leaderelection.go:245] attempting to acquire leader lease esta-tekton-predev/kubedock-lock..." and the testcontainers fail because the Docker API is not available.

Maybe it would make sense to remove the lease when shutting down kubedock. Or is there something I'm missing when using the --lock option?

@thomascube
Copy link
Author

Update: I see this works when starting kubedock server from my local machine with full permissions on the target namespace. The lease is successfully released with holderIdentity: ''. This however does not happen when running kubedock from within a container (Tekton pipeline) with the privileges suggested in the README (I added the "update" verb for leases).

@joyrex2001
Copy link
Owner

Fixed the documentation with regards to RBAC required for locking. After adding the 'update' verb, I could not reproduce with kubectl run and kubectl delete pod. Removed the other verbs as well, as the lease implementation doesn't require these.

@thomascube
Copy link
Author

thomascube commented Jun 26, 2023

Thanks for updating the docs. I can confirm that the lease is returned correctly (having holderIdentity: '') when running kubedock as a pod and after deleting it. However, we're running kubedock inside a Tekton task, which essentially also is a pod, but for some reason, the lease remains unchanged once the process ends. This just leads to a delay when the next run starts and has to wait until it can aquire the lease. I'll keep on digging...

@joyrex2001
Copy link
Owner

Can you share the logs during start & exit of kubedock?

@thomascube
Copy link
Author

There's a difference in the shutdown behavior, depending how we run kubedock and how we terminate it.
When running as individual Pod (or Sidecar container inside a Tekton pipeline), the lease is correctly released.

When we start kubedock inside a running container and then stop it explicitly with kill $(pidof kubedock), the lease remains unchanged. I tried different term signals with the kill command (none, 1, 2, 4) but no difference. Here are the logs for both cases just described:

Logs when stopped with kill $(pidof kubedock)

$ kubedock server --lock -n tekton-dev --reverse-proxy --unix-socket /tmp/kubedock.sock -v 4 &
I0724 10:28:26.456072     703 main.go:28] kubedock 0.11.0 (20230524-110633)
I0724 10:28:26.546374     703 main.go:105] kubernetes config: namespace=tekton-dev, initimage=joyrex2001/kubedock:0.11.0, ready timeout=2m0s
I0724 10:28:26.547537     703 leaderelection.go:245] attempting to acquire leader lease tekton-dev/kubedock-lock...
I0724 10:28:26.562013     703 leaderelection.go:255] successfully acquired lease tekton-dev/kubedock-lock
I0724 10:28:26.562089     703 main.go:85] new leader elected: 4dec02674662
I0724 10:28:26.562333     703 main.go:129] reaper started with max container age 1h0m0s
I0724 10:28:26.562426     703 main.go:75] enabled reverse-proxy services via 0.0.0.0 on the kubedock host
I0724 10:28:26.562517     703 main.go:102] default image pull policy: ifnotpresent
I0724 10:28:26.562945     703 main.go:107] using namespace: tekton-dev
...
I0724 10:30:10.191123     703 main.go:175] exit signal recieved, removing pods, configmaps and services

Lease after kill

kind: Lease
apiVersion: coordination.k8s.io/v1
metadata:
  name: kubedock-lock
  namespace: tekton-dev
spec:
  holderIdentity: 4dec02674662
  leaseDurationSeconds: 60
  acquireTime: '2023-07-24T08:28:26.547551Z'
  renewTime: '2023-07-24T08:30:06.744343Z'
  leaseTransitions: 12

Logs when running as Pod (Tekton Sidecar)

... or when starting kubedock from my workstation and stopping with ctrl-c)

$ kubedock server --lock -n tekton-dev --reverse-proxy --unix-socket /tmp/kubedock.sock -v 4
I0724 10:36:11.109712   82406 main.go:28] kubedock 0.11.0 (20230524-110633)
I0724 10:36:11.117669   82406 main.go:105] kubernetes config: namespace=tekton-dev, initimage=joyrex2001/kubedock:0.11.0, ready timeout=1m0s
I0724 10:36:11.119303   82406 leaderelection.go:245] attempting to acquire leader lease tekton-dev/kubedock-lock...
I0724 10:36:11.474071   82406 main.go:85] new leader elected: 4dec02674662
I0724 10:37:15.758226   82406 main.go:85] new leader elected: 377cd6884941
I0724 10:37:15.758261   82406 leaderelection.go:255] successfully acquired lease tekton-dev/kubedock-lock
I0724 10:37:15.761447   82406 main.go:129] reaper started with max container age 1h0m0s
I0724 10:37:15.762991   82406 main.go:75] enabled reverse-proxy services via 0.0.0.0 on the kubedock host
I0724 10:37:15.763410   82406 main.go:102] default image pull policy: ifnotpresent
I0724 10:37:15.763520   82406 main.go:105] service account used in deployments: default
I0724 10:37:15.763547   82406 main.go:107] using namespace: tekton-dev
...
I0724 10:38:32.251400   82406 main.go:175] exit signal recieved, removing pods, configmaps and services
I0724 10:38:32.432919   82406 main.go:82] lost lock on namespace tekton-dev

Lease after pod termination

kind: Lease
apiVersion: coordination.k8s.io/v1
metadata:
  name: kubedock-lock
  namespace: tekton-dev
spec:
  holderIdentity: ''
  leaseDurationSeconds: 1
  acquireTime: '2023-07-24T08:38:32.371780Z'
  renewTime: '2023-07-24T08:38:32.371780Z'
  leaseTransitions: 13

@joyrex2001
Copy link
Owner

I worked a bit on the shutdown code, which might fixed your issue as well. Can you retest?

@thomascube
Copy link
Author

Great! I can confirm that with the latest version from Git master (tested with 24f9ef4, the lease is now released as expected when terminating the kubedock process as described earlier.

@joyrex2001
Copy link
Owner

Awesome, thanks for confirming. I will close this issue, and the fix will be included in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants