Skip to content

fix/indexed-search: add livenessProbe to zoekt-webserver to prevent hung pods#850

Open
devdinu wants to merge 1 commit intomainfrom
03-27-fix_indexed-search_add_liveness_probe_to_zoekt-webserver_container
Open

fix/indexed-search: add livenessProbe to zoekt-webserver to prevent hung pods#850
devdinu wants to merge 1 commit intomainfrom
03-27-fix_indexed-search_add_liveness_probe_to_zoekt-webserver_container

Conversation

@devdinu
Copy link
Copy Markdown

@devdinu devdinu commented Mar 27, 2026

zoekt-webserver was unhealthy for long time

NAME               READY   STATUS    RESTARTS   AGE    
indexed-search-0   1/2     Running   0          6h18m  
indexed-search-1   1/2     Running   0          6h19m  
indexed-search-2   1/2     Running   0          6h21m  
indexed-search-3   1/2     Running   0          6h23m  
indexed-search-4   1/2     Running   0          6h25m  
indexed-search-5   1/2     Running   0          6h27m  
indexed-search-6   1/2     Running   0          6h29m  
indexed-search-7   1/2     Running   0          6h31m  
  Warning  Unhealthy  4m20s (x3300 over 4h39m)  kubelet  Readiness probe failed: Get "http://192.168.11.20:6070/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

ref https://linear.app/sourcegraph/issue/PLAT-509/incident-indexed-search-pods-were-unhealthy-for-long-time PLAT-509

Checklist

Test plan

  • Manual Verification - local with kind cluster
Screenshot 2026-03-27 at 11 52 41 AM
  • Tested against cloud-dev-qa with mi2 generate kustomize and kustomize apply

Copy link
Copy Markdown
Author

devdinu commented Mar 27, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

…ung pods

Without a livenessProbe, zoekt-webserver pods that crash silently remain
in an unhealthy state indefinitely. The probe acts as a backup to the
in-process watchdog (failureThreshold=10, period=60s > watchdog's 9x60s
detection window).

ref incident INC-484 https://sourcegraph.slack.com/archives/C0APJUXBG4R
@devdinu devdinu force-pushed the 03-27-fix_indexed-search_add_liveness_probe_to_zoekt-webserver_container branch from 2838bed to 8b403c5 Compare March 27, 2026 18:56
@devdinu devdinu requested review from a team March 27, 2026 18:56
@devdinu devdinu marked this pull request as ready for review March 27, 2026 18:59
Copy link
Copy Markdown
Member

@michaellzc michaellzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you quickly validate this on a cloud instance?

@eseliger eseliger requested review from a team and keegancsmith March 27, 2026 19:01
@devdinu
Copy link
Copy Markdown
Author

devdinu commented Mar 27, 2026

can you quickly validate this on a cloud instance?

Tested diff and applied against clouddev-qa, only change's indexed-search.

Screenshot 2026-03-27 at 1 51 19 PM

https://sourcegraph.slack.com/archives/C05DWT4ANHH/p1774644755678949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants