Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations#26971
Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations#26971
Conversation
…ns support (#26772) Fix two bugs in the OMJob operator: - Exit handler pods were recreated indefinitely because findExitHandlerPod() lacked the name-based fallback that findMainPod() already had, causing label propagation delays to trigger repeated pod creation events - Terminal phase handler never rescheduled for TTL-based cleanup, so pods were never cleaned up after ttlSecondsAfterFinished expired Add tolerations support for ingestion pod scheduling across the full stack: - Operator: OMJobPodSpec field, PodManager.buildPod(), CRD schema - Server: OMJob model, K8sPipelineClientConfig parsing, K8sPipelineClient builder, K8sJobUtils serialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes two operational issues in the OMJob Kubernetes operator (exit handler pod recreation loop and missing TTL-based cleanup rescheduling) and adds end-to-end tolerations support for ingestion pod scheduling.
Changes:
- Add a name-based fallback when discovering the exit handler pod to prevent repeated creation when label propagation is delayed.
- Reschedule reconciliation in terminal phases based on remaining TTL so pod cleanup happens when
ttlSecondsAfterFinishedexpires. - Introduce tolerations configuration and propagate it through the server-side OMJob builder, CRD schema, operator model, and pod creation.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/test/java/org/openmetadata/service/clients/pipeline/k8s/K8sPipelineClientConfigTest.java | Adds unit tests covering tolerations parsing and default/invalid cases. |
| openmetadata-service/src/main/java/org/openmetadata/service/clients/pipeline/k8s/OMJob.java | Extends OMJob pod spec model to include tolerations. |
| openmetadata-service/src/main/java/org/openmetadata/service/clients/pipeline/k8s/K8sPipelineClientConfig.java | Adds tolerations config key and parsing into V1Toleration objects. |
| openmetadata-service/src/main/java/org/openmetadata/service/clients/pipeline/k8s/K8sPipelineClient.java | Propagates tolerations into constructed Job/CronJob pod specs and OMJob specs. |
| openmetadata-service/src/main/java/org/openmetadata/service/clients/pipeline/k8s/K8sJobUtils.java | Serializes tolerations into the OMJob pod spec map when present. |
| openmetadata-k8s-operator/src/test/java/org/openmetadata/operator/unit/CRDSchemaValidationTest.java | Ensures tolerations exists in the CRD schema and Java model field set. |
| openmetadata-k8s-operator/src/test/java/org/openmetadata/operator/service/PodManagerTest.java | Adds tests for tolerations application and exit handler pod name-fallback lookup behavior. |
| openmetadata-k8s-operator/src/test/java/org/openmetadata/operator/controller/OMJobReconcilerTest.java | Adds reconciler tests for exit handler create-once behavior and TTL rescheduling/cleanup. |
| openmetadata-k8s-operator/src/main/resources/crds/omjob-crd.yaml | Adds tolerations schema to both main and exit handler pod specs. |
| openmetadata-k8s-operator/src/main/java/org/openmetadata/operator/service/PodManager.java | Adds name-based fallback to find exit handler pods and applies tolerations to pod specs. |
| openmetadata-k8s-operator/src/main/java/org/openmetadata/operator/model/OMJobSpec.java | Extends operator-side OMJob pod spec model with tolerations field. |
| openmetadata-k8s-operator/src/main/java/org/openmetadata/operator/controller/OMJobReconciler.java | Reschedules reconciliation in terminal phases until TTL expiry; cleans up when expired. |
...ice/src/main/java/org/openmetadata/service/clients/pipeline/k8s/K8sPipelineClientConfig.java
Show resolved
Hide resolved
🟡 Playwright Results — all passed (21 flaky)✅ 3597 passed · ❌ 0 failed · 🟡 21 flaky · ⏭️ 207 skipped
🟡 21 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
...etadata-k8s-operator/src/main/java/org/openmetadata/operator/controller/OMJobReconciler.java
Show resolved
Hide resolved
Adds the tolerations config binding so the server picks up the K8S_TOLERATIONS env var set by the Helm chart secret. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
...etadata-k8s-operator/src/main/java/org/openmetadata/operator/controller/OMJobReconciler.java
Show resolved
Hide resolved
- Remove redundant server-created pod selector fallback in findMainPod() since buildPodSelector() now matches all pods by omjob-name alone - Add null guard for getItems() in deletePods() to prevent NPE - Update local test values for namespace and image config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review ✅ Approved 1 resolved / 1 findingsFixes the Kubernetes operator exit handler pod loop and TTL cleanup, adding tolerations support. Resolves the TTL boundary race condition where pods were never cleaned at exact expiry second. ✅ 1 resolved✅ Edge Case: TTL boundary race: pods never cleaned at exact expiry second
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| if (map.get("tolerationSeconds") != null) { | ||
| toleration.setTolerationSeconds( | ||
| Long.parseLong(map.get("tolerationSeconds").toString())); | ||
| } | ||
| result.add(toleration); |
There was a problem hiding this comment.
parseTolerations wraps the whole parsing loop in a single try/catch, so a single malformed field (e.g., non-numeric tolerationSeconds) will throw and cause the method to return an empty list, discarding any tolerations already parsed. Consider handling parse errors per-item/per-field (e.g., catch NumberFormatException around tolerationSeconds) so valid tolerations are still applied while skipping only invalid entries.
| LOG.info("TTL expired for OMJob: {}, cleaning up pods", omJob.getMetadata().getName()); | ||
| podManager.deletePods(omJob); | ||
|
|
||
| eventPublisher.publishNormalEvent( | ||
| omJob, "ResourcesCleanedUp", "Cleaned up pods due to TTL expiration"); |
There was a problem hiding this comment.
deletePods is called here, but PodManager.deletePods swallows exceptions internally. That means reconciliation will proceed to publish ResourcesCleanedUp even if deletion failed. Consider having deletePods return a boolean (or throw) and only publishing the cleanup event on success.
|
|
||
| return UpdateControl.<OMJobResource>noUpdate(); | ||
| } |
There was a problem hiding this comment.
After attempting TTL cleanup, the reconciler returns noUpdate() with no reschedule. If pod deletion fails due to a transient API error, the operator won't retry cleanup until the next external reconcile trigger. Consider rescheduling after a short delay when cleanup fails (and only stopping reconciliation once deletion succeeds).
| List<Pod> pods = client.pods().inNamespace(namespace).withLabels(selector).list().getItems(); | ||
| if (pods == null) { | ||
| pods = List.of(); | ||
| } | ||
| for (Pod pod : pods) { | ||
| String podName = pod.getMetadata().getName(); | ||
| client.pods().inNamespace(namespace).withName(podName).delete(); | ||
| LOG.info("Deleted pod: {}", podName); | ||
| } |
There was a problem hiding this comment.
This changes pod cleanup from a single label-selector delete to listing pods and deleting them one-by-one, which increases API calls and can become expensive when many pods exist (e.g., after prior reconciliation loops). Unless per-pod deletion is required for correctness, consider using withLabels(selector).delete() (or batch deletion) and rely on logging at a higher level.
| List<Pod> pods = client.pods().inNamespace(namespace).withLabels(selector).list().getItems(); | |
| if (pods == null) { | |
| pods = List.of(); | |
| } | |
| for (Pod pod : pods) { | |
| String podName = pod.getMetadata().getName(); | |
| client.pods().inNamespace(namespace).withName(podName).delete(); | |
| LOG.info("Deleted pod: {}", podName); | |
| } | |
| client.pods().inNamespace(namespace).withLabels(selector).delete(); | |
| LOG.info("Deleted pods for OMJob: {}", omJob.getMetadata().getName()); |
|
|
Changes have been cherry-picked to the 1.12.6 branch. |
…ns (#26971) * Fix k8s operator exit handler pod loop and TTL cleanup, add tolerations support (#26772) Fix two bugs in the OMJob operator: - Exit handler pods were recreated indefinitely because findExitHandlerPod() lacked the name-based fallback that findMainPod() already had, causing label propagation delays to trigger repeated pod creation events - Terminal phase handler never rescheduled for TTL-based cleanup, so pods were never cleaned up after ttlSecondsAfterFinished expired Add tolerations support for ingestion pod scheduling across the full stack: - Operator: OMJobPodSpec field, PodManager.buildPod(), CRD schema - Server: OMJob model, K8sPipelineClientConfig parsing, K8sPipelineClient builder, K8sJobUtils serialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add K8S_TOLERATIONS env var mapping in openmetadata.yaml Adds the tolerations config binding so the server picks up the K8S_TOLERATIONS env var set by the Helm chart secret. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add tolerations to k8s test values for local validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix cleanup * Address PR review: remove redundant pod lookup and guard null items - Remove redundant server-created pod selector fallback in findMainPod() since buildPodSelector() now matches all pods by omjob-name alone - Add null guard for getItems() in deletePods() to prevent NPE - Update local test values for namespace and image config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit cfd71e8)



requires open-metadata/openmetadata-helm-charts#492
Summary
findExitHandlerPod()was missing the name-based fallback thatfindMainPod()already had. When label propagation was delayed, the reconciler kept entering the "create" branch every 10 seconds, publishingExitHandlerCreatedevents indefinitely.handleTerminalPhase()returnednoUpdate()without rescheduling, so the operator never re-checked whenttlSecondsAfterFinishedexpired. Now it reschedules after the remaining TTL duration.Closes #26772
Test plan
OMJobReconcilerTest— verifies exit handler pod is not recreated when already found, TTL rescheduling for terminal phases, immediate cleanup when TTL expired, no reschedule without TTLPodManagerTest— verifies tolerations are applied to pod spec, name-based fallback for exit handler pod discovery, no fallback when status has no pod nameCRDSchemaValidationTest— verifies tolerations field exists in CRD schema and Java modelK8sPipelineClientConfigTest— verifies tolerations parsing from config maps, empty/invalid tolerations handling