Skip to content

k8s operator service improvements #26772

@pmbrull

Description

@pmbrull
  1. The omjob operator is not cleaning up pods after the "ttlSecondsAfterFinished" is over. It runs indefinitely as per the event log
 Normal ExitHandlerCreated 46s   omjob-operator Created exit handler pod: om-job-123b421c-357b-49a2-9e8d-7fd6dfa23a2f-5decd8bf-exit
 Normal ExitHandlerCreated 36s   omjob-operator Created exit handler pod: om-job-123b421c-357b-49a2-9e8d-7fd6dfa23a2f-5decd8bf-exit
 Normal ExitHandlerCreated 26s   omjob-operator Created exit handler pod: om-job-123b421c-357b-49a2-9e8d-7fd6dfa23a2f-5decd8bf-exit
 Normal ExitHandlerCreated 16s   omjob-operator Created exit handler pod: om-job-123b421c-357b-49a2-9e8d-7fd6dfa23a2f-5decd8bf-exit
 Normal ExitHandlerCreated 6s    omjob-operator Created exit handler pod: om-job-123b421c-357b-49a2-9e8d-7fd6dfa23a2f-5decd8bf-exit

This happens regardless of the main pods lifecycle (completion or failure). This creates a huge chunk of completed or Error pods in our cluster waiting to be deleted.

  1. Lets add the ability to handle tolerations on ingestion pod creation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Status

In Progress 🏗️

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions