Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE Optional CronJob Parameter: Auto-Suspend Kubernetes CronJob After Specified Number of Successful Executions #124473

Open
jangel97 opened this issue Apr 23, 2024 · 9 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@jangel97
Copy link

jangel97 commented Apr 23, 2024

What would you like to be added?

/sig apps
/assign

Want to get feedback on this proposal. I can proceed to KEP process later, if it looks worthy for it to everyone.

Feature Summary
This RFE proposes adding an optional parameter to Kubernetes CronJobs that allows setting a maximum number of successful executions for a job, after which the job automatically is suspended. This feature would help manage resource usage (optionally) and ensure tasks do not run indefinitely and benefit those use cases where you want to check something for few times but not forever, similar to the 'at' command but within a Kubernetes environment.

Context
While Kubernetes supports scheduled jobs via CronJobs, there is currently no native feature to automatically stop these jobs after a certain number of successful completions. This enhancement would allow users to set limits on job executions, which is interesting in some cases.

Technical Details
Implementation could involve extending the CronJob specification to include an .spec.successfulJobCountLimit optional attribute, which defines the maximum number of successful executions before the crojob suspends. Kubernetes' job controller would need to be enhanced to track the count of successful executions and update the job's active status accordingly.

Example of cronjob using this optional parameter

apiVersion: batch/v1
kind: CronJob
metadata:
  name: example-cronjob
  namespace: default
spec:
  schedule: "0 1 * * *"
  successfulJobHistoryLimit: 3          # This keeps the last three successful job records
  successfulJobCountLimit: 5            # Hypothetical property to suspend the CronJob after 5 successful executions
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Why is this needed?

Consider a scenario where temporary cronjobs are created dynamically and jobs are scheduled for specific operational checks or data processing tasks that only need to run a predetermined number of times. Without a native feature to limit executions, developers must implement additional logic to track and suspend the cronjobs.

This feature is particularly beneficial in scenarios such as:

  • Jobs that are relevant only for a specific duration or a limited number of executions.
  • Certain tasks may need to be run a specific number of times to meet regulatory requirements.
  • Environments where jobs need to run a finite number of times to test and validate software under controlled conditions.

Some notes

Having a finish date/time, instead of force X number of executions is a potential other solution that could work for this use-case.

Practical use case

In my practical use case, my backend needs to monitor if an Instagram story has been removed. Since Instagram stories are ephemeral and last only 24 hours, I need a system that can perform checks several times within that period. The checks are triggered whenever a new story is detected.

I utilize Kubernetes cronjobs to manage these checks, as Kubernetes provides reliable scheduling to ensure the tasks are executed on time. However, the challenge arises with the cleanup of these dynamically generated cronjobs. As new stories can be added at any time, cronjobs can accumulate rapidly, leading to clutter and unnecessary use of resources and inconsistent results if these are not marked to run for few times during the day. Manually cleaning up these cronjobs isn't practical, especially when managing them in large numbers, which could easily exceed 100.

To address this issue systematically and possibly assist others in the community facing similar challenges, I proposed a Request for Enhancement (RFE). This proposal aims to explore solutions for automating the cleanup process of dynamically generated cronjobs in Kubernetes environments.

@jangel97 jangel97 added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 23, 2024
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 23, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sftim
Copy link
Contributor

sftim commented Apr 23, 2024

My key question: why should we build this into Kubernetes (and not, for example, let someone write an external controller that achieves the same outcome)?

@jangel97
Copy link
Author

jangel97 commented Apr 23, 2024

hey @sftim,

In my opionion, integrating a maximum successful execution limit for CronJobs directly into Kubernetes offers significant advantages over external controllers, enhancing the user experience seamlessly. This native feature simplifies management by removing the need for bespoke controllers or additional logic, thereby reducing complexity and error potential.

A native implementation also mitigates the resource overhead linked with external controllers, such as operational and maintenance costs, and potential performance degradation from managing multiple controllers. By incorporating this feature directly into Kubernetes, we foster community contributions and continuous improvement, ensuring robustness under diverse operational conditions.

The feature is designed as an optional parameter that activates only when configured by the user (with a default value of 0, for example), ensuring it does not impact existing CronJobs. This approach offers flexibility, allowing users to adapt the feature as needed without disrupting existing setups and preserving Kubernetes' commitment to modularity and user-driven configuration.

This optional parameter would benefit use cases like mine, where automation dynamically creates cronjobs to monitor specific conditions over short periods. Kubernetes, with its distributed architecture and reliable scheduling, is ideally suited for this. The proposed feature would automatically suspend these cronjobs after their intended number of executions, streamlining management and enhancing efficiency.

Consider the built-in Jobs history limits in Kubernetes, which already streamline job management and could have been handled by an external controller. However, having this functionality built into Kubernetes provides a more convenient, reliable, and cohesive experience. This proposed feature follows a similar rationale, embedding essential functionality directly within Kubernetes to enhance user convenience.

@sftim
Copy link
Contributor

sftim commented Apr 23, 2024

If the only benefit of it not being an external controller is not needing an external controller, I think you'll want a stronger case @jangel97.

You could build an out-of-tree prototype to show how it might work. Would you be willing to do that?

@jangel97
Copy link
Author

@sftim,

The proposal not only removes the requirement for an external controller but also enhances functionality to support a variety of potential use cases. For example, including this feature in Kubernetes cronjobs would perfectly address my requirements, and likely those of others. Take the implementation of Job history limits as another instance; it was introduced to meet a common need, providing multiple users with a native solution to manage job subresources as per need.

When you mention an out-of-tree prototype, are you suggesting a demonstration of how the proposed implementation would work? If so, I'm on board with that. Additionally, I'm keen to collaborate on the implementation tasks, provided that there's agreement on the viability of this proposal.

@sftim
Copy link
Contributor

sftim commented Apr 23, 2024

@jangel97 you should take this to a SIG Apps Zoom meeting (or Slack conversation) and test the water to see if other people are sold on adding more code to maintain.

If they're not, it'll likely find a better home out of tree. If there is interest, the next step is to find a project manager (KEP owner) willing to supervise delivery for the enhancement.

@jangel97
Copy link
Author

Hey @sftim ! Thank you for letting me know about this!

@jwhittem
Copy link

I agree with @sftim this seems like an opportunity to write something outside of the k8s code base, I could see this wanting more and more features that would be more quickly approved in another project anyhow. Creating a new type of 'job' object with this project, so that if there was enough adoption, perhaps this could merge into the k8s codebase some day. IMO, this use case is not so common.

@jangel97
Copy link
Author

@jwhittem, I agree that if this RFE doesn't receive sufficient support from the community, due to it being a niche use case, it would be more appropriate to integrate this setting into another project that more specifically focuses on job time scheduling (or just doing my custom thing for myself).

As for future RFEs concerning this setting, I find it hard to foresee any, as the setting appears to be quite distinct and standalone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Status: Needs Triage
Development

No branches or pull requests

4 participants