Skip to content

Latest commit

 

History

History
521 lines (376 loc) · 18.9 KB

File metadata and controls

521 lines (376 loc) · 18.9 KB

KEP-3140: TimeZone support in CronJob

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests for meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

CronJob creates Jobs based on the schedule specified by the author, but the Time Zone used during the creation depends on where kube-controller-manager is running. This proposal aims to extend CronJob resource with the ability for a user to define the TimeZone when a Job should be created.

Motivation

Not long after the introduction of CronJob in kubernetes a request was raised to support setting time zones. The initial response from SIG-Apps and SIG-Architecture at the time was that introducing such functionality would require cluster operator to manually include TimeZone database since golang did not have one. Starting from golang 1.15 time/tzdata package can be embedded in the binary thus removing the requirement for external database. At that time, the majority of the focus was towards moving CronJob to GA, so the effort to support TimeZone was again delayed. Now that we have CronJob fully GA-ed it's time to satisfy the original request.

Goals

  • Add the field .spec.timeZone which allows specifying a valid TimeZone name

Non-Goals

Proposal

Add the field .spec.timeZone to the CronJob resource. The cronjob controller will take the field into account when scheduling the next Job run. In case the field is not specified or is empty, the controller will maintain the current behavior, which is to rely on the time zone of the kube-controller-manager process.

Notes/Constraints/Caveats (Optional)

The current mechanism, which will still be the default behavior, heavily relies on the time zone of the kube-controller-manager process, which is hard for a regular user to figure out. Exposing an explicit field for setting a time zone will allow CronJob authors to simplify the user experience when creating CronJobs.

Risks and Mitigations

  • Outdated time zone in the golang might lead to wrong schedule times

This problem can be mitigated with a fresh build of kube-controller-manager with updated golang version, but it heavily relies on the go community to keep the time zone database up-to-date.

  • Malicious user can create multiple CronJobs with different time zone which can actually trigger Jobs at the exact same time

Cluster administrators should enforce quota to ensure not that many Jobs and CronJobs can be created per user.

Design Details

CronJob API

The .spec for a CronJob is expanded with a new timeZone field which allows specifying the name of the time zone to be used, the list of valid time zones can be found in tz database. Missing or empty value of the field indicates the current behavior, which relies on the time zone of the kube-controller-manager process.

In the API code, that looks like:

type CronJobSpec struct {

    // The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
    Schedule string

    // Time zone for the above schedule
    TimeZone *string

}

The value provided in TimeZone field will be validated against the embedded golang timezone database, which will result in the kube-apiserver and kube-controller-manager binaries growing by roughly extra 500kB.

CronJob controller

CronJob controller will use non-nil, non-empty value from TimeZone field when parsing the schedule and during scheduling the next run time. Additionally, the time zone will be reflected in the .status.lastSuccessfulTime and .status.lastScheduleTime. In all other cases the controller will maintain the current behavior.

Test Plan

Unit and integration tests covering the time zone mechanics of CronJob, including:

  • defaulting
  • validation
  • creating CronJob
  • updating CronJob

Additionally, all of tests will be performed with feature gate enabled and disabled.

Graduation Criteria

Alpha

  • Functionality implemented behind feature gate:
    • TimeZone field added to API (kube-apiserver)
    • CronJob controller reacting to new field (kube-controller-manager)

Beta

More TBD

GA

TBD

Upgrade / Downgrade Strategy

  • Upgrades

When upgrading from a release without this feature to a release with TimeZone users should not notice any change in behavior. The default value for TimeZone is nil, which is equal to current behavior for backwards compatibility.

  • Downgrades

When downgrading from a release with this feature, to a release without TimeZone there are a few cases:

  1. If TimeZone feature gate was enabled and user specified a TimeZone, the newly created Jobs after downgrade will return to previous behavior, as if no TimeZone was ever specified.
  2. If TimeZone feature gate was enabled and user did not specify TimeZone, there should be no change in behavior.
  3. If TimeZone feature gate was not enabled there should be no change in behavior.

Irrespectively of the option, cluster administrator should monitor cronjob_job_creation_skew which reports the skew between schedule and actual job creation.

Version Skew Strategy

This feature has no node runtime implications.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: CronJobTimeZone
    • Components depending on the feature gate: kube-apiserver, kube-controller-manager
Does enabling the feature change any default behavior?

No, the default behavior is maintained independently of the feature gate.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, the feature can be disabled. The outcome of disabling will depend if the new field was set by the user:

  1. If there was no value set in TimeZone field there will be no change in behavior.
  2. If the user set a valid TimeZone, the newly created Jobs will be triggered as if the field was never set.
What happens if we reenable the feature if it was previously rolled back?

The controller will start reading the new field.

Are there any tests for feature enablement/disablement?

Yes, both units and integration tests for enablement, disablement and transitions.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

An upgrade flow can be vulnerable to the enable, disable, enable if you have a lease that is acquired by a new kube-controller-manager, then an old kube-controller-manager, then a new kube-controller-manager.

What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
  • Events
    • Event Reason:
  • API .status
    • Condition name:
    • Other field:
  • Other (treat as last resort)
    • Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Metrics
    • Metric name: cronjob_controller_rate_limiter_use
    • Components exposing the metric: kube-controller-manager
  • Other (treat as last resort)
    • Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?

No new API calls are expected.

Will enabling / using this feature result in introducing new API types?

Yes, .spec.timeZone will be present in CronJob API.

Will enabling / using this feature result in any new calls to the cloud provider?

No new calls to cloud provider are expected.

Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes.

  • API type(s): CronJob
  • Estimated increase in size: new field in CronJob spec up to 50 bytes.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No increase it existing SLIs/SLOs is expected.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

Additional CPU and memory increase in the kube-controller-manager is negligible since the current schedule parsing is already covering time zone specification. We're not using it, yet.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

Another approach was to specify time zone as an offset to UTC, but using the name instead seems more user friendly.

Infrastructure Needed (Optional)