Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlushFrequency config type #107618

Merged

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jan 18, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

The intention for Kubernetes 1.23 was to support flushFrequency: 60 and treat
that as seconds. But time.Duration counts nanoseconds. Because the struct is
embedded inside the beta KubeletConfiguration, we cannot switch to a type that
serializes differently (like metav1.Duration) and therefore (for now) just
update the documentation to match the implementation from Kubernetes 1.23.

Special notes for your reviewer:

This was found in #105797 while working on a JSON unmarshaling unit test for LoggingConfiguration.

Does this PR introduce a user-facing change?

kubelet: The `flushFrequency` field in the kubelet configuration was meant to accept an integer counting seconds, but the actual implementation used nanonseconds. Configuration files that use, for example, `flushFrequency: 60` for 60s must be updated to `flushFrequency: 60000000000`.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 18, 2022
@pohly
Copy link
Contributor Author

pohly commented Jan 18, 2022

/sig instrumentation

@pohly pohly force-pushed the log-flush-frequency-config-type branch from 5838bd4 to 0246bcd Compare January 18, 2022 12:25
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 18, 2022
@pohly pohly changed the title LogflushFrequency config type FlushFrequency config type Jan 18, 2022
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@pohly
Copy link
Contributor Author

pohly commented Jan 18, 2022

This could also be solved with metav1.Duration, albeit without support for flushFrequency: 60. Would it be okay to do that?

It would be a much simpler solution.

@pohly pohly force-pushed the log-flush-frequency-config-type branch from 0246bcd to b8ea09c Compare January 18, 2022 16:21
@fedebongio
Copy link
Contributor

/cc @jpbetz
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 18, 2022
@pohly pohly force-pushed the log-flush-frequency-config-type branch from b8ea09c to c086919 Compare January 19, 2022 07:14
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 19, 2022
@pohly pohly force-pushed the log-flush-frequency-config-type branch from c086919 to 1838788 Compare January 19, 2022 07:16
@pohly
Copy link
Contributor Author

pohly commented Jan 19, 2022

This could also be solved with metav1.Duration, albeit without support for flushFrequency: 60.

I switched to that and updated the PR description.

The number of users of flushFrequency: 60 should be so small, not breaking those does not justify adding an entire new type that doesn't offer that much actual additional functionality. metav1.Duration.UnmarshalJSON could be extended to handle a number as input, but even that looks like overkill.

The original code with a new resource.Duration is in https://github.com/pohly/kubernetes/commits/log-flush-frequency-config-type-resource.

@liggitt liggitt added this to In progress in API Reviews Jan 20, 2022
@liggitt
Copy link
Member

liggitt commented Jan 20, 2022

echoing from slack:

I'm a pretty strong -1 on breaking decoding of fields exposed in beta+ config files.

the main option I see is to add a parallel field named something like flushFrequencyDuration that accepts duration, mark flushFrequency as deprecated, and translate values in that field into seconds in flushFrequencyDuration

changing the existing field to a string would mean anything using 1.23 structs to generate or read/modify config would break on a serialized config containing a string value for that field

flushFrequency: 5000000000
flushFrequency: 5s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the field is used in a beta API, there are beta stability expectations for this field.

Comment on lines 96 to 98
// Maximum number of seconds between log flushes. Ignored if the
// selected logging backend writes log messages without buffering.
FlushFrequency time.Duration
// The maximum delay between log flushes, specified as a string in the
// format understood by time.ParseDuration ("1m", "1h30m"). Ignored if
// the selected logging backend writes log messages without buffering.
FlushFrequency metav1.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the type, we will break things that expect the API to work as released in 1.23. We should consider instead adding a new field with the desired type and deprecate the old one, e.g. FlushFrequencyDuration as @liggitt suggested.

Copy link
Contributor Author

@pohly pohly Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a future improvement to make the configuration more user-friendly. I want to keep this bug fix PR simple and not spend more time on it than necessary (mine and of reviewers).

So let's make the documentation match the implementation from 1.23. I'll update the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehashman: update pushed, please take another look.

@liggitt liggitt moved this from In progress to Changes requested in API Reviews Jan 20, 2022
The intention for Kubernetes 1.23 was to support `flushFrequency: 60` and treat
that as seconds. But time.Duration counts nanoseconds. Because the struct is
embedded inside the beta KubeletConfiguration, we cannot switch to a type that
serializes differently (like metav1.Duration) and therefore (for now) just
update the documentation to match the implementation from Kubernetes 1.23.
@pohly pohly force-pushed the log-flush-frequency-config-type branch from 85cec62 to f9c8a9d Compare January 20, 2022 21:05
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 20, 2022
@liggitt liggitt moved this from Changes requested to In progress in API Reviews Jan 20, 2022
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel
/kind documentation
/lgtm

/assign @liggitt

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 26, 2022
@liggitt
Copy link
Member

liggitt commented Jan 26, 2022

therefore (for now) just update the documentation to match the implementation from Kubernetes 1.23.

There will never be a more convenient or less impactful time to resolve this than now. Given the other discussion about settling on a single stable serializable version of this struct, do we really want to leave this as-is? Marking this field as deprecated, setting it to omitempty so it doesn't get serialized when unset, printing a warning when it is used, and using a peer flushFrequencyDuration field that parses durations as originally proposed seems better long term

@liggitt
Copy link
Member

liggitt commented Jan 26, 2022

(I don't object to fixing the documentation, but I couldn't tell if that was all that was planned here)

@pohly
Copy link
Contributor Author

pohly commented Jan 26, 2022

There will never be a more convenient or less impactful time to resolve this than now.

I don't think that this really affects many users in practice. I understand that we must not break serialization because that might break tools even when this feature is not actively set (serialize defaults, restore later), but I doubt that there are many users who need the nicer serialization with a string instead of nanoseconds. Therefore implementing that improvement had a lower priority for me.

Given the other discussion about settling on a single stable serializable version of this struct, do we really want to leave this as-is?

My intention was to tackle this in this order:

  • fix the documentation to match the implementation (this PR)
  • convert to a single struct (the other PR)
  • change the serialization as you proposed

From a compatibility perspective it doesn't matter because we still need to be compatible with the same JSON fromat as in 1.23, but it would be less code that needs to be changed (only one struct).

@pohly
Copy link
Contributor Author

pohly commented Jan 27, 2022

@liggitt: I've implemented your proposal in https://github.com/pohly/kubernetes/commits/log-flush-frequency-new-field, with commits in the order that I proposed above (first this PR, ready for backporting to 1.23, then the open single-struct API PR for 1.24, and finally the flush frequency enhancement, also for 1.24).

@liggitt
Copy link
Member

liggitt commented Feb 8, 2022

/lgtm
/approve

for doc fix as first in the chain of updates

@liggitt liggitt moved this from In progress to API review completed, 1.24 in API Reviews Feb 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Feb 8, 2022
@pohly
Copy link
Contributor Author

pohly commented Feb 8, 2022

/retest

@k8s-ci-robot k8s-ci-robot merged commit 131d145 into kubernetes:master Feb 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Status: API review completed, 1.24
Development

Successfully merging this pull request may close these issues.

None yet

9 participants