New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bury KubeletConfiguration.ConfigTrialDuration for now #59628

Merged
merged 1 commit into from Feb 10, 2018

Conversation

@mtaufen
Contributor

mtaufen commented Feb 9, 2018

Based on discussion in https://github.com/kubernetes/kubernetes/pull/53833/files#r166669046, this PR chooses not to expose a knob for the trial duration yet. It is unclear exactly which shape this functionality should take in the API.

The alpha KubeletConfiguration.ConfigTrialDuration field is no longer available.
@mtaufen

This comment has been minimized.

Contributor

mtaufen commented Feb 9, 2018

For context, here is a re-print of the discussion on the kubeletconfig-to-beta PR:

liggitt 2 days ago Member
if this is only used for dynamic config, doesn't it make more sense in the Node ConfigSource?

mtaufen 2 days ago Member
I think this is a question of whether we should associate the measure of risk tolerance with the config being applied to a Node (KubeletConfiguration.ConfigTrialDuration), or with the Node the config is being applied to (Node.ConfigSource.ConfigTrialDuration). Now that you bring it up, I'm not sure I have a clear winner:

KubeletConfiguration.ConfigTrialDuration:

  • + Directly associates idea of "risk" with a configuration, which makes some sense because whatever is in the configuration probably determines how risky it is.
  • - Feels weird because, as you mentioned, it only affects config from a specific delivery mechanism.
  • +/- Simpler, in some ways, but less flexible (see below).

Node.ConfigSource.ConfigTrialDuration:

  • + Can set the baseline for how "skeptical" a Node is of new configs, without having to determine the "risk" of each configuration.
  • +/- Can technically achieve the same effect as KubeletConfiguration.ConfigTrialDuration by changing the assigned config and trial duration simultaneously, though you'd have to maintain that (config, duration) association independent of the config.
  • - Makes the NodeConfigSource API more complex, e.g. what does it mean to change the trial duration without changing the assigned config? I wonder if it would be confusing for a Kubelet to say: "Ah, config passed trial, now it's my last-known-good. Oh, you increased the trial? Guess now it's not my last known good anymore." This scenario requires the Kubelet to remember current and previous last-known-good configs, so that it can go back and forth when the trial duration changes. I worry this could be a confusing API (more implicit state and a parallel timeline).
  • + Makes some sense when you think about higher-level declarative APIs that control config across pools of nodes; you probably want to allow these APIs to control config policy without having to edit serialized payloads.

As third option, we could consider Node.ConfigSource.LastKnownGood:

  • +/- Elevates the problem of last-know-good selection to higher-level APIs. Inversion of control = inversion of responsibility.
  • + Completely explicit.
  • - Kubelet is maybe less robust by default, though default fallback to "provisioned with" config is probably plenty good in circumstances where people have fungible VMs.

As a fourth option, we could remove the external knob altogether, have the Kubelet just use 10 minutes as an internal default, and decide where to expose control of last-known-good later on (I don't really like kicking the can down the road, but it's an option).

WDYT?

liggitt 12 hours ago • edited Member
I guess I have a hard time understanding how I would choose a value for this. It doesn't seem like something you'd tune super carefully... whatever gets you past the "I'm crash looping on config X, better roll back", which I'm not sure is best represented by a duration (something like "completed startup operations and N successful sync loops" might be more meaningful)

liggitt 12 hours ago Member
Without a clear picture of how we'd recommend someone tune this, I'd probably omit it and keep it internal and low(ish) for now, but happy to hear opposing views.

mtaufen 7 hours ago Member
I think that's probably what we should do for now, simply because it would be silly to block Kubelet's component config going beta while we argue this out.

@mtaufen mtaufen referenced this pull request Feb 9, 2018

Merged

Graduate kubeletconfig API group to beta #53833

33 of 33 tasks complete
@liggitt

This comment has been minimized.

Member

liggitt commented Feb 10, 2018

moving internally for now LGTM

as an aside, there are normal conditions where the kubelet exits/restarts (waiting on client credential grants, or an available API server, etc). how does this interact with crashloops caused by things like that? could that trigger rolling back config because the kubelet thinks the config was responsible for the crashloop?

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 10, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 10, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 10, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mtaufen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 10, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@fejta-bot

This comment has been minimized.

fejta-bot commented Feb 10, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 10, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@mtaufen

This comment has been minimized.

Contributor

mtaufen commented Feb 10, 2018

Right now, this is just a grace-period after which the last-known-good updates. We removed crash-loop detection over the summer because we weren't confident the implementation covered the full spectrum of edge cases. So right now the only thing that can cause a fallback to last-known good is a failure to load the assigned config (corrupt checkpoint, invalid configuration, etc.).

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 10, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 9d33926 into kubernetes:master Feb 10, 2018

12 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation mtaufen authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

k8s-merge-robot added a commit that referenced this pull request Feb 15, 2018

Merge pull request #53833 from mtaufen/kubeletconfig-to-beta
Automatic merge from submit-queue (batch tested with PRs 59353, 59905, 53833). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Graduate kubeletconfig API group to beta

Regarding kubernetes/enhancements#281, this PR moves the kubeletconfig API group to beta. 

After #53088, the KubeletConfiguration type should not contain any deprecated or experimental fields, and we should not have to remove any more fields from the type before graduating it to beta. 

We need the community to double check for two things, however:
1. Are there any fields currently in the KubeletConfiguration type that you were going to mark deprecated this quarter, but haven't yet?
2. Are there any fields currently in the KubeletConfiguration type that are experimental or alpha, but were not explicitly denoted as such?

Please comment on this PR if you can answer "yes" to either of those two questions. Please cc anyone with a stake in the kubeletconfig API, so we get as much coverage as possible.

/cc @thockin @dchen1107 @Random-Liu @yujuhong @dashpole @tallclair @vishh @abw @freehan @dnardo @bowei @MrHohn @luxas @liggitt @ncdc @derekwaynecarr @mikedanese 

@kubernetes/sig-network-pr-reviews, @kubernetes/sig-node-pr-reviews 

```release-note
action required: The `kubeletconfig` API group has graduated from alpha to beta, and the name has changed to `kubelet.config.k8s.io`. Please use `kubelet.config.k8s.io/v1beta1`, as `kubeletconfig/v1alpha1` is no longer available. 
```

**TODO:**
- [x] Move experimental/non-gated-alpha/soon-to-be-deprecated fields to `KubeletFlags`
  - [x] #53088
  - [x] #54154
  - [x] #54160
  - [x] #55562
  - [x] #55983
  - [x] #57851
- [x] Lift embedded structure out of strings
  - [x] #53025
  - [x] #54643
  - [x] #54823
  - [x] #55254
- [x] Resolve relative paths against the location config files are loaded from
  - [x] #55648 
- [x] Rename to `kubelet.config.k8s.io`
- [x] Comments
  - [x] Make sure existing comments at least read sensibly.
  - [x] Note default values in comments on the versioned struct.
  - [x] Remove any reference to default values in comments on the internal struct.
- [x] Most fields should be `+optional` and `omitempty`. Add where necessary. ~Where omitted, explicitly comment.~ Edit: We should not distinguish between nil and empty, see below items.
- [x] Ensure defaults are specified via `pkg/kubelet/apis/kubelet.config.k8s.io/v1beta1/defaults.go`, not `cmd/kubelet/app/options/options.go`.
  - [x] #57770
- [x] Ensure kubeadm does not persist v1alpha1 KubeletConfiguration objects (or feature-gates this functionality)
- [x] Don't make a distinction between empty and nil, because of #43203.
  - [x] #59515
  - [x] #59681
- [x] Take the opportunity to fix insecure Kubelet defaults @tallclair 
  - [x] #59666
- [x] Remove CAdvisorPort from KubeletConfiguration wrt #56523.
  - [x] #59580
- [x] Hide `ConfigTrialDuration` until we're more sure what to do with it.
   - [x] #59628
- [x] Fix `// default: x` comments after rebasing on recent changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment