Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha Dynamic Kubelet Configuration #46254

Merged
merged 3 commits into from Aug 9, 2017
Merged

Conversation

mtaufen
Copy link
Contributor

@mtaufen mtaufen commented May 22, 2017

Feature: kubernetes/enhancements#281

This proposal contains the alpha implementation of the Dynamic Kubelet Configuration feature proposed in #29459 community/contributors/design-proposals/dynamic-kubelet-configuration.md.

Please note:

  • The proposal doc is not yet up to date with this implementation, there are some subtle differences and some more significant ones. I will update the proposal doc to match by tomorrow afternoon.
  • This obviously needs more tests. I plan to write several O(soon). Since it's alpha and feature-gated, I'm decoupling this review from the review of the tests. I've beefed up the unit tests, though there is still plenty of testing to be done.
  • I'm temporarily holding off on updating the generated docs, api specs, etc, for the sake of my reviewers 馃槃 these files now live in a separate commit; the first commit is the one to review.

/cc @dchen1107 @vishh @bgrant0607 @thockin @derekwaynecarr

Adds (alpha feature) the ability to dynamically configure Kubelets by enabling the DynamicKubeletConfig feature gate, posting a ConfigMap to the API server, and setting the spec.configSource field on Node objects. See the proposal at https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dynamic-kubelet-configuration.md for details.

@mtaufen mtaufen added area/kubelet area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 22, 2017
@mtaufen mtaufen added this to the v1.7 milestone May 22, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2017
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 22, 2017
if err != nil {
return err
}
s.KubeletConfiguration = kcToUseInternal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on the structure of this:

  • can we separate the dynamic config setup into a separate function/file to make the inputs (config and client) and outputs (controller and resulting config) clearer?
  • mutating the config in the KubeletServer object makes this hard to reason about. can you hold the dynamically fetched config locally and propagate it to where it needs to go instead?
  • I would expect all the dynamic config lookup to happen after the BootstrapKubeconfig handling... otherwise, bootstrapped kubelets won't have a client yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split that into another function in server.go.

KubeletServer already gets propagated to where the config is needed, and this concept is deeply threaded through the server.go code. I'm open to discussing whether this is the right flow to have, but I think it's out of scope for this PR - which is really more focused on the "obtain a valid config" part vs. the "pass that config around" part.

I'll look into moving the setup to after the BootstrapKubeconfig handling.


var ncc *nodeconfig.NodeConfigController
if useDynamicConfig {
client, err := getKubeClient(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail in TLS bootstrap cases where the kubelet doesn't have a client yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into moving this after the TLS bootstrap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.

Present "solution" (imperfect) is to do the bare minimum to get a client based on the config supplied at startup and initialize the config controller with this client, then allow the client to be regenerated once we have a validated config.

This still leaves the config controller using the old client for its sync-loop, which means the QPS, etc. parameters might not be uniform between the clients. I'll look into splitting starting the conifg sync-loop out from running the Run() function on the controller, so that any new client can be injected there when it is available.

Copy link
Member

@liggitt liggitt May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that TLS bootstrap shouldn't depend on the KubeletConfiguration, but who was I kidding, everything depends on the KubeletConfiguration ;). It needs ContentType, KubeAPIQPS, and KubeAPIBurst. This creates a mild chicken-egg problem.

I don't see those settings being required. From what I can see, the kubelet TLS bootstrap requires this information:

  • bootstrap kubeconfig file
  • path to write resulting kubeconfig to
  • dir to write resulting client certificate to
  • name of the node (which requires the following info to determine)
    • cloud provider
    • cloud provider config file
    • hostname override

Those files and paths don't seems like something you'd need to be dynamic, and the name of the node is needed to look up the dynamic config in the first place.

Copy link
Contributor Author

@mtaufen mtaufen May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I was erroneously thinking of "get a client" as somehow involved because I saw that path reading the kubeconfig file. It's the client getting that needs the API call config parameters I mentioned above.

The cert dir, cloud provider, and cloud provider config file path, however, are still on the KubeletConfiguration. I wasn't planning on touching the KubeletConfiguration type itself in this PR, since I had another PR for refactoring that... but I suppose it doesn't hurt to remove a few parameters from it that clearly shouldn't be dynamic - especially given it means we won't have to worry about regenerating certs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok those fields are yanked, no cert regen.
Clients are regenerated from latest config so that the QPS, etc. parameters are up-to-date, and then the latest client is injected when starting the sync loop.

return err
// when you run the controller, either you get back a valid config to use, or the Kubelet crashes because
// something was fatally wrong with the configuration. Non-fatal errors will be logged, but not returned from Run().
kcToUse := ncc.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still expect an error returned, not an internal panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Kubelet cannot reliably determine a configuration to use, it should refuse to start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but that should happen via a bubbling error, not a panic deep in a config library

// If specified, the source to get node configuration from
// The DynamicKubeletConfig feature gate must be enabled for the Kubelet to use this field
// +optional
ConfigSource *NodeConfigSource `json:"configSource,omitempty" protobuf:"bytes,6,opt,name=configSource"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we determine whether a node is permitted to access a particular configmap? nodes create their API objects on startup, if needed. Should they be permitted to create a Node API object that references a configmap config source? That's letting a node escalate their view permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Node can use the configuration from a particular ConfigMap if its Kubelet can read that ConfigMap. IIUC, the Kubelet registers the Node object. Either the Kubelet has permission to read the config it sets on the Node at that time, or it doesn't, and I don't see how this permission changes when the Node object is created.

I think we should probably allow Nodes to reference config at creation time, so an autoscaler can spin up Nodes that immediately refer to the desired config.

Copy link
Contributor Author

@mtaufen mtaufen May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-read your scoped Kubelet access proposal (kubernetes/community#585) just now. If the node authorizer takes "related to the requesting node" to include "node -> configmap", then the permissions I just noted are too broad to keep things scoped to just that Kubelet's node. Since that uses specs as sources of scope information, you're correct that there is an escalation path. The node authorizer would have to either disallow Kubelets from creating nodes that initially reference configuration, or have some other source of permissions information for this configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed with the additions to plugin/pkg/admission/noderestriction/admission.go in this PR.

// Only used for dynamic configuration.
// The maximum number of Kubelet restarts that can occur in ConfigTrialDuration before the
// configuration fails the trial and the Kubelet rolls back to its last-known-good config.
// Default 3, mimimum allowed is 2, maximum allowed is 10.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimum of 2 is odd... why are we required to let the kubelet crash twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it is currently defaulting to 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for that catch on the default vs. the comment.

The Kubelet has to restart at least once to take up the new configuration, and since the curSymlink would have been updated when the new configuration was downloaded, the startups file will contain one restart since that update. So we can't count the restart as a "crash" until we see 2 restarts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the kubelet can't tell when it crashed vs when it shut down in an orderly way to pick up a new config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's an existing way to do that.

I sketched one idea out just now:
Have a file that contains a bool - True by default (so the initial Kubelet startup isn't treated as a crash). If the Kubelet is about to exit to refresh its config, it writes True to the file. On restart, only startups where the file is False are recorded, and the Kubelet always resets the file to False after checking whether it should record a startup.

But there are edge cases - if you reboot a node for a legitimate reason (like an upgrade), it will be counted as a Kubelet crash unless you write True to the file as part of your reboot sequence.

I could also just make the threshold non-inclusive (> rather than >=), so the minimum would be 1, which might be less confusing. Just relying on the threshold is an imperfect solution, I agree, but I think it gives users decent wiggle-room without the implementation having to track additional state.

Copy link
Contributor Author

@mtaufen mtaufen May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it so the minimum can be 1 0, meaning no crashes allowed. I'd like to postpone trying to differentiate crashes from config-exits, as I have a feeling any hasty solution I dream up now will be riddled with edge-cases.

} else {
s = format
}
glog.FatalDepth(1, fmt.Sprintf(nodeconfigLogFmt, s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process exiting inside a config library is unexpected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this case. We want the Kubelet to complain mightily if something is messing with its configuration filesystem by refusing to start if it can't reliably determine a configuration.

I wouldn't think of this as a library. It's a core part of the Kubelet startup workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning an error is more expected and makes testing easier. os.Exit inside a config package is unexpected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please don't exit here

Copy link
Contributor Author

@mtaufen mtaufen May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll see if I can find a better way to plumb errors around. It was much nicer to implement this under the assumption that the Kubelet should just stop when it hits these classes of error, but the point about testing is enough reason for me to change this.

There are a lot of very low-level, very-bad errors that should prevent the controller from continuing, so I'm going to see if I can find a way return these fatal-class errors from the high-level Run() without plumbing everything through return values under the hood - which makes error handling much uglier, and makes it harder to differentiate between "fatal" and "non-fatal" error classes. Maybe something along these lines: https://blog.golang.org/errors-are-values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of immediately stopping the controller when things are really really wrong without inserting a bunch of error-handling boilerplate, how do you feel about panics in the very low-level functions that are recovered in Run() and in the sync-loop? These are testable, unlike the fatal errors, allow us to return an error from Run(), and only require a very small change (fatalf calls glog.ErrorDepth and panics instead of calling glog.FatalDepth).

There is precedent in the Go standard library for using panic like this, see https://golang.org/src/encoding/json/decode.go (func (d *decodeState) error(err error), which calls panic, and the recover call in unmarshal, for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the "panic" model are in the second commit, for reference.

}

// TODO(mtaufen): Once the KubeletConfiguration type is decomposed (#44252), allow multiple keys to contain configuration.
// Since we presently only expect one key, the for loop is a convenient way to express "take any value in the map".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the initial version of this should define the key we load config from. If we want to decompose into multiple keys later, we'll be able to, but "take the first arbitrary key" limits how we can evolve this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this limit us? I don't see a problem evolving from "take the first arbitrary key" to "take all arbitrary keys". Besides, as long as it's in alpha we can make any breaking change we want.

This evolution will happen alongside #44252, which should be done for 1.8, prior to dynamic configuration exiting alpha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means the only direction we can evolve is "take all arbitrary keys", which I'm not a fan of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? The information we care about is the set of config objects (presently just one) in the data associated with those keys. As long as we get the right set of objects, why should we care about how they're distributed across the keys? We can evolve by adding new object types.

We will always have the schematic requirement that all the correct objects exist in the configuration payload. I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we get the right set of objects, why should we care about how they're distributed across the keys?

We should start with a well-specified key. It is far easier to relax later than it is to tighten.

We can evolve by adding new object types.

In the simplest example, consider a kubelet built with this PR. Pointed at a configmap containing two keys, one of which is the current config object type and the other is a new config object type, it will fail if it encounters the other unknown object first.

Instead, if it looked for a specific key, then a single configmap could provide a compatible object to kubelets with this PR, and a new config object type to kubelets that knew to look for the appropriate key.

I'm wary of layering arbitrary schematic requirements on top of this, like requiring specific key names, because it just adds unnecessary complexity and more opportunities for user error.

Given the naming requirements around the config map hash/algorithm, I assume a tool will be needed to generate these. It should set the key names correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it wouldn't be safe to roll out newly-typed configuration until all clients are updated to understand it. If we were planning on adding a new config object, we'd have to make it optional, upgrade the Kubelets to understand it, and then roll out the new configurations that use that object.

That said, I see some usefulness to being able to express a transition of configuration across Kubelet upgrades in the ConfigMap. What would you think of requiring the key name match the minimum version of the Kubelet that should be allowed to use a given configuration, e.g. 1.7.0, or a key expressing the version of the API group that the config came from, e.g. kubelet.v1alpha1? A tool could definitely be created to understand these relationships.

Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.

For this PR, I'll just require the config live in a key called kubelet.v1alpha1, and we can keep discussing this as the feature develops.

Copy link
Member

@liggitt liggitt May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just glue each config object to a key name, and Kubelets could just ignore keys they don't understand.

That is what I expected, and seems more in line with how most APIs behave

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll actually reduce it all the way back to the kubelet key that was originally in the proposal, so we don't commit to any scheme beyond that.

@dchen1107
Copy link
Member

dchen1107 commented Aug 7, 2017

Had some offline discussion with @mtaufen and we agreed to handle above comments through separate issues: #50215, #50216 and #50217 since this pr is already too big.

Please rebase the pr, we should continue.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
@mtaufen mtaufen force-pushed the dkcfg branch 2 times, most recently from 93cac39 to 5aa8264 Compare August 8, 2017 03:51
@luxas
Copy link
Member

luxas commented Aug 8, 2017

/retest

obj.ConfigTrialDuration = &metav1.Duration{Duration: 10 * time.Minute}
}
if obj.CrashLoopThreshold == nil {
obj.CrashLoopThreshold = utilpointer.Int32Ptr(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bump this up to 10 for now until we resolve #50216? Otherwise, this could be very dangerous for anyone attempt to use this feature end up to no good kubelet config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Alpha implementation of the Dynamic Kubelet Configuration feature.
See the proposal doc in kubernetes#29459.
@dchen1107
Copy link
Member

/retest

@dchen1107
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, mtaufen, thockin

Associated issue: 281

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 8, 2017

/retest

@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 8, 2017

/test pull-kubernetes-verify

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/kubelet-api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet