Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Refactor and Remove KubeletConfiguration from componentconfig #44252

Closed
wants to merge 1 commit into from

Conversation

mtaufen
Copy link
Contributor

@mtaufen mtaufen commented Apr 9, 2017

I've finished my initial pass at grouping the KubeletConfiguration types into substructures and I'm looking for feedback.

The proposed groupings are in a temporary file for now (pkg/apis/componentconfig/v1alpha1/temp_v1a1_refactor.gogo), to make it easy to see the original type side-by-side (pkg/apis/componentconfig/v1alpha1/types.go) and to make it easier to grep just the stuff from KubeletConfiguration. The .gogo extension on the temp file is just so tools don't get confused by a malformed go file.

There are a few TODO(kc-refactor) comments noting questions I have or things I still have to think about.

@yujuhong Please let me know if I effectively covered the set of flags-only-eventually-remove parameters related to container runtimes, or if I went too far.

@timstclair Please take a look at the PodSecurityConfig struct and let me know if you like grouping things this way.

@bgrant0607 @thockin @dchen1107 @vishh @dashpole @lavalamp @Random-Liu @derekwaynecarr @justinsb @smarterclayton @ncdc @mikedanese

@mtaufen mtaufen added area/kubelet area/kubelet-api do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 9, 2017
@mtaufen mtaufen added this to the v1.7 milestone Apr 9, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 9, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mtaufen

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 9, 2017
@k8s-reviewable
Copy link

This change is Reviewable

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what to do with this PR. What is a .gogo file ? :)

ManifestURLHeader string `json:"manifestURLHeader"`

// maxOpenFiles is Number of files that can be opened by Kubelet process.
MaxOpenFiles int64 `json:"maxOpenFiles"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're sure this shouldn't be a cluster-wide param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's just for the Kubelet process, I'd configure it in node-level config.

// mounts,etc).
RootDirectory string `json:"rootDirectory"`

// containerRuntime is the container runtime to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't these be cluster-wide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerRuntime is probably the same cluster-wide. This is also really a CRI detail and I should probably move this to the section of stuff that is subsumed by CRI.

The RemoteRuntime/RemoteImage endpoints are paths to socket files on the node, for communicating with CRI. Probably the same cluster-wide if all of your nodes have the same filesystem layout, but it's a node-level config parameter and should be treated as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the true CRI world, this field, ContainerRuntime will not exist. Since it's a legacy field, let's not include this in the kubelet configuration (and kubelet can't transition from using docker to rkt properly anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the runtime/image endpoints, I think it's perfectly fine to have a hybrid cluster composed of docker and rkt nodes if you'd like.

It's also risky to give the impression that you can simply change the endpoint in your cluster-wide KubeletConfiguration to use another CRI implementation without any consequence, so I'd not want to encourage people to use this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that this should not be cluster-wide.

type FeatureGates struct {
// featureGates is a string of comma-separated key=value pairs that describe feature
// gates for alpha/experimental features.
FeatureGates string `json:"featureGates,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that really wants to be cluster-config? E.g. master and node components need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. We ought to put this somewhere central, because so many components use it. Where that central place should be, I've yet to really think about.

@mtaufen
Copy link
Contributor Author

mtaufen commented Apr 10, 2017

@thockin this isn't intended to be merged. .gogo is a made-up extension so my tools don't try to parse that file. This PR is just so that we can talk about the categories in a file that only contains the stuff from KubeletConfiguration.

Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, the categories in the file look too fine-grained to me.

@thockin this isn't intended to be merged. .gogo is a made-up extension so my tools don't try to parse that file. This PR is just so that we can talk about the categories in a file that only contains the stuff from KubeletConfiguration.

A markdown file may be better -- we'd get the colored text for the code :-)

type KubeletDebugConfig struct {
// enableDebuggingHandlers enables server endpoints for log collection
// and local running of containers and commands
EnableDebuggingHandlers *bool `json:"enableDebuggingHandlers"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat: EnableDebuggingHandlers is enabled by default because it's required for exec, attach, logs, portforward, etc to work. It's probably a misnomer, and I am not sure it should really be grouped into the KubeletDebugConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, does that mean that debug-only endpoints like configz are also served in production clusters? We really shouldn't do that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also be a good opportunity for us to change the name (and split debug from exec, attach, etc.), since this config API is still alpha.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, since this is required for kubectl to interact directly with containers, what are the specific cases where you'd want to turn it off?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the master nodes.

KubeAPIBurst int32 `json:"kubeAPIBurst"`
// syncFrequency is the max period between synchronizing running
// containers and config
SyncFrequency metav1.Duration `json:"syncFrequency"` // TODO(kc-refactor): This feels right here but take another look later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncFrequency is only related to kubelet's internal implementation and is not directly related to how kubelet interacts with the apiserver (if this is what KubeAPIConfig is about).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok then KubeAPIConfig would not be the appropriate place for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to stick it in PodLifecycleConfig. That said, it does still have some indirect effect on the frequency of API calls, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very indirectly. If you sync the pods and the statuses remain unchanged, no updates will be sent.


}

type NodeStatusConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need separate structs for NodeRegistrationConfig and NodeStatusConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i am wondering if we would ever have more than one field in this struct?

// dockerEndpoint is the path to the docker endpoint to communicate with.
DockerEndpoint string `json:"dockerEndpoint"`
// enable gathering custom metrics.
EnableCustomMetrics bool `json:"enableCustomMetrics"` // for Docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think EnableCustomMetrics belong in the DockerOptions. We support this through cadvisor, and may eventually move this out of kubelet.

CNIBinDir string `json:"cniBinDir"`
// podInfraContainerImage is the image whose network/ipc namespaces
// containers in each pod will use.
PodInfraContainerImage string `json:"podInfraContainerImage"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a docker-specific field.

/* BEGIN RUNTIME REMOVAL */

// I believe the plan is to remove ALL of these from the Kubelet/Node configuration
// e.g. they become flags-only for now and are immediately marked deprecated, for later removal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can deprecate these flags anytime soon because there are no alternatives to specify these settings. We can only deprecate them when/if we deprecate the docker integration and/or enable running dockershim as a separate process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we deprecate the integration now and tell people to move to CRI as the replacement?

// mounts,etc).
RootDirectory string `json:"rootDirectory"`

// containerRuntime is the container runtime to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the true CRI world, this field, ContainerRuntime will not exist. Since it's a legacy field, let's not include this in the kubelet configuration (and kubelet can't transition from using docker to rkt properly anyway).

// mounts,etc).
RootDirectory string `json:"rootDirectory"`

// containerRuntime is the container runtime to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the runtime/image endpoints, I think it's perfectly fine to have a hybrid cluster composed of docker and rkt nodes if you'd like.

It's also risky to give the impression that you can simply change the endpoint in your cluster-wide KubeletConfiguration to use another CRI implementation without any consequence, so I'd not want to encourage people to use this.

@timstclair timstclair removed their assignment Apr 24, 2017

type CgroupsConfig struct {
// kubeletCgroups is the absolute name of cgroups to isolate the kubelet in.
KubeletCgroups string `json:"kubeletCgroups"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if this could be made optional.

// for no container. Rolling back the flag requires a reboot.
SystemCgroups string `json:"systemCgroups"`
// runtimeCgroups are cgroups that container runtime is expected to be isolated in.
RuntimeCgroups string `json:"runtimeCgroups"` // TODO(kc-refactor): Does this become part of CRI now?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this would be deprecated, the kubelet should cease to re-parent runtimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, the more we can yank the better

RuntimeCgroups string `json:"runtimeCgroups"` // TODO(kc-refactor): Does this become part of CRI now?
// cgroupRoot is the root cgroup to use for pods. This is handled by the
// container runtime on a best effort basis.
CgroupRoot string `json:"cgroupRoot"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we updated this text. with cgroups per qos enabled, this is the cgroup that will be the parent of kubepods cgroup for all end-user pods.

KubeReserved map[string]string `json:"kubeReserved"`
// This flag helps kubelet identify absolute name of top level cgroup used to enforce `SystemReserved` compute resource reservation for OS system daemons.
// Refer to [Node Allocatable](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) doc for more information.
SystemReservedCgroup string `json:"systemReservedCgroup,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the separate struct for CgroupsConfig feels awkward. can we combine them into a common NodeAllocatableCnofig struct?

@mtaufen
Copy link
Contributor Author

mtaufen commented Apr 25, 2017

I'll post a comment when I push that update.

@mtaufen
Copy link
Contributor Author

mtaufen commented Apr 26, 2017

I restructured into these categories:
Original 6:

  • NodeLifecycleConfig
    • Node registration, labels, taints, etc.
  • PodLifecycleConfig
    • I was surprised to only find one parameter that really made sense in this group (SyncFrequency)
  • NodeMonitoringConfig
    • Status update frequency, event QPS, etc.
  • NodeResourcesConfig
    • node allocatable, cgroups, garbage collection, eviction, etc.
  • NodeSecurityConfig
    • allowed sysctls, seccomp, etc.
  • NodeNetworkConfig

Additional:

  • VolumeConfig
    • Still not super-sure what to do with these, I stuck them here because they're both volume related. Open to suggestions.
  • KubeletServerConfig
    • Config for the endpoints the Kubelet serves that make attach, exec, portforward, etc. work, and healthz and cadvisor endpoints.
    • Also includes auth and TLS, and I'm on the fence about whether these should be here or in NodeSecurityConfig
    • Also given that this involves logs collection, there's some relation to NodeMonitoringConfig
  • KubeletDebugConfig
    • I suspect we'll find more appropriate parameters for this one when we split "debug" from "endpoints needed for kubectl attach, exec, etc."
  • ClusterFeatureGates
    • Again, this should be exposed somewhere central to all components
  • ClusterAPIConfig
    • QPS to the apiserver, etc.
  • ImageAPIConfig
    • QPS to the registry, etc.
  • CloudConfig
    • Cloud provider information

@mtaufen
Copy link
Contributor Author

mtaufen commented Apr 27, 2017

Also wrt FeatureGates, that mechanism is unversioned -- it should be made versioned and not coupled to other components' versioned APIs. We shouldn't have to rev the API version of every component every time feature gates are added or removed, components from a new K8s version should be able to convert an old FeatureGates object to what they expect.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/new-api size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 18, 2017
@mtaufen
Copy link
Contributor Author

mtaufen commented May 18, 2017

I made some progress on this today but it won't build yet as some test code relies on bits of the alpha dynamic config API that this PR removes. I'll probably continue this work in the 1.8 timeframe, as I really need to focus on the alpha implementation of the new dynamic config mechanism for 1.7, which matters more than the API right now.

Updates:

  • The example refactoring now lives in the Kubelet's tree, under pkg/kubelet/apis/nodeconfig
  • Most, if not all dependencies have been redirected appropriately.
  • It almost builds :P.
  • I moved several fields out of KubeletConfiguration and into KubeletFlags for the time being. I plan to do an extensive audit of all of the fields to determine which are "safe" and "sensible" to dynamically update, and which are not. Those which are not will not be allowed into the dynamic configuration API. So more fields will probably leave KubeletConfiguration, which is currently the "dynamic" type.
  • I probably won't have time to do that audit before 1.7, which means this PR won't be done for 1.7. This is ok because the Kubelet's config API is still planned to be alpha in 1.7.
  • Part of my roadmap for 1.8 is going to be a static config API for the Kubelet so these static fields can live somewhere other than flags. But for now it's safer to keep them flags than it is to carelessly shove them into dynamic config.

Commit text from the latest squash:

This moves KubeletConfiguration from componentconfig to
pkg/kubelet/apis/nodeconfig

The types themselves still need a lot of refactoring work. For example,
a serious audit of what can SAFELY be dynamically updated should be
done. Part of this commit moves a lot of parameters out of the
KubeletConfiguration type and into KubeletFlags, because they need more
thought before we allow them to be dynamic (like changing cgroups on the fly).

Unfortunately this somewhat reduces the utility of dynamic config for testing,
and presently the pods_container_manager_test in e2e_node is a blocker
because it relied on dynamic config to change cgroups parameters.

I probably shouldn't be surprised at how many dependencies on dynamic
config have sprung up in the last ~year, but it is kind of incredible.
After all, the current implementation is barely alpha, and the API is
still subject to violent changes like this commit.

@mtaufen mtaufen removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 18, 2017
@mtaufen mtaufen changed the title WIP Group the KubeletConfiguration parameters into substructures WIP Refactor and Remove KubeletConfiguration from componentconfig May 18, 2017
This moves KubeletConfiguration from componentconfig to
pkg/kubelet/apis/nodeconfig

The types themselves still need a lot of refactoring work. For example,
a serious audit of what can SAFELY be dynamically updated should be
done. Part of this commit moves a lot of parameters out of the
KubeletConfiguration type and into KubeletFlags, because they need more
thought before we allow them to be dynamic (like changing cgroups on the fly).

Unfortunately this somewhat reduces the utility of dynamic config for testing,
and presently the pods_container_manager_test in e2e_node is a blocker
because it relied on dynamic config to change cgroups parameters.

I probably shouldn't be surprised at how many dependencies on dynamic
config have sprung up in the last ~year, but it is kind of incredible.
After all, the current implementation is barely alpha, and the API is
still subject to violent changes like this commit.
@k8s-ci-robot
Copy link
Contributor

@mtaufen: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins Bazel Build ef8277b link @k8s-bot bazel test this
Jenkins GCE Node e2e ef8277b link @k8s-bot node e2e test this
pull-kubernetes-federation-e2e-gce ef8277b link @k8s-bot pull-kubernetes-federation-e2e-gce test this
pull-kubernetes-cross ef8277b link @k8s-bot pull-kubernetes-cross test this
Jenkins Kubemark GCE e2e ef8277b link @k8s-bot kubemark e2e test this
Jenkins unit/integration ef8277b link @k8s-bot unit test this
Jenkins kops AWS e2e ef8277b link @k8s-bot kops aws e2e test this
Jenkins GCE etcd3 e2e ef8277b link @k8s-bot gce etcd3 e2e test this
Jenkins verification ef8277b link @k8s-bot verify test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

@mtaufen PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2017
)

// GroupName is the group name use in this package
const GroupName = "nodeconfig"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called nodeconfig and not kubeletconfig. Should it include configuration for other node agents like node problem detector or kube-proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KubeletConfiguration (and decomposition) describe how you want your node to behave, but the Kubelet is really just an implementation detail of how these node-level things are configured, so it doesn't make sense to marry the name to the Kubelet (even though the Kubelet tree is, at present, the best location for the types - because the Kubelet is the only thing that implements them).

Those node-agents should expose their configuration types from their own trees. Interesting idea about putting them in the same group though. Can you get name collisions between components' API groups during registration (like if kube-proxy also called its API group nodeconfig and had some dependency on the Kubelet)? Come to think of it this seems possible... maybe it would be better to call the group kubeletconfig just to be safe... This might also be less confusing for people.

My bias is toward trying to design higher-level APIs, and "configure general node behavior" is a higher level concept than "configure the Kubelet". I think it also gets people to think in terms of what's really necessary for achieving the node behavior they want, rather than putting all kinds of unnecessary stuff in the configuration API.

WDYT?

@grodrigues3
Copy link
Contributor

@mtaufen this wasn't /lgtm'd prior to codefreeze and seems like WIP. Removing the PR from the milestone

@grodrigues3 grodrigues3 removed this from the v1.7 milestone Jun 6, 2017
@mtaufen
Copy link
Contributor Author

mtaufen commented Jun 6, 2017 via email

@k8s-github-robot
Copy link

This PR hasn't been active in 62 days. It will be closed in 27 days (Sep 4, 2017).

cc @bgrant0607 @dchen1107 @derekwaynecarr @justinsb @mtaufen @smarterclayton @vishh @yujuhong

You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days

k8s-github-robot pushed a commit that referenced this pull request Aug 15, 2017
Automatic merge from submit-queue (batch tested with PRs 50198, 49051, 48432)

move KubeletConfiguration out of componentconfig API group

I'm splitting #44252 into more manageable steps. This step moves the types and updates references.

To reviewers: the most important changes are the removals from pkg/apis/componentconfig and additions to pkg/kubelet/apis/kubeletconfig. Almost everything else is an import or name update.

I have one unanswered question: Should I create a whole new api scheme for Kubelet APIs rather than register e.g. a kubeletconfig group with the default runtime.Scheme instance? This feels like the right thing, as the Kubelet should be exposing its own API, but there's a big fat warning not to do this in `pkg/api/register.go`. Can anyone answer this?
@mtaufen mtaufen closed this Aug 24, 2017
@mtaufen
Copy link
Contributor Author

mtaufen commented Aug 24, 2017

I'm going to do something similar across a few PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/kubelet-api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.