New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Detect CRIs automatically #69366

Merged
merged 1 commit into from Jan 23, 2019

Conversation

@rosti
Copy link
Member

rosti commented Oct 3, 2018

What this PR does / why we need it:

In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed.

This change does that by doing the following:

  • Introduce a new runtime function (DetectCRISocket) that will attempt to detect a CRI socket, or return an appropriate error.
  • Default to using the above function if --cri-socket is not specified and CRISocket in NodeRegistrationOptions is empty.
  • Stop static defaulting to DefaultCRISocket. And rename it to DefaultDockerCRISocket. Its use is now narrowed to "Docker or not" distinguishment and tests.
  • Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet. Use that in all commands, that support --cri-socket.
  • Remove the deprecated --cri-socket-path flag from kubeadm config images pull and deprecate --cri-socket in kubeadm upgrade apply.

In short, if multiple CRIs are detected, we bail out with an error. If no CRI is detected, we attempt to use Docker by default and this will fail if it's not installed and we actually need the CRI for the operation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Refs kubernetes/kubeadm#1117 (probably needs docs update too)

Special notes for your reviewer:
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/kind feature
/assign @fabriziopandini
/assign @timothysc
/cc @neolit123 @bart0sh @kad

Release note:

kubeadm now attempts to detect an installed CRI by its usual domain socket, so that --cri-socket can be omitted from the command line if Docker is not used and there is a single CRI installed.

@k8s-ci-robot k8s-ci-robot requested review from bart0sh , kad and neolit123 Oct 3, 2018

@rosti rosti changed the title kubeadm: Detect CRIs automatically [WIP] kubeadm: Detect CRIs automatically Oct 3, 2018

if err != nil {
return err
}
glog.V(1).Infof("detected and using CRI socket: %s\n", cfg.CRISocket)

This comment has been minimized.

@neolit123

neolit123 Oct 3, 2018

Member
  • glog calls don't need the extra \n. appears multiple times in the diff.
  • better move the Infof() call in DetectCRISocket()?
  • shouldn't we add this for nodeconfig.go too?
    JoinConfiguration also has a NodeRegistrationOptions

This comment has been minimized.

@rosti

rosti Nov 1, 2018

Author Member

This code is called for JoinConfiguration.

@fabriziopandini
Copy link
Contributor

fabriziopandini left a comment

Thanks @rosti !
I think that the approach you are proposing can be slightly improved by leveraging on the node annotation that contains the CRI used for node initialisation
Looking forward for a second pass on this! Ping me when it is ready

@@ -23,6 +23,6 @@ const (
DefaultCACertPath = "/etc/kubernetes/pki/ca.crt"
// DefaultSocketUrlScheme defines default socket url prefix
DefaultUrlScheme = "unix"
// DefaultCRISocket defines the default cri socket
DefaultCRISocket = "/var/run/dockershim.sock"
// DefaultDockerCRISocket defines the default Docker CRI socket

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 5, 2018

Contributor

Now that v1beta1 merged those changes should be implemented there.
With regards to implement changes on v1alpha3 as well, I think that we are not allowed to alter the behaviour of an existing API versions, but I'm open to discuss if this case (change of a default for improving UX) is an acceptable exception

@@ -212,7 +212,7 @@ func TestImagesPull(t *testing.T) {
LookPathFunc: func(cmd string) (string, error) { return "/usr/bin/docker", nil },
}

containerRuntime, err := utilruntime.NewContainerRuntime(&fexec, kubeadmapiv1alpha3.DefaultCRISocket)
containerRuntime, err := utilruntime.NewContainerRuntime(&fexec, kubeadmapiv1alpha3.DefaultDockerCRISocket)

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 5, 2018

Contributor

v1beta1 (please revise this PR)

@@ -56,7 +56,7 @@ func assertDirEmpty(t *testing.T, path string) {
func TestNewReset(t *testing.T) {
var in io.Reader
certsDir := kubeadmapiv1alpha3.DefaultCertificatesDir
criSocketPath := kubeadmapiv1alpha3.DefaultCRISocket
criSocketPath := kubeadmapiv1alpha3.DefaultDockerCRISocket

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 5, 2018

Contributor

v1beta1

@@ -55,6 +55,12 @@ func NewCmdReset(in io.Reader, out io.Writer) *cobra.Command {
ignorePreflightErrorsSet, err := validation.ValidateIgnorePreflightErrors(ignorePreflightErrors)
kubeadmutil.CheckErr(err)

if criSocketPath == "" {
criSocketPath, err = utilruntime.DetectCRISocket()

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 5, 2018

Contributor

I think that the detection process should initially try to read the kubeadm config (that now includes also the node annotation with the cri socket used for node initialisation) and only as a fallback use cri detection.

This has two advantages:

  1. it will work also in case for multiple cri (because the cri to use is already known)
  2. it will enable further cleanup actions e.g. cleanup of cluster status in case of control plane nodes (out of scope of this PR)
@@ -69,7 +75,7 @@ func NewCmdReset(in io.Reader, out io.Writer) *cobra.Command {
)

cmd.PersistentFlags().StringVar(
&criSocketPath, "cri-socket", kubeadmapiv1alpha3.DefaultCRISocket,
&criSocketPath, "cri-socket", "",

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 5, 2018

Contributor

Might be adding a something to the flag description like e.g. this parameter should be used only in case automatic cri detection fails

Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/apply.go
@chuckha
Copy link
Member

chuckha left a comment

this is going to save me a lot of time once it merges! thanks for the PR

return "", goerrors.New("No CRI socket detected, please use the --cri-socket option")
case 1:
return foundCRISockets[0], nil
default:

This comment has been minimized.

@chuckha

chuckha Oct 8, 2018

Member

i wonder it would be a better or worse experience to print a warning that multiple sockets were detected and kubeadm is continuing with the first one it found? My thought is that this would allow more users to get started without having to specify a specific socket to use, but still be aware there may be a problem.

This comment has been minimized.

@rosti

rosti Nov 1, 2018

Author Member

Every possible solution, from UX perspective, is a double edged sword.

On one hand, if we bail with error (the current version) the user will have to re-run the command with correct --cri-socket passed in. This may be annoying to some users and it's still not guaranteed, that they will provide the correct CRI.

On other hand, if we warn the user and just pick up the first one, then the user may ignore or not see the warning and end up in a mess (requiring reset).

For me the first option is more viable in the case of multiple CRIs and therefore I picked that solution.

}

for _, socket := range knownCRISockets {
if isSocket(socket) {

This comment has been minimized.

@bart0sh

bart0sh Oct 16, 2018

Contributor

Checking existence of the socket is not enough from my point of view. I'd suggest to also check if CRI API is accessible through socket.

This comment has been minimized.

@neolit123

neolit123 Oct 16, 2018

Member

maybe we should do https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/cri#RuntimeVersioner
or something that is known to be supported by all implementers.

This comment has been minimized.

@bart0sh

bart0sh Oct 29, 2018

Contributor

crictl info should be enough, I believe.

This comment has been minimized.

@rosti

rosti Nov 2, 2018

Author Member

I don't think, that we need to make the check too complex. Even checking, that the path leads to a domain socket is a bit of an overkill on my part. The sockets we are checking for at the moment, cannot be created without root privileges and I doubt that any process running as root could create an exact socket path by accident.

My opinion is to keep things as simple as possible. If there aren't any use cases, where someone could deliberately setup an invalid CRI socket, then I don't think we should add any additional checks for it.

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Oct 29, 2018

we probably want this in 1.13?

@rosti

This comment has been minimized.

Copy link
Member Author

rosti commented Oct 29, 2018

@neolit123 I'll try to get to it this week or the next one.

@timothysc timothysc added this to the v1.13 milestone Nov 1, 2018

@rosti rosti force-pushed the rosti:cri-auto-detect branch from 00ef7ed to 1b7b7cd Nov 2, 2018

@rosti rosti changed the title [WIP] kubeadm: Detect CRIs automatically kubeadm: Detect CRIs automatically Nov 2, 2018

@rosti

This comment has been minimized.

Copy link
Member Author

rosti commented Nov 2, 2018

This should be ready for review now.

func detectCRISocketImpl(isSocket func(string) bool) (string, error) {
foundCRISockets := []string{}
knownCRISockets := []string{
// /var/run/dockershim.sock is deliberately omitted from here, we have a separate Docker check below

This comment has been minimized.

@bart0sh

bart0sh Nov 2, 2018

Contributor

I didn't get the point. Why not to add /var/run/dockershim.sock here and remove duplicate check below?

This comment has been minimized.

@rosti

rosti Nov 2, 2018

Author Member

/var/run/dockershim.sock is backed by kubelet, which is in continuous restart cycle if the node is uninitialized (like before kubeadm init or join).
Therefore the check for /var/run/dockershim.sock is unreliable and it actually was not working in something like 80% of the time I tested it on a fresh machine.

case 0:
return "", pkgerrors.New("No CRI socket detected, please use the --cri-socket option")
case 1:
return foundCRISockets[0], nil

This comment has been minimized.

@bart0sh

bart0sh Nov 2, 2018

Contributor

Does this mean that if docker is running this function will always return /var/run/docker.sock ? Would it be better to change the order of known sockets? If CRI-O or containerd is configured and running should we prefer those even if docker is running?

This comment has been minimized.

@rosti

rosti Nov 2, 2018

Author Member

No, this means, that if you have only one runtime environment (no matter if Docker or CRI socket backed one) it will use that. If there is no runtime environment detected or multiple ones are detected (for example Docker & CRI-O) it will display error message and force the user to use --cri-socket to supply the socket.

@rosti rosti changed the title kubeadm: Detect CRIs automatically [WIP] kubeadm: Detect CRIs automatically Nov 2, 2018

@rosti rosti force-pushed the rosti:cri-auto-detect branch from ec9305f to 53c9f3e Jan 15, 2019

@rosti rosti changed the title [WIP] kubeadm: Detect CRIs automatically kubeadm: Detect CRIs automatically Jan 15, 2019

@rosti rosti force-pushed the rosti:cri-auto-detect branch from 53c9f3e to f7beaa6 Jan 15, 2019

@@ -79,6 +79,7 @@ func newCmdPreFlightNode() *cobra.Command {
cfg := &kubeadmapiv1beta1.JoinConfiguration{}
kubeadmscheme.Scheme.Default(cfg)

// Ross: CRI?

This comment has been minimized.

@neolit123

neolit123 Jan 15, 2019

Member

possibly a TODO leftover?

Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/apply.go
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Jan 15, 2019

/assign @bart0sh

@rosti rosti force-pushed the rosti:cri-auto-detect branch from f7beaa6 to d3a5dea Jan 15, 2019

@rosti

This comment has been minimized.

Copy link
Member Author

rosti commented Jan 16, 2019

/test pull-kubernetes-e2e-kops-aws

@fabriziopandini
Copy link
Contributor

fabriziopandini left a comment

@rosti I really appreciate you are keeping up in this effort for so a long time!
IMO the PR is shaping out nicely; I left some minor comments, but nothing blocking and I'm ready to lgtm asap. in the meantime
/approve

PS. it would be great if during the grand discussion about CI test improvement we define an MVP for different CRI as well

@@ -185,6 +185,9 @@ func getDefaultedInitConfig() (*kubeadmapi.InitConfiguration, error) {
KubernetesVersion: fmt.Sprintf("v1.%d.0", constants.MinimumControlPlaneVersion.Minor()+1),
},
BootstrapTokens: []kubeadmapiv1beta1.BootstrapToken{placeholderToken},
NodeRegistration: kubeadmapiv1beta1.NodeRegistrationOptions{
CRISocket: kubeadmapiv1beta1.DefaultDockerCRISocket, // avoid CRI detection

This comment has been minimized.

@fabriziopandini

fabriziopandini Jan 20, 2019

Contributor

Side note:
It seems to me that the defaulted configuration is getting more and more driven by the need of passing validation and unit tests, vs being driven by UX needs. This is not optimal and should probably be addressed in the long term.

This comment has been minimized.

@rosti

rosti Jan 21, 2019

Author Member

Indeed this is the case. Especially the fact, that we carry NodeRegistrationOptions around as part of the InitConfiguration, which we fetched up only to use something from ClusterConfiguration or the LocalAPIEndpoint. This in itself triggers the CRI autodetect code too often and sometimes without needing it at all.
In my opinion, in the long run, we have to decouple the configs and fetch them as we need them from the cluster or config file(s). I believe, that this is what @luxas would want too.

Show resolved Hide resolved cmd/kubeadm/app/cmd/init.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/join.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/apply.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/util/cmdutil.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/util/cmdutil.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/config.go
Show resolved Hide resolved cmd/kubeadm/app/apis/kubeadm/v1beta1/defaults_unix.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/apis/kubeadm/v1beta1/defaults_windows.go Outdated
kubeadm: Detect CRIs automatically
In order to allow for a smoother UX with CRIs different than Docker, we have to
make the --cri-socket command line flag optional when just one CRI is
installed.

This change does that by doing the following:

- Introduce a new runtime function (DetectCRISocket) that will attempt to
  detect a CRI socket, or return an appropriate error.
- Default to using the above function if --cri-socket is not specified and
  CRISocket in NodeRegistrationOptions is empty.
- Stop static defaulting to DefaultCRISocket. And rename it to
  DefaultDockerCRISocket. Its use is now narrowed to "Docker or not"
  distinguishment and tests.
- Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet.
  Use that in all commands, that support --cri-socket.
- Remove the deprecated --cri-socket-path flag from kubeadm config images pull
  and deprecate --cri-socket in kubeadm upgrade apply.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>

@rosti rosti force-pushed the rosti:cri-auto-detect branch from d3a5dea to f97770b Jan 21, 2019

@fabriziopandini

This comment has been minimized.

Copy link
Contributor

fabriziopandini commented Jan 23, 2019

@rosti well done!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 23, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 23, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, rosti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fabriziopandini

This comment has been minimized.

Copy link
Contributor

fabriziopandini commented Jan 23, 2019

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot merged commit b66e332 into kubernetes:master Jan 23, 2019

18 checks passed

cla/linuxfoundation rosti authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details

// The CRI socket flag is deprecated here, since it should be taken from the NodeRegistrationOptions for the current
// node instead of the command line. This prevents errors by the users (such as attempts to use wrong CRI during upgrade).
cmdutil.AddCRISocketFlag(cmd.Flags(), &flags.criSocket)

This comment has been minimized.

@inercia

inercia Jan 23, 2019

Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

This comment has been minimized.

@neolit123

neolit123 Jan 23, 2019

Member

i had concerns about this myself but forgot to mention at today's office hours. :
maybe we still want a flag that is set to a value of "auto" by default.

This comment has been minimized.

@MalloZup

MalloZup Jan 24, 2019

Contributor

agree

This comment has been minimized.

@rosti

rosti Jan 24, 2019

Author Member

This is error prone and is there only for historical reasons. It's there, because in the past we did not keep the CRI socket in the cluster on per node basis.
As long as the option is still there, you can use it, but it's now deprecated.
Anyway, forcing the CRI should be done upon init/join and then leave kubeadm to use that socket for all other operations. In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

This comment has been minimized.

@neolit123

neolit123 Jan 24, 2019

Member

In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

do you think we should undeprecated the flag and have it with a value of "auto" by default?

This comment has been minimized.

@rosti

rosti Jan 24, 2019

Author Member

I don't think so, it's best that this flag is removed altogether. The only thing, that users can attempt to do with this flag is to change the CRI upon upgrade. However, I don't think, that this is going to work properly and I don't think, that we should support such user story.

This comment has been minimized.

@neolit123

neolit123 Jan 24, 2019

Member

but how would we handle the scenario @inercia mentioned:

Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

i.e. multiple CRIs installed. how would the user pick one of the sockets?

This comment has been minimized.

@rosti

rosti Jan 24, 2019

Author Member

The persisted NodeRegistrationOptions in the cluster for this node is going to contain a CRI socket, that was setup upon init/join. Hence, no detection will be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment