Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node e2e non docker specific #57976

Merged

Conversation

@Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Jan 8, 2018

Fixes #57977.

Make node e2e test generic to container runtimes.

With this change, other than tests with [Feature:Docker], all tests can run against all CRI container runtimes.

Note that this PR also marks cpu manager test as Serial, because it restarts kubelet during the test. It doesn't cause problem in regular node e2e suite today, because it is skipped if node has less than 2 CPUs, which is the case for our test environment. /cc @balajismaniam

@yujuhong @mrunalp @feiskyer
/cc @dashpole @balajismaniam @bprashanth Because I addressed your comments.
/cc @kubernetes/sig-node-pr-reviews
Release note:

none
@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

@miaoyq @yanxuean This PR added the container-runtime-pid-file and container-runtime-process flag in node e2e test, which you guys can also use in the benchmark test.

@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

/test pull-kubernetes-unit

1 similar comment
@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

/test pull-kubernetes-unit

Copy link
Member

@yujuhong yujuhong left a comment

Can we trigger the affected test suites to verify the change?

@@ -203,9 +205,11 @@ func RegisterCommonFlags() {
flag.StringVar(&TestContext.FeatureGates, "feature-gates", "", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
flag.StringVar(&TestContext.Viper, "viper-config", "e2e", "The name of the viper config i.e. 'e2e' will read values from 'e2e.json' locally. All e2e parameters are meant to be configurable by viper.")
flag.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/rkt/remote).")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeProcess, "container-runtime-process", "dockerd", "The name of the container runtime process.")
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

nit: container-runtime-process-name for clarity.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Will do.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Done.

Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Did you push again? I didn't see any change.

@@ -203,9 +205,11 @@ func RegisterCommonFlags() {
flag.StringVar(&TestContext.FeatureGates, "feature-gates", "", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
flag.StringVar(&TestContext.Viper, "viper-config", "e2e", "The name of the viper config i.e. 'e2e' will read values from 'e2e.json' locally. All e2e parameters are meant to be configurable by viper.")
flag.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/rkt/remote).")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Why not relying on kubelet's defaults? Does this mean if we change the dockershim socket in kubelet, we'd have to change the test as well?

Also, the relationship between --container-runtime and the new flags may be hard for people (w/o prior knowledge of kubelet internals) to understand.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

The problem is that when connecting the container runtime in the test, we need to know which socket to connect anyway.

Unless we add the default in kubelet/remote client, we have to set the default in the test somewhere.

Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Do we also need the defaults set in hack/make-rules/test-e2e-node.sh?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Another option is to skip setting endpoint in hack/make-rules/test-e2e-node.sh if it's empty. I can do that.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Done.

flag.StringVar(&TestContext.SystemdServices, "systemd-services", "docker", "The comma separated list of systemd services the framework will dump logs for.")
flag.StringVar(&TestContext.ImageServiceEndpoint, "image-service-endpoint", "", "The image service endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ImageServiceEndpoint, "image-service-endpoint", "unix:///var/run/dockershim.sock", "The image service endpoint of cluster VM instances.")
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Same question about setting the default in both the node e2e and in kubelet

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

ditto.

@@ -150,6 +150,7 @@ var _ = framework.KubeDescribe("Container Manager Misc [Serial]", func() {
})
// Log the running containers here to help debugging. Use `docker ps`
// directly for now because the test is already docker specific.
// TODO(random-liu): Use `crictl ps` and `crictl sandboxes` instead.
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Why can't you use the CRIClient you add in this PR to do this?

Also, without changing this, wouldn't the test still fail for non-docker runtimes? How do we currently skip the test for rkt and and non-docker CRI runtimes?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

@yujuhong If the test doesn't fail, this line won't be run.
I'll change this to use CRIClient to dump debug information.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Done.

}
return nil
}, 1*time.Minute, 2*time.Second).Should(BeNil())
for _, pid := range runtimePids {
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

When do we expect multiple pids?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Asked the same question when I review the original container manager test PR. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/container_manager_test.go#L80-L86

There should only be one unless there are multiple instances of docker running. However, the function getPidsForProcess returns multiple pids. We can either assume there is only one, or do things in a loop. This PR goes with the second option now.

Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Can we check the number of pids and error out if there are more than one?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Yeah, that's option one. We can. Will do.
When we start doing benchmark, we may want to include containerd for both Docker and cri-containerd.
However, getPidsForProcess should still only return one pid, we just need to extend ContainerRuntimeProcessName to be a list. So I think we could assert 1 pid here.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Done for restart_test.go. Leave container_manager_test.go as it is.

}, 1*time.Minute, 2*time.Second).Should(BeNil())
for _, pid := range runtimePids {
if stdout, err := exec.Command("sudo", "kill", fmt.Sprintf("%d", pid)).CombinedOutput(); err != nil {
framework.Failf("Failed to kill container runtime (pid=%d): %v, stdout: %q", pid, err, string(stdout))
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

If we expect multiple pids for the runtime, can we print some information about the process being killed, so that we know what order of killing cause the failure?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Actually there should only be one. :)

Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Ack

// getCRIClient connects CRI and returns CRI runtime service clients and image service client.
func getCRIClient() (internalapi.RuntimeService, internalapi.ImageManagerService, error) {
// connection timeout for CRI service connection
const connectionTimeout = 15 * time.Minute
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

Why setting such a high timeout? kubelet's default timeout is 2 minutes, right?

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Copied from image_list.go I can change it to 2 minutes.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Done.

@@ -142,6 +142,32 @@ var _ = framework.KubeDescribe("GarbageCollect [Serial]", func() {
// while containers are running, if not constrained by maxPerPodContainer or maxTotalContainers, keep an extra copy of each container
// once pods are killed, all containers are eventually cleaned up
func containerGCTest(f *framework.Framework, test testRun) {
var runtime internalapi.RuntimeService
Copy link
Member

@yujuhong yujuhong Jan 8, 2018

+@dashpole for reviewing the GC test change

@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

Can we trigger the affected test suites to verify the change?

I've run the updated test locally, and they all pass. I don't think there is a PR node e2e serial test. :)

@@ -175,7 +201,7 @@ func containerGCTest(f *framework.Framework, test testRun) {
for i := 0; i < pod.numContainers; i++ {
containerCount := 0
for _, containerName := range containerNames {
if strings.Contains(containerName, pod.getContainerName(i)) {
if containerName == pod.getContainerName(i) {
Copy link
Contributor

@dashpole dashpole Jan 8, 2018

Have you tested this? I dont remember why I used Contains initially, but if it works the new way, I guess that is better.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

Yeah, because container name in docker ps is pretty long, it's composed of a lot of things.

Copy link
Contributor

@dashpole dashpole Jan 8, 2018

Oh, so because this is using Labels[types.KubernetesContainerNameLabel], it should be exactly the name we gave it, rather than an element of the name. I like it

Copy link
Member Author

@Random-Liu Random-Liu Jan 9, 2018

Yeah

Copy link
Contributor

@balajismaniam balajismaniam left a comment

The changes to CPU manager tests LGTM except for the comment.


setOldKubeletConfig(f, oldCfg)
})
}

// Serial because the test updates kubelet configuration.
var _ = SIGDescribe("CPU Manager [Feature:CPUManager]", func() {
var _ = SIGDescribe("CPU Manager [Serial] [Feature:CPUManager]", func() {
Copy link
Contributor

@balajismaniam balajismaniam Jan 8, 2018

The [Serial] tag was removed as the CPU manager tests are run as a separate job in the CI/CD env. We can add it back when we want to run it as a part of the node e2e serial tests.

Copy link
Member Author

@Random-Liu Random-Liu Jan 8, 2018

@balajismaniam Actually the test is running in regular node e2e test today (Search "CPU Manager" in https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/9809/build-log.txt), it is just skipped because the node has less than 2 CPUs.
Given that I prefer run and skip this in Serial test suite.

Copy link
Contributor

@balajismaniam balajismaniam Jan 9, 2018

Alright. Sounds good.

@Random-Liu Random-Liu force-pushed the node-e2e-non-docker-specific branch from 707c629 to 21f0bb5 Jan 8, 2018
@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

@yujuhong Addressed comments.

@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 8, 2018

Note that the new commit include the PR #53865 to make hack/make-rules/test-e2e-node.sh change cleaner.

Will squash the new commit before merging.

@Random-Liu Random-Liu force-pushed the node-e2e-non-docker-specific branch from 21f0bb5 to 8ca324f Jan 8, 2018
Copy link
Member

@yujuhong yujuhong left a comment

Looks good w/ some nits.

//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L517
imageManagerEndpoint = framework.TestContext.ImageServiceEndpoint
}
i, err := remote.NewRemoteImageService(imageManagerEndpoint, connectionTimeout)
Copy link
Member

@yujuhong yujuhong Jan 9, 2018

Do we use this to pre-pull the images and is 2 min enough?

Copy link
Member Author

@Random-Liu Random-Liu Jan 9, 2018

if framework.TestContext.ImageServiceEndpoint != "" {
//ImageServiceEndpoint is the same as ContainerRuntimeEndpoint if not
//explicitly specified
//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L517
Copy link
Member

@yujuhong yujuhong Jan 9, 2018

Change the link to point to a versioned code, or just remove the link completely.

Copy link
Member Author

@Random-Liu Random-Liu Jan 9, 2018

WIll remove it.

Copy link
Member Author

@Random-Liu Random-Liu Jan 9, 2018

Done.

@yujuhong
Copy link
Member

@yujuhong yujuhong commented Jan 9, 2018

LGTM.
Squash the commits?

@Random-Liu Random-Liu force-pushed the node-e2e-non-docker-specific branch from 1e15a00 to a3949bf Jan 9, 2018
@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 9, 2018

@yujuhong Done
@yujuhong @dashpole @balajismaniam Thanks for reviewing! :)

@yujuhong
Copy link
Member

@yujuhong yujuhong commented Jan 9, 2018

/lgtm

You still need approvers!

@dashpole
Copy link
Contributor

@dashpole dashpole commented Jan 9, 2018

/lgtm

ContainerRuntime string
ContainerRuntimeEndpoint string
ContainerRuntimeProcessName string
ContainerRuntimePidFile string
Copy link

@miaoyq miaoyq Jan 9, 2018

Maybe ContainerRuntimeProcessName and ContainerRuntimePidFile need to be a list or map, since some runtimes contain several binaries that belong to different cgroups (e.g. containerd and cri-cointainerd)

Copy link
Member Author

@Random-Liu Random-Liu Jan 9, 2018

@miaoyq You can do that in your benchmark PR if you find that is required. :)

@dchen1107
Copy link
Member

@dchen1107 dchen1107 commented Jan 9, 2018

/lgtm

Please rebase it.

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jan 9, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, Random-Liu, yujuhong

Associated issue: #57977

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 9, 2018

Just did a rebase, reapply LGTM label.

@Random-Liu Random-Liu added the lgtm label Jan 9, 2018
@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 9, 2018

/test pull-kubernetes-unit

@k8s-github-robot
Copy link
Contributor

@k8s-github-robot k8s-github-robot commented Jan 10, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@Random-Liu
Copy link
Member Author

@Random-Liu Random-Liu commented Jan 10, 2018

/test pull-kubernetes-bazel-build

@k8s-github-robot
Copy link
Contributor

@k8s-github-robot k8s-github-robot commented Jan 10, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit ecd525d into kubernetes:master Jan 10, 2018
12 of 13 checks passed
@Random-Liu Random-Liu deleted the node-e2e-non-docker-specific branch Jan 10, 2018
vdemeester pushed a commit to vdemeester/kubernetes that referenced this issue Feb 7, 2018
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up unused function GetKubeletDockerContainers

**What this PR does / why we need it**:
fix todo: function GetKubeletDockerContainers is not unused,it has been migrated off in test/e2e_node/garbage_collector_test.go  in [kubernetes#57976](https://github.com/kubernetes/kubernetes/pull/57976/files)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants