New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node e2e non docker specific #57976

Merged
merged 2 commits into from Jan 10, 2018

Conversation

@Random-Liu
Copy link
Member

Random-Liu commented Jan 8, 2018

Fixes #57977.

Make node e2e test generic to container runtimes.

With this change, other than tests with [Feature:Docker], all tests can run against all CRI container runtimes.

Note that this PR also marks cpu manager test as Serial, because it restarts kubelet during the test. It doesn't cause problem in regular node e2e suite today, because it is skipped if node has less than 2 CPUs, which is the case for our test environment. /cc @balajismaniam

@yujuhong @mrunalp @feiskyer
/cc @dashpole @balajismaniam @bprashanth Because I addressed your comments.
/cc @kubernetes/sig-node-pr-reviews
Release note:

none
@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

@miaoyq @yanxuean This PR added the container-runtime-pid-file and container-runtime-process flag in node e2e test, which you guys can also use in the benchmark test.

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

/test pull-kubernetes-unit

1 similar comment
@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

/test pull-kubernetes-unit

@yujuhong
Copy link
Contributor

yujuhong left a comment

Can we trigger the affected test suites to verify the change?

@@ -203,9 +205,11 @@ func RegisterCommonFlags() {
flag.StringVar(&TestContext.FeatureGates, "feature-gates", "", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
flag.StringVar(&TestContext.Viper, "viper-config", "e2e", "The name of the viper config i.e. 'e2e' will read values from 'e2e.json' locally. All e2e parameters are meant to be configurable by viper.")
flag.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/rkt/remote).")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeProcess, "container-runtime-process", "dockerd", "The name of the container runtime process.")

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

nit: container-runtime-process-name for clarity.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Will do.

This comment has been minimized.

@Random-Liu

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Did you push again? I didn't see any change.

@@ -203,9 +205,11 @@ func RegisterCommonFlags() {
flag.StringVar(&TestContext.FeatureGates, "feature-gates", "", "A set of key=value pairs that describe feature gates for alpha/experimental features.")
flag.StringVar(&TestContext.Viper, "viper-config", "e2e", "The name of the viper config i.e. 'e2e' will read values from 'e2e.json' locally. All e2e parameters are meant to be configurable by viper.")
flag.StringVar(&TestContext.ContainerRuntime, "container-runtime", "docker", "The container runtime of cluster VM instances (docker/rkt/remote).")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "", "The container runtime endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ContainerRuntimeEndpoint, "container-runtime-endpoint", "unix:///var/run/dockershim.sock", "The container runtime endpoint of cluster VM instances.")

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Why not relying on kubelet's defaults? Does this mean if we change the dockershim socket in kubelet, we'd have to change the test as well?

Also, the relationship between --container-runtime and the new flags may be hard for people (w/o prior knowledge of kubelet internals) to understand.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

The problem is that when connecting the container runtime in the test, we need to know which socket to connect anyway.

Unless we add the default in kubelet/remote client, we have to set the default in the test somewhere.

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Do we also need the defaults set in hack/make-rules/test-e2e-node.sh?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Another option is to skip setting endpoint in hack/make-rules/test-e2e-node.sh if it's empty. I can do that.

This comment has been minimized.

@Random-Liu
flag.StringVar(&TestContext.SystemdServices, "systemd-services", "docker", "The comma separated list of systemd services the framework will dump logs for.")
flag.StringVar(&TestContext.ImageServiceEndpoint, "image-service-endpoint", "", "The image service endpoint of cluster VM instances.")
flag.StringVar(&TestContext.ImageServiceEndpoint, "image-service-endpoint", "unix:///var/run/dockershim.sock", "The image service endpoint of cluster VM instances.")

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Same question about setting the default in both the node e2e and in kubelet

This comment has been minimized.

@Random-Liu
@@ -150,6 +150,7 @@ var _ = framework.KubeDescribe("Container Manager Misc [Serial]", func() {
})
// Log the running containers here to help debugging. Use `docker ps`
// directly for now because the test is already docker specific.
// TODO(random-liu): Use `crictl ps` and `crictl sandboxes` instead.

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Why can't you use the CRIClient you add in this PR to do this?

Also, without changing this, wouldn't the test still fail for non-docker runtimes? How do we currently skip the test for rkt and and non-docker CRI runtimes?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

@yujuhong If the test doesn't fail, this line won't be run.
I'll change this to use CRIClient to dump debug information.

This comment has been minimized.

@Random-Liu
}
return nil
}, 1*time.Minute, 2*time.Second).Should(BeNil())
for _, pid := range runtimePids {

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

When do we expect multiple pids?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Asked the same question when I review the original container manager test PR. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/container_manager_test.go#L80-L86

There should only be one unless there are multiple instances of docker running. However, the function getPidsForProcess returns multiple pids. We can either assume there is only one, or do things in a loop. This PR goes with the second option now.

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Can we check the number of pids and error out if there are more than one?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Yeah, that's option one. We can. Will do.
When we start doing benchmark, we may want to include containerd for both Docker and cri-containerd.
However, getPidsForProcess should still only return one pid, we just need to extend ContainerRuntimeProcessName to be a list. So I think we could assert 1 pid here.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Done for restart_test.go. Leave container_manager_test.go as it is.

}, 1*time.Minute, 2*time.Second).Should(BeNil())
for _, pid := range runtimePids {
if stdout, err := exec.Command("sudo", "kill", fmt.Sprintf("%d", pid)).CombinedOutput(); err != nil {
framework.Failf("Failed to kill container runtime (pid=%d): %v, stdout: %q", pid, err, string(stdout))

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

If we expect multiple pids for the runtime, can we print some information about the process being killed, so that we know what order of killing cause the failure?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Actually there should only be one. :)

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Ack

// getCRIClient connects CRI and returns CRI runtime service clients and image service client.
func getCRIClient() (internalapi.RuntimeService, internalapi.ImageManagerService, error) {
// connection timeout for CRI service connection
const connectionTimeout = 15 * time.Minute

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

Why setting such a high timeout? kubelet's default timeout is 2 minutes, right?

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Copied from image_list.go I can change it to 2 minutes.

This comment has been minimized.

@Random-Liu
@@ -142,6 +142,32 @@ var _ = framework.KubeDescribe("GarbageCollect [Serial]", func() {
// while containers are running, if not constrained by maxPerPodContainer or maxTotalContainers, keep an extra copy of each container
// once pods are killed, all containers are eventually cleaned up
func containerGCTest(f *framework.Framework, test testRun) {
var runtime internalapi.RuntimeService

This comment has been minimized.

@yujuhong

yujuhong Jan 8, 2018

Contributor

+@dashpole for reviewing the GC test change

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

Can we trigger the affected test suites to verify the change?

I've run the updated test locally, and they all pass. I don't think there is a PR node e2e serial test. :)

@@ -175,7 +201,7 @@ func containerGCTest(f *framework.Framework, test testRun) {
for i := 0; i < pod.numContainers; i++ {
containerCount := 0
for _, containerName := range containerNames {
if strings.Contains(containerName, pod.getContainerName(i)) {
if containerName == pod.getContainerName(i) {

This comment has been minimized.

@dashpole

dashpole Jan 8, 2018

Contributor

Have you tested this? I dont remember why I used Contains initially, but if it works the new way, I guess that is better.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

Yeah, because container name in docker ps is pretty long, it's composed of a lot of things.

This comment has been minimized.

@dashpole

dashpole Jan 8, 2018

Contributor

Oh, so because this is using Labels[types.KubernetesContainerNameLabel], it should be exactly the name we gave it, rather than an element of the name. I like it

This comment has been minimized.

@Random-Liu
@balajismaniam
Copy link
Member

balajismaniam left a comment

The changes to CPU manager tests LGTM except for the comment.


setOldKubeletConfig(f, oldCfg)
})
}

// Serial because the test updates kubelet configuration.
var _ = SIGDescribe("CPU Manager [Feature:CPUManager]", func() {
var _ = SIGDescribe("CPU Manager [Serial] [Feature:CPUManager]", func() {

This comment has been minimized.

@balajismaniam

balajismaniam Jan 8, 2018

Member

The [Serial] tag was removed as the CPU manager tests are run as a separate job in the CI/CD env. We can add it back when we want to run it as a part of the node e2e serial tests.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 8, 2018

Member

@balajismaniam Actually the test is running in regular node e2e test today (Search "CPU Manager" in https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/9809/build-log.txt), it is just skipped because the node has less than 2 CPUs.
Given that I prefer run and skip this in Serial test suite.

This comment has been minimized.

@balajismaniam

balajismaniam Jan 9, 2018

Member

Alright. Sounds good.

@Random-Liu Random-Liu force-pushed the Random-Liu:node-e2e-non-docker-specific branch from 707c629 to 21f0bb5 Jan 8, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

@yujuhong Addressed comments.

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 8, 2018

Note that the new commit include the PR #53865 to make hack/make-rules/test-e2e-node.sh change cleaner.

Will squash the new commit before merging.

@Random-Liu Random-Liu force-pushed the Random-Liu:node-e2e-non-docker-specific branch from 21f0bb5 to 8ca324f Jan 8, 2018

@yujuhong
Copy link
Contributor

yujuhong left a comment

Looks good w/ some nits.

//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L517
imageManagerEndpoint = framework.TestContext.ImageServiceEndpoint
}
i, err := remote.NewRemoteImageService(imageManagerEndpoint, connectionTimeout)

This comment has been minimized.

@yujuhong

yujuhong Jan 9, 2018

Contributor

Do we use this to pre-pull the images and is 2 min enough?

This comment has been minimized.

@Random-Liu
if framework.TestContext.ImageServiceEndpoint != "" {
//ImageServiceEndpoint is the same as ContainerRuntimeEndpoint if not
//explicitly specified
//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L517

This comment has been minimized.

@yujuhong

yujuhong Jan 9, 2018

Contributor

Change the link to point to a versioned code, or just remove the link completely.

This comment has been minimized.

@Random-Liu

Random-Liu Jan 9, 2018

Member

WIll remove it.

This comment has been minimized.

@Random-Liu
@yujuhong

This comment has been minimized.

Copy link
Contributor

yujuhong commented Jan 9, 2018

LGTM.
Squash the commits?

@Random-Liu Random-Liu force-pushed the Random-Liu:node-e2e-non-docker-specific branch from 1e15a00 to a3949bf Jan 9, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 9, 2018

@yujuhong Done
@yujuhong @dashpole @balajismaniam Thanks for reviewing! :)

@yujuhong

This comment has been minimized.

Copy link
Contributor

yujuhong commented Jan 9, 2018

/lgtm

You still need approvers!

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 9, 2018

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Jan 9, 2018

/lgtm

ContainerRuntime string
ContainerRuntimeEndpoint string
ContainerRuntimeProcessName string
ContainerRuntimePidFile string

This comment has been minimized.

@miaoyq

miaoyq Jan 9, 2018

Member

Maybe ContainerRuntimeProcessName and ContainerRuntimePidFile need to be a list or map, since some runtimes contain several binaries that belong to different cgroups (e.g. containerd and cri-cointainerd)

This comment has been minimized.

@Random-Liu

Random-Liu Jan 9, 2018

Member

@miaoyq You can do that in your benchmark PR if you find that is required. :)

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Jan 9, 2018

/lgtm

Please rebase it.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 9, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, Random-Liu, yujuhong

Associated issue: #57977

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@Random-Liu Random-Liu force-pushed the Random-Liu:node-e2e-non-docker-specific branch from a3949bf to e05a5b9 Jan 9, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 9, 2018

Just did a rebase, reapply LGTM label.

@Random-Liu Random-Liu added the lgtm label Jan 9, 2018

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 9, 2018

/test pull-kubernetes-unit

@k8s-merge-robot

This comment has been minimized.

Copy link
Contributor

k8s-merge-robot commented Jan 10, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 10, 2018

/test pull-kubernetes-bazel-build

@k8s-merge-robot

This comment has been minimized.

Copy link
Contributor

k8s-merge-robot commented Jan 10, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit ecd525d into kubernetes:master Jan 10, 2018

12 of 13 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-verify
Details
cla/linuxfoundation Random-Liu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke-gci Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@Random-Liu Random-Liu deleted the Random-Liu:node-e2e-non-docker-specific branch Jan 10, 2018

vdemeester pushed a commit to vdemeester/kubernetes that referenced this pull request Feb 7, 2018

Merge pull request kubernetes#59472 from hanxiaoshuai/fixtodo02072
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up unused function GetKubeletDockerContainers

**What this PR does / why we need it**:
fix todo: function GetKubeletDockerContainers is not unused,it has been migrated off in test/e2e_node/garbage_collector_test.go  in [kubernetes#57976](https://github.com/kubernetes/kubernetes/pull/57976/files)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment