Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test::npd): provide NPD with proper kubeconfig #96262

Merged
merged 1 commit into from Nov 12, 2020

Conversation

knight42
Copy link
Member

@knight42 knight42 commented Nov 5, 2020

Signed-off-by: knight42 anonymousknight96@gmail.com

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

The ServiceAccount admission controller is disabled in node e2e tests:

o.Admission.GenericAdmission.DisablePlugins = []string{"ServiceAccount", "TaintNodesByCondition"}

So we have to create kubeconfig for NPD in test.

Which issue(s) this PR fixes:

Fixes #95955

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 5, 2020
@knight42
Copy link
Member Author

knight42 commented Nov 5, 2020

/priority important-soon
/cc @tosi3k @Random-Liu

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2020
@karan
Copy link
Contributor

karan commented Nov 5, 2020

/test pull-kubernetes-bazel-test

Thanks for this fix. Were you able to verify that it actually fixes the test using a local/remote run?

@karan
Copy link
Contributor

karan commented Nov 5, 2020

/assign @wangzhen127

@knight42
Copy link
Member Author

knight42 commented Nov 6, 2020

Thanks for this fix. Were you able to verify that it actually fixes the test using a local/remote run?

Unfortunately I was unable to verify due to lack of testing environment..

@wangzhen127
Copy link
Member

@karan Are you able to try this PR with the NPD e2e tests? I don't have a setup myself. Thought you may have it already. Do you mind helping verify?

@karan
Copy link
Contributor

karan commented Nov 6, 2020

Sure - let me see if I can get it to work:

$ make test-e2e-node REMOTE=true FOCUS="\[NodeFeature:NodeProblemDetector\]"


Ran 0 of 331 Specs in 112.196 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 331 Skipped

@karan
Copy link
Contributor

karan commented Nov 6, 2020

Doesn't seem to work:

$ make test-e2e-node FOCUS="NodeProblemDetector" SKIP="" REMOTE=true CLEANUP=true DELETE_INSTANCES=true EXTRA_ENVS="NODE_PROBLEM_DETECTOR_IMAGE=k8s.gcr.io/node-problem-detector:v0.8.1"

[BeforeEach] [k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]                                                                                                                                                                                                                                                                               [61/2878]  /usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:174                                                                                                                                                                                                                                      STEP: Creating a kubernetes client                                                                                                                                                                                                                                                                                                                                          STEP: Building a namespace api object, basename node-problem-detector                                                                                                                                                                                                                                                                                                       Nov  6 17:35:27.348: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.                                                                                                                                                                                                                                                                            Nov  6 17:35:27.348: INFO: Skipping waiting for service account                                                                                                                                                                                                                                                                                                             [BeforeEach] [k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]                                                                                                                                                                                                                                                                                          _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:58
STEP: Using node-problem-detector image: k8s.gcr.io/node-problem-detector:v0.8.1
[BeforeEach] [k8s.io] SystemLogMonitor
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:103
STEP: Calculate Lookback duration
STEP: Generate event list options
STEP: Create config map for the node problem detector
STEP: Create the node problem detector
[It] should generate node condition and events for corresponding errors
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:283
STEP: should generate default node condition
STEP: Wait for 0 temp events generated
STEP: Wait for 0 total events generated
STEP: Make sure only 0 total events generated
STEP: Make sure node condition "TestCondition" is set
[AfterEach] [k8s.io] SystemLogMonitor
  _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:406
STEP: Get node problem detector log
Nov  6 17:36:36.419: INFO: Node Problem Detector logs:

STEP: Delete the node problem detector
STEP: Wait for the node problem detector to disappear
Nov  6 17:36:36.435: INFO: Waiting for pod node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618 to disappear
Nov  6 17:36:36.441: INFO: Pod node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618 no longer exists
STEP: Delete the config map
STEP: Clean up the events
STEP: Clean up the node condition
[AfterEach] [k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]
  /usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:175
STEP: Collecting events from namespace "node-problem-detector-9364".
STEP: Found 4 events.
Nov  6 17:36:36.454: INFO: At 2020-11-06 17:35:28 +0000 UTC - event for node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618: {kubelet test-cos-beta-85-13310-1041-1} Pulled: Container image "k8s.gcr.io/node-problem-detector:v0.8.1" already present on machine
Nov  6 17:36:36.456: INFO: At 2020-11-06 17:35:28 +0000 UTC - event for node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618: {kubelet test-cos-beta-85-13310-1041-1} Created: Created container node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618
Nov  6 17:36:36.456: INFO: At 2020-11-06 17:35:28 +0000 UTC - event for node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618: {kubelet test-cos-beta-85-13310-1041-1} Started: Started container node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618
Nov  6 17:36:36.457: INFO: At 2020-11-06 17:35:30 +0000 UTC - event for node-problem-detector-bcda4aff-9b14-4107-bdc0-93175c707618: {kubelet test-cos-beta-85-13310-1041-1} BackOff: Back-off restarting failed container
Nov  6 17:36:36.458: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Nov  6 17:36:36.458: INFO:
Nov  6 17:36:36.467: INFO:
Logging node info for node test-cos-beta-85-13310-1041-1
Nov  6 17:36:36.470: INFO: Node Info: &Node{ObjectMeta:{test-cos-beta-85-13310-1041-1    79c47e1a-ab69-43ff-92f2-3e033f1fda96 63 0 2020-11-06 17:35:16 +0000 UTC <nil> <nil> map[kubernetes.io/arch:amd64 kubernetes.io/hostname:test-cos-beta-85-13310-1041-1 kubernetes.io/os:linux] map[volumes.kubernetes.io/controller-managed-attach-detach:true] [] []  [{kubelet Upd
ate v1 2020-11-06 17:35:27 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:volumes.kubernetes.io/controller-managed-attach-detach":{}},"f:labels":{".":{},"f:kubernetes.io/arch":{},"f:kubernetes.io/hostname":{},"f:kubernetes.io/os":{}}},"f:status":{"f:addresses":{".":{},"k:{\"type\":\"Hostname\"}":{".":{},"f:address":{},"f:type":{}},"k:{\"type\":\"In
ternalIP\"}":{".":{},"f:address":{},"f:type":{}}},"f:allocatable":{".":{},"f:cpu":{},"f:ephemeral-storage":{},"f:hugepages-2Mi":{},"f:memory":{},"f:pods":{}},"f:capacity":{".":{},"f:cpu":{},"f:ephemeral-storage":{},"f:hugepages-2Mi":{},"f:memory":{},"f:pods":{}},"f:conditions":{".":{},"k:{\"type\":\"DiskPressure\"}":{".":{},"f:lastHeartbeatTime":{},"f:lastTransi
tionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"MemoryPressure\"}":{".":{},"f:lastHeartbeatTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"PIDPressure\"}":{".":{},"f:lastHeartbeatTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},
"k:{\"type\":\"Ready\"}":{".":{},"f:lastHeartbeatTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:config":{},"f:daemonEndpoints":{"f:kubeletEndpoint":{"f:Port":{}}},"f:images":{},"f:nodeInfo":{"f:architecture":{},"f:bootID":{},"f:containerRuntimeVersion":{},"f:kernelVersion":{},"f:kubeProxyVersion":{},"f:kubeletVersi
on":{},"f:machineID":{},"f:operatingSystem":{},"f:osImage":{},"f:systemUUID":{}}}}}]},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},ephemeral-storage: {{16684785664 0} {<nil>}  BinarySI},hugepages-2Mi: {{0 0} {<nil
>} 0 DecimalSI},memory: {{3866849280 0} {<nil>} 3776220Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI},},Allocatable:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},ephemeral-storage: {{15016307073 0} {<nil>} 15016307073 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{3604705280 0} {<nil>} 3520220Ki BinarySI},pods: {{110 0} {<nil>} 110 DecimalSI
},},Phase:,Conditions:[]NodeCondition{NodeCondition{Type:MemoryPressure,Status:False,LastHeartbeatTime:2020-11-06 17:35:26 +0000 UTC,LastTransitionTime:2020-11-06 17:35:15 +0000 UTC,Reason:KubeletHasSufficientMemory,Message:kubelet has sufficient memory available,},NodeCondition{Type:DiskPressure,Status:False,LastHeartbeatTime:2020-11-06 17:35:26 +0000 UTC,LastT
ransitionTime:2020-11-06 17:35:15 +0000 UTC,Reason:KubeletHasNoDiskPressure,Message:kubelet has no disk pressure,},NodeCondition{Type:PIDPressure,Status:False,LastHeartbeatTime:2020-11-06 17:35:26 +0000 UTC,LastTransitionTime:2020-11-06 17:35:15 +0000 UTC,Reason:KubeletHasSufficientPID,Message:kubelet has sufficient PID available,},NodeCondition{Type:Ready,Statu
s:True,LastHeartbeatTime:2020-11-06 17:35:26 +0000 UTC,LastTransitionTime:2020-11-06 17:35:26 +0000 UTC,Reason:KubeletReady,Message:kubelet is posting ready status. AppArmor enabled,},},Addresses:[]NodeAddress{NodeAddress{Type:InternalIP,Address:10.128.0.7,},NodeAddress{Type:Hostname,Address:test-cos-beta-85-13310-1041-1,},},DaemonEndpoints:NodeDaemonEndpoints{K
ubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:82bc503c9002d7b8d865e9702216b51e,SystemUUID:82bc503c-9002-d7b8-d865-e9702216b51e,BootID:1aa7026c-1f21-41f3-9cb3-28d0ff70b1fe,KernelVersion:5.4.49+,OSImage:Container-Optimized OS from Google,ContainerRuntimeVersion:docker://19.3.9,KubeletVersion:v1.20.0-beta.1.113+56f90bebc70753,KubePr
oxyVersion:v1.20.0-beta.1.113+56f90bebc70753,OperatingSystem:linux,Architecture:amd64,},Images:[]ContainerImage{ContainerImage{Names:[perl@sha256:782dcd48bab11e07a017af09c1a6f54d34dbe1e262d82936f43f0e5d21055c38 perl:5.26],SizeBytes:853285759,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/node-perf/tf-wide-deep-amd64@sha256:80d4564d5ab49ecfea3b20f75cc6
76d8dfd8b2aca364ed4c1a8a55fbcaaed7f6 gcr.io/kubernetes-e2e-test-images/node-perf/tf-wide-deep-amd64:1.0],SizeBytes:634170972,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/volume/gluster@sha256:e2d3308b2d27499d59f120ff46dfc6c4cb307a3f207f02894ecab902583761c9 gcr.io/kubernetes-e2e-test-images/volume/gluster:1.0],SizeBytes:332011484,},ContainerImage{Nam
es:[gcr.io/kubernetes-e2e-test-images/volume/nfs@sha256:c2ad734346f608a5f7d69cfded93c4e8094069320657bd372d12ba21dea3ea71 gcr.io/kubernetes-e2e-test-images/volume/nfs:1.0],SizeBytes:225358913,},ContainerImage{Names:[httpd@sha256:eb8ccf084cf3e80eece1add239effefd171eb39adbc154d33c14260d905d4060 httpd:2.4.38-alpine],SizeBytes:123781643,},ContainerImage{Names:[k8s.gc
r.io/e2e-test-images/agnhost@sha256:ab055cd3d45f50b90732c14593a5bf50f210871bb4f91994c756fc22db6d922a k8s.gcr.io/e2e-test-images/agnhost:2.21],SizeBytes:113879107,},ContainerImage{Names:[k8s.gcr.io/node-problem-detector@sha256:d32558aad2dd3fad2d6650a9567ab23c8fd0a9b3cf21615a64b5ad4571861beb k8s.gcr.io/node-problem-detector:v0.8.1],SizeBytes:108876863,},ContainerI
mage{Names:[k8s.gcr.io/node-problem-detector@sha256:6e9b4a4eaa47f120be61f60573a545844de63401661812e2cfb7ae81a28efd19 k8s.gcr.io/node-problem-detector:v0.6.2],SizeBytes:98707739,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/node-perf/npb-is@sha256:9d08dd99565b25af37c990cd4474a4284b27e7ceb3f98328bb481edefedf8aa5 gcr.io/kubernetes-e2e-test-images/node-p
erf/npb-is:1.0],SizeBytes:96288249,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/node-perf/npb-ep@sha256:564314549347619cfcdbe6c7d042a29e133a00e922b37682890fff17ac1a7804 gcr.io/kubernetes-e2e-test-images/node-perf/npb-ep:1.0],SizeBytes:96286449,},ContainerImage{Names:[google/cadvisor@sha256:815386ebbe9a3490f38785ab11bda34ec8dacf4634af77b8912832d4f85d
ca04 google/cadvisor:latest],SizeBytes:69583040,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/nonroot@sha256:4bd7ae247de5c988700233c5a4b55e804ffe90f8c66ae64853f1dae37b847213 gcr.io/kubernetes-e2e-test-images/nonroot:1.0],SizeBytes:42321438,},ContainerImage{Names:[nfvpe/sriov-device-plugin@sha256:518499ed631ff84b43153b8f7624c1aaacb75a721038857509fe690
abdf62ddb nfvpe/sriov-device-plugin:v3.1],SizeBytes:25318421,},ContainerImage{Names:[k8s.gcr.io/nvidia-gpu-device-plugin@sha256:4b036e8844920336fa48f36edeb7d4398f426d6a934ba022848deed2edbf09aa],SizeBytes:18981551,},ContainerImage{Names:[nginx@sha256:485b610fefec7ff6c463ced9623314a04ed67e3945b9c08d7e53a47f6d108dc7 nginx:1.14-alpine],SizeBytes:16032814,},Containe$
Image{Names:[gcr.io/kubernetes-e2e-test-images/ipc-utils@sha256:bb127be3a1ecac0516f672a5e223d94fe6021021534ecb7a02a607a63154c3d8 gcr.io/kubernetes-e2e-test-images/ipc-utils:1.0],SizeBytes:10039224,},ContainerImage{Names:[gcr.io/kubernetes-e2e-test-images/nonewprivs@sha256:10066e9039219449fe3c81f38fe01928f87914150768ab81b62a468e51fa7411 gcr.io/kubernetes-e2e-test
-images/nonewprivs:1.0],SizeBytes:6757579,},ContainerImage{Names:[k8s.gcr.io/stress@sha256:f00aa1ddc963a3164aef741aab0fc05074ea96de6cd7e0d10077cf98dd72d594 k8s.gcr.io/stress:v1],SizeBytes:5494760,},ContainerImage{Names:[busybox@sha256:8ccbac733d19c0dd4d70b4f0c1e12245b5fa3ad24758a11035ee505c629c0796 busybox:1.29],SizeBytes:1154361,},ContainerImage{Names:[k8s.gcr.
io/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff],SizeBytes:1113554,},ContainerImage{Names:[k8s.gcr.io/pause@sha256:927d98197ec1141a368550822d18fa1c60bdae27b78b0c004f705f548c07814f k8s.gcr.io/pause:3.2],SizeBytes:682696,},},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:&NodeConfigStatus{Assigned:nil,Active:nil,LastKn
ownGood:nil,Error:,},},}
Nov  6 17:36:36.470: INFO:
Logging kubelet events for node test-cos-beta-85-13310-1041-1
Nov  6 17:36:36.474: INFO:
Logging pods the kubelet thinks is on node test-cos-beta-85-13310-1041-1
W1106 17:36:36.483563   25346 metrics_grabber.go:83] Can't find any pods in namespace kube-system to grab metrics from
W1106 17:36:36.483575   25346 metrics_grabber.go:98] Can't find kube-scheduler pod. Grabbing metrics from kube-scheduler is disabled.
W1106 17:36:36.483579   25346 metrics_grabber.go:102] Can't find kube-controller-manager pod. Grabbing metrics from kube-controller-manager is disabled.
W1106 17:36:36.483594   25346 metrics_grabber.go:105] Did not receive an external client interface. Grabbing metrics from ClusterAutoscaler is disabled.
Nov  6 17:36:36.512: INFO:
Latency metrics for node test-cos-beta-85-13310-1041-1
Nov  6 17:36:36.512: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "node-problem-detector-9364" for this suite.


• Failure [69.263 seconds]
[k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]
/usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
  [k8s.io] SystemLogMonitor
  /usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
    should generate node condition and events for corresponding errors [It]
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:283

    Timed out after 60.000s.
    Expected success, but got an error:
        <*errors.errorString | 0xc000a053d0>: {
            s: "node condition \"TestCondition\" not found",
        }
        node condition "TestCondition" not found

    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:398
------------------------------
I1106 17:36:36.548029   25339 e2e_node_suite_test.go:224] Stopping node services...
I1106 17:36:36.548078   25339 server.go:257] Kill server "services"
I1106 17:36:36.548096   25339 server.go:294] Killing process 25592 (services) with -TERM
I1106 17:36:36.640761   25339 server.go:257] Kill server "kubelet"
I1106 17:36:36.652352   25339 services.go:156] Fetching log files...
I1106 17:36:36.652395   25339 services.go:165] Get log file "kern.log" with journalctl command [-k].
I1106 17:36:36.661790   25339 services.go:165] Get log file "cloud-init.log" with journalctl command [-u cloud*].
I1106 17:36:36.952476   25339 services.go:165] Get log file "docker.log" with journalctl command [-u docker].
I1106 17:36:36.955956   25339 services.go:165] Get log file "containerd.log" with journalctl command [-u containerd].
I1106 17:36:36.960153   25339 services.go:165] Get log file "kubelet.log" with journalctl command [-u kubelet-20201106T093456.service].
I1106 17:36:36.990554   25339 e2e_node_suite_test.go:229] Tests Finished



Summarizing 1 Failure:

[Fail] [k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial] [k8s.io] SystemLogMonitor [It] should generate node condition and events for corresponding errors
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:398

Ran 1 of 330 Specs in 91.081 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 329 Skipped

@karan
Copy link
Contributor

karan commented Nov 6, 2020

NPD container is crash looping (no logs or anything to indicate why in NPD itself):

Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.785802   33552 handler.go:325] Added event &{/kubepods/besteffort/pod9b85f56c-8e2b-466b-9fbe-57a65aa19197/11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59 2020-11-06 19:07:29.645023021 +0000 UTC containerCreation {<nil>}}
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.785842   33552 container.go:490] Start housekeeping for container "/kubepods/besteffort/pod9b85f56c-8e2b-466b-9fbe-57a65aa19197/11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59"
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.786602   33552 handler.go:125] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.786646   33552 handler.go:260] error while listing directory "/proc/33873/limits" to read ulimits: open /proc/33873/limits: no such file or directory
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.805266   33552 handler.go:295] error while reading "/proc/33552/fd/44" link: readlink /proc/33552/fd/44: no such file or directory
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.824708   33552 manager.go:1044] Destroyed container: "/kubepods/besteffort/pod9b85f56c-8e2b-466b-9fbe-57a65aa19197/11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59" (aliases: [k8s_node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2_node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2_node-problem-detector-6205_9b85f56c-8e2b-466b-9fbe-57a65aa19197_1 11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59], namespace: "docker")
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.824767   33552 handler.go:325] Added event &{/kubepods/besteffort/pod9b85f56c-8e2b-466b-9fbe-57a65aa19197/11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59 2020-11-06 19:07:29.824760482 +0000 UTC m=+13.964625926 containerDeletion {<nil>}}
Nov 06 19:07:29 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:29.845465   33552 httplog.go:89] "HTTP" verb="HEAD" URI="/healthz" latency="37.055µs" userAgent="Go-http-client/1.1" srcIP="127.0.0.1:36012" resp=200
Nov 06 19:07:30 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:30.521293   33552 kubelet.go:1965] SyncLoop (housekeeping)
Nov 06 19:07:30 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:30.636269   33552 generic.go:155] GenericPLEG: 9b85f56c-8e2b-466b-9fbe-57a65aa19197/11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59: non-existent -> exited
Nov 06 19:07:30 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:30.637485   33552 kuberuntime_manager.go:944] getSandboxIDByPodUID got sandbox IDs ["0ea75394017103a23748eea3bae5a247aebba21458ef9557bd3dc1a6a9155d0a"] for pod "node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2_node-problem-detector-6205(9b85f56c-8e2b-466b-9fbe-57a65aa19197)"
Nov 06 19:07:30 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:30.643453   33552 generic.go:386] PLEG: Write status for node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2/node-problem-detector-6205: &container.PodStatus{ID:"9b85f56c-8e2b-466b-9fbe-57a65aa19197", Name:"node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2", Namespace:"node-problem-detector-6205", IPs:[]string{}, ContainerStatuses:[]*container.Status{(*container.Status)(0xc000fa4690), (*container.Status)(0xc000fa4780)}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc000aa1080)}} (err: <nil>)
Nov 06 19:07:30 test-cos-beta-85-13310-1041-1 kubelet[33552]: I1106 19:07:30.643522   33552 kubelet.go:1920] SyncLoop (PLEG): "node-problem-detector-85d2bb05-2de1-4362-8f39-a3e2ae791dd2_node-problem-detector-6205(9b85f56c-8e2b-466b-9fbe-57a65aa19197)", event: &pleg.PodLifecycleEvent{ID:"9b85f56c-8e2b-466b-9fbe-57a65aa19197", Type:"ContainerDied", Data:"11398c52d1a368c0fbf82a44def1f297e994ea7b2d3f5206c2675d504ea70b59"}

@karan
Copy link
Contributor

karan commented Nov 6, 2020

So far I've been unable to figure out why NPD is unable to start up during the test:

$ docker ps -a
CONTAINER ID        IMAGE                  COMMAND                  CREATED              STATUS                          PORTS               NAMES
3a762fdb0e31        6abafd7e83b9           "sh -c 'touch /log/t…"   About a minute ago   Exited (0) About a minute ago                       k8s_node-problem-detect
or-5de72669-39f4-4c31-afa6-6a4cda98040b_node-problem-detector-5de72669-39f4-4c31-afa6-6a4cda98040b_node-problem-detector-6420_72fe0b9a-6ab4-4af0-9fb4-9bd6fe528445_3
95540e48f578        k8s.gcr.io/pause:3.2   "/pause"                 2 minutes ago        Up 2 minutes                                        k8s_POD_node-problem-de
tector-5de72669-39f4-4c31-afa6-6a4cda98040b_node-problem-detector-6420_72fe0b9a-6ab4-4af0-9fb4-9bd6fe528445_0

There are no container or pod logs on the host. The various configs expected exist in the right place. I've ruled out resource constraints (using a n1-standard-2 machine instead of a 1 core machine).

I've also ruled out permissions issues by making sure that the /log/test.log file actually exists. So we know the container actually does begin to start up, and creates the test file, but then for some reason NPD does not.

Finally, I tried running NPD manually:

docker run -a stdin -a stdout -a stderr -i -t -v "/var/lib/kubelet/pods/72fe0b9a-6ab4-4af0-9fb4-9bd6fe528445/volumes/kubernetes.io~configmap/config:/config:ro" -v "/var/lib/kubelet/pods/72fe0b9a-6ab4-4af0-9fb4-9bd6fe528445/volumes/kubernetes.io~empty-dir/:/log" k8s.gcr.io/node-problem-detector:v0.8.1 -- /node-problem-detector --logtostderr --system-log-monitors=/config/testconfig.json --apiserver-override=https://127.0.0.1:6443?inClusterConfig=false&auth=/config/kubeconfig

This fails because of a missing config. Weird because it should be using testconfig.json so maybe my docker run command is incorrect?

$ docker logs aa1e0423b02b
Flag --system-log-monitors has been deprecated, replaced by --config.system-log-monitor. NPD will panic if both --system-log-monitors and --config.system-log-monitor are set.
F1106 21:53:26.682301       1 log_monitor.go:67] Failed to read configuration file "/config/kernel-monitor.json": open /config/kernel-monitor.json: no such file or directory
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc0004be900, 0xc00000ab00, 0xae, 0xfc)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/vendor/github.com/golang/glog/glog.go:769 +0xb1
github.com/golang/glog.(*loggingT).output(0x25451e0, 0xc000000003, 0xc0004bd030, 0x24be80a, 0xe, 0x43, 0x0)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/vendor/github.com/golang/glog/glog.go:720 +0x2f6
github.com/golang/glog.(*loggingT).printf(0x25451e0, 0x3, 0x1634b3e, 0x28, 0xc00020fa48, 0x2, 0x2)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/vendor/github.com/golang/glog/glog.go:655 +0x14e
github.com/golang/glog.Fatalf(...)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/vendor/github.com/golang/glog/glog.go:1148
k8s.io/node-problem-detector/pkg/systemlogmonitor.NewLogMonitorOrDie(0xc000043b60, 0x1b, 0x1618549, 0x12)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/pkg/systemlogmonitor/log_monitor.go:67 +0x910
k8s.io/node-problem-detector/pkg/problemdaemon.NewProblemDaemons(0xc0004be630, 0xc00003c080, 0x6, 0x6)
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/pkg/problemdaemon/problem_daemon.go:64 +0x292
main.main()
        /usr/local/google/home/lantaol/workspace/src/k8s.io/node-problem-detector/cmd/nodeproblemdetector/node_problem_detector.go:53 +0xfe

Regardless at least there's some logging happening. We don't have anything when in the e2e test.

So I'm out of ideas for now.

@knight42
Copy link
Member Author

knight42 commented Nov 7, 2020

@karan Thanks for your effort, much appreicated! I think we are very close.

As for the docker run command, I think you have to change the entrypoint since its original entrypoint has been hardcoded as ["/node-problem-detector", "--system-log-monitors=/config/kernel-monitor.json"] :
https://github.com/kubernetes/node-problem-detector/blob/f42281ee2658900bdb0571e1159a43f6ab712a19/Dockerfile.in#L30
This is why NPD compain kernel-monitor.json is not found.

Could you try modifying the entrypoint and run the command again to see what is going on?

@karan
Copy link
Contributor

karan commented Nov 9, 2020

It works when manually run.

test-cos-beta-85-13310-1041-1 /var/lib/kubelet/pods/5589ef85-413c-4e20-903a-a2554cd3d21c/volumes # docker run --entrypoint "/node-problem-detector" -a stdin -a stdo
ut -a stderr -i -t -v "/var/lib/kubelet/pods/5589ef85-413c-4e20-903a-a2554cd3d21c/volumes/kubernetes.io~configmap/config:/config:ro" -v "/var/lib/kubelet/pods/5589e
f85-413c-4e20-903a-a2554cd3d21c/volumes/kubernetes.io~empty-dir/:/log" k8s.gcr.io/node-problem-detector:v0.8.1  --logtostderr --system-log-monitors=/config/kernel-m
onitor.json --apiserver-override=https://127.0.0.1:6443?inClusterConfig=false&auth=/config/kubeconfig
[1] 3454
test-cos-beta-85-13310-1041-1 /var/lib/kubelet/pods/5589ef85-413c-4e20-903a-a2554cd3d21c/volumes # docker ps
CONTAINER ID        IMAGE                                     COMMAND                  CREATED             STATUS                  PORTS               NAMES
ad03161dbca3        k8s.gcr.io/node-problem-detector:v0.8.1   "/node-problem-detec…"   2 seconds ago       Up Less than a second                       youthful_maxw
ell
009e833053a5        k8s.gcr.io/pause:3.2                      "/pause"                 29 minutes ago      Up 29 minutes                               k8s_POD_node-
problem-detector-0edd5aa6-1f6f-4d26-8462-e16a68ba64ad_node-problem-detector-8816_5589ef85-413c-4e20-903a-a2554cd3d21c_0
test-cos-beta-85-13310-1041-1 /var/lib/kubelet/pods/5589ef85-413c-4e20-903a-a2554cd3d21c/volumes # docker logs ad03161dbca3 -f
Flag --system-log-monitors has been deprecated, replaced by --config.system-log-monitor. NPD will panic if both --system-log-monitors and --config.system-log-monito
r are set.
I1109 16:56:33.685503       1 log_monitor.go:79] Finish parsing log monitor config file /config/kernel-monitor.json: {WatcherConfig:{Plugin:filelog PluginConfig:map
[message:kernel: \[.*\] (.*) timestamp:^.{15} timestampFormat:Jan _2 15:04:05] LogPath:/log/test.log Lookback:1h2m29.310535737s Delay:} BufferSize:10 Source:kernel-
monitor-0edd5aa6-1f6f-4d26-8462-e16a68ba64ad DefaultConditions:[{Type:TestCondition Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:Default Message:default 
message}] Rules:[{Type:temporary Condition: Reason:Temporary Pattern:temporary error} {Type:permanent Condition:TestCondition Reason:Permanent1 Pattern:permanent er
ror 1.*} {Type:permanent Condition:TestCondition Reason:Permanent2 Pattern:permanent error 2.*}] EnableMetricsReporting:0x24eb6fc}
I1109 16:56:33.685671       1 log_watchers.go:40] Use log watcher of plugin "filelog"
I1109 16:56:33.696038       1 k8s_exporter.go:54] Waiting for kube-apiserver to be ready (timeout 5m0s)...

Note that I changed the e2e to write kernel-monitor.json instead of testconfig.json, and commented our the cleanup steps.

So it does work manually, but not when run as a Pod from the e2e test.

@karan
Copy link
Contributor

karan commented Nov 9, 2020

Okay, I'm a bit closer to figuring out what's up. If I change the PodSpec to this:

Command: []string{"/node-problem-detector"},
							Args: []string{
								"--logtostderr",
								fmt.Sprintf("--system-log-monitors=%s", configFile),
								fmt.Sprintf("--apiserver-override=%s?inClusterConfig=false&auth=%s", framework.TestContext.Host, kubeConfigFile),
							},

Then at least the container starts (it crashes because /log/test.log doesn't exist, but that's a solvable problem).

So it seems like the issue is with the use of Command. I'll send a fix for this separately.

@karan
Copy link
Contributor

karan commented Nov 9, 2020

@knight42 #96381 fixes the Pod spec. Once that's submitted, you can rebase your PR and the test will be green:

• [SLOW TEST:92.239 seconds]
[k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]
/usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
  [k8s.io] SystemLogMonitor
  /usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
    should generate node condition and events for corresponding errors
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:315
------------------------------
I1109 18:49:04.390509    6755 e2e_node_suite_test.go:224] Stopping node services...
I1109 18:49:04.390540    6755 server.go:257] Kill server "services"
I1109 18:49:04.390558    6755 server.go:294] Killing process 7052 (services) with -TERM
E1109 18:49:04.439423    6755 services.go:95] Failed to stop services: error stopping "services": waitid: no child processes
I1109 18:49:04.439446    6755 server.go:257] Kill server "kubelet"
I1109 18:49:04.450477    6755 services.go:156] Fetching log files...
I1109 18:49:04.450546    6755 services.go:165] Get log file "cloud-init.log" with journalctl command [-u cloud*].
I1109 18:49:04.737151    6755 services.go:165] Get log file "docker.log" with journalctl command [-u docker].
I1109 18:49:04.741245    6755 services.go:165] Get log file "containerd.log" with journalctl command [-u containerd].
I1109 18:49:04.745445    6755 services.go:165] Get log file "kubelet.log" with journalctl command [-u kubelet-20201109T104705.service].
I1109 18:49:04.773150    6755 services.go:165] Get log file "kern.log" with journalctl command [-k].
I1109 18:49:04.780138    6755 e2e_node_suite_test.go:229] Tests Finished


Ran 1 of 330 Specs in 111.176 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 329 Skipped

@karan
Copy link
Contributor

karan commented Nov 10, 2020

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 10, 2020
"-c",
// `ServiceAccount` admission controller is disabled in node e2e tests, so we could not use
// inClusterConfig here.
fmt.Sprintf("touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false&auth=%s",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me some time, but in the end, the reason why it failed with no error and logs...
When you execute the command with the sh -c you should be aware of special characters, in your case you have --apiserver-override=%s?inClusterConfig=false&auth=%s" that includes & that interpreted as a special character, so the final command has view

touch %s && \
/node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false & \
auth=%s

when & bitwise and, auth=some_variable always have zero output, so the final value will be 0(at least I think so, did not check it).

To make it work you just need to escape & -> \&

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility. I tested it (\& is not a valid escape sequence):

$ git diff
diff --git a/test/e2e_node/node_problem_detector_linux.go b/test/e2e_node/node_problem_detector_linux.go
index 6750a2ceee5..e91046d9434 100644
--- a/test/e2e_node/node_problem_detector_linux.go
+++ b/test/e2e_node/node_problem_detector_linux.go
@@ -240,7 +240,7 @@ current-context: local-context
                                                                "-c",
                                                                // `ServiceAccount` admission controller is disabled in node e2e tests, so we could not use
                                                                // inClusterConfig here.
-                                                               fmt.Sprintf("touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false&auth=%s",
+                                                               fmt.Sprintf(`touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false&auth=%s`,
                                                                        logFile,
                                                                        configFile,
                                                                        framework.TestContext.Host,

The test still fails - same symptoms - even though the docker entrypoint looks correct:

            "Entrypoint": [
                "sh",
                "-c",
                "touch /log/test.log && /node-problem-detector --logtostderr --system-log-monitors=/config/testconfig.json --apiserver-override=https://127.0.0.1:6443?inClusterConfig=false&auth=/config/kubeconfig"
            ],

Then I tried again with:

$ git diff
diff --git a/test/e2e_node/node_problem_detector_linux.go b/test/e2e_node/node_problem_detector_linux.go
index 6750a2ceee5..3eca64593e9 100644
--- a/test/e2e_node/node_problem_detector_linux.go
+++ b/test/e2e_node/node_problem_detector_linux.go
@@ -231,20 +231,36 @@ current-context: local-context
                                                        },
                                                },
                                        },
-                                       Containers: []v1.Container{
+                                       InitContainers: []v1.Container{
                                                {
-                                                       Name:  name,
-                                                       Image: image,
-                                                       Command: []string{
-                                                               "sh",
+                                                       Name:    "init-log-file",
+                                                       Image:   "debian",
+                                                       Command: []string{"/bin/sh"},
+                                                       Args: []string{
                                                                "-c",
-                                                               // `ServiceAccount` admission controller is disabled in node e2e tests, so we could not use
-                                                               // inClusterConfig here.
-                                                               fmt.Sprintf("touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false&auth=%s",
-                                                                       logFile,
-                                                                       configFile,
-                                                                       framework.TestContext.Host,
-                                                                       kubeConfigFile),
+                                                               fmt.Sprintf("touch %s", logFile),
+                                                       },
+                                                       VolumeMounts: []v1.VolumeMount{
+                                                               {
+                                                                       Name:      logVolume,
+                                                                       MountPath: path.Dir(logFile),
+                                                               },
+                                                               {
+                                                                       Name:      localtimeVolume,
+                                                                       MountPath: etcLocaltime,
+                                                               },
+                                                       },
+                                               },
+                                       },
+                                       Containers: []v1.Container{
+                                               {
+                                                       Name:    name,
+                                                       Image:   image,
+                                                       Command: []string{"/node-problem-detector"},
+                                                       Args: []string{
+                                                               "--logtostderr",
+                                                               fmt.Sprintf("--system-log-monitors=%s", configFile),
+                                                               fmt.Sprintf("--apiserver-override=%s?inClusterConfig=false&auth=%s", framework.TestContext.Host, kubeConfigFile),
                                                        },
                                                        Env: []v1.EnvVar{
                                                                {

This works fine:

Ran 1 of 330 Specs in 112.085 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 329 Skipped

The docker command/args look like this:

        "Path": "/node-problem-detector",
        "Args": [
            "--logtostderr",
            "--system-log-monitors=/config/testconfig.json",
            "--apiserver-override=https://127.0.0.1:6443?inClusterConfig=false&auth=/config/kubeconfig"
        ],

So I don't think the escape is the issue (or at least the way I'm escaping).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you execute the command with the sh -c you should be aware of special characters

@cynepco3hahue Ah I think this make sense! Because the node-problem-detector was put in background, so the container exit immediately and we got no error and logs. Thanks for enlightening us.

Copy link
Member Author

@knight42 knight42 Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karan I think the point is sh would treat & in a special way, perhaps you could try:

`touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override='%s?inClusterConfig=false&auth=%s'`

note the single quote around the value of --apiserver-override

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly, I verified your PR with

fmt.Sprintf("touch %s && /node-problem-detector --logtostderr --system-log-monitors=%s --apiserver-override=%s?inClusterConfig=false\\&auth=%s",
									logFile,
									configFile,
									framework.TestContext.Host,
									kubeConfigFile),

and it passed.

Take into account that you need to double slash it \\&:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha! @knight42 feel free to undo my changes in your PR in lieu of the escaping here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karan I think your change is beneficial which makes the command easier to read, I'd like to keep it.

@karan
Copy link
Contributor

karan commented Nov 10, 2020

@knight42 can you please rebase this PR now that #96381 is merged.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2020
Signed-off-by: knight42 <anonymousknight96@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 11, 2020
@karan
Copy link
Contributor

karan commented Nov 11, 2020

• [SLOW TEST:94.401 seconds]
[k8s.io] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial]
/usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
  [k8s.io] SystemLogMonitor
  /usr/local/google/home/karangoel/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:624
    should generate node condition and events for corresponding errors
    _output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:301
------------------------------
I1111 17:19:20.774469     719 e2e_node_suite_test.go:224] Stopping node services...
I1111 17:19:20.774495     719 server.go:257] Kill server "services"
I1111 17:19:20.774507     719 server.go:294] Killing process 1651 (services) with -TERM
I1111 17:19:20.856695     719 server.go:257] Kill server "kubelet"
I1111 17:19:20.878618     719 services.go:156] Fetching log files...
I1111 17:19:20.878682     719 services.go:165] Get log file "kern.log" with journalctl command [-k].
I1111 17:19:20.899268     719 services.go:165] Get log file "cloud-init.log" with journalctl command [-u cloud*].
I1111 17:19:21.236770     719 services.go:165] Get log file "docker.log" with journalctl command [-u docker].
I1111 17:19:21.240060     719 services.go:165] Get log file "containerd.log" with journalctl command [-u containerd].
I1111 17:19:21.243727     719 services.go:165] Get log file "kubelet.log" with journalctl command [-u kubelet-20201111T091546.service].
I1111 17:19:21.270396     719 e2e_node_suite_test.go:229] Tests Finished


Ran 1 of 334 Specs in 205.842 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 333 Skipped

/assign @derekwaynecarr

@karan
Copy link
Contributor

karan commented Nov 11, 2020

/lgtm

/approved

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 11, 2020
@wangzhen127
Copy link
Member

Thanks for looking into this!
/lgtm

@derekwaynecarr
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, knight42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2020
@k8s-ci-robot k8s-ci-robot merged commit 7edf621 into kubernetes:master Nov 12, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Nov 12, 2020
@knight42 knight42 deleted the fix/npd-test branch November 13, 2020 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Failing test] SystemLogMonitor should generate node condition and events for corresponding errors
6 participants