New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaking test- TestRotateLogs from kubelet/logs/container_log_manager_test #123414
Comments
/triage accepted |
/cc @harshanarayana |
/priority important-longterm Seems to impact on this arch only? |
/sig windows |
@harshanarayana poke @Rajalakshmi-Girish and @mkumatag on slack please |
@harshanarayana unfortunately, I couldn't reproduce this flake locally. |
I am now able to see the flake,
|
@Rajalakshmi-Girish This is with |
I could reproduce this even on x86_64. The flakiness % is higher when run from a pod rather than a VM. |
I was able to reproduce this on regular machines as well with |
diff --git a/pkg/kubelet/logs/container_log_manager_test.go b/pkg/kubelet/logs/container_log_manager_test.go
index 7d6e7055274..a2c5a097b25 100644
--- a/pkg/kubelet/logs/container_log_manager_test.go
+++ b/pkg/kubelet/logs/container_log_manager_test.go
@@ -157,12 +157,12 @@ func TestRotateLogs(t *testing.T) {
},
}
f.SetFakeContainers(testContainers)
- go c.processQueueItems(ctx, 1)
require.NoError(t, c.rotateLogs(ctx))
+ go c.processQueueItems(ctx, 1)
pollTimeoutCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
- err = wait.PollUntilContextCancel(pollTimeoutCtx, 20*time.Millisecond, false, func(ctx context.Context) (done bool, err error) {
+ err = wait.PollUntilContextCancel(pollTimeoutCtx, 50*time.Millisecond, false, func(ctx context.Context) (done bool, err error) {
return c.queue.Len() == 0, nil
})
require.NoError(t, err)
This seem to take care of the issue. Do you think the poll interval of |
❯ stress ./logs.test -test.run TestRotateLogs -test.v
5s: 0 runs so far, 0 failures
10s: 48 runs so far, 0 failures
15s: 96 runs so far, 0 failures
20s: 146 runs so far, 0 failures
25s: 196 runs so far, 0 failures
30s: 244 runs so far, 0 failures
35s: 292 runs so far, 0 failures
40s: 339 runs so far, 0 failures
45s: 387 runs so far, 0 failures
|
Spoke too early. 1m30s: 812 runs so far, 4 failures (0.49%)
/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/go-stress-20240223T144024-4097897728
=== RUN TestRotateLogs
container_log_manager_test.go:173:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:173
Error: "[- test-log-1 - test-log-2.20240223-144155 - test-log-3 - test-log-3.00000000-000001 - test-log-3.00000000-000001.tmp - test-log-4]" should have 5 item(s), but has 6
Test: TestRotateLogs
container_log_manager_test.go:176:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:176
Error: Not equal:
expected: "test-log-3.00000000-000001.gz"
actual : "test-log-3"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-test-log-3.00000000-000001.gz
+test-log-3
Test: TestRotateLogs
container_log_manager_test.go:177:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:177
Error: Not equal:
expected: "test-log-3.20240223-144155"
actual : "test-log-3.00000000-000001"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-test-log-3.20240223-144155
+test-log-3.00000000-000001
Test: TestRotateLogs
container_log_manager_test.go:178:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:178
Error: Not equal:
expected: "test-log-4"
actual : "test-log-3.00000000-000001.tmp"
Diff:
--- Expected
…
1m35s: 851 runs so far, 5 failures (0.59%) |
/assign |
/triage accepted |
I0223 15:36:45.203710 20383 container_log_manager.go:242] "Prrocessing container for log rotation" id="container-not-need-rotate" name="" labels=null
I0223 15:36:45.203788 20383 container_log_manager.go:242] "Prrocessing container for log rotation" id="container-need-rotate" name="" labels=null
I0223 15:36:45.203811 20383 container_log_manager.go:242] "Prrocessing container for log rotation" id="container-has-excess-log" name="" labels=null
I0223 15:36:45.203830 20383 container_log_manager.go:242] "Prrocessing container for log rotation" id="container-is-not-running" name="" labels=null
I0223 15:36:45.203964 20383 container_log_manager.go:302] "log file doesn't need to be rotated" worker=1 containerID="container-not-need-rotate" path="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-1" currentSize=5 maxSize=10
I0223 15:36:45.203991 20383 container_log_manager.go:264] "Processing of container log completed" worker=1 containerID="container-not-need-rotate" ok=true
I0223 15:36:45.204294 20383 container_log_manager.go:322] "Found logs to rotate" containerID="container-need-rotate" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-2" logs=null
I0223 15:36:45.204323 20383 container_log_manager.go:328] "Cleaned up unused logs" containerID="container-need-rotate" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-2" logs=null
I0223 15:36:45.204345 20383 container_log_manager.go:334] "Removed excess logs" containerID="container-need-rotate" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-2" logs=null
I0223 15:36:45.204366 20383 container_log_manager.go:464] "Rotating latest log" containerID="container-need-rotate" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-2"
I0223 15:36:45.204822 20383 container_log_manager.go:310] "Rotated log for container" worker=1 containerID="container-need-rotate" path="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-2" currentSize=20 maxSize=10
I0223 15:36:45.204849 20383 container_log_manager.go:264] "Processing of container log completed" worker=1 containerID="container-need-rotate" ok=true
I0223 15:36:45.205135 20383 container_log_manager.go:322] "Found logs to rotate" containerID="container-has-excess-log" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3" logs=["/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000000.gz","/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001"]
I0223 15:36:45.205183 20383 container_log_manager.go:328] "Cleaned up unused logs" containerID="container-has-excess-log" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3" logs=["/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000000.gz","/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001"]
I0223 15:36:45.205521 20383 container_log_manager.go:334] "Removed excess logs" containerID="container-has-excess-log" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3" logs=["/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001"]
I0223 15:36:45.205551 20383 container_log_manager.go:420] "Compressing log" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001"
I0223 15:36:45.205733 20383 container_log_manager.go:427] "Creating temporary log" log="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp"
container_log_manager_test.go:174:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:174
Error: "[- test-log-1 - test-log-2.20240223-153645 - test-log-3 - test-log-3.00000000-000001 - test-log-3.00000000-000001.tmp - test-log-4]" should have 5 item(s), but has 6
Test: TestRotateLogs
container_log_manager_test.go:177:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:177
Error: Not equal:
expected: "test-log-3.00000000-000001.gz"
actual : "test-log-3"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-test-log-3.00000000-000001.gz
+test-log-3
Test: TestRotateLogs
container_log_manager_test.go:178:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:178
Error: Not equal:
expected: "test-log-3.20240223-153645"
actual : "test-log-3.00000000-000001"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-test-log-3.20240223-153645
+test-log-3.00000000-000001
Test: TestRotateLogs
container_log_manager_test.go:179:
Error Trace: /Users/harshnar/Work/Tesseract/Maglev/mks/k8s/kubernetes/pkg/kubelet/logs/container_log_manager_test.go:179
Error: Not equal:
expected: "test-log-4"
actual : "test-log-3.00000000-000001.tmp"
Diff:
--- Expected
+++ Actual
@@ -1 +1 @@
-test-log-4
+test-log-3.00000000-000001.tmp
Test: TestRotateLogs
I0223 15:36:45.227321 20383 container_log_manager.go:447] "Renaming temporary log to compressed log" tmpLog="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp" compressedLog="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.gz"
E0223 15:36:45.227999 20383 container_log_manager.go:449] "Failed to rename temporary log to compressed log" err="rename /var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp /var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.gz: no such file or directory" tmpLog="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp" compressedLog="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.gz"
E0223 15:36:45.228164 20383 container_log_manager.go:307] "Failed to rotate log for container" err="failed to compress log \"/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001\": failed to rename \"/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp\" to \"/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.gz\": rename /var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.tmp /var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3.00000000-000001.gz: no such file or directory" worker=1 containerID="container-has-excess-log" path="/var/folders/l3/tf0xj3gj67bfmk13h462668w0000gn/T/test-rotate-logs3784354061/test-log-3" currentSize=20 maxSize=10
I0223 15:36:45.228193 20383 container_log_manager.go:264] "Processing of container log completed" worker=1 containerID="container-has-excess-log" ok=true
I0223 15:36:45.228223 20383 container_log_manager.go:260] "Did not get any new entry to process, terminating worker" workerID=1
--- FAIL: TestRotateLogs (0.03s) Had to add a few additional logs to identify the issue, but has a fix now. PTAL when possible. |
Which jobs are flaking?
https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-kubernetes-unit-windows-master
https://prow.k8s.io/job-history/gs/ppc64le-kubernetes/logs/periodic-kubernetes-unit-test-ppc64le
Which tests are flaking?
TestRotateLogs from kubelet/logs/container_log_manager_test
Since when has it been flaking?
Probably since the merge of change #114301
Testgrid link
https://testgrid.k8s.io/sig-windows-signal#windows-unit-master
https://testgrid.k8s.io/ibm-unit-tests-ppc64le#Periodic%20unit%20test%20suite%20on%20ppc64le
Reason for failure (if possible)
Anything else we need to know?
No response
Relevant SIG(s)
/sig node
The text was updated successfully, but these errors were encountered: