New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1873900: install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5 #448
Bug 1873900: install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5 #448
Conversation
To pick up all the .V(5) logging in pkg/cvo/sync_worker.go. I'm especially interested in the main select loop in SyncWorker.Start, because we have one recent cluster where the sync worker loop apparently goes to sleep for no clear reason [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1873900
@wking: This pull request references Bugzilla bug 1873900, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@wking: This pull request references Bugzilla bug 1873900, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 1873900 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This line is descended from logging we grew way back in 847f71b (pkg/cvo: update to new updatepayload structure, 2018-09-18, openshift#14). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, and we can get the manifest content via: $ oc adm release extract --to=manifests $PULLSPEC
This line is descended from logging we grew way back in 847f71b (pkg/cvo: update to new updatepayload structure, 2018-09-18, openshift#14). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, and we can get the manifest content via: $ oc adm release extract --to=manifests $PULLSPEC
This line is descended from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, with all of the small "fraction-completed" bumps: $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log6265 $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n42020-10-09T18:21:53.523398340Z I1009 18:21:53.522733 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
This line is descended from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, with all of the small "fraction-completed" bumps: $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log6265 $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4 2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
And also the "Status report channel was full" lines. These lines are descended from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, with all of the small "fraction-completed" bumps: $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 6265 $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 5586 $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4 2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
And also the "Status report channel was full" lines. These lines are descended from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, with all of the small "fraction-completed" bumps: $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 6265 $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 5586 $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4 2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
And also the "Status report channel was full" lines. These lines are descended from 961873d (sync: Do config syncing in the background, 2019-01-11, openshift#82). But we triggered it recently with 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). Drop the line, because it's noisy spew in the log files, with all of the small "fraction-completed" bumps: $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 6265 $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log 5586 $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4 2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false} 2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786 1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
/cherrypick release-4.5 |
@jottofar: #448 failed to apply on top of branch "release-4.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
We've been using --v=5 since 88c222c (install/0000_00_cluster-version-operator_03_deployment: Bump to --v=5, 2020-08-30, openshift#448). But it leads to large quantities of noise from client-side throttling [1], and that throttling is V(3): $ grep -n . vendor/k8s.io/client-go/rest/request.go | grep -B7 -A6 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) 598: } 599: if latency > extraLongThrottleLatency { 600: // If the rate limiter latency is very high, the log message should be printed at a higher log level, 601: // but we use a throttled logger to prevent spamming. 602: globalThrottledLogger.Infof("%s", message) 603: } Auditing: $ git --no-pager grep 'klog.V([3-5])' vendor I don't see much that I'd miss. I liked having vendor/github.com/openshift/library-go/pkg/verify's logging back when we made that pivot, but now that that code is old and stable, I'm ok losing it. I've shifted the lib/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V lib) It's mostly hotloop-detection since 40d0a4e (Log object updates and show existing/required diff, 2021-06-03, openshift#561, [2]), and isn't all that noisy since 05e1af7 (Bug 1984414: Log resource diffs on update only in reconcile mode, 2021-07-22, openshift#628, [3]). I've shifted the pkg/ stuff down to V(2) using: $ sed -i 's/klog[.]V([3-5])/klog.V(2)/g' $(git grep -l klog.V pkg) It's mostly fairly core stuff, and low-noise except for the per-manifest "Running sync for ..." and "Done syncing for ..." messages [1]. The per-manifest messages can be useful to identify where the CVO is in the sync cycle, so I'm keeping them for now. We may be able to punt them up to higher levels if we get an alternative mechanism for identifying sync cycle positions (e.g. events for task-node completion or task failure). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2034493#c1 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1984414
To pick up all the
.V(5)
logging inpkg/cvo/sync_worker.go
. I'm especially interested in the main select loop inSyncWorker.Start
, because we have one recent cluster where the sync worker loop apparently goes to sleep for no clear reason.