Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to latest klog 0.4.0 #81164

Merged
merged 2 commits into from Aug 13, 2019

Conversation

@dims
Copy link
Member

commented Aug 8, 2019

kubernetes/klog@47ffc4e - Add test case for detecting data race
kubernetes/klog@959d342 - Prevent data race in SetOutput* methods
kubernetes/klog@34123a4 - Test with golang 1.12.x
kubernetes/klog@bf4884f - Fix the log duplication issue for --log-file
kubernetes/klog@dc5546c - Backfill integration tests for selecting log destinations
kubernetes/klog@baef93d - fix broken links
kubernetes/klog@07b218b - Add go modules files
kubernetes/klog@b33ae69 - Add flag to include file's dir in logs
kubernetes/klog@7c58910 - correct documentation
kubernetes/klog@a4033db - Code Hygene - glog to klog
kubernetes/klog@941d47b - Revert "Fix the log duplication issue for --log-file."
kubernetes/klog@314f6c4 - Update godoc for the default value of logtostderr
kubernetes/klog@7eb35e2 - Fix the log duplication issue for --log-file.

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


47ffc4e Add test case for detecting data race
959d342 Prevent data race in SetOutput* methods
34123a4 Test with golang 1.12.x
bf4884f Fix the log duplication issue for --log-file
dc5546c Backfill integration tests for selecting log destinations
baef93d fix broken links
07b218b Add go modules files
b33ae69 Add flag to include file's dir in logs
7c58910 correct documentation
a4033db Code Hygene - glog to klog
941d47b Revert "Fix the log duplication issue for --log-file."
314f6c4 Update godoc for the default value of logtostderr
7eb35e2 Fix the log duplication issue for --log-file.
@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

/milestone v1.16
/priority important-soon
/area code-organization

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

/sig release

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

@liggitt the first step i was told was to run /test pull-kubernetes-e2e-gce-large-performance which i triggered here

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

/test pull-kubernetes-e2e-gce-large-performance

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

kubernetes/klog@bf4884f - Fix the log duplication issue for --log-file

was any functional test added to verify kubernetes/klog@bf4884f fixed the duplication issue?

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

@liggitt tested this manually


if logging.logFile != "" {
// Since we are using a single log file, all of the items in l.file array
// will point to the same file, so just use one of them to write data.

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 8, 2019

Member

Since we are using a single log file, all of the items in l.file array
will point to the same file, so just use one of them to write data.

is that accurate if SetOutputBySeverity was called?

This comment has been minimized.

Copy link
@dims

dims Aug 8, 2019

Author Member

@liggitt i should probably add godoc comments that when using SetOutput* they should avoid setting any command line params.

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 8, 2019

Member

if CLI flags and SetOutputBySeverity are mutually exclusive, and kube doesn't use SetOutputBySeverity, that's probably ok. Definitely needs documenting

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

@liggitt tested this manually

a follow-up (in klog) that exercises this in a test would be good. this is fragile enough that I could see it silently regressing

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

/lgtm
/approve

/hold
for scale test input

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

@liggitt @wojtek-t @mborsz as i understand it, this PR and #78466 need to be tested together in a 5k environment for us to see the regression. right?

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

/test pull-kubernetes-e2e-gce-large-performance

@mborsz

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

In the previous attempt, the klog update itself (without actually switching to --log-file flag) was a root cause of significant performance regression.

I think it would be good to test this in 5k environment before submission.

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

@mborsz ack agree! @wojtek-t mentioned that he can start a 5k run over the next few days

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

@dims @liggitt - I've kicked this tests and after less than an hour (~1 minute after first logrotation) kube-apiserver started crashlooping.

I will try to run it one more time, but our suspicion with @mborsz is that it may actually be exactly the same problem.

/hold

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

@wojtek-t did you try just this PR like @mborsz suggested above? (don't introduce #78466)

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

@wojtek-t did you try just this PR like @mborsz suggested above? (don't introduce #78466)

I wanted to run the previous experiment one more time to ensure it wasn't flake.
I just did that and the result was exactly the same - apiserver started crashlooping immediatley after first log-rotation. So that definitely wasn't flake.

I didn't yet try only this PR - let me try doing that now.
But even if it will work, it doesn't seem to visible help with going towards distroless, right?

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

@wojtek-t it will give me more information on where the problem could be. "immediately after first log-rotation" seems like a pretty big hint at this point. I just want to confirm that without the distroless just this PR works fine.

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

But even if it will work, it doesn't seem to visible help with going towards distroless, right?

That would seem to indicate the issue is not a performance issue, but an actual bug in the log rotation code (possibly when combined with the distroless environment... unexpectedly readonly filesystems, etc?)

@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

@liggitt exactly!

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

That would seem to indicate the issue is not a performance issue, but an actual bug in the log rotation code (possibly when combined with the distroless environment... unexpectedly readonly filesystems, etc?)

I think that it can still be performance issue, just with distroless we're configuring stuff differently and using a different code-path.

That said, I've run scalability tests against just this PR (5k-node cluster) and the results look fine (i.e. they are comparable with head).
So from scalability POV I'm fine with this PR (as long as we don't merge the distroless one).

@dims @liggitt - feel free to hold cancel.

@wojtek-t wojtek-t self-assigned this Aug 13, 2019
@dims

This comment has been minimized.

Copy link
Member Author

commented Aug 13, 2019

@wojtek-t agree. I'll start looking at the code path for rotation in klog and file follow up PR(s). Can you please point me to logs from the broken run (with distroless) please?

/hold cancel

@k8s-ci-robot k8s-ci-robot merged commit 461b2d8 into kubernetes:master Aug 13, 2019
24 checks passed
24 checks passed
cla/linuxfoundation dims authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-large-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.