Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e failure] Cluster level logging implemented by Stackdriver should ingest logs #52433

Closed
ericchiang opened this issue Sep 13, 2017 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@ericchiang
Copy link
Contributor

ericchiang commented Sep 13, 2017

@kubernetes/sig-instrumentation-test-failures
@kubernetes/kubernetes-release-managers

test has started to fail on a several e2e tests:

https://k8s-testgrid.appspot.com/release-master-blocking#gci-gce
https://k8s-testgrid.appspot.com/release-master-blocking#gke

Example failure https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/14136

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/stackdrvier/basic.go:145
Expected error:
    <*errors.errorString | 0xc4210d1dc0>: {
        s: "got message without prefix '[Trimmed]': AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n",
    }
    got message without prefix '[Trimmed]': AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/stackdrvier/basic.go:142

#52289 seems suspect cc @crassirostris

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Sep 13, 2017
@ericchiang ericchiang added this to the v1.8 milestone Sep 13, 2017
@ericchiang ericchiang added kind/e2e-test-failure priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Sep 13, 2017
@ericchiang ericchiang added the kind/bug Categorizes issue or PR as related to a bug. label Sep 13, 2017
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Labels Complete

@ericchiang

Issue label settings:

  • sig/instrumentation: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Additional instructions available here The commands available for adding these labels are documented here

@crassirostris
Copy link

crassirostris commented Sep 13, 2017

100k chars line is split into smaller chunks on export. There's something broken in the logging mechanism. I'm investigating

/cc @Random-Liu @dashpole

@crassirostris crassirostris added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 13, 2017
@crassirostris
Copy link

crassirostris commented Sep 13, 2017

@dchen1107 Docker splits log messages of length 100k into messages of 16k. Is this expected?

@ericchiang
Copy link
Contributor Author

(Note this isn't blocking the submit queue just the e2e tests, but we still need to get this green for the release)

@crassirostris
Copy link

Reproducible on GCP on COS in head, in versions 1.6.9 and 1.7.5, suspect it might be related to the COS image

@dashpole
Copy link
Contributor

This seems to be the correct behavior now. This comment: moby/moby#34620 (comment) seems to indicate this was a recent change. I suspect that this changed when we upgraded from docker 1.11 to 1.12.

@crassirostris
Copy link

crassirostris commented Sep 13, 2017

That is a problem that is going to break a lot of people. Is there a way to turn this off?

/cc @fgrzadkowski @piosz

@crassirostris
Copy link

@tagomoris @repeatedly @edsiper Could you please help? Is there a plugin already to recombine docker output?

@igorpeshansky
Copy link

/cc @igorpeshansky.

@nickperry
Copy link

nickperry commented Sep 13, 2017

@dashpole the 16k split behavior was introduced in Docker 1.13.0 via moby/moby#22982

@crassirostris the only log drivers that currently handle the partial flag are journald and jsonfile. The behavior of the jsonfile driver is to only add a \n to the end of log message if the partial flag is not set on the message. https://github.com/moby/moby/blob/master/daemon/logger/jsonfilelog/jsonfilelog.go#L121

@dashpole
Copy link
Contributor

Thanks for the correction @nickperry, we did just move to 1.13 (not 1.12)

@crassirostris
Copy link

OK, since this is not a problem with Kuberentes per se, I'll move the offending test out of the blocking suite, while working on the resolution of this problem

@nickperry Thanks for clarification

@spiffxp
Copy link
Member

spiffxp commented Sep 13, 2017

k8s-github-robot pushed a commit that referenced this issue Sep 15, 2017
Automatic merge from submit-queue

[fluentd-gcp addon] Remove some e2e tests out of blocking suites

Fixes #52433

Some Stackdriver Logging e2e tests are broken in release-blocking suites:

- Due to the change in Docker 1.13, on some systems logs are automatically split by 16K chunks. This PR removes an e2e test that assumes otherwise
- In large clusters, it's not possible to ingest system logs from all nodes

Since it's not a Kubernetes problem per se, mitigating this by removing these tests from blocking suites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants