Uploading logs to GCS seem to stopped working #34446

wojtek-t · 2016-10-10T13:35:29Z

It seems that we stopped uploading logs to GCS around Friday/Saturday.

As examples, this is the first kubemark-scale run that doesn't have logs:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-gce-scale/1468/?project=kubernetes-jenkins

But we also don't have any logs e.g. for kubernetes-gce-e2e suite (which seems to be our main suite), e.g.:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24856/?project=kubernetes-jenkins
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24857/?project=kubernetes-jenkins
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24858/?project=kubernetes-jenkins

This seems to be the last run, for which we have logs:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24740/?project=kubernetes-jenkins

which suggests that it was broken around 2016-10-08 01:00:00

@kubernetes/test-infra-admins @kubernetes/test-infra-admins @kubernetes/sig-testing
@fejta @ixdy @spxtr

wojtek-t · 2016-10-10T13:38:28Z

P0 - because it makes any debugging roughly impossible.

spxtr · 2016-10-10T16:44:57Z

kubernetes/test-infra#777

ixdy · 2016-10-10T17:22:41Z

@rmmh is working on this

wojtek-t · 2016-10-11T07:29:39Z

It seems that after fixes from yesterday, we are still lacking logs from all kubemark runs. @gmarek

wojtek-t · 2016-10-11T10:03:53Z

Lack of kubemark logs is significantly slowing down any scalability-related work, which is our P0 goal for this quarter, so please prioritize it.
@fgrzadkowski - FYI ^^

wojtek-t · 2016-10-11T12:37:58Z

As an example:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-gce-scale/1473/?project=kubernetes-jenkins
[is a run from an hour ago] and it contains only started.json file.

fejta · 2016-10-11T16:50:46Z

I'm going to migrate this job over to where our code (rather than a jenkins plugin) captures and uploads logs, which we want to do anyway and will hopefully resolve this.

Odd... if we look at job/kubernetes-e2e-gce/configure we will a bunch of conditional build steps in the post-build actions configuration, which matches what is specified in yaml:
https://github.com/kubernetes/test-infra/blob/master/jenkins/job-configs/kubernetes-jenkins/kubernetes-e2e-gce.yaml#L46

We also have this specified in kubemark yaml:
https://github.com/kubernetes/test-infra/blob/master/jenkins/job-configs/kubernetes-jenkins/kubernetes-kubemark.yaml#L26

However I don't see where this is applied in the actual configuration: /job/kubernetes-kubemark-gce-scale/configure

ixdy · 2016-10-11T17:54:13Z

I suspect that the job updater ran at some point this weekend while the conditional build step plugin wasn't working, so it wasn't able to add that step. I'm not sure why it hasn't rectified it since.

rmmh · 2016-10-11T18:18:24Z

That seems to be the problem. I've cleared the job cache and done a full rebuild, so gce-scale and anything else that was updated why the postbuild plugin was missing should be fixed now.

wojtek-t · 2016-10-11T19:40:09Z

Hmm - it doesn't seem to be fully fixed.

Basically, we used to have logs from kubemark master machine, e.g.:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/6550/artifacts/?project=kubernetes-jenkins
(there is this kubemark-500-kubemark-master/ directory)

However, even now (this is the last run), we no longer have those logs:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/6686/artifacts/?project=kubernetes-jenkins

So something is still broken...

wojtek-t · 2016-10-12T10:49:14Z

Any thoughts about it? The logs are still missing, e.g.:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-gce-scale/1477/artifacts/?project=kubernetes-jenkins

rmmh · 2016-10-12T19:00:42Z

All the files in _artifacts are being properly uploaded. Something else is failing to copy the logs onto the node.

In gce-scale/1477

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSDumping master and node logs to /workspace/_artifacts
Copying 'kern kube-apiserver kube-scheduler kube-controller-manager etcd glbc cluster-autoscaler docker kubelet supervisor/supervisord supervisor/kubelet-stdout supervisor/kubelet-stderr supervisor/docker-stdout supervisor/docker-stderr' from kubemark-2000-kubemark-master
Node SSH not supported for kubemark

In gce-scale/1465, it seems to be copying more:

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSDumping master and node logs to /workspace/_artifacts
scp: /var/log/glbc.log*: No such file or directory
scp: /var/log/cluster-autoscaler.log*: No such file or directory
scp: /var/log/supervisor/kubelet-stdout.log*: No such file or directory
scp: /var/log/supervisor/kubelet-stderr.log*: No such file or directory
scp: /var/log/supervisor/docker-stdout.log*: No such file or directory
scp: /var/log/supervisor/docker-stderr.log*: No such file or directory
ERROR: (gcloud.compute.copy-files) [/usr/bin/scp] exited with return code [1]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
Node SSH not supported for kubemark

Did something else change in kubemark between these time periods? Is scp from the master broken now? This is a separate issue.

wojtek-t · 2016-10-12T19:56:16Z

I'm not aware of any other changes.
@gmarek - we should try to debug it tomorrow.

spxtr · 2016-10-12T20:27:42Z

We did touch log-dump.sh recently in #34153. Might be related.

zmerlynn · 2016-10-12T20:29:25Z

Yeah, I was just trying to work out if it was. I tried to keep the fake kubemark path alive as best I could.

wojtek-t · 2016-10-12T20:37:47Z

@zmerlynn - if you could figure out if it was because of your PR (timing matches) that would be great...

Fixes kubernetes#34446

zmerlynn · 2016-10-12T20:49:54Z

@wojtek-t: #34647

Automatic merge from submit-queue log-dump.sh: Fix kubemark log-dump.sh **What this PR does / why we need it**: Using `log-dump.sh` with the `kubemark` synthetic provider are broken. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #34446

wojtek-t · 2016-10-13T06:39:52Z

I confirm that this is fixed now.
@zmerlynn - thanks a lot for fixing this!

wojtek-t added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra labels Oct 10, 2016

wojtek-t assigned fejta Oct 10, 2016

ixdy mentioned this issue Oct 10, 2016

kubernetes-e2e-gci-gke-slow: broken test run #33989

Closed

wojtek-t mentioned this issue Oct 11, 2016

CI jenkins never uploads finished.json kubernetes/test-infra#777

Closed

fejta assigned rmmh Oct 11, 2016

rmmh closed this as completed Oct 11, 2016

wojtek-t reopened this Oct 11, 2016

rmmh closed this as completed Oct 12, 2016

wojtek-t reopened this Oct 12, 2016

wojtek-t assigned gmarek and wojtek-t and unassigned rmmh and fejta Oct 12, 2016

zmerlynn added a commit to zmerlynn/kubernetes that referenced this issue Oct 12, 2016

log-dump.sh: Fix kubemark log-dump.sh

98bcb69

Fixes kubernetes#34446

zmerlynn mentioned this issue Oct 12, 2016

log-dump.sh: Fix kubemark log-dump.sh #34647

Merged

rmmh mentioned this issue Oct 12, 2016

kubernetes-e2e-gci-gke-updown: broken test run #34193

Closed

fejta assigned zmerlynn Oct 12, 2016

k8s-github-robot closed this as completed in #34647 Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploading logs to GCS seem to stopped working #34446

Uploading logs to GCS seem to stopped working #34446

wojtek-t commented Oct 10, 2016

wojtek-t commented Oct 10, 2016 •

edited

spxtr commented Oct 10, 2016

ixdy commented Oct 10, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

fejta commented Oct 11, 2016

ixdy commented Oct 11, 2016

rmmh commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 12, 2016

rmmh commented Oct 12, 2016

wojtek-t commented Oct 12, 2016

spxtr commented Oct 12, 2016

zmerlynn commented Oct 12, 2016

wojtek-t commented Oct 12, 2016

zmerlynn commented Oct 12, 2016

wojtek-t commented Oct 13, 2016

Uploading logs to GCS seem to stopped working #34446

Uploading logs to GCS seem to stopped working #34446

Comments

wojtek-t commented Oct 10, 2016

wojtek-t commented Oct 10, 2016 • edited

spxtr commented Oct 10, 2016

ixdy commented Oct 10, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

fejta commented Oct 11, 2016

ixdy commented Oct 11, 2016

rmmh commented Oct 11, 2016

wojtek-t commented Oct 11, 2016

wojtek-t commented Oct 12, 2016

rmmh commented Oct 12, 2016

wojtek-t commented Oct 12, 2016

spxtr commented Oct 12, 2016

zmerlynn commented Oct 12, 2016

wojtek-t commented Oct 12, 2016

zmerlynn commented Oct 12, 2016

wojtek-t commented Oct 13, 2016

wojtek-t commented Oct 10, 2016 •

edited