New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading logs to GCS seem to stopped working #34446
Comments
P0 - because it makes any debugging roughly impossible. |
@rmmh is working on this |
It seems that after fixes from yesterday, we are still lacking logs from all kubemark runs. @gmarek |
Lack of kubemark logs is significantly slowing down any scalability-related work, which is our P0 goal for this quarter, so please prioritize it. |
As an example: |
I'm going to migrate this job over to where our code (rather than a jenkins plugin) captures and uploads logs, which we want to do anyway and will hopefully resolve this. Odd... if we look at job/kubernetes-e2e-gce/configure we will a bunch of conditional build steps in the post-build actions configuration, which matches what is specified in yaml: We also have this specified in kubemark yaml: However I don't see where this is applied in the actual configuration: /job/kubernetes-kubemark-gce-scale/configure |
I suspect that the job updater ran at some point this weekend while the conditional build step plugin wasn't working, so it wasn't able to add that step. I'm not sure why it hasn't rectified it since. |
That seems to be the problem. I've cleared the job cache and done a full rebuild, so gce-scale and anything else that was updated why the postbuild plugin was missing should be fixed now. |
Hmm - it doesn't seem to be fully fixed. Basically, we used to have logs from kubemark master machine, e.g.: However, even now (this is the last run), we no longer have those logs: So something is still broken... |
Any thoughts about it? The logs are still missing, e.g.: |
All the files in _artifacts are being properly uploaded. Something else is failing to copy the logs onto the node.
In gce-scale/1465, it seems to be copying more:
Did something else change in kubemark between these time periods? Is scp from the master broken now? This is a separate issue. |
I'm not aware of any other changes. |
We did touch |
Yeah, I was just trying to work out if it was. I tried to keep the fake |
@zmerlynn - if you could figure out if it was because of your PR (timing matches) that would be great... |
Automatic merge from submit-queue log-dump.sh: Fix kubemark log-dump.sh **What this PR does / why we need it**: Using `log-dump.sh` with the `kubemark` synthetic provider are broken. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #34446
I confirm that this is fixed now. |
It seems that we stopped uploading logs to GCS around Friday/Saturday.
As examples, this is the first kubemark-scale run that doesn't have logs:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-kubemark-gce-scale/1468/?project=kubernetes-jenkins
But we also don't have any logs e.g. for kubernetes-gce-e2e suite (which seems to be our main suite), e.g.:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24856/?project=kubernetes-jenkins
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24857/?project=kubernetes-jenkins
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24858/?project=kubernetes-jenkins
This seems to be the last run, for which we have logs:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce/24740/?project=kubernetes-jenkins
which suggests that it was broken around 2016-10-08 01:00:00
@kubernetes/test-infra-admins @kubernetes/test-infra-admins @kubernetes/sig-testing
@fejta @ixdy @spxtr
The text was updated successfully, but these errors were encountered: