-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flake data excludes pod-utils jobs #14643
Comments
What data/files is the flake analysis using? Are the pod-utils not uploading something they should? |
I'm not sure I fully understand the pipeline just yet, I took a quick look at it seems something like:
I think the issue is that repos is now a JSON blob, so @fejta suggested something like making this a field in the database and then reading that instead of this
|
I'm not actually sure which component is at fault here or exactly why this doesn't work, but looking at the data we produce jobs that are fully pod-utils are missing and jobs that migrated to pod-utils on newer branches have "0 flakes" when they definitely have non-zero flakes. I would tend to suggest that the pipeline is a bit hairy and probably at fault, but the results are generally very useful for identifying sources of flakiness. The data looks (?) present in pod-utils to me but I'm not fully familiar with that format or the big query pipeline... https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/83519/pull-kubernetes-e2e-gce/1181359333854679040/started.json https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/83519/pull-kubernetes-e2e-gce/1181359333854679040/finished.json ... maybe it's reading repos from finished.json instead of started? |
Hmm -- not sure, I've never looked at that pipeline myself. Happy to help if we can identify what the utils should be doing to be compliant |
Thanks.
I intend to take another look at this pipeline tomorrow to try to
understand what is different.
…On Mon, Oct 7, 2019, 17:51 Steve Kuznetsov ***@***.***> wrote:
Hmm -- not sure, I've never looked at that pipeline myself. Happy to help
if we can identify what the utils should be doing to be compliant
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14643?email_source=notifications&email_token=AAHADK3ZVTJJ5COVEKJMMA3QNPKQRA5CNFSM4I6JCTWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASIVGQ#issuecomment-539265690>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHADK2M6JBW4KA5ZI3DG3TQNPKQRANCNFSM4I6JCTWA>
.
|
This came up again today wrt It seems like we do bigquery quer{y,ies} and then pipe through jq? ... these are fairly gnarly. |
Who's an expert on that pipeline? |
Cole is the only person I can remember touching it in the past year or so. |
/assign Work that may overlap #15469 |
The flakes query looks for looking at fields in the builds table for pr:pull-kubernetes-e2e-kind vs. pr:pull-kubernetes-e2e-gce,
test-infra/kettle/make_json.py Lines 158 to 163 in cdeb7eb
test-infra/kettle/make_json.py Lines 155 to 156 in cdeb7eb
|
example kind job: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/85282/pull-kubernetes-e2e-kind/1215213961905967104 finished.json has no {"timestamp":1578565960,"passed":false,"result":"FAILURE","revision":"05c8dce8bcb1874ad57bcdeb391c11fcccff2a58"} started.json has {"timestamp":1578564680,"pull":"85282","repo-version":"49162743c0055b4395dd40bdf910f2c0472973b5","repos":{"kubernetes/kubernetes":"master:ef69bc910f0e47bbe3cf396d4bebf4f678cf6f3a,85282:05c8dce8bcb1874ad57bcdeb391c11fcccff2a58"}} example gce job: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/86450/pull-kubernetes-e2e-gce/1215395413390004227 finished.json has a {
"timestamp": 1578610945,
"version": "v1.18.0-alpha.1.550+64e0fc900b5b3f",
"result": "FAILURE",
"passed": false,
"job-version": "v1.18.0-alpha.1.550+64e0fc900b5b3f",
"metadata": {
"repo-commit": "64e0fc900b5b3fcd5e5a16cb76ed40b1b900df15",
"node_os_image": "cos-77-12371-89-0",
"repos": {
"k8s.io/kubernetes": "master:aef336d71253d9897f83425e80a231763d1385e8,86450:91a6050b58898d14f48ef893733cff070b17c0db",
"k8s.io/release": "master"
},
"infra-commit": "dd307d2a7",
"repo": "k8s.io/kubernetes",
"master_os_image": "cos-77-12371-89-0",
"job-version": "v1.18.0-alpha.1.550+64e0fc900b5b3f",
"pod": "c130ee54-332c-11ea-9e6e-4a9fb1cbefb2",
"revision": "v1.18.0-alpha.1.550+64e0fc900b5b3f"
}
} |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale I got as far as writing a google-doc proposal which proposed adding I was left with the impression that if we need to touch every part of the pipeline, maybe we want to consider rewriting parts of it piecemeal. |
@spiffxp lacking this data seems problematic. can we at least add some snippet we run in the wrapper script to dump this to e.g. metadata.json, or update the pipeline to consume the prowjob.json or ..? |
/assign |
…uild This is based on spiffxp's proposal off issue kubernetes#14643
#19666 covers updating queries |
Current status:
We haven't decided whether to swap out the old for the new:
When we decide to swap out old for new, we should also look at updating other queries before calling this done (ref: #20013) |
/milestone v1.21 |
Thanks @spiffxp, I have only done one juxtaposition of job results and had not seen that discrepancy in job data. I will try and look into this soon |
Here are the results I am seeing:
New Query Top 10
|
/close |
@MushuEE: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
See http://storage.googleapis.com/k8s-metrics/flakes-latest.json etc. (metrics/ produced files)
and http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1
The flake data is very misleading, for example
pull-kubernetes-verify
has "no flakes" which is definitely wrong.What seems to be happening if we only include data from bootstrap.py results, not pod-utils (I think) possibly due to handling of the repos data (per @cjwagner)
We should fix this, not having flake data is a pretty big regression for managing kubernetes presubmits. I didn't realize that jobs I'd migrated were losing this.
The text was updated successfully, but these errors were encountered: