Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip the tests in the thirdparty directory #3049

Closed
wants to merge 2 commits into from

Conversation

mengqiy
Copy link
Contributor

@mengqiy mengqiy commented Apr 21, 2022

No description provided.

@mengqiy mengqiy requested a review from martinmaly April 21, 2022 23:27
@yuwenma
Copy link
Contributor

yuwenma commented Apr 21, 2022

From my PR CI results, it seems like some tests are flakey

@mengqiy
Copy link
Contributor Author

mengqiy commented Apr 21, 2022

From my PR CI results, it seems like some tests are flakey

@yuwenma Do you have a link that I can look at?

@yuwenma
Copy link
Contributor

yuwenma commented Apr 21, 2022

From my PR CI results, it seems like some tests are flakey

@yuwenma Do you have a link that I can look at?

It is the go build action on both #3028 and #3044 . Rerun passes.

@yuwenma
Copy link
Contributor

yuwenma commented Apr 21, 2022

Aha, fortunately I have another open PR:
https://github.com/GoogleContainerTools/kpt/runs/6120859943?check_suite_focus=true

All three PRs have the same failure on go build

@yuwenma
Copy link
Contributor

yuwenma commented Apr 21, 2022

@mengqiy I pasted the flaky test log here https://paste.googleplex.com/6427926082879488 and reran the job.

@mengqiy
Copy link
Contributor Author

mengqiy commented Apr 21, 2022

TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory

It seems this tese case is causing some issue. I found this test case was modified after the version we are using. kubernetes-sigs/cli-utils@7aee3cc#diff-c1319f0853b5d0fa77470af5ed385b3bd933f6e864978c885e7c745d49ecd021L878

@mengqiy
Copy link
Contributor Author

mengqiy commented Apr 21, 2022

@karlkfi Do you have any insight? If this test case was flaky, and kubernetes-sigs/cli-utils@7aee3cc#diff-c1319f0853b5d0fa77470af5ed385b3bd933f6e864978c885e7c745d49ecd021L878 helps mitigated it?

Here is an example failure: https://github.com/GoogleContainerTools/kpt/runs/6120096965?check_suite_focus=true#step:4:339

We are trying to figure out if it is the k8s RC release or something else causing the problem.

@mortent
Copy link
Contributor

mortent commented Apr 22, 2022

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

This looks like a problem I ran into recently while working on my watcher PR (kubernetes-sigs/cli-utils#572).

The issue is probably that the ApplyTask takes long enough that the objec tis already reconciled before the WaitTask starts, which means the WaitTask skips the ReconcilePending event and goes strait to the ReconcileSuccessful event.

It occurs to me that cli-utils doesn't have periodic tests on master, so we might not have caught that the test was flaky. On the plus side, the code is probably fine. So you can fix the test or just comment it out.

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

oh wait... missing Current StatusEvent is different that missing Pending WaitEvent... nevermind what I said... I don't know what the cause it for this.

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

You might want to change the t.Fatalf to t.Errorf in the expectedStatusEvents check. It's blocking us from seeing the other failures that would help debug, specifically the event list comparison, which would include any error events.

@mengqiy
Copy link
Contributor Author

mengqiy commented Apr 22, 2022

You might want to change the t.Fatalf to t.Errorf in the expectedStatusEvents check. It's blocking us from seeing the other failures that would help debug, specifically the event list comparison, which would include any error events.

Sure. #3050

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

I was able to repro locally once, after trying many times:

$ go test ./thirdparty/cli-utils/... -count 1
E0421 17:46:38.787255  201274 task.go:263] Unknown object UID from ResourceCache (status: InProgress): default_foo_apps_Deployment
E0421 17:46:38.787340  201274 task.go:263] Unknown object UID from ResourceCache (status: InProgress): default_secret__Secret
E0421 17:46:38.799146  201274 task.go:248] Empty object UID from InventoryManager: default_foo_apps_Deployment
E0421 17:46:38.819250  201274 task.go:248] Empty object UID from InventoryManager: default_foo_apps_Deployment
E0421 17:46:38.825111  201274 task.go:263] Unknown object UID from ResourceCache (status: InProgress): default_foo_apps_Deployment
--- FAIL: TestApplier (10.08s)
    --- FAIL: TestApplier/apply_resource_with_existing_object_belonging_to_different_inventory (0.01s)
        applier_test.go:1499: Expected status event not received: &testutil.ExpStatusEvent{Identifier:object.ObjMetadata{Namespace:"default", Name:"foo", GroupKind:schema.GroupKind{Group:"apps", Kind:"Deployment"}}, Status:"Current", Error:error(nil)}
FAIL
FAIL	github.com/GoogleContainerTools/kpt/thirdparty/cli-utils/apply	14.620s

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

Ahh, i see what the problem is.
The test is sending fake status events for InProgress & Current, but the test is supposed to be erroring in the ApplyTask with an InventoryOverlapError. So in a real life situation, the ApplyTask would never have applied the Deployment and the StatusPoller would never have sent InProgress & Current events for the Deployment, except in some edge cases we're not trying to test here. And even if the StatusPoller did send those events, it would still be a race as to whether the applier exits before receiving them, because the WaitTask is skipped, because the ApplyTask errored.

So the fix is probably to just remove the statusEvents & expectedStatusEvents for this test.

@karlkfi
Copy link
Contributor

karlkfi commented Apr 22, 2022

kubernetes-sigs/cli-utils#582

@mengqiy
Copy link
Contributor Author

mengqiy commented Apr 25, 2022

Closing this PR in favor of #3059.

@mengqiy mengqiy closed this Apr 25, 2022
@mengqiy mengqiy deleted the skiptest branch April 25, 2022 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants