Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky E2E test: e2e/TestDomain - create domain with TLS #1873

Closed
dsimansk opened this issue Oct 9, 2023 · 9 comments · Fixed by #1888 or #1919
Closed

Flaky E2E test: e2e/TestDomain - create domain with TLS #1873

dsimansk opened this issue Oct 9, 2023 · 9 comments · Fixed by #1888 or #1919
Assignees
Labels
good first issue Denotes an issue ready for a new contributor.
Milestone

Comments

@dsimansk
Copy link
Contributor

dsimansk commented Oct 9, 2023

The e2e tests are intermittently failing on e2e/TestDomain. In a specific https test case. When the URL should contain https but is empty. It's most likely a timing issue with empty value being checked prematurely.

There's a wait loop to wait for URL to be populated before asserting it. It might be not working as expected.
https://github.com/knative/client/blob/2d0415a7951b99130ccb514fa0e3c5343338225c/test/e2e/domain_mapping_test.go#L136C1-L144C3

Error log:
https://prow.knative.dev/view/gs/knative-prow/logs/continuous_client_main_periodic/1710944161479266304#

/kind good-first-issue

@knative-prow knative-prow bot added the good first issue Denotes an issue ready for a new contributor. label Oct 9, 2023
@dsimansk dsimansk added this to the Knative 1.12 milestone Oct 9, 2023
@xiangpingjiang
Copy link
Contributor

/assign

@dsimansk
Copy link
Contributor Author

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@xiangpingjiang
Copy link
Contributor

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk
Did you find that e2e/TestDomain not pass after #1888 merged?

@dsimansk
Copy link
Contributor Author

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.
@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk Did you find that e2e/TestDomain not pass after #1888 merged?

Yes. E.g.: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_client/1901/integration-tests-latest-release_client_main/1729841947746504704

@xiangpingjiang
Copy link
Contributor

/assign

@dsimansk
Copy link
Contributor Author

The suspicious part in test log is the following. See age: field is 1s, URL is still empty, but the test is already failed with missing string error.

There's still possibility to a short timeout (e.g. 3 - 5s) between create and verify operations. But I'd rather check first if the wait loop is working as expected.

domain_mapping_test.go:153: assertion failed: 
        Actual output: Name:       newdomain.com
        Namespace:  kne2etests21
        Age:        1s
        
        URL:  
        
        Reference:     
          APIVersion:  serving.knative.dev/v1
          Kind:        Service
          Name:        hello
        
        Conditions:  
          OK TYPE    AGE REASON
        
        Missing strings: https://newdomain.com/

@Ipppoooo
Copy link

Ipppoooo commented Dec 27, 2023

It could be a timing related issue. This may work better for the for loop

func domainDescribe(r *test.KnRunResultCollector, domainName string, tls bool) {
	k := test.NewKubectl(r.KnTest().Kn().Namespace())

	// Wait for Domain Mapping URL to be populated
	var url string
	timeout := time.After(30 * time.Second) // Adjust the timeout as needed

	for {
		select {
		case <-time.After(time.Second):
			out, err := k.Run("get", "domainmapping", domainName, "-o=jsonpath='{.status.url}'")
			assert.NilError(r.T(), err)

			if len(out) > 0 {
				url = out
				break
			}
		case <-timeout:
			assert.Fail(r.T(), "Timed out waiting for the domain mapping URL to be populated")
			return
		}
	}

	out := r.KnTest().Kn().Run("domain", "describe", domainName)
	r.AssertNoError(out)

	if tls {
		url = "https://" + domainName
	} else {
		url = "http://" + domainName
	}

	assert.Assert(r.T(), util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url))
}

By introducing this more robust waiting mechanism, you can mitigate timing issues and reduce the likelihood of intermittent test failures.

@rhuss
Copy link
Contributor

rhuss commented Jan 10, 2024

Thanks @Ipppoooo , i think its worth a try. Maybe don't retry every second as this will cause 30 calls if this is a non-recoverable error. Maybe use e.g. 5seconds or an exponential backoff. Fancy to send in a PR ?

@xiangpingjiang
Copy link
Contributor

hello @rhuss
I think the reason maybe the break condition in for loop is len(out) > 0, but the assert condition is util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url).

It's different conditions.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor.
Projects
None yet
4 participants