Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests using WAF randomly fails after version 7.12.0 #3237

Closed
baltie opened this issue May 26, 2024 · 4 comments · Fixed by #3239
Closed

Integration tests using WAF randomly fails after version 7.12.0 #3237

baltie opened this issue May 26, 2024 · 4 comments · Fixed by #3239

Comments

@baltie
Copy link
Contributor

baltie commented May 26, 2024

We have a large number of integration tests using WebApplicationFramework (WAF) and XUnit and test collections in XUnit. These have worked fine up until version 7.12.0, but after that, they are all failing intermittently, both locally and in our CI/CD workflows. That is, either all fail, or all succeed.

I've tracked down the problem to be the changes introduced in PR #3190 and how _cancellation in ProjectionCoordinator is now handled.

What happens is this:

  1. After all tests in our test collection are done, a teardown procedure is started
  2. Due to a bug/problem (feature?) in WAF, the host is disposed of twice. See Host stopped and disposed twice when WebApplicationFactory<> is used with the minimal api  dotnet/aspnetcore#40271
  3. The ProjectionCoordinator is also then stopped twice, which means StopAsync is called twice.
  4. PauseAsync is then called twice, where the first one succeeds, but the next one fails because the _cancellation CancellationTokenSource has been cancelled and disposed by the first call to PauseAsync.

If the whole test suit fails or not depends on which one of the execution paths fails - if it is the one that XUnit monitors.

It seems to me that Microsoft is not inclined to fix the double dispose. Earlier PRs to fix the matter have been closed and not merged, and the proposed solution is usually to fix the issue in the hosted service so that it handles being stopped twice.

This wasn't a problem with ProjectionCoordinator before, though, until the changes mentioned around pause and resume.

The fix should be simple. There is no need to dispose the CTS right after the cancel in PauseAsync() - you can instead call _cancellation?.Dispose() in StartAsync() before _cancellation = new(). A CTS that is not linked nor uses a timer is not really using any unmanaged resources, so it doesn't really matter if it is left to live until StartAsync() is called again.

@jeremydmiller
Copy link
Member

@baltie I think this is an "I take pull requests" with a relatively quick turnaround

baltie added a commit to baltie/marten that referenced this issue May 27, 2024
… to right before creation

This avoids the problem with exceptions thrown if StopAsync is called more than once.

Closes JasperFxGH-3237
@baltie
Copy link
Contributor Author

baltie commented May 27, 2024

@jeremydmiller I've made a PR for this now. I've tested it, and all our integration tests work after the change.

@jeremydmiller
Copy link
Member

Okay, I'll try to get that out tomorrow maybe

jeremydmiller pushed a commit that referenced this issue May 29, 2024
… to right before creation

This avoids the problem with exceptions thrown if StopAsync is called more than once.

Closes GH-3237
@baltie
Copy link
Contributor Author

baltie commented May 29, 2024

Thank you! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants