Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our build script tests hang when run with Mono on Windows #3222

Closed
jnm2 opened this issue Apr 28, 2019 · 6 comments · Fixed by #3225 or #3095
Closed

Our build script tests hang when run with Mono on Windows #3222

jnm2 opened this issue Apr 28, 2019 · 6 comments · Fixed by #3225 or #3095
Assignees
Milestone

Comments

@jnm2
Copy link
Contributor

jnm2 commented Apr 28, 2019

When I run the equivalent of test45 using Mono 5.20 on Windows just like we do on Linux, the test run hangs consistently. It looks similar to what we've seen intermittently in CI.

Steps to reproduce from a PowerShell prompt on Windows:

  • choco install mono -y if you don't have it already

  • .\build.ps1 -t build

  • &"C:\Program Files\Mono\bin\mono.exe" `
      tools/NUnit.ConsoleRunner.3.9.0/tools/nunit3-console.exe `
      bin/Release/net45/nunit.framework.tests.dll `
      --result=test-results/net45/nunit.framework.tests.xml `
      --process=InProcess
    

The test run has never completed for me.

@jnm2
Copy link
Contributor Author

jnm2 commented Apr 28, 2019

By adding --labels=Before, I can see that it's actually hanging at a consistent place after all. It's during NUnit.Framework.Attributes.TimeoutTests.TimeoutWithMessagePumpShouldAbort. This is a test I added in #2023. I'm going to try to fix this right away.

@jnm2 jnm2 self-assigned this Apr 28, 2019
@jnm2
Copy link
Contributor Author

jnm2 commented Apr 28, 2019

The test passes when it is the only test in the test run but not when grouped with all the others.

If I remove the WaitMessage code from it so that it just while (true) { } like the other test next to it, it still deadlocks.

Wild hypothesis, but it's starting to feel like Mono never throws ThreadAbortException again once a thread has called ResetAbort even if it's aborted again. If that's the case, you could never have more than one timed out test per thread or else the next timeout would hang.

I went so far as to remove all the aborting logic I added in #2023, but running one extra normal timeout test still hangs when it should be timing out. Thread.CurrentThread.ThreadState is AbortRequested, but no ThreadAbortException gets raised.

If my hypothesis is correct, that would explain the intermittency. To fail, you have to be on Mono and have more than one of the timeout tests get picked up by the same worker thread.

@CharliePoole
Copy link
Contributor

Crazy. You could write a test for that, of course. @migueldeicaza does that sound possible to you? @jnm2 what version of mono?

@jnm2
Copy link
Contributor Author

jnm2 commented Apr 28, 2019

@CharliePoole I haven't verified that. I'm working on it as we speak. But I just had a thought. In #3221 (comment), a stray ThreadAbortException comes along out of nowhere. Here, none shows up. I wonder if somehow something is delaying it so that it only shows up when it's no longer relevant.

Unfortunately I'll be stopping for the evening in ten minutes due to another commitment. I don't mind the break though; I've been banging my head against the wall for six hours at this point. In light of that, everything I say should probably be taken with a grain of salt now that I think of it. :D

@jnm2
Copy link
Contributor Author

jnm2 commented Apr 29, 2019

Now that I'm coming back with fresh eyes, I'm noticing that a number of our timeout tests have this line:

[Platform(Exclude = "Mono", Reason = "Runner hangs at end when this is run")]

This was introduced ten years ago: fc04253
It looks as though Mono has been consistently problematic since then. What I missed in #2023 was that I should have excluded Mono for my test as well, since it's just a fancier version of:

[Test]
[Platform(Exclude = "Mono", Reason = "Runner hangs at end when this is run")]
public void TestTimesOutAndTearDownIsRun()
{
TimeoutFixture fixture = new TimeoutFixture();
TestSuite suite = TestBuilder.MakeFixture(fixture);
TestMethod testMethod = (TestMethod)TestFinder.Find("InfiniteLoopWith50msTimeout", suite, false);
ITestResult result = TestBuilder.RunTest(testMethod, fixture);
Assert.That(result.ResultState, Is.EqualTo(ResultState.Failure));
Assert.That(result.Message, Does.Contain("50ms"));
Assert.That(fixture.TearDownWasRun, "TearDown was not run");
}

@jnm2 jnm2 changed the title Our build script tests deadlock when run with Mono on Windows Our build script tests hang when run with Mono on Windows Apr 30, 2019
@jnm2 jnm2 modified the milestones: 3.13, 3.12 Apr 30, 2019
@jnm2 jnm2 closed this as completed in #3225 May 1, 2019
@jnm2
Copy link
Contributor Author

jnm2 commented May 12, 2019

I nailed down the actual bug in Mono and worked around it so that none of these tests need to be ignored on Mono anymore!

To bring closure to this issue, there are more details starting at #3095 (comment).

tl;dr: Under some circumstances, if you swallow a ThreadAbortException without calling ResetAbort, Mono does not throw ThreadAbortException when the control flow leaves the catch block. Instead, it resurrects it when control flow leaves the next unrelated catch block that happens to execute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants