-
Notifications
You must be signed in to change notification settings - Fork 744
Infinite loop in nunit 3.9 #2761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The code you posted doesn't seem to be able to cause an infinite loop, at least not if the value of OTOH, if you meant to say it's not Complete, and no other tests are running, this would definitely lock things up. It's interesting that the non-parallel worker is the one having problems. Do you have any non-parallel tests, or do they all run in parallel? What happens if you add a non-parallel test? Using |
Sorry, I messed things up, I meant that the state is |
I figured. 😄 Yes, try that. My theory is that an unused non-parallel worker may be a special case, since the workers are all created up front, whether needed or not. Otherwise, the info trace will give more info as to what was running immediately before the hangup. |
I tried to add a non parallel test but I managed to replicate the hang, so it makes no difference. However, here's something I noticed in the logs (I emitted a couple of additional messages in the code to help me understand what was going on). When the tests ends correctly the sequence I see regarding the dll is:
So it seems that even if the work item is executed by NonParallelWorker the |
The dispatching of the Assuming that you have no specific It's possible you have discovered a race condition in that logic. When the last actual test completes, that "rollup" would normally start. The one-time teardown for each containing suite, starting with the fixture itself, would need to execute. You should check the sequence of those calls. DId anything happen in the teardowns to hang the proccess? |
Possible duplicate of #2764 |
See also #2810. |
I caught a hang live on my machine that I've seen about 2% of the time on AppVeyor. Saved a minidump and attached VS to it! Going to do my best to leave the process running in case one of you wants me to try something. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
/cc @CharliePoole for insight. |
@jnm2 There's not much specific that we can say from the info posted. In general, it looks like this...
I have only been able to debug situations like this by looking at how we got to this place - that is at what tests were running on what threads before we got to the situation. An Info-level trace usually has the right information. |
I managed to reproduce the issue @algol-fi wrote here. The problem lies in the fact that some of the ParallelWorkers die during execution and before completion, and this seems to block the roll-up process he mentioned.
Looking at the code, I noticed this block: nunit/src/NUnitFramework/framework/Internal/Commands/BeforeAndAfterTestCommand.cs Lines 55 to 73 in 0e3ad2b
According to MSDN, ThreadAbortException is a special exception that always rethrows even if caught, unless So, if a test has a You can find an example reproducing the issue here: https://github.com/dukearena/nunit-issue-2761-confirm |
One note about parallel workers "dying" - In the course of execution, it is normal for an entire new set of parallel workers to be created and then terminated. For example, when a non-parallel fixture starts executing, NUnit does not yet know whether some of the tests in that fixture will run in parallel. If they do call for parallel execution, we don't want them to run in parallel with tests from other fixtures, since this is a non-parallel fixture. Therefore, we create an entire new set of workers to run the tests within the fixture. When the fixture completes, all those workers are stopped. There's a bit of inefficiency in doing this, but it isn't large. I originally planned to optimize it should the need arise but it never did. So the line in the log reading
is probably due to this. If the thread had merely been killed due to an error, I don't think you would get a message at all. However, I'd need to see other entries in the log around that time to be sure. |
Ah, this sounds similar to #2328! |
@dukearena Regarding abort during teardown, see #352. I created this issue four years ago. It has been deprioritized a few times and nobody has ever taken it on as a task. I continue to feel that high priority was the right call back in 2014. It seems to me, however, that we could rather easily fix the problem of not calling ResetAbort as you called out in this issue. |
@CharliePoole regarding the logs, this is a more detailed explanation: So, nunit/src/NUnitFramework/framework/Internal/Execution/TestWorker.cs Lines 109 to 150 in 0e3ad2b
So, even if the fixture is not completed, it exits the Adding a |
@dukearena OK, I see the point now. In any case, I think your fix takes care of the problem. Thanks! |
I am running a test suite with nunit 3.9 using nunit3-console, that randomly blocks at the end of the run. All of the tests derive from a base class that is declared with
[Parallelizable(ParallelScope.All)]
attribute.I managed to break into nunit and the problem is that sometimes the NonParallel worker is stuck inside the
OnEndOfShift()
method ofParallelWorkItemDispatcher
. The_topLevelWorkItem
state isWorkItemState.Complete
, and after all of the shifts have been run their queues are empty. No other worker is changing the state of the_topLevelWorkItem
. Given this situation, the following code insideOnEndOfShift()
is an endless loop:The name of the
_topLevelWorkItem
is the name of the dll containing all the tests.I was not able to recreate the problem with a small set of tests but I can make investigations about variable values, execution stacks, or order of events, if you need info.
The text was updated successfully, but these errors were encountered: