Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUnit Console hanging with parallel tests #4028

Open
samcook opened this issue Jun 10, 2019 · 9 comments
Open

NUnit Console hanging with parallel tests #4028

samcook opened this issue Jun 10, 2019 · 9 comments

Comments

@samcook
Copy link

samcook commented Jun 10, 2019

Hi,

I'm having an issue where, since adding [assembly: Parallelizable(ParallelScope.Fixtures)] to one of our test assemblies, our tests are sporadically hanging in TeamCity (running version 3.10 of the nunit console runner).

I've managed to reproduce the hang (again, sporadically) running nunit console directly on my machine using both v3.10, as well as the latest git master version, however I don't really know what's triggering it, so trying to narrow it down to a simple repro case isn't easy.

Debugging the process at the time of the hang doesn't seem to show any threads waiting in our own code.

image

The command line I'm running the test with (mostly copied from our TeamCity setup, with added logging) is:

C:\Dev\git\github\nunit-console\bin\Debug\net35\nunit3-console.exe xxxx.Tests.dll --where "cat!=UI&&cat!=API&&cat!=Acceptance&&cat!=Integration&&cat!=ExternalDependency&&cat!=NeverRunCategory" --framework=net-4.0 --agents=8 --trace=Verbose --output=testoutput.log --labels=Before

I have noticed, comparing the logs of successful runs vs hanging runs, that the hanging ones stop logging immediately prior to the end of the parallel shift (WorkShift: Parallel shift ending is logged in the successful log, which then proceeds with a few more tests that are marked [NonParallelizable]).

Any assistance with getting to the bottom of this is much appreciated.

@CharliePoole
Copy link
Contributor

First off, if this is an NUnit problem, it pertains to the framework, not the console runner. All the parallelization takes place in the framework. If we decide it's an NUnit problem, however, we can easily transfer it to that repository.

Regarding your command line, it looks fine except for the --agents option. Since you are running only one test assembly, no more than one agent can ever exist, so limiting agents to 8 has no effect. You would use this, for example, if you had 20 test assemblies but didn't want more than 8 to run at a time. However, it's doing no harm, since it's ignored.

Although some people, including some on the NUnit team, advocate use of the assembly-level [Parallelizable(ParallelScope.Fixtures)] I'm not too keen on it. Experience shows that fixtures not explicitly written to run in parallel often won't. I would remove that and start adding [Parallelizable] to individual fixtures until I found the error. Of course, it's extremely likely that it takes two particular fixtures interfering with one another to cause the problem, so you would need to continue removing and adding the attribute until the right fixtures were identified.

Alternatively, you could keep the code as it is and run various subsets of the tests looking for the culprit.

Even if this is not caused by NUnit (as I suspect) I think we could give better information to help you debug this sort of problem.

@samcook
Copy link
Author

samcook commented Jun 10, 2019

Thanks for the quick reply.

Re the --agents parameter, we do test multiple assemblies in our TC build so it's applicable there, I'd narrowed it down to a single assembly triggering the issue here but not removed it.

I'll try and narrow it down to some particular combinations of tests, however there are over 8,700 tests in that assembly 😓

The thing that's confusing me is that there are no signs of threads sitting our code when debugging the hung process though 🤔

If there's any additional information or modifications you think would be useful in tracing this I'm happy to help as I'd really like to get this solved.

@CharliePoole
Copy link
Contributor

A hang can easily be caused by fixtures that use the same resources, even though you may not be creating any threads themselves. NUnit views marking a fixture as parallelizable as a promise on your part that the two are completely independent. It does nothing to make them parallelizable.

With that many tests, I imagine they are spread in various namespaces. Each namespace is a "test", which you can select using the command-line --test option or the --namespace option. The latter is handy sometimes, since it doesn't include nested namespaces. However, there is no doubt that it's too tedious to figure these things out. I wish we had something analogous to the git bisect command for besecting tests.

@ChrisMaddock
Copy link
Member

Simple thing to check - as Charlie says, this is likely to be an issue in the framework instead of the console. Are you referencing the latest version of the framework? This issue is giving me a strong sense of deja vu...

If that's not the issue, are you able to share your log files?

@samcook
Copy link
Author

samcook commented Jun 12, 2019

Yes, we're referencing the latest framework version (3.12). I've also tried building the latest master version of the framework and debugging with that, and I'm able to reproduce it there too.

I've managed to narrow it down further to a subset of test fixtures that seem to (sometimes) trigger this.

One thing I have noticed when debugging the hung nunit-agent is this:

If I examine the thread that's sitting in the NUnitTestAssemblyRunner waiting for completion, and drill down into TopLevelWorkItem I eventually get to the test fixture that doesn't seem to be started on at all (its state is Running, but no tests in that fixture have been run, and I don't see anything in the output indicating that its OneTimeSetUp has been run).

Additionally, I can see that the TestWorker property lists ParallelWorker#3, however when I look at the threads in the process, ParallelWorker#3 is missing 🤔

image

image

I've attached the logs for the assembly in question - let me know if there's anything else needed.

InternalTrace.21996.xxxx.Managers.Tests.dll.log

@CharliePoole
Copy link
Contributor

ParallelWorker#3 is running on thread 11. Scanning for [11] in your log, I see that thread nunit/nunit-console#11 starts running AddRelatedAccountAsync_Account_Not_Found_Throws_InvalidDomainOperationException and is never heard from again.

@samcook
Copy link
Author

samcook commented Jun 12, 2019

So, further digging...

This is probably why that thread disappears.

image

When I stepped on from that I was in Reflect.cs (

return method.Invoke(fixture, args);
) with a TargetInvocationException, but on the next step the thread was gone (didn't continue into the catch block).

Not sure what's causing corrupt memory in the process, although we do use SQLite in some of our unit tests, and I believe that is unmanaged code, so I'm suspicious of that.

However, would be nice if nunit was able to deal with a worker thread disappearing with an error message or something rather than hanging.

@CharliePoole
Copy link
Contributor

There are certain exceptions from which you can't recover. I believe this is one of them. The catch block in Reflect.cs wraps and throws the exception so that the test can be recorded as an error later. But as you noted, the catch never executes.

To detect what was happening, it would be necessary for the dispatcher to run a thread that periodically checked whether all the worker threads still exist. Reporting which test caused the problem would require more than that, however. Either the dispatcher or the engine would need to keep track of which tests had started but not yet completed. I've recently come across the need to do this in working on an issue in the GUI.

Right now, I think the easiest thing that could be done would be to terminate the test run with an error plus some indication of which test most likely caused the problem. We used to do something like that in the NUnit V2 GUI but it was easier then because we didn't have parallel execution.

@CharliePoole
Copy link
Contributor

Moving this to the framework project in case more work is needed.

@CharliePoole CharliePoole transferred this issue from nunit/nunit-console Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants