New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent's process stays in memory when it was failed to unload appdomain #43

Closed
CharliePoole opened this Issue Aug 29, 2016 · 40 comments

Comments

Projects
None yet
9 participants
@CharliePoole
Member

CharliePoole commented Aug 29, 2016

@NikolayPianikov commented on Mon Jun 27 2016


@NikolayPianikov commented on Mon Jun 27 2016

From my point of view it could be critical issue


@CharliePoole commented on Mon Jun 27 2016

Is this new with 3.4?


@NikolayPianikov commented on Mon Jun 27 2016

Yes, looks like a regression from 3.2.1


@CharliePoole commented on Mon Jun 27 2016

This is a bit strange. The code at https://github.com/nunit/nunit/blob/master/src/NUnitEngine/nunit.engine/Services/DomainManager.cs#L153 throws an exception if the domain unload throws or if it times out. I have run under console and verified that the exception comes back to the console and is displayed. Maybe the Join is timing out and the Thread.Abort is hanging.


@NikolayPianikov commented on Tue Jun 28 2016

Anyway we should terminate all related agents when the console's process was finished


@CharliePoole commented on Tue Jun 28 2016

Certainly... but we have to figure out how to do that. :-) The code I pointed to is part of the termination and it did change in 3.4.

Ideas:

  • Look at what changed in 3.4. Of course the change was intended to solve a problem, so we will have to make sure that still gets solved.
  • Use code from https://github.com/nunit/nunit/blob/master/src/NUnitFramework/framework/Internal/ThreadUtility.cs#L55 which may do a better job of killing the thread.
  • Agent termination has always been cooperative - that is, we tell it to terminate and it does. We could add some code to TestAgency to actually kill the process if it doesn't terminate within a certain time.

What do you think?


@NikolayPianikov commented on Tue Jun 28 2016

Ok. I will try to find.


@rprouse commented on Tue Jun 28 2016

I need to look at the code, but a timeout in TestAgency and then killing the process seems like the right place.

Better would be identifying why it is happening and fix that ;)


@CharliePoole commented on Tue Jun 28 2016

I reviewed the 3.4 changes and don't see anything that should cause this. Moving on to the other bullet points.


@CharliePoole commented on Thu Jun 30 2016

@NikolayPianikov We really need to get the release out, so we are going with the changes we have so far. We can follow up further on this problem if it continues.


@NikolayPianikov commented on Thu Jun 30 2016

Issue was reproduced on "master", see http://win10nik.cloudapp.net/viewLog.html?buildId=94&buildTypeId=NUnit_NUnit3IntegrationTests&tab=buildResultsDiv


@NikolayPianikov commented on Thu Jun 30 2016

User creates new AppDomain and does not finish thread there. We can reproduce it using mocks.zip

internal class UnloadingDomainUtil
{
    public static void Create()
    {
        var newDomain = System.AppDomain.CreateDomain(System.Guid.NewGuid().ToString(), System.AppDomain.CurrentDomain.Evidence);
        newDomain.CreateInstanceFrom(typeof(UnloadingDomainUtil).Assembly.Location, typeof(UnloadingDomainUtil).FullName);
    }

    public UnloadingDomainUtil()
    {
        new System.Threading.Thread(() => { while(true); }).Start();
    }   
}


[Test]
        public void MyTest()
        {
            UnloadingDomainUtil.Create();
        }

@CharliePoole commented on Thu Jun 30 2016

Can you run it using the release/3.4.1 branch?


@NikolayPianikov commented on Thu Jun 30 2016

One moment


@CharliePoole commented on Thu Jun 30 2016

And you say we were successfully terminating such a case using 3.2.1?

Since this is a user-specific situation, caused by a bad test it could be postponed. The most critical issue is that 3.4 won't run the teamcity extension at all! I can only give this a very short timebox before I do the 3.4.1 release without it.


@NikolayPianikov commented on Thu Jun 30 2016

It is reproduced on 3.2.1 too. May be it is a bit another case (we have an exception "Error while unloading appdomain"), but it could have a similar solution.


@CharliePoole commented on Thu Jun 30 2016

When I add your test to a console run, the only effect is a brief pause before it terminates. This works if I run in process or in a separate process.

I'm going to move ahead with 3.4.1 as it is. After it's out, we can try to address this if it's still a problem.


@tfabris commented on Tue Aug 23 2016

Regarding Nunit issue #1628, "Agent's process stays in memory when it was failed to unload appdomain":

This issue is a blocking issue for us at the moment, due to the fact that we recently converted our company's tests over from Nunit 2.6.4 to Nunit 3.4.1. There are some details about the issue which were not revealed on the original bug report and discussion. We have some new information about this bug. We would like to request that the bug be re-opened because it is a blocking issue.

New details:

  • Bug definitely did not occur in Nunit 2.6.4.

  • Bug occurs in 3.4.1 but we do not know in which build it was introduced.

  • Bug occurs only on certain test DLLs which induce the situation which cause the message to appear: "Unable to unload AppDomain, Unload thread timed out."

  • Important: Bug only occurs when the output of the console is redirected. For example, in test automation situations where console output is captured to memory, such as when Team City is running the Nunit Console runner to execute tests. This bug causes Team City to hang and not complete the build. For ease of repro, you can also reproduce in a powershell script which redirects console output.

  • Important: Bug does not occur when you run the Nunit Console at a plain console (output is not redirected).

  • Example PowerShell command which does NOT cause a hang (bug still occurs, but the output appears normally and drops back to powershell prompt):

    & nunit3-console.exe MyBadTest.dll --x86 --framework=net-4.0

  • Exmple PowerShell command which DOES INDUCE the hang because it is redirecting the output (this is the same thing that happens to Team City):

    $result = & nunit3-console.exe MyBadTest.dll --x86 --framework=net-4.0
    write-host "$result"

  • If you watch the Windows Task Manager in the two cases above, you see that there is a launch of Nunit3-Console.exe and one or more Nunit-agent-x86.exe instances. Only the Nunit3-Console.exe closes at the end of the test, the agent instance(s) remain open and do not close. When there is no console redirection, control returns to the console, but when there is redirection, control does not return to the console until the agent process is forcefully terminated.

  • This is a blocking issue for any instances where Nunit Console is being launched by any type of automation, because it causes the automation to hang, and fail to process any more commands. The test runner should close down when it is done, regardless of whether the tests had an issue unloading the appdomain or not.


@tfabris commented on Tue Aug 23 2016

Clarifying my last message:

  • The same bug occurs whether or not you redirect the console output: The nunit agent remains in memory and does not leave.
  • However, you only notice a problem (it's only a blocking issue) in situations where the console output is redirected, such as part of an automation system. That is the situation in which the automation hangs because of the Nunit agent which does not close.

@CharliePoole commented on Tue Aug 23 2016

Some work is ongoing in this area. I'll reopen this and we can test when the changes are in.


@tfabris commented on Tue Aug 23 2016

Sweet. We have an easy repro case, so let me know if you want me to verify a fixed build.


@CharliePoole commented on Sat Aug 27 2016

@tfabris Can you test this against the latest MyGet build? It's at https://www.myget.org/feed/Packages/nunit and the latest version at the moment is 3.5.0-dev-03158.


@tfabris commented on Mon Aug 29 2016

I'm very interested in trying out this build of Nunit against our repro case.

However I am unable to obtain it at the link you provided. Following the link brought me to a signup page for the MyGet service. I had never used MyGet before, and, even after I set up an account there and created a feed, the link you provided ( https://www.myget.org/feed/Packages/nunit ) still does not result in me obtaining the file; it redirects me to a MyGet help/config page. Even after I made sure my profile was filled out and I created a "Feed" in MyGet and tried to add "Nunit" to my feed, same thing: I can't obtain this new build of Nunit that you're trying to link me to. The version of Nunit that is gives me in MyGet is 3.4.1 and I don't see any option to have it give me work-in-progress builds.

Do you have any tips on how to obtain this 3.5.0 dev build?

Thanks!


@CharliePoole commented on Mon Aug 29 2016

@tfabris Sorry - I guess that's only a link for our account to use. There is a Gallery feature in MyGet, but we have not set that up for nunit yet.

Use the following url in Visual Studio (under Nuget sources) to get our myget feed:
https://www.myget.org/F/nunit/api/v2

Alternatively, you can grab the build from AppVeyor:
https://ci.appveyor.com/project/CharliePoole/nunit-console/build/3.5.0-dev-03158/artifacts


@tfabris commented on Mon Aug 29 2016

Thanks so much, Charlie!!!!

I obtained the package from AppVeyor: package\NUnit.ConsoleRunner.3.5.0-dev-03158.nupkg

I extracted it and ran the console runner contained within that package at the command prompt, pointing it to our repro DLL.

Result: Same problem. Steps:

  • When the tests in our repro test DLL are complete and all tests have passed, there is the final report of all test scores at the console. All tests pass. Then, after the test scores are displayed, the error message "Unable to unload AppDomain, Unload thread timed out" appears on the console (this is the bad condition that the test DLL must induce in order for the problem below to repro).
  • At that point in time, "Nunit3-Console.exe" closes and is no longer in memory, as expected. However "nunit-agent-x86" remains in memory and does not leave memory. (This is the bug.)
  • If you are running directly at the DOS console, then control returns to the console, despite the agent remaining in memory.
  • If you are redirecting the console output of the Nunit runner (for example, if you are running a script which pipes the output somewhere else for capture/logging, or, you are running in Team City which pipes the output into memory for capture/logging), then the progress hangs and control is not returned to the calling program. It waits forever and does not return. If your test is running in Team City, the the build hangs forever and never finishes. (This is the most critical failure mode of the bug as it blocks production.)

Details, from an analysis by our escalation developer, using the Nunit 3.4.1 build:

The process nunit-agent-x86.exe is kept alive because its main thread is waiting forever for a manual Event to be set. This is the call stack of the main thread:

OS Thread Id: 0x6ff0 (0)
Child SP IP Call Site
00afede8 7745718c [HelperMethodFrame_1OBJ: 00afede8] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
00afeecc 723a49d1 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
00afeee4 723a4998 System.Threading.WaitHandle.WaitOne(Int32, Boolean) <<< Waits forever here…
00afeef8 723a496e System.Threading.WaitHandle.WaitOne()
00afef00 0539707e NUnit.Engine.Agents.RemoteTestAgent.WaitForStop()
00afef08 00d2173e NUnit.Agent.NUnitTestAgent.Main(System.String[]) [C:\Users\xxxxxxxxxxx\AppData\Local\JetBrains\Shared\v04\DecompilerCache\decompiler\9E876AE6-521E-4A30-9583-EE5D668CC2CC\76\160e3076\NUnitTestAgent.cs @ 92]
00aff0e8 73301376 [GCFrame: 00aff0e8]

This is the NUnit source code relative to the methods above:

public void WaitForStop()
{
this.stopSignal.WaitOne();
}


@CharliePoole commented on Mon Aug 29 2016

This issue should have been moved to the new nunit-console repository when we split repos. Doing so now.

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Aug 29, 2016

Member

I confirmed this in our own build of the engine master branch, which had not been running since we split repositories.

Member

CharliePoole commented Aug 29, 2016

I confirmed this in our own build of the engine master branch, which had not been running since we split repositories.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Aug 29, 2016

Thanks so much for looking into this in detail, Charlie!

Question: Is there a chance that this issue is a regression caused by the fix to this earlier issue?

nunit/nunit#329

tfabris commented Aug 29, 2016

Thanks so much for looking into this in detail, Charlie!

Question: Is there a chance that this issue is a regression caused by the fix to this earlier issue?

nunit/nunit#329

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Aug 29, 2016

Member

@tfabris I'll check that. The earlier fix was that we were not reporting the error. AFAIK the process would probably still have hung, but I can run some comparisons to see.

Member

CharliePoole commented Aug 29, 2016

@tfabris I'll check that. The earlier fix was that we were not reporting the error. AFAIK the process would probably still have hung, but I can run some comparisons to see.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Aug 29, 2016

Thanks! I don't know which build it was introduced in, because we've been on 2.6.4 for a long time and only just switched to Nunit3 in the last week or so. So the issue could have been there for a long time, any time after 2.6.4.

tfabris commented Aug 29, 2016

Thanks! I don't know which build it was introduced in, because we've been on 2.6.4 for a long time and only just switched to Nunit3 in the last week or so. So the issue could have been there for a long time, any time after 2.6.4.

@rwencel

This comment has been minimized.

Show comment
Hide comment
@rwencel

rwencel Sep 24, 2016

We're having the same (or similar?) problem - The nunit-agent.exe process hangs after the error "Unable to unload AppDomain, Unload thread timed out" when testing certain assemblies. I suspect it's some of our test or test object finalizers hanging and not releasing the finalizer thread but I'm not sure. Since this hangs our TFS build, we're going to workaround the problem for now with a custom NUnit build that calls "_realRunner.Dispose()" in a try/finally block in MasterTestRunner's Dispose method. That seems to be the cause of the hang, that the agent is never signalled to stop. Let me know if I can help anymore.

rwencel commented Sep 24, 2016

We're having the same (or similar?) problem - The nunit-agent.exe process hangs after the error "Unable to unload AppDomain, Unload thread timed out" when testing certain assemblies. I suspect it's some of our test or test object finalizers hanging and not releasing the finalizer thread but I'm not sure. Since this hangs our TFS build, we're going to workaround the problem for now with a custom NUnit build that calls "_realRunner.Dispose()" in a try/finally block in MasterTestRunner's Dispose method. That seems to be the cause of the hang, that the agent is never signalled to stop. Let me know if I can help anymore.

@davidjward30

This comment has been minimized.

Show comment
Hide comment
@davidjward30

davidjward30 Oct 2, 2016

In case it helps, we also had the same problem when migrating from 2.6.4 to 3.4.1 with teamcity. We finally managed to track down the problem to one of our tests creating a child process which was left running. This this child process was designed to automatically close when it's parent closed so would not affect subsequent test runs on the same machine. We now close the process and all is well.

davidjward30 commented Oct 2, 2016

In case it helps, we also had the same problem when migrating from 2.6.4 to 3.4.1 with teamcity. We finally managed to track down the problem to one of our tests creating a child process which was left running. This this child process was designed to automatically close when it's parent closed so would not affect subsequent test runs on the same machine. We now close the process and all is well.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Oct 2, 2016

In our case, we discovered that the child process which wasn't shutting down was the third party library "Serilog". It would eventually shut down on its own but it took to long for Nunit/TeamCity's tastes and thus we would get the hang and the test failure. We have had to temporarily work around the problem by disabling Serilog during the run of our integration tests, but it means that when the tests reveal a bug, we don't have any logging to look at. So we're anxiously awaiting a fix for this in Nunit 3.x so that we have the option to re-enable logging in these cases.

tfabris commented Oct 2, 2016

In our case, we discovered that the child process which wasn't shutting down was the third party library "Serilog". It would eventually shut down on its own but it took to long for Nunit/TeamCity's tastes and thus we would get the hang and the test failure. We have had to temporarily work around the problem by disabling Serilog during the run of our integration tests, but it means that when the tests reveal a bug, we don't have any logging to look at. So we're anxiously awaiting a fix for this in Nunit 3.x so that we have the option to re-enable logging in these cases.

@NikolayPianikov

This comment has been minimized.

Show comment
Hide comment
@NikolayPianikov

NikolayPianikov Oct 3, 2016

Contributor

@CharliePoole I looked into this question and I think I could try to make a quite robust fix for this issue, but it will use Job Objects Win32 API. Is this acceptable?

Contributor

NikolayPianikov commented Oct 3, 2016

@CharliePoole I looked into this question and I think I could try to make a quite robust fix for this issue, but it will use Job Objects Win32 API. Is this acceptable?

@rprouse

This comment has been minimized.

Show comment
Hide comment
@rprouse

rprouse Oct 3, 2016

Member

@NikolayPianikov we cannot use any Win32 API's because we want to maintain compatibility on all platforms.

If you have a way to wrap the Win32 API's into an interface and detect if those API's are available, then return a different no-op interface on other platforms, then we might consider a fix, but please sketch out your plan before you do too much work on it.

Member

rprouse commented Oct 3, 2016

@NikolayPianikov we cannot use any Win32 API's because we want to maintain compatibility on all platforms.

If you have a way to wrap the Win32 API's into an interface and detect if those API's are available, then return a different no-op interface on other platforms, then we might consider a fix, but please sketch out your plan before you do too much work on it.

@rprouse

This comment has been minimized.

Show comment
Hide comment
@rprouse

rprouse Oct 6, 2016

Member

@CharliePoole I haven't seen anything for this. Are you still planning on working on it before release? I don't think we have reliable repro steps do we?

Member

rprouse commented Oct 6, 2016

@CharliePoole I haven't seen anything for this. Are you still planning on working on it before release? I don't think we have reliable repro steps do we?

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Oct 6, 2016

Our repro steps that we did were 100 percent reliable. Though we don't have example code of a test that will induce them since that's internal company stuff.

Basically the steps were:

  • Write a test that calls an object that opens a thread and never closes it.
  • Run this test under Nunit 2.6.4: Get "Unloading Appdomain" message but there is no hang.
  • Run this test under Nunit 3.x: Get "Unload Appdomain" message and the Nunit-Agent process hangs and never exits.

tfabris commented Oct 6, 2016

Our repro steps that we did were 100 percent reliable. Though we don't have example code of a test that will induce them since that's internal company stuff.

Basically the steps were:

  • Write a test that calls an object that opens a thread and never closes it.
  • Run this test under Nunit 2.6.4: Get "Unloading Appdomain" message but there is no hang.
  • Run this test under Nunit 3.x: Get "Unload Appdomain" message and the Nunit-Agent process hangs and never exits.
@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 6, 2016

Member

Although it's hard to replicate, we do understand the source of the problem. Years ago, we decided it didn't matter if an AppDomain didn't unload or a subordinate process never exited. Now we have various cases where it does matter.

I was planning to put something in to kill any processes that we started on exit, provided they are still running. I haven't looked at how easy it will be. TestAgency has the process info and the console runner knows when it's exiting. If it's relatively easy, I'll do it, then we can see whether it solves the problem.

Member

CharliePoole commented Oct 6, 2016

Although it's hard to replicate, we do understand the source of the problem. Years ago, we decided it didn't matter if an AppDomain didn't unload or a subordinate process never exited. Now we have various cases where it does matter.

I was planning to put something in to kill any processes that we started on exit, provided they are still running. I haven't looked at how easy it will be. TestAgency has the process info and the console runner knows when it's exiting. If it's relatively easy, I'll do it, then we can see whether it solves the problem.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Oct 6, 2016

Sounds like a reasonable plan!

tfabris commented Oct 6, 2016

Sounds like a reasonable plan!

@rwencel

This comment has been minimized.

Show comment
Hide comment
@rwencel

rwencel Oct 7, 2016

Another repro is to put an infinite Thread.Sleep in a test fixture's finalizer. I described a fix for the 3.4.1 release above, but it seems the relevant code has changed since 3.4.1. Basically, the ProcessRunner wasn't getting disposed b/c the MasterTestRunner threw an exception b/c the AppDomain didn't unload. Without the ProcessRunner disposed, the agent is never signalled to stop and hangs forever.

rwencel commented Oct 7, 2016

Another repro is to put an infinite Thread.Sleep in a test fixture's finalizer. I described a fix for the 3.4.1 release above, but it seems the relevant code has changed since 3.4.1. Basically, the ProcessRunner wasn't getting disposed b/c the MasterTestRunner threw an exception b/c the AppDomain didn't unload. Without the ProcessRunner disposed, the agent is never signalled to stop and hangs forever.

@leo90skk

This comment has been minimized.

Show comment
Hide comment
@leo90skk

leo90skk Oct 7, 2016

Note: In my case I get the error when I am using a external Library (NetMQ).
I decided to disable theese unit tests because I can't dig so deep in to that foreign library. My project is open source - take a look at leo90skk/qdms#57 .

leo90skk commented Oct 7, 2016

Note: In my case I get the error when I am using a external Library (NetMQ).
I decided to disable theese unit tests because I can't dig so deep in to that foreign library. My project is open source - take a look at leo90skk/qdms#57 .

@NikolayPianikov

This comment has been minimized.

Show comment
Hide comment
@NikolayPianikov

NikolayPianikov Oct 7, 2016

Contributor

@rprouse If it was possible to have an extension point to control creating new agents' processes we could create an extension. This extension just should create a Job for parent process with specific parameters all other works will be done by OS. I could make a prototype.
Of course we should not forget about Linux Mono and .net core. But on the first step we could do it only for full .net users on OS Windows. Also we could use platform specific assemblies of extensions in the scope of each extension in the future.
Also this extension could control many other things. For example run tests under specified user account/windows integrity level and something other useful.

Contributor

NikolayPianikov commented Oct 7, 2016

@rprouse If it was possible to have an extension point to control creating new agents' processes we could create an extension. This extension just should create a Job for parent process with specific parameters all other works will be done by OS. I could make a prototype.
Of course we should not forget about Linux Mono and .net core. But on the first step we could do it only for full .net users on OS Windows. Also we could use platform specific assemblies of extensions in the scope of each extension in the future.
Also this extension could control many other things. For example run tests under specified user account/windows integrity level and something other useful.

@rwencel

This comment has been minimized.

Show comment
Hide comment
@rwencel

rwencel Oct 7, 2016

@rprouse @CharliePoole The problem is still repro in last night's latest source code. Using a try/finally in ProcessRunner's Dispose method fixes the problem. This fix signals the agents to stop even if the AppDomain unload throws an error. It previously never signaled the agent to stop so the nunit-agent.exe process hung on agent.WaitForStop(). I'm really hoping this makes it in 3.5 since we're using a custom build now to avoid a TFS build hang. Thanks guys!

Repro Command Line
"nunit3-console.exe" "bin\Debug\Repro.dll" --framework=net-4.5 --agents=4

Repro

    [TestFixture]
    public class TestClass
    {
        [Test]
        public void Test()
        {
            Assert.Pass();
        }
        ~TestClass()
        {
            System.Threading.Thread.Sleep(TimeSpan.FromDays(1));
        }
    }

Fix in ProcessRunner.cs

        protected override void Dispose(bool disposing)
        {
            try {
                base.Dispose(disposing);
            } finally {

                try {
                    if (disposing && _agent != null) {
                        log.Debug("Stopping remote agent");
                        _agent.Stop();
                        _agent = null;
                    }
                } catch (Exception e) {
                    log.Error("Failed to stop the remote agent. {0}", e.Message);
                    _agent = null;
                }
            }
        }

rwencel commented Oct 7, 2016

@rprouse @CharliePoole The problem is still repro in last night's latest source code. Using a try/finally in ProcessRunner's Dispose method fixes the problem. This fix signals the agents to stop even if the AppDomain unload throws an error. It previously never signaled the agent to stop so the nunit-agent.exe process hung on agent.WaitForStop(). I'm really hoping this makes it in 3.5 since we're using a custom build now to avoid a TFS build hang. Thanks guys!

Repro Command Line
"nunit3-console.exe" "bin\Debug\Repro.dll" --framework=net-4.5 --agents=4

Repro

    [TestFixture]
    public class TestClass
    {
        [Test]
        public void Test()
        {
            Assert.Pass();
        }
        ~TestClass()
        {
            System.Threading.Thread.Sleep(TimeSpan.FromDays(1));
        }
    }

Fix in ProcessRunner.cs

        protected override void Dispose(bool disposing)
        {
            try {
                base.Dispose(disposing);
            } finally {

                try {
                    if (disposing && _agent != null) {
                        log.Debug("Stopping remote agent");
                        _agent.Stop();
                        _agent = null;
                    }
                } catch (Exception e) {
                    log.Error("Failed to stop the remote agent. {0}", e.Message);
                    _agent = null;
                }
            }
        }
@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 7, 2016

Member

No fix to this issue has been pushed so of course the problem persists.

I agree with your diagnosis but am a little nervous with a try / catch inside a finally block. Have you seen this work when an exception is followed by a second exception inside the finally?

Member

CharliePoole commented Oct 7, 2016

No fix to this issue has been pushed so of course the problem persists.

I agree with your diagnosis but am a little nervous with a try / catch inside a finally block. Have you seen this work when an exception is followed by a second exception inside the finally?

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 7, 2016

Member

Research indicates it's legal to do a try/catch in the finally but almost universally considered bad form. 😄

I'll try to restructure to avoid the nesting, but if I can't come up with something quick, @rwencel 's code will go into the release.

Member

CharliePoole commented Oct 7, 2016

Research indicates it's legal to do a try/catch in the finally but almost universally considered bad form. 😄

I'll try to restructure to avoid the nesting, but if I can't come up with something quick, @rwencel 's code will go into the release.

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 8, 2016

Member

Dispose is doing two things: unloading the runner and stopping the agent. Hence, there are four cases of success/failure...

  1. Both succeed - no problem, no issue here.

  2. Unload fails but stop succeeds. I'm displaying a two line warning, for example...

    Unable to unload AppDomain. Unload thread timed out.
    Agent process was terminated successfully after error.
    
  3. Unload succeeds but stop fails. I'm displaying...

    Failed to stop the remote  agent. Simulated failure in agent stop.
    
  4. Both unload and stop fail. I'm displaying

    Unable to unload AppDomain. Unload thread timed out.
    Failed to stop the remote  agent. Simulated failure in agent stop.
    
Member

CharliePoole commented Oct 8, 2016

Dispose is doing two things: unloading the runner and stopping the agent. Hence, there are four cases of success/failure...

  1. Both succeed - no problem, no issue here.

  2. Unload fails but stop succeeds. I'm displaying a two line warning, for example...

    Unable to unload AppDomain. Unload thread timed out.
    Agent process was terminated successfully after error.
    
  3. Unload succeeds but stop fails. I'm displaying...

    Failed to stop the remote  agent. Simulated failure in agent stop.
    
  4. Both unload and stop fail. I'm displaying

    Unable to unload AppDomain. Unload thread timed out.
    Failed to stop the remote  agent. Simulated failure in agent stop.
    
@Roemer

This comment has been minimized.

Show comment
Hide comment
@Roemer

Roemer Oct 17, 2016

I have the same problem and as you say, it should be a issue in the test code, I have one where I spent hours to solve and now I'm out of ideas and still end in the AppDomain unload exception.
The problem occurs in one of my projects where I do UI testing. So most tests open an application, do some ui stuff and then exit the application.
Now I could create a standalone version which fails 100%. It only contains a WinForms sample app with an empty listview and an empty treeview. The test itself only has some initialization code where it selects the tree.
There are various things I found out already:

  • The test must run at least twice (two TestFixture with parameters) so that the 2nd will fail
  • If the test is run more than two times, only the last one fails, the others work correct
  • Deleting the ListView on the sample app solves the issue
  • Moving the "this.Controls.Add" (in the Designer.cs) for the listview below the one from the treeview solves the issue (this is very strange to me...)
  • Commenting out var tree = mainWindow.FindFirst(... solves the issue

I am absolutely out of ideas what could go wrong and especially why some totally unrelated things fix this issue.
I have a very simple project which shows the problem here: http://flauschig.ch/transfer/TreeHangTest.zip
Just build it and then run the runtetst.bat and you see the problem.
Anyone mind helping me finding why the app domain cannot unload?

Roemer commented Oct 17, 2016

I have the same problem and as you say, it should be a issue in the test code, I have one where I spent hours to solve and now I'm out of ideas and still end in the AppDomain unload exception.
The problem occurs in one of my projects where I do UI testing. So most tests open an application, do some ui stuff and then exit the application.
Now I could create a standalone version which fails 100%. It only contains a WinForms sample app with an empty listview and an empty treeview. The test itself only has some initialization code where it selects the tree.
There are various things I found out already:

  • The test must run at least twice (two TestFixture with parameters) so that the 2nd will fail
  • If the test is run more than two times, only the last one fails, the others work correct
  • Deleting the ListView on the sample app solves the issue
  • Moving the "this.Controls.Add" (in the Designer.cs) for the listview below the one from the treeview solves the issue (this is very strange to me...)
  • Commenting out var tree = mainWindow.FindFirst(... solves the issue

I am absolutely out of ideas what could go wrong and especially why some totally unrelated things fix this issue.
I have a very simple project which shows the problem here: http://flauschig.ch/transfer/TreeHangTest.zip
Just build it and then run the runtetst.bat and you see the problem.
Anyone mind helping me finding why the app domain cannot unload?

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 18, 2016

Member

@Roemer Are you saying that the agent still hangs with the fix in the new release?

Member

CharliePoole commented Oct 18, 2016

@Roemer Are you saying that the agent still hangs with the fix in the new release?

@Roemer

This comment has been minimized.

Show comment
Hide comment
@Roemer

Roemer Oct 18, 2016

@CharliePoole No it does not hang with 3.5 but I get the warning "Unable to unload AppDomain" which I want to fix but I ran out of ideas (see post above). Is there any way to get more information about why it fails?

Roemer commented Oct 18, 2016

@CharliePoole No it does not hang with 3.5 but I get the warning "Unable to unload AppDomain" which I want to fix but I ran out of ideas (see post above). Is there any way to get more information about why it fails?

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Oct 24, 2016

I'm glad to hear that it does not hang with version 3.5 any more.

Today I tried to locate the version 3.5 console runner but I cannot find where to download it. I go to this location : https://github.com/nunit/nunit/releases/tag/3.5 but all of the .zip files located there do not contain the console runner. The release notes say that the console runner is now a separate build, but I cannot find where to get a download of that build.

tfabris commented Oct 24, 2016

I'm glad to hear that it does not hang with version 3.5 any more.

Today I tried to locate the version 3.5 console runner but I cannot find where to download it. I go to this location : https://github.com/nunit/nunit/releases/tag/3.5 but all of the .zip files located there do not contain the console runner. The release notes say that the console runner is now a separate build, but I cannot find where to get a download of that build.

@rprouse

This comment has been minimized.

Show comment
Hide comment
@rprouse

rprouse Oct 24, 2016

Member

@tfabris The Nunit Console is now released separately and from this repository, https://github.com/nunit/nunit-console/releases

Member

rprouse commented Oct 24, 2016

@tfabris The Nunit Console is now released separately and from this repository, https://github.com/nunit/nunit-console/releases

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Oct 24, 2016

Member

@Roemer I created issue #111 to look at how to get more info to you when the AppDomain won't unload.

Member

CharliePoole commented Oct 24, 2016

@Roemer I created issue #111 to look at how to get more info to you when the AppDomain won't unload.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Oct 24, 2016

Rob, thank you so much for pointing me to the correct download location.

tfabris commented Oct 24, 2016

Rob, thank you so much for pointing me to the correct download location.

@davidjward30

This comment has been minimized.

Show comment
Hide comment
@davidjward30

davidjward30 Nov 3, 2016

I am still getting occasional issues with the agent process staying in memory with 3.5.0. This problem appeared when we migrated from 3.2 to 3.4. I'm fairly sure that the AppDomain is being unloaded successfully in our case. Looking at the 'dead' agent processes, they only have nunit assemblies loaded into memory.

I'd like to help but it seems very hard to reproduce. I could try and get a process dump from the agents in the 'dead' state.

davidjward30 commented Nov 3, 2016

I am still getting occasional issues with the agent process staying in memory with 3.5.0. This problem appeared when we migrated from 3.2 to 3.4. I'm fairly sure that the AppDomain is being unloaded successfully in our case. Looking at the 'dead' agent processes, they only have nunit assemblies loaded into memory.

I'd like to help but it seems very hard to reproduce. I could try and get a process dump from the agents in the 'dead' state.

@davidjward30

This comment has been minimized.

Show comment
Hide comment
@davidjward30

davidjward30 Nov 4, 2016

Following on from my previous comment, I'm starting to spot a pattern on our teamcity runs. I have a feeling that "Cancelling" while nunit is running could be causing the agent processes to be left running. I suspect that Teamcity simply terminates the nunit-console process. This has always worked reliably for us in nunit 2.6 and 3.2.

davidjward30 commented Nov 4, 2016

Following on from my previous comment, I'm starting to spot a pattern on our teamcity runs. I have a feeling that "Cancelling" while nunit is running could be causing the agent processes to be left running. I suspect that Teamcity simply terminates the nunit-console process. This has always worked reliably for us in nunit 2.6 and 3.2.

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Nov 4, 2016

Member

@davidjward30 Your last comment makes a certain amount of sense. The agent process runs until it is told to terminate by the main process. We may need to rethink that if we can't properly handle termination of the process. Could be a fairly big thing. 😦

Since this really has nothing to do with the closed issue of being unable to close the AppDomain, could you please file this as a new issue?

Member

CharliePoole commented Nov 4, 2016

@davidjward30 Your last comment makes a certain amount of sense. The agent process runs until it is told to terminate by the main process. We may need to rethink that if we can't properly handle termination of the process. Could be a fairly big thing. 😦

Since this really has nothing to do with the closed issue of being unable to close the AppDomain, could you please file this as a new issue?

@CharliePoole

This comment has been minimized.

Show comment
Hide comment
@CharliePoole

CharliePoole Nov 4, 2016

Member

@NikolayPianikov can you tell us if TC cancels the NUnit process in some circumstances and, if so, exactly what it does to accomplish it?

Member

CharliePoole commented Nov 4, 2016

@NikolayPianikov can you tell us if TC cancels the NUnit process in some circumstances and, if so, exactly what it does to accomplish it?

@davidjward30

This comment has been minimized.

Show comment
Hide comment
@davidjward30

davidjward30 Nov 7, 2016

These cancellations occur when a user requests a CI build to be cancelled. @CharliePoole - I have created a new issue #128

davidjward30 commented Nov 7, 2016

These cancellations occur when a user requests a CI build to be cancelled. @CharliePoole - I have created a new issue #128

@davidjward30

This comment has been minimized.

Show comment
Hide comment
@davidjward30

davidjward30 Nov 7, 2016

@CharliePoole - the way we typically handle this situation in our software is to have the child process open a handle to it's caller and then exit early if it closes. This seemed to be simpler than using "job objects" that was mentioned earlier in this thread.

davidjward30 commented Nov 7, 2016

@CharliePoole - the way we typically handle this situation in our software is to have the child process open a handle to it's caller and then exit early if it closes. This seemed to be simpler than using "job objects" that was mentioned earlier in this thread.

@NikolayPianikov

This comment has been minimized.

Show comment
Hide comment
@NikolayPianikov

NikolayPianikov Nov 7, 2016

Contributor

TeamCity has the button to finish a build. TeamCity stops all tree of processes from the root, but sometimes, for example when console.exe was stopped, but agents did not do the same - there is no way to find all children. In this case "Jobs" could help but this approach will work only for Windows. We could use some kind of the "heartbeat" signal to check that the console is still working and each agent should do "auto exit" when the console was closed

Contributor

NikolayPianikov commented Nov 7, 2016

TeamCity has the button to finish a build. TeamCity stops all tree of processes from the root, but sometimes, for example when console.exe was stopped, but agents did not do the same - there is no way to find all children. In this case "Jobs" could help but this approach will work only for Windows. We could use some kind of the "heartbeat" signal to check that the console is still working and each agent should do "auto exit" when the console was closed

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Nov 7, 2016

Another reason that team city migh stop the test is if the test has exceeded its runtime limit and is considered to be "hung". Team city will terminate the Nunit console in that situation, but not the agents.

tfabris commented Nov 7, 2016

Another reason that team city migh stop the test is if the test has exceeded its runtime limit and is considered to be "hung". Team city will terminate the Nunit console in that situation, but not the agents.

@tfabris

This comment has been minimized.

Show comment
Hide comment
@tfabris

tfabris Nov 7, 2016

I'm sorry that I don't know the exact method that team city uses to perform the termination.

tfabris commented Nov 7, 2016

I'm sorry that I don't know the exact method that team city uses to perform the termination.

@pavzaj

This comment has been minimized.

Show comment
Hide comment
@pavzaj

pavzaj Nov 9, 2016

Hello, I have similar problem. When I run my test suite like so:
D:\BuildAgent2\work\c11fa50512543249>D:\NUnit-3.5.0\nunit3-console.exe <testDllsNames> --where="cat!=FunctionalTest&&cat!=IntegrationTest&&cat!=StressTest" --agents=0 --dispose-runners --framework=net-4.5 --work=<workdir> --verbose --teamcity --result=.\CodeAnalysis-Nunit-Result.xml;format=nunit2

I get error:
Unable to unload AppDomain, Unload thread timed out. Agent Process was terminated successfully after error.
The error occur both on teamcity and on plain windows cmd.
No nunit-agent process seems to hang after that.
The result file produced after this error is empty. I need it's content very bad :).
If I run the same command with nunit console 3.2.1 some test throws exception, but at the end no nunit error is thrown and result file is correct.

pavzaj commented Nov 9, 2016

Hello, I have similar problem. When I run my test suite like so:
D:\BuildAgent2\work\c11fa50512543249>D:\NUnit-3.5.0\nunit3-console.exe <testDllsNames> --where="cat!=FunctionalTest&&cat!=IntegrationTest&&cat!=StressTest" --agents=0 --dispose-runners --framework=net-4.5 --work=<workdir> --verbose --teamcity --result=.\CodeAnalysis-Nunit-Result.xml;format=nunit2

I get error:
Unable to unload AppDomain, Unload thread timed out. Agent Process was terminated successfully after error.
The error occur both on teamcity and on plain windows cmd.
No nunit-agent process seems to hang after that.
The result file produced after this error is empty. I need it's content very bad :).
If I run the same command with nunit console 3.2.1 some test throws exception, but at the end no nunit error is thrown and result file is correct.

@rprouse

This comment has been minimized.

Show comment
Hide comment
@rprouse

rprouse Nov 9, 2016

Member

@pavzaj what happens if you run from the command line and add the --inprocess command line?

Member

rprouse commented Nov 9, 2016

@pavzaj what happens if you run from the command line and add the --inprocess command line?

@rprouse

This comment has been minimized.

Show comment
Hide comment
@rprouse

rprouse Nov 9, 2016

Member

@CharliePoole do you think we should open a new issue to track the new reports. It is related, but the issue is different.

Member

rprouse commented Nov 9, 2016

@CharliePoole do you think we should open a new issue to track the new reports. It is related, but the issue is different.

@pavzaj

This comment has been minimized.

Show comment
Hide comment
@pavzaj

pavzaj Nov 10, 2016

With --inprocess (instead of --agents=0) it ends up with message "Exception encountered unloading AppDomain". As I said test suite throws some exceptions during execution with console runner 3.2.1. These exeptions does not show during execution with console runner 3.5.0, but the whole run ends with "Exception encountered unloading AppDomain".

The test suite is built with nunit framework 3.2.1.

pavzaj commented Nov 10, 2016

With --inprocess (instead of --agents=0) it ends up with message "Exception encountered unloading AppDomain". As I said test suite throws some exceptions during execution with console runner 3.2.1. These exeptions does not show during execution with console runner 3.5.0, but the whole run ends with "Exception encountered unloading AppDomain".

The test suite is built with nunit framework 3.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment