Add locking to AgentDatabase #183

ChrisMaddock · 2017-02-25T14:12:40Z

Fixes #168 . This is targeting release/3.6 as I think it may need to be a hotfix - although I don't think that decisions been made yet, so will move it over if not.

This adds a lock object to the AgentDatabase, to allow access in parallel. This was introduced by the just-in-time agent loading fixed in 3.5, and will be required if we parallelise load/explore etc.

Many of the methods in AgentDatabase were no longer used - so I took the liberty of removing them rather than making them safe. The class is now solely additive.

I wanted to add a test to ensure agent creation can be parallelised, but wasn't able to reproduce the error. The test I had (and removed) is below - open to any ideas as to how we might be able to put something in to prevent regression here. (Note the below additionally doesn't work due to #173 - but once I'd hacked around that, I still wasn't able to reproduce the error from #168.)

        [Test]
        public void CanCreateAgentsInParallel()
        {
            const int agents = 10;
            var workerPool = new ParallelTaskWorkerPool(agents);

            for (var i = 0; i < agents; i++)
            {
                var task = new LaunchAgentTask(_testAgency);
                workerPool.Enqueue(task);
            }

            TestDelegate createAgents = () =>
            {
                workerPool.Start();
                workerPool.WaitAll();
            };
            Assert.That(createAgents, Throws.Nothing);
        }

        private class LaunchAgentTask : ITestExecutionTask
        {
            private readonly TestAgency _agency;

            public LaunchAgentTask(TestAgency agency)
            {
                _agency = agency;
            }

            public void Execute()
            {
                _agency.GetAgent(new TestPackage("."), Timeout.Infinite);
            }
        }

CharliePoole

Looking at both the V2 and our current TestAgency, I see that my guess in the line comment was correct: the code contained calls to the AgentDataBase that were commented out. I recall this was because of threading issues, which we are hopefully now solving.

Originally, the agency was notified when a process exited and OnProcessExit was called. At the end of a run, Clear was called.

There was also code to reuse existing agents, which is something the gui made use of in V2. When you reloaded a set of tests, the agency would check to see if the existing process had the correct framework and bitness to allow it to be reused. I don't know if we will do that in the new gui.

I'd say that we should retain Remove and Clear and reinstate their use by TestAgency.

As for testing, I think we need a unit test of AgentDataBase, including hitting it with multiple threads. That would be a much narrower test than what you were trying to do. Of course that means either pulling AgentDataBase out to a separate file or at least making it accessible as a nested class. Can a nested class be internal? Separating it would be a little more work, but cleaner.

CharliePoole · 2017-02-25T17:20:28Z

src/NUnitEngine/nunit.engine/Services/TestAgency.cs

-            public void Remove(Guid agentId)
-            {
-                agentData.Remove(agentId);
-            }


I agree with the removal of unused methods, but I'm very surprised that this is unused. Seems like we should be removing entries when we stop the runner.

We've known about the problem of thread-safety for a long time - before the latest changes made it impossible to ignore. So it's possible that some of these method have calls that were deleted or commented out. Let's take a look at TestAgency to see if that's so.

ChrisMaddock · 2017-02-25T23:22:12Z

Thanks @CharliePoole - will have a look tomorrow.

Which bits of commented code do you think should be be re-enabled? I'm a little wary of blindly making changes to the agent termination code, given the issues we have and had have with remoting exceptions here. Would it perhaps be better if this PR just puts back the relevant bits of the AgentDatabase, and the other commented out code is looked at in a separate issue?

CharliePoole · 2017-02-26T01:15:08Z

I'd be OK with leaving the TestAgency changes re-enabling use of the other methods to a separate PR. For this one, I think we should re-enable Remove and Clear and add the lock to them. Ideally, we should add some tests too, but it depends on how much time you have right now. We shouldn't start to actually use those other methods till we have tested them with multiple clients.

ChrisMaddock · 2017-03-01T18:54:53Z

I didn't manage to get back to this last weekend, and I'm not sure I will this weekend either. If anyone wants to take over this branch, you're welcome to - otherwise I'll get back to it when time allows!

CharliePoole · 2017-03-02T02:10:14Z

@ChrisMaddock I pulled this down to give it a try and it's hanging when I try to run the engine tests. Since it was passing in AppVeyor, it must be something different on my system.

CharliePoole · 2017-03-02T21:16:43Z

@ChrisMaddock Resolved my problem. Writing some tests now.

ChrisMaddock · 2017-03-02T22:47:34Z

Great - thanks @CharliePoole and @thinkbeforecoding!

thinkbeforecoding · 2017-03-03T08:12:36Z

What's still needed to make it ready ?

CharliePoole · 2017-03-03T08:14:58Z

I need to decide whether to keep and use the currently commented methods . I'll look at it in the morning.

CharliePoole · 2017-03-03T23:07:43Z

I think this is now ready to merge. Since it has changed substantially, I'd like a @nunit/framework-team member to (re)review it.

AgentDataBase is no longer an internal class to TestAgency. It's in its own file and has its own tests.

I left some commented-out code in place in TestAgency, because I think we may want to re-activate it to fix some other issues. For the same reason, I left in a few AgentDataBase methods that are not now called by TestAgency.

I'm not completely happy with adding the Snapshot facility to AgentDataBase since it serves only to enable testing more easily. However, it simplified things enormously when I added it and we can always refactor it in another PR if we want.

CharliePoole · 2017-03-04T00:38:58Z

Well, my "clever" use of nameof causes a warning under Mono, which makes the build fail.

I did this...

private static int[] Counts = new int[] { 1, 3, 10 };

const string COUNTS = nameof(Counts);

which allows me to use TestCaseSource(COUNTS).

However, Counts is seen as never used! 😠

CharliePoole · 2017-03-04T21:30:28Z

Tests failed once in travis and passed on re-running. Failure was an issue with how we run multiple TestAgency instances in our own tests, not something that comes up in normal execution of user tests.

ChrisMaddock · 2017-03-05T13:45:57Z

Thanks for picking this up @CharliePoole. Just to note - I started this PR based of the release-3.6 branch, we'll need to make sure this is either released as a hotfix, or pulled into master.

CharliePoole · 2017-03-05T19:06:27Z

@ChrisMaddock @rprouse

Well darn it! I knew it was on release/3.6 but forgot it when I was doing the merge. I restored the branch but didn't do anything else yet. We need to decide what we want before I do more.

First thing would be to decide about a hotfix release. I'm not too keen on it. The latest framework hotfix required a lot of work by Rob and I was planning to send you guys a note suggesting we change our view of hotfixes. I'm thinking we should do one of two things:

Reduce the time between releases so we have less need of hotfixes. Just releasing what's in master is really quite easy compared to hot fixes.
Reduce the time between releases and just stop doing hotfixes. If something comes up that's urgent, just release everything as either a new minor version upgrade or a third-digit update, depending on whether there are any features included.

I prefer the second approach.

If you agree with me, then I think I should back out the merge and redo it into master. I'll wait to see what your opinion is. If we do back it out, I don't think we want an offseting commit. I would just do it locally and force push to the branch since we haven't yet released anything from it beyond 3.6 itself.

I think we need to decide on this rather quickly.

ChrisMaddock · 2017-03-05T21:28:30Z

I have no problem with releasing everything as needed, instead of cherry picking commits.

Does that need to affect the time between main releases?

CharliePoole · 2017-03-05T23:05:19Z

I hope the time gets shorter. Seems to me that if we focus on a single thread of development, we can try to get really good at releasing and automate as much as possible.

@rprouse I think this lies with you to a great extent, since you have been doing most of the work of releasing.

rprouse · 2017-03-06T14:16:29Z

I am okay with releasing early instead of doing a hotfix, but I don't really want to reduce the normal cadence of releases. No matter how much we automate it, it still takes time to do releases and I think a normal cadence of more than quarterly creates too much churn.

That said, this fix is on the 3.6 branch, so I would be fine with doing a 3.6.1 hotfix with just this fix. If there are not other fixes to be cherry-picked into this hotfix, doing a hotfix on this branch is just as easy as doing a full release. At this point, I think I would prefer to do the hotfix.

thinkbeforecoding · 2017-03-06T14:25:57Z

Yes, not waiting too long to release the fix would be great since it makes build randomly fail.
This causes some unnecessary time waste in the CI tool chaine.

CharliePoole · 2017-03-06T14:43:29Z

Let's do the hotfix for this one then and deal with others as they come up.

rprouse · 2017-03-06T14:54:59Z

I will start the release tonight and hopefully get it out tomorrow or Wed.

…ent errors. See nunit/nunit-console#183. Remove this workaround once we update NUnit consoler runner version.

Fixes #168 - add locking to AgentDatabase

6fef1f3

ChrisMaddock requested a review from CharliePoole February 25, 2017 14:12

jnm2 approved these changes Feb 25, 2017

View reviewed changes

CharliePoole suggested changes Feb 25, 2017

View reviewed changes

thinkbeforecoding mentioned this pull request Mar 2, 2017

#168 : Fix race condition in AgentDatabase #185

Closed

CharliePoole added 2 commits March 2, 2017 16:08

Move AgentDataBase to a separate file and add tests

ced9cb3

Remove unused code from TestAgency

c811685

Cleanup and refactoring

3ed03e6

CharliePoole approved these changes Mar 3, 2017

View reviewed changes

CharliePoole force-pushed the issue-168 branch 2 times, most recently from 771d0f9 to 82356f5 Compare March 4, 2017 02:20

Disable warning message

0159b91

CharliePoole force-pushed the issue-168 branch from 82356f5 to 0159b91 Compare March 4, 2017 02:29

CharliePoole merged commit 082e342 into release/3.6 Mar 4, 2017

CharliePoole deleted the issue-168 branch March 4, 2017 21:31

CharliePoole mentioned this pull request Mar 4, 2017

Intermittent errors while running tests after updating to 3.6 #168

Closed

CharliePoole restored the issue-168 branch March 5, 2017 18:56

rprouse mentioned this pull request Mar 7, 2017

Preparing for 3.6.1 Hotfix Release #187

Closed

rprouse deleted the issue-168 branch March 18, 2017 17:18

tritao added a commit to mono/CppSharp that referenced this pull request Apr 25, 2017

[tests] Pass --agents=1 to NUnit console test runner to fix intermitt…

b6f400d

…ent errors. See nunit/nunit-console#183. Remove this workaround once we update NUnit consoler runner version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add locking to AgentDatabase #183

Add locking to AgentDatabase #183

ChrisMaddock commented Feb 25, 2017

CharliePoole left a comment

CharliePoole Feb 25, 2017

ChrisMaddock commented Feb 25, 2017 •

edited

CharliePoole commented Feb 26, 2017

ChrisMaddock commented Mar 1, 2017

CharliePoole commented Mar 2, 2017

CharliePoole commented Mar 2, 2017

ChrisMaddock commented Mar 2, 2017

thinkbeforecoding commented Mar 3, 2017

CharliePoole commented Mar 3, 2017

CharliePoole commented Mar 3, 2017

CharliePoole commented Mar 4, 2017

CharliePoole commented Mar 4, 2017

ChrisMaddock commented Mar 5, 2017

CharliePoole commented Mar 5, 2017

ChrisMaddock commented Mar 5, 2017

CharliePoole commented Mar 5, 2017

rprouse commented Mar 6, 2017

thinkbeforecoding commented Mar 6, 2017

CharliePoole commented Mar 6, 2017

rprouse commented Mar 6, 2017

Add locking to AgentDatabase #183

Add locking to AgentDatabase #183

Conversation

ChrisMaddock commented Feb 25, 2017

CharliePoole left a comment

Choose a reason for hiding this comment

CharliePoole Feb 25, 2017

Choose a reason for hiding this comment

ChrisMaddock commented Feb 25, 2017 • edited

CharliePoole commented Feb 26, 2017

ChrisMaddock commented Mar 1, 2017

CharliePoole commented Mar 2, 2017

CharliePoole commented Mar 2, 2017

ChrisMaddock commented Mar 2, 2017

thinkbeforecoding commented Mar 3, 2017

CharliePoole commented Mar 3, 2017

CharliePoole commented Mar 3, 2017

CharliePoole commented Mar 4, 2017

CharliePoole commented Mar 4, 2017

ChrisMaddock commented Mar 5, 2017

CharliePoole commented Mar 5, 2017

ChrisMaddock commented Mar 5, 2017

CharliePoole commented Mar 5, 2017

rprouse commented Mar 6, 2017

thinkbeforecoding commented Mar 6, 2017

CharliePoole commented Mar 6, 2017

rprouse commented Mar 6, 2017

ChrisMaddock commented Feb 25, 2017 •

edited