Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locking to AgentDatabase #183

Merged
merged 5 commits into from Mar 4, 2017
Merged

Add locking to AgentDatabase #183

merged 5 commits into from Mar 4, 2017

Conversation

ChrisMaddock
Copy link
Member

Fixes #168 . This is targeting release/3.6 as I think it may need to be a hotfix - although I don't think that decisions been made yet, so will move it over if not.

This adds a lock object to the AgentDatabase, to allow access in parallel. This was introduced by the just-in-time agent loading fixed in 3.5, and will be required if we parallelise load/explore etc.

Many of the methods in AgentDatabase were no longer used - so I took the liberty of removing them rather than making them safe. The class is now solely additive.

I wanted to add a test to ensure agent creation can be parallelised, but wasn't able to reproduce the error. The test I had (and removed) is below - open to any ideas as to how we might be able to put something in to prevent regression here. (Note the below additionally doesn't work due to #173 - but once I'd hacked around that, I still wasn't able to reproduce the error from #168.)

        [Test]
        public void CanCreateAgentsInParallel()
        {
            const int agents = 10;
            var workerPool = new ParallelTaskWorkerPool(agents);

            for (var i = 0; i < agents; i++)
            {
                var task = new LaunchAgentTask(_testAgency);
                workerPool.Enqueue(task);
            }

            TestDelegate createAgents = () =>
            {
                workerPool.Start();
                workerPool.WaitAll();
            };
            Assert.That(createAgents, Throws.Nothing);
        }

        private class LaunchAgentTask : ITestExecutionTask
        {
            private readonly TestAgency _agency;

            public LaunchAgentTask(TestAgency agency)
            {
                _agency = agency;
            }

            public void Execute()
            {
                _agency.GetAgent(new TestPackage("."), Timeout.Infinite);
            }
        }

Copy link
Contributor

@CharliePoole CharliePoole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at both the V2 and our current TestAgency, I see that my guess in the line comment was correct: the code contained calls to the AgentDataBase that were commented out. I recall this was because of threading issues, which we are hopefully now solving.

Originally, the agency was notified when a process exited and OnProcessExit was called. At the end of a run, Clear was called.

There was also code to reuse existing agents, which is something the gui made use of in V2. When you reloaded a set of tests, the agency would check to see if the existing process had the correct framework and bitness to allow it to be reused. I don't know if we will do that in the new gui.

I'd say that we should retain Remove and Clear and reinstate their use by TestAgency.

As for testing, I think we need a unit test of AgentDataBase, including hitting it with multiple threads. That would be a much narrower test than what you were trying to do. Of course that means either pulling AgentDataBase out to a separate file or at least making it accessible as a nested class. Can a nested class be internal? Separating it would be a little more work, but cleaner.

public void Remove(Guid agentId)
{
agentData.Remove(agentId);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the removal of unused methods, but I'm very surprised that this is unused. Seems like we should be removing entries when we stop the runner.

We've known about the problem of thread-safety for a long time - before the latest changes made it impossible to ignore. So it's possible that some of these method have calls that were deleted or commented out. Let's take a look at TestAgency to see if that's so.

@ChrisMaddock
Copy link
Member Author

ChrisMaddock commented Feb 25, 2017

Thanks @CharliePoole - will have a look tomorrow.

Which bits of commented code do you think should be be re-enabled? I'm a little wary of blindly making changes to the agent termination code, given the issues we have and had have with remoting exceptions here. Would it perhaps be better if this PR just puts back the relevant bits of the AgentDatabase, and the other commented out code is looked at in a separate issue?

@CharliePoole
Copy link
Contributor

I'd be OK with leaving the TestAgency changes re-enabling use of the other methods to a separate PR. For this one, I think we should re-enable Remove and Clear and add the lock to them. Ideally, we should add some tests too, but it depends on how much time you have right now. We shouldn't start to actually use those other methods till we have tested them with multiple clients.

@ChrisMaddock
Copy link
Member Author

I didn't manage to get back to this last weekend, and I'm not sure I will this weekend either. If anyone wants to take over this branch, you're welcome to - otherwise I'll get back to it when time allows!

@CharliePoole
Copy link
Contributor

@ChrisMaddock I pulled this down to give it a try and it's hanging when I try to run the engine tests. Since it was passing in AppVeyor, it must be something different on my system.

@CharliePoole
Copy link
Contributor

@ChrisMaddock Resolved my problem. Writing some tests now.

@ChrisMaddock
Copy link
Member Author

Great - thanks @CharliePoole and @thinkbeforecoding!

@thinkbeforecoding
Copy link

What's still needed to make it ready ?

@CharliePoole
Copy link
Contributor

I need to decide whether to keep and use the currently commented methods . I'll look at it in the morning.

@CharliePoole
Copy link
Contributor

I think this is now ready to merge. Since it has changed substantially, I'd like a @nunit/framework-team member to (re)review it.

AgentDataBase is no longer an internal class to TestAgency. It's in its own file and has its own tests.

I left some commented-out code in place in TestAgency, because I think we may want to re-activate it to fix some other issues. For the same reason, I left in a few AgentDataBase methods that are not now called by TestAgency.

I'm not completely happy with adding the Snapshot facility to AgentDataBase since it serves only to enable testing more easily. However, it simplified things enormously when I added it and we can always refactor it in another PR if we want.

@CharliePoole
Copy link
Contributor

Well, my "clever" use of nameof causes a warning under Mono, which makes the build fail.

I did this...

private static int[] Counts = new int[] { 1, 3, 10 };

const string COUNTS = nameof(Counts);

which allows me to use TestCaseSource(COUNTS).

However, Counts is seen as never used! 😠

@CharliePoole CharliePoole force-pushed the issue-168 branch 2 times, most recently from 771d0f9 to 82356f5 Compare March 4, 2017 02:20
@CharliePoole
Copy link
Contributor

Tests failed once in travis and passed on re-running. Failure was an issue with how we run multiple TestAgency instances in our own tests, not something that comes up in normal execution of user tests.

@ChrisMaddock
Copy link
Member Author

Thanks for picking this up @CharliePoole. Just to note - I started this PR based of the release-3.6 branch, we'll need to make sure this is either released as a hotfix, or pulled into master.

@CharliePoole CharliePoole restored the issue-168 branch March 5, 2017 18:56
@CharliePoole
Copy link
Contributor

@ChrisMaddock @rprouse

Well darn it! I knew it was on release/3.6 but forgot it when I was doing the merge. I restored the branch but didn't do anything else yet. We need to decide what we want before I do more.

First thing would be to decide about a hotfix release. I'm not too keen on it. The latest framework hotfix required a lot of work by Rob and I was planning to send you guys a note suggesting we change our view of hotfixes. I'm thinking we should do one of two things:

  1. Reduce the time between releases so we have less need of hotfixes. Just releasing what's in master is really quite easy compared to hot fixes.

  2. Reduce the time between releases and just stop doing hotfixes. If something comes up that's urgent, just release everything as either a new minor version upgrade or a third-digit update, depending on whether there are any features included.

I prefer the second approach.

If you agree with me, then I think I should back out the merge and redo it into master. I'll wait to see what your opinion is. If we do back it out, I don't think we want an offseting commit. I would just do it locally and force push to the branch since we haven't yet released anything from it beyond 3.6 itself.

I think we need to decide on this rather quickly.

@ChrisMaddock
Copy link
Member Author

I have no problem with releasing everything as needed, instead of cherry picking commits.

Does that need to affect the time between main releases?

@CharliePoole
Copy link
Contributor

I hope the time gets shorter. Seems to me that if we focus on a single thread of development, we can try to get really good at releasing and automate as much as possible.

@rprouse I think this lies with you to a great extent, since you have been doing most of the work of releasing.

@rprouse
Copy link
Member

rprouse commented Mar 6, 2017

I am okay with releasing early instead of doing a hotfix, but I don't really want to reduce the normal cadence of releases. No matter how much we automate it, it still takes time to do releases and I think a normal cadence of more than quarterly creates too much churn.

That said, this fix is on the 3.6 branch, so I would be fine with doing a 3.6.1 hotfix with just this fix. If there are not other fixes to be cherry-picked into this hotfix, doing a hotfix on this branch is just as easy as doing a full release. At this point, I think I would prefer to do the hotfix.

@thinkbeforecoding
Copy link

Yes, not waiting too long to release the fix would be great since it makes build randomly fail.
This causes some unnecessary time waste in the CI tool chaine.

@CharliePoole
Copy link
Contributor

Let's do the hotfix for this one then and deal with others as they come up.

@rprouse
Copy link
Member

rprouse commented Mar 6, 2017

I will start the release tonight and hopefully get it out tomorrow or Wed.

@rprouse rprouse deleted the issue-168 branch March 18, 2017 17:18
tritao added a commit to mono/CppSharp that referenced this pull request Apr 25, 2017
…ent errors.

See nunit/nunit-console#183.

Remove this workaround once we update NUnit consoler runner version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants