Eliminate extra services from agent #251

CharliePoole · 2017-07-02T16:47:17Z

Fixes #248
Fixes #255

Seemed easy... except...

Our tests in some cases depended on the extra services being present.
Our actual remoting operation had come to depend on the TestAgency service's initialization.

ChrisMaddock

Overall looks sensible - but this PR makes a few changes to the remoting process, which I'd like @rprouse or @jnm2 to review, as it seems to conflict with recent changes.

ChrisMaddock · 2017-07-03T12:34:10Z

src/NUnitConsole/nunit3-console.tests/ConsoleRunnerTests.cs

@@ -11,10 +11,15 @@ namespace NUnit.ConsoleRunner.Tests
 {
    class ConsoleRunnerTests
    {
-        [Test]
+        [Test, Ignore("Needs to be rewritten")]


When we merge, can we add an issue for this? Otherwise I think it might be forgotten. 🙂

We haven't done it before, but we haven't gone back and fixed these either. Maybe we need a general issue for cleaning ignores. I think we have one in the framework. Although, if we do, we still haven't fixed them over a long time.

I think @ChrisMaddock's idea is a really good one. It's better to have individual issues that can be triaged and eventually documented and fixed.

It's interesting to consider why this is so. It has also been a problem for me as a coach. In teams, I have highlighted Ignored tests by posting them on the wall. We do highlight them in the console runs, coloring them as warnings, but that does not seem to make us want to fix them. I'm willing to try issues but somebody will have to take a look at the list of warnings and create them. In so doing, they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.

but that does not seem to make us want to fix them

The reason is that

they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.

I have no idea what the rationale is behind someone else marking a test ignored. On the other hand, an issue marked bug and confirm have no such shades of meaning. Either case, it's a matter of education and establishing practices of regularly sweeping through with an eye to things like this. There are other things in issues as well to sweep for, like abandoned or un/mistriaged issues, so having practices like this all in one place makes issues a good candidate for tracking IMO.

Ideally, they should be telling you. That's why we made the Reason field required. But I agree you may not know. Anyhow... slightly separate from this issue.

I'm going to create an issue for this one - as a start. I'm not sure if anyone will come back to it, but it makes me more comfortable having it logged! 😄

ChrisMaddock · 2017-07-03T12:39:35Z

src/NUnitEngine/nunit-agent/Program.cs

-            // Owns the channel used for communications with the agency and with clients
-            var testAgencyServer = engine.Services.GetService<TestAgency>();
+            // Make sure we can communicate with the agency
+            ServerUtilities.GetTcpChannel("", 0, 2);


Picky - Could this just be ServerUtilities.GetTcpChannel() - so we're not duplicating the default values? The no-parameters overload calls the same as is written here, although I went to go look up what the parameters were in this case!

Why not keep the original lines instead? I don't understand what this gains us. We still have https://github.com/nunit/nunit-console/pull/251/files#diff-c6456741755862ce820b5efbe37974d0R70.

That's in a test, which includes TestAgency in a temporary context. Really, it should be using mocks or be moved to an integration test in my view.

The problem with the original code is that it uses a service (server) that allows the agent to create new agents, which is not supposed to exist in the agent. See my general comment.

ChrisMaddock · 2017-07-03T12:40:37Z

src/NUnitEngine/nunit-agent/Program.cs

-            try
-            {
-                // Unregister the channel
-                testAgencyServer.Stop();


This change means the channel is no longer unregistered, as far as I can see? Should we be calling ServerUtilities.SafeReleaseChannel() at some point?

Actually, the previous ServerBase.Stop() did more than just unregister the TCP channel, including stuff with @jnm2's fix for the agent crashes recently. Is none of this necessary in this case?

@jnm2 - you have a much deeper understanding of this process than I do, would you mind reviewing this one as well?

I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.

Another way we made our own bug into a feature.

We could do that for neatness, but it shouldn't be needed since the agent exits when done.

The agent process will cause a SocketException in the test process when the test process tells the agent process to shut down. As long as the test process is sending a kill signal to the agent process, we need to be running the code added in #223.

@CharliePoole

I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.

Another way we made our own bug into a feature.

I don't follow. Are you saying that TestAgency should not be a ServerBase?

Any ServerBase that needs remoting will be susceptible to the SocketException any time the remote instance of the ServerBase is shutting down, so unless I'm missing something, ServerBase is definitely where the fix belongs.

Maybe the confusion is that the same ServerBase type is used twice- one on the server side, one on the client side. The fix definitely has to be there inside the implementation of Stop so that when Stop executes on the remote side, it doesn't close the socket too soon. When Stop is called on the test process side, none of the code in Stop actually runs in the test process; it just causes the remote process to execute it which is exactly what needs to happen- no more, no less.

@jnm2 Check my general comment and go back to the original issue. The services that are initialized in agent's Main() are the services the agent is supposed to have. It's not supposed to have any class derived from ServerBase, 'cause it's not a server. Due to an accident of how I implemented the agent (reusing the entire engine) the code for the server is present, but we really don't want to start it just for it's side effects. We should instead figure out what a client actually needs and put it here - I suspect we will be putting it here again if we do that.

ChrisMaddock · 2017-07-03T12:44:03Z

src/NUnitEngine/nunit.engine.tests/Runners/TestEngineRunnerTests.cs

@@ -39,7 +39,7 @@ namespace NUnit.Engine.Runners.Tests
    //[TestFixture(typeof(ProcessRunner))]
    [TestFixture(typeof(MultipleTestDomainRunner), 1)]
    [TestFixture(typeof(MultipleTestDomainRunner), 3)]
-    [TestFixture(typeof(MultipleTestProcessRunner), 1)]
+    //[TestFixture(typeof(MultipleTestProcessRunner), 1)]


What's the change that required this commenting out, sorry?

Starting a ProcessRunner of any kind has not been possible for a long time. Really these should be ignored.

I frankly have no idea why MultipleTestProcessRunner was working with the TestAgency included and ProcessRunner was not. MultipleTestProcessRunner actually creates ProcessRunners. This would be a good issue to create.

@CharliePoole - could you create this issue? I feel you have a better idea of the background here than I do!

CharliePoole · 2017-07-03T15:01:24Z

src/NUnitEngine/nunit-agent/Program.cs

-            // Owns the channel used for communications with the agency and with clients
-            var testAgencyServer = engine.Services.GetService<TestAgency>();
+            // Make sure we can communicate with the agency
+            ServerUtilities.GetTcpChannel("", 0, 2);


I'm tempted to put the code itself here and not call server utilities as it may confuse people and you end up checking anyway.

jnm2 · 2017-07-03T15:19:57Z

src/NUnitEngine/nunit.engine/TestEngine.cs

-            Services.Add(new ResultService());
-            Services.Add(new TestFilterService());
+            // If caller added services beforehand, we don't add any
+            if (Services.ServiceCount == 0)


I'd rather do as much information hiding as possible. A bool property Services.Any seems better to me.

How about Services.Added to indicate they have been added but may or may not be initialized?

Sounds fine.

CharliePoole · 2017-07-03T16:14:13Z

General notes on architecture and the overall issue here, to clarify some of my comments.

The engine includes TestAgency, derived from ServerBase. TestAgents deal with TestAgency in a client/server relationship. The agent is the client and the agency is the server. This has always been a confusion to new folks working in this area and in the past I would explain it to each person when they got started. That approach is no longer scaling, I'm afraid, so maybe we need some comments somewhere in the code to explain it. What makes the agency the server is that it advertises itself and can be connected to. Agents OTOH can't be connected to, they must initiate the connection. In simple terms, the agent makes use of the agency to get assignments, which it then carries out, reporting back the results. It's a bit of an inversion of what one might think at first glance but not that uncommon: e.g. X11 has a similar "reversal of roles."

What makes it more confusing is the error trisMaddock found: We were starting up duplicate services as well as extra services that are not used in the agent. One of those was TestAgency. Because of the extra services, some tests worked without fake or mock services being created. I fixed all of those but one - that one required more time than I have right now. Then I discovered that the presence of TestAgency was required in order to make cross-process communication work. I'm pretty sure that was not always the case. I think it may have happened when somebody saw that we were starting TestAgency, didn't understand it was an error and simply made use of it in a fix. And now, of course, we know that @jnm2 made some fixes to ServerBase to fix a problem in the agent.

This may be a long fix and somebody else may have to finish it. Beyond that, there are a number of problems with testing runners. I think most of the runner tests should either be converted into true unit tests, making use of mocks, or we should move them to a new integration test project. For example, when we test a MultipleProcessRunner, we don't need real ProcessRunners to be called or processes started. We can call fake or mock ProcessRunners. I think this work is needed but should be separate from the fix.

The biggest issue of all is that the origin of all the problems in this area is the fact that I reused the engine as a part of the agent.exe. That means that all these extra classes are available in the agent to confuse things. Really, we should have a reduced engine in the agent executables. Ironically, the introduction of a .NET Standard build did much of what we need but also made it harder to work with because it only applies the necessary changes to the .NET standard build and adds changes of its own in order to conform to the standard. 🤕

I'll go back to the line comments now - hopefully this comment will clarify some of the shorter ones.

CharliePoole · 2017-07-03T17:45:51Z

@jnm2 I looked at #223, which I hadn't seen before. Practically everything in the agent code that confused me came from that issue: getting the service, disposing of it, the changes to ServerBase. That's actually a good thing, since we know the age of the introduced changes and it's relatively young.

I think you should look at your changes again and reconsider each item in the light of it's only being used in the testing process, not the agent process. If stuff is not needed, then it should be removed, of course. Maybe we need a ClientUtility or AgentUtility to handle the client side of this or maybe it can be put into some pre-existing class.

Unfortunately, there's a lot of stuff in the engine now that I don't know about along with a bunch of other stuff where I may well be the only person to know about it. That comes from too many years of its being my private preserve. For a few months, I haven't really been able to work on the engine because of changes introduced for VS2017 (I think) but I grabbed this one because I figured I caused the original problem. I do my best within the IDE, but since I can't run the script, I rely on CI to tell me if there are errors in the .NET Standard stuff. So, it may take more than one person and more than one issue to get it all straight.

CharliePoole · 2017-07-04T22:42:48Z

@jnm2 @ChrisMaddock I don't have any more time to work on this issue till after my vacation. Feel free to take it and continue if you have time.

rprouse · 2017-07-05T00:05:04Z

Aside from the comments already made, this looks good. I think the main question remaining is if we still need an equivalent of the Stop() call. @CharliePoole you said that you don't have time to work on this, so one of us can pick it up. I don't mind doing it, but you are welcome to it @jnm2 if you want it.

@CharliePoole you nailed it when you stated that the .NET Standard build is only the code and services needed by the agent side. It doesn't add too much though, it just combines the API and the Engine code into one assembly because an agent doesn't need to load an engine in the same way. I did separate out some of the code to make the least impact on the main engine code, but it would be nice to bring them back together at some point.

The main issue I had with the .NET Standard version was the services and extensions. I really want to add them back in, but loading assemblies in other directories is a lot more limited in the lower versions of .NET Standard so the walking up and searching other NuGet packages wasn't possible, or at least in the time I had.

CharliePoole · 2017-07-05T00:20:25Z

One thought - maybe we should do a separate build of the engine for use with the agent. We would leave out the things that don't belong in the agent under any runtime. Then the .NET Standard build would leave out a few more things. That way it would not be possible to activate anything that is not intended to be included in the agent.

rprouse · 2017-07-05T13:27:03Z

One thought - maybe we should do a separate build of the engine for use with the agent. We would leave out the things that don't belong in the agent under any runtime. Then the .NET Standard build would leave out a few more things. That way it would not be possible to activate anything that is not intended to be included in the agent.

I was thinking along the same lines, probably by taking the .NET Standard build, adding back in what we need and multitargeting it. That would also allow us to create .NET Core agents for the eventual goal of running .NET Core tests from the main console. I think I would prefer to wait until post release for that level of refactor though 😄

jnm2 · 2017-07-08T00:49:15Z

@CharliePoole

Thanks for the explanation. The duplicate services also confused me. It turns out I picked the wrong duplicate to remove when I ran into this: 33e75f2#r117599764

I can't approve any PR that doesn't address the channel stop issue. I'm looking now to see what I would have done if I had chosen the correct duplicate to die.

jnm2 · 2017-07-08T00:52:58Z

I disagree that we should be using ServerUtilities.GetTcpChannel("", 0, 2); unencapsulated, directly in Program.cs.

jnm2 · 2017-07-08T00:56:16Z

Can I put the channel handling in RemoteTestAgent? @CharliePoole @rprouse @ChrisMaddock

CharliePoole · 2017-07-08T00:57:55Z

You mean TestAgency, I take it. There is no duplicate of TestAgency. TestAgency was never supposed to be part of the agent because it's the service that launches agents!

The fact that it was ever initialized was the problem. So if it's not initialized, you don't have to stop it except for the side effect of closing the channel. However, I think you should pull this branch and experiment with it to see if the problem you were fixing is still fixed. If it was caused by failing to stop the agent, then not starting the agent in the first place will most likely eliminate the problem. IN fact, this spurious TestAgency may have caused a lot of our problems.

CharliePoole · 2017-07-08T00:58:32Z

And just think, we never would have known about it without @ChrisMaddock noticing an odd bit of code!

jnm2 · 2017-07-08T00:58:47Z

Aside: it should have bothered me more when I noticed I was doing engine.Services.GetService<TestAgency>() near Activator.GetObject(typeof(ITestAgency), AgencyUrl) as ITestAgency. I was in too much of a hurry getting the fix out.

jnm2 · 2017-07-08T00:59:51Z

@CharliePoole No, I meant the duplicate channel which was hogging my message counter.

jnm2 · 2017-07-08T01:24:53Z

Anyhow. The Activator code is coupled to the channel being opened, so I'll encapsulate that.

jnm2 · 2017-07-08T02:10:52Z

Confirmed that the SocketException fix is still needed simply because the process exists too fast.

I have a strong sense that this architecture is heading in the right direction. More fun than the original fix.
Almost ready for review. Should I push a new branch?

jnm2 · 2017-07-08T02:15:56Z

@CharliePoole @rprouse @ChrisMaddock

This encapsulation makes me happy. What do you say?

https://github.com/nunit/nunit-console/compare/issue-248...issue-248_channel_encapsulation?expand=1
If it's helpful I can provide commentary on the changes, but I'm interested in your impression.

CharliePoole · 2017-07-08T02:21:43Z

Your encapsulation is good. I'd like it better if it didn't call ServerUtilities, but that's more of a naming quibble than anything else.

jnm2 · 2017-07-08T02:28:24Z

@CharliePoole More than happy to rename that thing RemotingUtilities because that's precisely what it is.

CharliePoole · 2017-07-08T02:30:46Z

@jnm2 No reason you can't just take over the branch and PR - that's one reason we all have commit authority.

jnm2 · 2017-07-08T02:33:01Z

@CharliePoole Can I delete the ITestAgent.Agency and TestAgent.Agency property? It's not used anywhere at all. I don't see any reason for code which uses a test agent to be able to reach the agency.

CharliePoole · 2017-07-08T02:36:47Z

It's an internal interface (not in the API assembly) so we can change it without hurting anyone who follows the rules. 😃

jnm2 · 2017-07-08T02:45:55Z

Pushed three commits to this PR.

jnm2 · 2017-07-08T02:56:39Z

Forgot to rename ServerUtilityTests. Force pushed.

…should not have access to the agency

jnm2 · 2017-07-09T17:04:08Z

@ChrisMaddock Thanks. My push seems to have dismissed your review, and I added a commit.

jnm2 · 2017-07-09T17:10:27Z

I'm going to improve TcpChannelTests. Too many asserts per test.

jnm2 · 2017-07-09T17:12:08Z

@nunit/framework-team What's the thought on GetTcpChannel_returns_same_channel_for_same_name_different_port instead of GetTcpChannelReturnsSameChannelForSameNameDifferentPort?

I guess I could always just push and find out. 😈

…to track release responsibility.

ChrisMaddock · 2017-07-10T08:03:05Z

I really don't mind how you name your tests, so long as I don't have to start stretching to the underscore key when I write mine! 😄

CharliePoole · 2017-07-10T09:20:35Z

I don't think one test in a class should use a different naming convention from the others.

I don't think we should change all the named either.

I don't know if the naming standards still apply but IIRC they call for Pascal casing.

IMO Ruby-style underscored names are only useful with a BDD framework that changes them too spaces. I also use them in a minor way, to distinguish special cases.

jnm2 · 2017-07-10T15:46:58Z

Hey, no harm in trying. I think it's more readable because test names tend to be exceptionally long, and underscores to me are exactly as readable as spaces. But I'm happy to stick to the standard which doesn't have a exception for absurdly long names. 😛

CharliePoole · 2017-07-10T15:53:04Z

To me,it's not so much about naming as about the "prime directive" of not changing the style of an existing file. Probably not a big deal in this case, but you did ask. 😄

We do have entire classes that use the style you prefer.

jnm2 · 2017-07-10T18:06:57Z

As I understood it, the prime directive is not that we should preserve existing style at all costs, but that we should not change style for its own sake. In this case, I rewrote the entire file from scratch for non-stylistic reasons.

jnm2 · 2017-07-10T18:11:06Z

Fixed now though =)

rprouse

This looks good to me too. Tests are running and passing, my only reservation is that I hope to get a release out soon. I am worried that a change this large so close to release might be dangerous. Thought?

jnm2 · 2017-07-11T00:25:44Z

I'm happy either way.

rprouse · 2017-07-11T01:23:57Z

@jnm2 I respect your opinion, how thoroughly have you tested these fixes and how confident are you in the changes? I hope to get the release out this week, should we wait to merge? I see the risk as moderate compared to a minor gain, so I am inclined to wait, but if I am misreading the value of the fix, then that might change my calculation.

CharliePoole · 2017-07-11T01:33:22Z

In that case, just use a style that you prefer, consistent with any standards that exist rather than asking for a discussion of style!

jnm2 · 2017-07-11T01:47:59Z

@rprouse I'm sufficiently aware of the difficulty of predicting side effects that I can't justify the risk vs gain either. I too would rather we sat on it then.

@CharliePoole I'm sorry. I'll do less talking next time. =)

CharliePoole · 2017-07-11T04:15:32Z

Thus was a low priority simple fix that got complicated. It's a serious problem with the extra test agency being started and I suspect it's the cause of a few agent problems we have had.

That said, it's a problem that we lived with through several releases.

I'd only merge if we are pretty sure it's at least an improvement.

jnm2 · 2017-07-12T18:18:51Z

We have evidence that my original PR only solves the problem for some people and that this PR solves it in one scenario where mine did not. I'd say we should get this in 3.7.

rprouse · 2017-07-13T12:59:02Z

@jnm2 I read through #255 and I agree with your assessment. Let's include this in the release. I have more confidence now that this has been run against large production test suites and fixes their issues.

CharliePoole added 2 commits July 2, 2017 08:05

Eliminate duplicate services

d4d7041

Fix communications problem by initializing TCP channel

fb8280e

ChrisMaddock reviewed Jul 3, 2017

View reviewed changes

CharliePoole commented Jul 3, 2017

View reviewed changes

jnm2 reviewed Jul 3, 2017

View reviewed changes

Encapsulated channel and agency responsibilities

9f44bc3

jnm2 force-pushed the issue-248 branch from 46b2347 to 60f2c3c Compare July 8, 2017 02:56

ChrisMaddock mentioned this pull request Jul 9, 2017

Rewrite ConsoleRunnerTests.ThrowsNUnitEngineExceptionWhenTestResultsAreNotWriteable() #256

Closed

jnm2 added 2 commits July 9, 2017 13:01

Renamed ServerUtilities to TcpChannelUtilities

7ac1f3d

Removed ITestAgeny.Agency and TestAgent.Agency- code using the agent …

6b54094

…should not have access to the agency

jnm2 dismissed ChrisMaddock’s stale review via 6b54094 July 9, 2017 17:01

jnm2 force-pushed the issue-248 branch from 60f2c3c to 6b54094 Compare July 9, 2017 17:01

jnm2 force-pushed the issue-248 branch from 9a1502f to 3f6d3b8 Compare July 9, 2017 17:46

Removed SafeReleaseChannel because it's unused and it was a poor way …

2834d2e

…to track release responsibility.

jnm2 force-pushed the issue-248 branch from 3f6d3b8 to 09eb395 Compare July 9, 2017 17:51

Refactored TcpChannelUtilsTests

cee4d2f

jnm2 force-pushed the issue-248 branch from 09eb395 to cee4d2f Compare July 10, 2017 18:11

rprouse approved these changes Jul 11, 2017

View reviewed changes

jnm2 mentioned this pull request Jul 11, 2017

SocketException thrown during console run #255

Closed

rprouse merged commit 008e02f into master Jul 13, 2017

rprouse deleted the issue-248 branch July 13, 2017 12:59

Eliminate extra services from agent #251

Eliminate extra services from agent #251

Conversation

CharliePoole commented Jul 2, 2017 • edited by rprouse Loading

ChrisMaddock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisMaddock Jul 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnm2 Jul 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharliePoole commented Jul 3, 2017

CharliePoole commented Jul 3, 2017

CharliePoole commented Jul 4, 2017

rprouse commented Jul 5, 2017

CharliePoole commented Jul 5, 2017

rprouse commented Jul 5, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017

CharliePoole commented Jul 8, 2017

CharliePoole commented Jul 8, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017 • edited Loading

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017 • edited Loading

CharliePoole commented Jul 8, 2017

jnm2 commented Jul 8, 2017

CharliePoole commented Jul 8, 2017

jnm2 commented Jul 8, 2017 • edited Loading

CharliePoole commented Jul 8, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 8, 2017

jnm2 commented Jul 9, 2017 • edited Loading

jnm2 commented Jul 9, 2017

jnm2 commented Jul 9, 2017

ChrisMaddock commented Jul 10, 2017

CharliePoole commented Jul 10, 2017

jnm2 commented Jul 10, 2017 • edited Loading

CharliePoole commented Jul 10, 2017

jnm2 commented Jul 10, 2017 • edited Loading

jnm2 commented Jul 10, 2017

rprouse left a comment

Choose a reason for hiding this comment

jnm2 commented Jul 11, 2017

rprouse commented Jul 11, 2017

CharliePoole commented Jul 11, 2017

jnm2 commented Jul 11, 2017

CharliePoole commented Jul 11, 2017

jnm2 commented Jul 12, 2017

rprouse commented Jul 13, 2017

CharliePoole commented Jul 2, 2017 •

edited by rprouse

Loading

ChrisMaddock Jul 3, 2017 •

edited

Loading

jnm2 Jul 3, 2017 •

edited

Loading

jnm2 commented Jul 8, 2017 •

edited

Loading

jnm2 commented Jul 8, 2017 •

edited

Loading

jnm2 commented Jul 8, 2017 •

edited

Loading

jnm2 commented Jul 9, 2017 •

edited

Loading

jnm2 commented Jul 10, 2017 •

edited

Loading

jnm2 commented Jul 10, 2017 •

edited

Loading