Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate extra services from agent #251

Merged
merged 7 commits into from
Jul 13, 2017
Merged

Eliminate extra services from agent #251

merged 7 commits into from
Jul 13, 2017

Conversation

CharliePoole
Copy link
Collaborator

@CharliePoole CharliePoole commented Jul 2, 2017

Fixes #248
Fixes #255

Seemed easy... except...

  • Our tests in some cases depended on the extra services being present.
  • Our actual remoting operation had come to depend on the TestAgency service's initialization.

Copy link
Member

@ChrisMaddock ChrisMaddock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks sensible - but this PR makes a few changes to the remoting process, which I'd like @rprouse or @jnm2 to review, as it seems to conflict with recent changes.

@@ -11,10 +11,15 @@ namespace NUnit.ConsoleRunner.Tests
{
class ConsoleRunnerTests
{
[Test]
[Test, Ignore("Needs to be rewritten")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we merge, can we add an issue for this? Otherwise I think it might be forgotten. 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't done it before, but we haven't gone back and fixed these either. Maybe we need a general issue for cleaning ignores. I think we have one in the framework. Although, if we do, we still haven't fixed them over a long time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ChrisMaddock's idea is a really good one. It's better to have individual issues that can be triaged and eventually documented and fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to consider why this is so. It has also been a problem for me as a coach. In teams, I have highlighted Ignored tests by posting them on the wall. We do highlight them in the console runs, coloring them as warnings, but that does not seem to make us want to fix them. I'm willing to try issues but somebody will have to take a look at the list of warnings and create them. In so doing, they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that does not seem to make us want to fix them

The reason is that

they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.

I have no idea what the rationale is behind someone else marking a test ignored. On the other hand, an issue marked bug and confirm have no such shades of meaning. Either case, it's a matter of education and establishing practices of regularly sweeping through with an eye to things like this. There are other things in issues as well to sweep for, like abandoned or un/mistriaged issues, so having practices like this all in one place makes issues a good candidate for tracking IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, they should be telling you. That's why we made the Reason field required. But I agree you may not know. Anyhow... slightly separate from this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to create an issue for this one - as a start. I'm not sure if anyone will come back to it, but it makes me more comfortable having it logged! 😄

// Owns the channel used for communications with the agency and with clients
var testAgencyServer = engine.Services.GetService<TestAgency>();
// Make sure we can communicate with the agency
ServerUtilities.GetTcpChannel("", 0, 2);
Copy link
Member

@ChrisMaddock ChrisMaddock Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picky - Could this just be ServerUtilities.GetTcpChannel() - so we're not duplicating the default values? The no-parameters overload calls the same as is written here, although I went to go look up what the parameters were in this case!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the original lines instead? I don't understand what this gains us. We still have https://github.com/nunit/nunit-console/pull/251/files#diff-c6456741755862ce820b5efbe37974d0R70.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in a test, which includes TestAgency in a temporary context. Really, it should be using mocks or be moved to an integration test in my view.

The problem with the original code is that it uses a service (server) that allows the agent to create new agents, which is not supposed to exist in the agent. See my general comment.

try
{
// Unregister the channel
testAgencyServer.Stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means the channel is no longer unregistered, as far as I can see? Should we be calling ServerUtilities.SafeReleaseChannel() at some point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the previous ServerBase.Stop() did more than just unregister the TCP channel, including stuff with @jnm2's fix for the agent crashes recently. Is none of this necessary in this case?

@jnm2 - you have a much deeper understanding of this process than I do, would you mind reviewing this one as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.

Another way we made our own bug into a feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that for neatness, but it shouldn't be needed since the agent exits when done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent process will cause a SocketException in the test process when the test process tells the agent process to shut down. As long as the test process is sending a kill signal to the agent process, we need to be running the code added in #223.

Copy link
Collaborator

@jnm2 jnm2 Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharliePoole

I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.

Another way we made our own bug into a feature.

I don't follow. Are you saying that TestAgency should not be a ServerBase?

Any ServerBase that needs remoting will be susceptible to the SocketException any time the remote instance of the ServerBase is shutting down, so unless I'm missing something, ServerBase is definitely where the fix belongs.

Maybe the confusion is that the same ServerBase type is used twice- one on the server side, one on the client side. The fix definitely has to be there inside the implementation of Stop so that when Stop executes on the remote side, it doesn't close the socket too soon. When Stop is called on the test process side, none of the code in Stop actually runs in the test process; it just causes the remote process to execute it which is exactly what needs to happen- no more, no less.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnm2 Check my general comment and go back to the original issue. The services that are initialized in agent's Main() are the services the agent is supposed to have. It's not supposed to have any class derived from ServerBase, 'cause it's not a server. Due to an accident of how I implemented the agent (reusing the entire engine) the code for the server is present, but we really don't want to start it just for it's side effects. We should instead figure out what a client actually needs and put it here - I suspect we will be putting it here again if we do that.

@@ -39,7 +39,7 @@ namespace NUnit.Engine.Runners.Tests
//[TestFixture(typeof(ProcessRunner))]
[TestFixture(typeof(MultipleTestDomainRunner), 1)]
[TestFixture(typeof(MultipleTestDomainRunner), 3)]
[TestFixture(typeof(MultipleTestProcessRunner), 1)]
//[TestFixture(typeof(MultipleTestProcessRunner), 1)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the change that required this commenting out, sorry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a ProcessRunner of any kind has not been possible for a long time. Really these should be ignored.

I frankly have no idea why MultipleTestProcessRunner was working with the TestAgency included and ProcessRunner was not. MultipleTestProcessRunner actually creates ProcessRunners. This would be a good issue to create.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharliePoole - could you create this issue? I feel you have a better idea of the background here than I do!

// Owns the channel used for communications with the agency and with clients
var testAgencyServer = engine.Services.GetService<TestAgency>();
// Make sure we can communicate with the agency
ServerUtilities.GetTcpChannel("", 0, 2);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to put the code itself here and not call server utilities as it may confuse people and you end up checking anyway.

Services.Add(new ResultService());
Services.Add(new TestFilterService());
// If caller added services beforehand, we don't add any
if (Services.ServiceCount == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather do as much information hiding as possible. A bool property Services.Any seems better to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Services.Added to indicate they have been added but may or may not be initialized?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine.

@CharliePoole
Copy link
Collaborator Author

General notes on architecture and the overall issue here, to clarify some of my comments.

The engine includes TestAgency, derived from ServerBase. TestAgents deal with TestAgency in a client/server relationship. The agent is the client and the agency is the server. This has always been a confusion to new folks working in this area and in the past I would explain it to each person when they got started. That approach is no longer scaling, I'm afraid, so maybe we need some comments somewhere in the code to explain it. What makes the agency the server is that it advertises itself and can be connected to. Agents OTOH can't be connected to, they must initiate the connection. In simple terms, the agent makes use of the agency to get assignments, which it then carries out, reporting back the results. It's a bit of an inversion of what one might think at first glance but not that uncommon: e.g. X11 has a similar "reversal of roles."

What makes it more confusing is the error trisMaddock found: We were starting up duplicate services as well as extra services that are not used in the agent. One of those was TestAgency. Because of the extra services, some tests worked without fake or mock services being created. I fixed all of those but one - that one required more time than I have right now. Then I discovered that the presence of TestAgency was required in order to make cross-process communication work. I'm pretty sure that was not always the case. I think it may have happened when somebody saw that we were starting TestAgency, didn't understand it was an error and simply made use of it in a fix. And now, of course, we know that @jnm2 made some fixes to ServerBase to fix a problem in the agent.

This may be a long fix and somebody else may have to finish it. Beyond that, there are a number of problems with testing runners. I think most of the runner tests should either be converted into true unit tests, making use of mocks, or we should move them to a new integration test project. For example, when we test a MultipleProcessRunner, we don't need real ProcessRunners to be called or processes started. We can call fake or mock ProcessRunners. I think this work is needed but should be separate from the fix.

The biggest issue of all is that the origin of all the problems in this area is the fact that I reused the engine as a part of the agent.exe. That means that all these extra classes are available in the agent to confuse things. Really, we should have a reduced engine in the agent executables. Ironically, the introduction of a .NET Standard build did much of what we need but also made it harder to work with because it only applies the necessary changes to the .NET standard build and adds changes of its own in order to conform to the standard. 🤕

I'll go back to the line comments now - hopefully this comment will clarify some of the shorter ones.

@CharliePoole
Copy link
Collaborator Author

@jnm2 I looked at #223, which I hadn't seen before. Practically everything in the agent code that confused me came from that issue: getting the service, disposing of it, the changes to ServerBase. That's actually a good thing, since we know the age of the introduced changes and it's relatively young.

I think you should look at your changes again and reconsider each item in the light of it's only being used in the testing process, not the agent process. If stuff is not needed, then it should be removed, of course. Maybe we need a ClientUtility or AgentUtility to handle the client side of this or maybe it can be put into some pre-existing class.

Unfortunately, there's a lot of stuff in the engine now that I don't know about along with a bunch of other stuff where I may well be the only person to know about it. That comes from too many years of its being my private preserve. For a few months, I haven't really been able to work on the engine because of changes introduced for VS2017 (I think) but I grabbed this one because I figured I caused the original problem. I do my best within the IDE, but since I can't run the script, I rely on CI to tell me if there are errors in the .NET Standard stuff. So, it may take more than one person and more than one issue to get it all straight.

@CharliePoole
Copy link
Collaborator Author

@jnm2 @ChrisMaddock I don't have any more time to work on this issue till after my vacation. Feel free to take it and continue if you have time.

@rprouse
Copy link
Member

rprouse commented Jul 5, 2017

Aside from the comments already made, this looks good. I think the main question remaining is if we still need an equivalent of the Stop() call. @CharliePoole you said that you don't have time to work on this, so one of us can pick it up. I don't mind doing it, but you are welcome to it @jnm2 if you want it.

@CharliePoole you nailed it when you stated that the .NET Standard build is only the code and services needed by the agent side. It doesn't add too much though, it just combines the API and the Engine code into one assembly because an agent doesn't need to load an engine in the same way. I did separate out some of the code to make the least impact on the main engine code, but it would be nice to bring them back together at some point.

The main issue I had with the .NET Standard version was the services and extensions. I really want to add them back in, but loading assemblies in other directories is a lot more limited in the lower versions of .NET Standard so the walking up and searching other NuGet packages wasn't possible, or at least in the time I had.

@CharliePoole
Copy link
Collaborator Author

One thought - maybe we should do a separate build of the engine for use with the agent. We would leave out the things that don't belong in the agent under any runtime. Then the .NET Standard build would leave out a few more things. That way it would not be possible to activate anything that is not intended to be included in the agent.

@rprouse
Copy link
Member

rprouse commented Jul 5, 2017

One thought - maybe we should do a separate build of the engine for use with the agent. We would leave out the things that don't belong in the agent under any runtime. Then the .NET Standard build would leave out a few more things. That way it would not be possible to activate anything that is not intended to be included in the agent.

I was thinking along the same lines, probably by taking the .NET Standard build, adding back in what we need and multitargeting it. That would also allow us to create .NET Core agents for the eventual goal of running .NET Core tests from the main console. I think I would prefer to wait until post release for that level of refactor though 😄

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

@CharliePoole

Thanks for the explanation. The duplicate services also confused me. It turns out I picked the wrong duplicate to remove when I ran into this: 33e75f2#r117599764

I can't approve any PR that doesn't address the channel stop issue. I'm looking now to see what I would have done if I had chosen the correct duplicate to die.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

I disagree that we should be using ServerUtilities.GetTcpChannel("", 0, 2); unencapsulated, directly in Program.cs.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Can I put the channel handling in RemoteTestAgent? @CharliePoole @rprouse @ChrisMaddock

@CharliePoole
Copy link
Collaborator Author

You mean TestAgency, I take it. There is no duplicate of TestAgency. TestAgency was never supposed to be part of the agent because it's the service that launches agents!

The fact that it was ever initialized was the problem. So if it's not initialized, you don't have to stop it except for the side effect of closing the channel. However, I think you should pull this branch and experiment with it to see if the problem you were fixing is still fixed. If it was caused by failing to stop the agent, then not starting the agent in the first place will most likely eliminate the problem. IN fact, this spurious TestAgency may have caused a lot of our problems.

@CharliePoole
Copy link
Collaborator Author

And just think, we never would have known about it without @ChrisMaddock noticing an odd bit of code!

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Aside: it should have bothered me more when I noticed I was doing engine.Services.GetService<TestAgency>() near Activator.GetObject(typeof(ITestAgency), AgencyUrl) as ITestAgency. I was in too much of a hurry getting the fix out.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

@CharliePoole No, I meant the duplicate channel which was hogging my message counter.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Anyhow. The Activator code is coupled to the channel being opened, so I'll encapsulate that.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Confirmed that the SocketException fix is still needed simply because the process exists too fast.

I have a strong sense that this architecture is heading in the right direction. More fun than the original fix.
Almost ready for review. Should I push a new branch?

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

@CharliePoole @rprouse @ChrisMaddock

This encapsulation makes me happy. What do you say?

https://github.com/nunit/nunit-console/compare/issue-248...issue-248_channel_encapsulation?expand=1
If it's helpful I can provide commentary on the changes, but I'm interested in your impression.

@CharliePoole
Copy link
Collaborator Author

Your encapsulation is good. I'd like it better if it didn't call ServerUtilities, but that's more of a naming quibble than anything else.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

@CharliePoole More than happy to rename that thing RemotingUtilities because that's precisely what it is.

@CharliePoole
Copy link
Collaborator Author

@jnm2 No reason you can't just take over the branch and PR - that's one reason we all have commit authority.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

@CharliePoole Can I delete the ITestAgent.Agency and TestAgent.Agency property? It's not used anywhere at all. I don't see any reason for code which uses a test agent to be able to reach the agency.

@CharliePoole
Copy link
Collaborator Author

It's an internal interface (not in the API assembly) so we can change it without hurting anyone who follows the rules. 😃

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Pushed three commits to this PR.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

Forgot to rename ServerUtilityTests. Force pushed.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 9, 2017

@ChrisMaddock Thanks. My push seems to have dismissed your review, and I added a commit.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 9, 2017

I'm going to improve TcpChannelTests. Too many asserts per test.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 9, 2017

@nunit/framework-team What's the thought on GetTcpChannel_returns_same_channel_for_same_name_different_port instead of GetTcpChannelReturnsSameChannelForSameNameDifferentPort?

I guess I could always just push and find out. 😈

@ChrisMaddock
Copy link
Member

I really don't mind how you name your tests, so long as I don't have to start stretching to the underscore key when I write mine! 😄

@CharliePoole
Copy link
Collaborator Author

I don't think one test in a class should use a different naming convention from the others.

I don't think we should change all the named either.

I don't know if the naming standards still apply but IIRC they call for Pascal casing.

IMO Ruby-style underscored names are only useful with a BDD framework that changes them too spaces. I also use them in a minor way, to distinguish special cases.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

Hey, no harm in trying. I think it's more readable because test names tend to be exceptionally long, and underscores to me are exactly as readable as spaces. But I'm happy to stick to the standard which doesn't have a exception for absurdly long names. 😛

@CharliePoole
Copy link
Collaborator Author

To me,it's not so much about naming as about the "prime directive" of not changing the style of an existing file. Probably not a big deal in this case, but you did ask. 😄

We do have entire classes that use the style you prefer.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

As I understood it, the prime directive is not that we should preserve existing style at all costs, but that we should not change style for its own sake. In this case, I rewrote the entire file from scratch for non-stylistic reasons.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

Fixed now though =)

Copy link
Member

@rprouse rprouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me too. Tests are running and passing, my only reservation is that I hope to get a release out soon. I am worried that a change this large so close to release might be dangerous. Thought?

@jnm2
Copy link
Collaborator

jnm2 commented Jul 11, 2017

I'm happy either way.

@rprouse
Copy link
Member

rprouse commented Jul 11, 2017

@jnm2 I respect your opinion, how thoroughly have you tested these fixes and how confident are you in the changes? I hope to get the release out this week, should we wait to merge? I see the risk as moderate compared to a minor gain, so I am inclined to wait, but if I am misreading the value of the fix, then that might change my calculation.

@CharliePoole
Copy link
Collaborator Author

In that case, just use a style that you prefer, consistent with any standards that exist rather than asking for a discussion of style!

@jnm2
Copy link
Collaborator

jnm2 commented Jul 11, 2017

@rprouse I'm sufficiently aware of the difficulty of predicting side effects that I can't justify the risk vs gain either. I too would rather we sat on it then.

@CharliePoole I'm sorry. I'll do less talking next time. =)

@CharliePoole
Copy link
Collaborator Author

Thus was a low priority simple fix that got complicated. It's a serious problem with the extra test agency being started and I suspect it's the cause of a few agent problems we have had.

That said, it's a problem that we lived with through several releases.

I'd only merge if we are pretty sure it's at least an improvement.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 12, 2017

We have evidence that my original PR only solves the problem for some people and that this PR solves it in one scenario where mine did not. I'd say we should get this in 3.7.

@rprouse
Copy link
Member

rprouse commented Jul 13, 2017

@jnm2 I read through #255 and I agree with your assessment. Let's include this in the release. I have more confidence now that this has been run against large production test suites and fixes their issues.

@rprouse rprouse merged commit 008e02f into master Jul 13, 2017
@rprouse rprouse deleted the issue-248 branch July 13, 2017 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants