-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate extra services from agent #251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -11,10 +11,15 @@ namespace NUnit.ConsoleRunner.Tests | |||
{ | |||
class ConsoleRunnerTests | |||
{ | |||
[Test] | |||
[Test, Ignore("Needs to be rewritten")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we merge, can we add an issue for this? Otherwise I think it might be forgotten. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't done it before, but we haven't gone back and fixed these either. Maybe we need a general issue for cleaning ignores. I think we have one in the framework. Although, if we do, we still haven't fixed them over a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ChrisMaddock's idea is a really good one. It's better to have individual issues that can be triaged and eventually documented and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting to consider why this is so. It has also been a problem for me as a coach. In teams, I have highlighted Ignored tests by posting them on the wall. We do highlight them in the console runs, coloring them as warnings, but that does not seem to make us want to fix them. I'm willing to try issues but somebody will have to take a look at the list of warnings and create them. In so doing, they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that does not seem to make us want to fix them
The reason is that
they will have to use some judgement as to how ignore is being used in each situation and whether an issue is called for.
I have no idea what the rationale is behind someone else marking a test ignored. On the other hand, an issue marked bug
and confirm
have no such shades of meaning. Either case, it's a matter of education and establishing practices of regularly sweeping through with an eye to things like this. There are other things in issues as well to sweep for, like abandoned or un/mistriaged issues, so having practices like this all in one place makes issues a good candidate for tracking IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, they should be telling you. That's why we made the Reason
field required. But I agree you may not know. Anyhow... slightly separate from this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to create an issue for this one - as a start. I'm not sure if anyone will come back to it, but it makes me more comfortable having it logged! 😄
// Owns the channel used for communications with the agency and with clients | ||
var testAgencyServer = engine.Services.GetService<TestAgency>(); | ||
// Make sure we can communicate with the agency | ||
ServerUtilities.GetTcpChannel("", 0, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picky - Could this just be ServerUtilities.GetTcpChannel()
- so we're not duplicating the default values? The no-parameters overload calls the same as is written here, although I went to go look up what the parameters were in this case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep the original lines instead? I don't understand what this gains us. We still have https://github.com/nunit/nunit-console/pull/251/files#diff-c6456741755862ce820b5efbe37974d0R70.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's in a test, which includes TestAgency in a temporary context. Really, it should be using mocks or be moved to an integration test in my view.
The problem with the original code is that it uses a service (server) that allows the agent to create new agents, which is not supposed to exist in the agent. See my general comment.
try | ||
{ | ||
// Unregister the channel | ||
testAgencyServer.Stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means the channel is no longer unregistered, as far as I can see? Should we be calling ServerUtilities.SafeReleaseChannel()
at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the previous ServerBase.Stop()
did more than just unregister the TCP channel, including stuff with @jnm2's fix for the agent crashes recently. Is none of this necessary in this case?
@jnm2 - you have a much deeper understanding of this process than I do, would you mind reviewing this one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.
Another way we made our own bug into a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that for neatness, but it shouldn't be needed since the agent exits when done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent process will cause a SocketException in the test process when the test process tells the agent process to shut down. As long as the test process is sending a kill signal to the agent process, we need to be running the code added in #223.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of that. We should examine what was fixed and find another way. The agent is the client in this relationship, not the server. Testagency uses Server Base so any fix there that was done for the agent should probably be removed.
Another way we made our own bug into a feature.
I don't follow. Are you saying that TestAgency
should not be a ServerBase
?
Any ServerBase
that needs remoting will be susceptible to the SocketException any time the remote instance of the ServerBase
is shutting down, so unless I'm missing something, ServerBase
is definitely where the fix belongs.
Maybe the confusion is that the same ServerBase
type is used twice- one on the server side, one on the client side. The fix definitely has to be there inside the implementation of Stop
so that when Stop
executes on the remote side, it doesn't close the socket too soon. When Stop
is called on the test process side, none of the code in Stop
actually runs in the test process; it just causes the remote process to execute it which is exactly what needs to happen- no more, no less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnm2 Check my general comment and go back to the original issue. The services that are initialized in agent's Main() are the services the agent is supposed to have. It's not supposed to have any class derived from ServerBase, 'cause it's not a server. Due to an accident of how I implemented the agent (reusing the entire engine) the code for the server is present, but we really don't want to start it just for it's side effects. We should instead figure out what a client actually needs and put it here - I suspect we will be putting it here again if we do that.
@@ -39,7 +39,7 @@ namespace NUnit.Engine.Runners.Tests | |||
//[TestFixture(typeof(ProcessRunner))] | |||
[TestFixture(typeof(MultipleTestDomainRunner), 1)] | |||
[TestFixture(typeof(MultipleTestDomainRunner), 3)] | |||
[TestFixture(typeof(MultipleTestProcessRunner), 1)] | |||
//[TestFixture(typeof(MultipleTestProcessRunner), 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the change that required this commenting out, sorry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a ProcessRunner of any kind has not been possible for a long time. Really these should be ignored.
I frankly have no idea why MultipleTestProcessRunner was working with the TestAgency included and ProcessRunner was not. MultipleTestProcessRunner actually creates ProcessRunners. This would be a good issue to create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharliePoole - could you create this issue? I feel you have a better idea of the background here than I do!
// Owns the channel used for communications with the agency and with clients | ||
var testAgencyServer = engine.Services.GetService<TestAgency>(); | ||
// Make sure we can communicate with the agency | ||
ServerUtilities.GetTcpChannel("", 0, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to put the code itself here and not call server utilities as it may confuse people and you end up checking anyway.
Services.Add(new ResultService()); | ||
Services.Add(new TestFilterService()); | ||
// If caller added services beforehand, we don't add any | ||
if (Services.ServiceCount == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather do as much information hiding as possible. A bool
property Services.Any
seems better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Services.Added
to indicate they have been added but may or may not be initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine.
General notes on architecture and the overall issue here, to clarify some of my comments. The engine includes TestAgency, derived from ServerBase. TestAgents deal with TestAgency in a client/server relationship. The agent is the client and the agency is the server. This has always been a confusion to new folks working in this area and in the past I would explain it to each person when they got started. That approach is no longer scaling, I'm afraid, so maybe we need some comments somewhere in the code to explain it. What makes the agency the server is that it advertises itself and can be connected to. Agents OTOH can't be connected to, they must initiate the connection. In simple terms, the agent makes use of the agency to get assignments, which it then carries out, reporting back the results. It's a bit of an inversion of what one might think at first glance but not that uncommon: e.g. X11 has a similar "reversal of roles." What makes it more confusing is the error trisMaddock found: We were starting up duplicate services as well as extra services that are not used in the agent. One of those was TestAgency. Because of the extra services, some tests worked without fake or mock services being created. I fixed all of those but one - that one required more time than I have right now. Then I discovered that the presence of TestAgency was required in order to make cross-process communication work. I'm pretty sure that was not always the case. I think it may have happened when somebody saw that we were starting TestAgency, didn't understand it was an error and simply made use of it in a fix. And now, of course, we know that @jnm2 made some fixes to ServerBase to fix a problem in the agent. This may be a long fix and somebody else may have to finish it. Beyond that, there are a number of problems with testing runners. I think most of the runner tests should either be converted into true unit tests, making use of mocks, or we should move them to a new integration test project. For example, when we test a MultipleProcessRunner, we don't need real ProcessRunners to be called or processes started. We can call fake or mock ProcessRunners. I think this work is needed but should be separate from the fix. The biggest issue of all is that the origin of all the problems in this area is the fact that I reused the engine as a part of the agent.exe. That means that all these extra classes are available in the agent to confuse things. Really, we should have a reduced engine in the agent executables. Ironically, the introduction of a .NET Standard build did much of what we need but also made it harder to work with because it only applies the necessary changes to the .NET standard build and adds changes of its own in order to conform to the standard. 🤕 I'll go back to the line comments now - hopefully this comment will clarify some of the shorter ones. |
@jnm2 I looked at #223, which I hadn't seen before. Practically everything in the agent code that confused me came from that issue: getting the service, disposing of it, the changes to ServerBase. That's actually a good thing, since we know the age of the introduced changes and it's relatively young. I think you should look at your changes again and reconsider each item in the light of it's only being used in the testing process, not the agent process. If stuff is not needed, then it should be removed, of course. Maybe we need a ClientUtility or AgentUtility to handle the client side of this or maybe it can be put into some pre-existing class. Unfortunately, there's a lot of stuff in the engine now that I don't know about along with a bunch of other stuff where I may well be the only person to know about it. That comes from too many years of its being my private preserve. For a few months, I haven't really been able to work on the engine because of changes introduced for VS2017 (I think) but I grabbed this one because I figured I caused the original problem. I do my best within the IDE, but since I can't run the script, I rely on CI to tell me if there are errors in the .NET Standard stuff. So, it may take more than one person and more than one issue to get it all straight. |
@jnm2 @ChrisMaddock I don't have any more time to work on this issue till after my vacation. Feel free to take it and continue if you have time. |
Aside from the comments already made, this looks good. I think the main question remaining is if we still need an equivalent of the Stop() call. @CharliePoole you said that you don't have time to work on this, so one of us can pick it up. I don't mind doing it, but you are welcome to it @jnm2 if you want it. @CharliePoole you nailed it when you stated that the .NET Standard build is only the code and services needed by the agent side. It doesn't add too much though, it just combines the API and the Engine code into one assembly because an agent doesn't need to load an engine in the same way. I did separate out some of the code to make the least impact on the main engine code, but it would be nice to bring them back together at some point. The main issue I had with the .NET Standard version was the services and extensions. I really want to add them back in, but loading assemblies in other directories is a lot more limited in the lower versions of .NET Standard so the walking up and searching other NuGet packages wasn't possible, or at least in the time I had. |
One thought - maybe we should do a separate build of the engine for use with the agent. We would leave out the things that don't belong in the agent under any runtime. Then the .NET Standard build would leave out a few more things. That way it would not be possible to activate anything that is not intended to be included in the agent. |
I was thinking along the same lines, probably by taking the .NET Standard build, adding back in what we need and multitargeting it. That would also allow us to create .NET Core agents for the eventual goal of running .NET Core tests from the main console. I think I would prefer to wait until post release for that level of refactor though 😄 |
Thanks for the explanation. The duplicate services also confused me. It turns out I picked the wrong duplicate to remove when I ran into this: 33e75f2#r117599764 I can't approve any PR that doesn't address the channel stop issue. I'm looking now to see what I would have done if I had chosen the correct duplicate to die. |
I disagree that we should be using |
Can I put the channel handling in |
You mean TestAgency, I take it. There is no duplicate of TestAgency. TestAgency was never supposed to be part of the agent because it's the service that launches agents! The fact that it was ever initialized was the problem. So if it's not initialized, you don't have to stop it except for the side effect of closing the channel. However, I think you should pull this branch and experiment with it to see if the problem you were fixing is still fixed. If it was caused by failing to stop the agent, then not starting the agent in the first place will most likely eliminate the problem. IN fact, this spurious TestAgency may have caused a lot of our problems. |
And just think, we never would have known about it without @ChrisMaddock noticing an odd bit of code! |
Aside: it should have bothered me more when I noticed I was doing |
@CharliePoole No, I meant the duplicate channel which was hogging my message counter. |
Anyhow. The Activator code is coupled to the channel being opened, so I'll encapsulate that. |
Confirmed that the SocketException fix is still needed simply because the process exists too fast. I have a strong sense that this architecture is heading in the right direction. More fun than the original fix. |
@CharliePoole @rprouse @ChrisMaddock This encapsulation makes me happy. What do you say? https://github.com/nunit/nunit-console/compare/issue-248...issue-248_channel_encapsulation?expand=1 |
Your encapsulation is good. I'd like it better if it didn't call ServerUtilities, but that's more of a naming quibble than anything else. |
@CharliePoole More than happy to rename that thing |
@jnm2 No reason you can't just take over the branch and PR - that's one reason we all have commit authority. |
@CharliePoole Can I delete the |
It's an internal interface (not in the API assembly) so we can change it without hurting anyone who follows the rules. 😃 |
Pushed three commits to this PR. |
Forgot to rename |
…should not have access to the agency
@ChrisMaddock Thanks. My push seems to have dismissed your review, and I added a commit. |
I'm going to improve TcpChannelTests. Too many asserts per test. |
@nunit/framework-team What's the thought on I guess I could always just push and find out. 😈 |
…to track release responsibility.
I really don't mind how you name your tests, so long as I don't have to start stretching to the underscore key when I write mine! 😄 |
I don't think one test in a class should use a different naming convention from the others. I don't think we should change all the named either. I don't know if the naming standards still apply but IIRC they call for Pascal casing. IMO Ruby-style underscored names are only useful with a BDD framework that changes them too spaces. I also use them in a minor way, to distinguish special cases. |
Hey, no harm in trying. I think it's more readable because test names tend to be exceptionally long, and underscores to me are exactly as readable as spaces. But I'm happy to stick to the standard which doesn't have a exception for absurdly long names. 😛 |
To me,it's not so much about naming as about the "prime directive" of not changing the style of an existing file. Probably not a big deal in this case, but you did ask. 😄 We do have entire classes that use the style you prefer. |
As I understood it, the prime directive is not that we should preserve existing style at all costs, but that we should not change style for its own sake. In this case, I rewrote the entire file from scratch for non-stylistic reasons. |
Fixed now though =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me too. Tests are running and passing, my only reservation is that I hope to get a release out soon. I am worried that a change this large so close to release might be dangerous. Thought?
I'm happy either way. |
@jnm2 I respect your opinion, how thoroughly have you tested these fixes and how confident are you in the changes? I hope to get the release out this week, should we wait to merge? I see the risk as moderate compared to a minor gain, so I am inclined to wait, but if I am misreading the value of the fix, then that might change my calculation. |
In that case, just use a style that you prefer, consistent with any standards that exist rather than asking for a discussion of style! |
@rprouse I'm sufficiently aware of the difficulty of predicting side effects that I can't justify the risk vs gain either. I too would rather we sat on it then. @CharliePoole I'm sorry. I'll do less talking next time. =) |
Thus was a low priority simple fix that got complicated. It's a serious problem with the extra test agency being started and I suspect it's the cause of a few agent problems we have had. That said, it's a problem that we lived with through several releases. I'd only merge if we are pretty sure it's at least an improvement. |
We have evidence that my original PR only solves the problem for some people and that this PR solves it in one scenario where mine did not. I'd say we should get this in 3.7. |
Fixes #248
Fixes #255
Seemed easy... except...