Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketException thrown during console run #255

Closed
ChrisMaddock opened this issue Jul 7, 2017 · 105 comments
Closed

SocketException thrown during console run #255

ChrisMaddock opened this issue Jul 7, 2017 · 105 comments
Assignees
Milestone

Comments

@ChrisMaddock
Copy link
Member

@HaomingFu commented on Fri Jul 07 2017

I'm using NUnit3.5 console runner. I use nunit console to run my unit tests. There is no deterministic repro but it fails very often with error code -100.

nunit3-console.exe UnitTest.Client.dll --where "cat==MyTests" --output=RBSUnitTest_General_OutputFile.txt --result=RBSUnitTest_General_ResultFile.xml

I see this in the nunit console cmd output:

Server stack trace: 
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
   at System.Runtime.Remoting.Channels.SocketStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Runtime.Remoting.Channels.SocketHandler.ReadFromSocket(Byte[] buffer, Int32 offset, Int32 count)
   at System.Runtime.Remoting.Channels.SocketHandler.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.Runtime.Remoting.Channels.Tcp.TcpFixedLengthReadingStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.BinaryReader.ReadBytes(Int32 count)
   at System.Runtime.Serialization.Formatters.Binary.SerializationHeaderRecord.Read(__BinaryParser input)
   at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadSerializationHeaderRecord()
   at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()
   at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(HeaderHandler handler, __BinaryParser serParser, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
   at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream, HeaderHandler handler, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
   at System.Runtime.Remoting.Channels.CoreChannel.DeserializeBinaryResponseMessage(Stream inputStream, IMethodCallMessage reqMsg, Boolean bStrictBinding)
   at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at NUnit.Engine.ITestAgent.Stop()
   at NUnit.Engine.Runners.ProcessRunner.Dispose(Boolean disposing)
   at NUnit.Engine.Runners.AbstractTestRunner.Dispose()
   at NUnit.Engine.Runners.MasterTestRunner.Dispose(Boolean disposing)
   at NUnit.Engine.Runners.MasterTestRunner.Dispose()
   at NUnit.ConsoleRunner.ConsoleRunner.RunTests(TestPackage package, TestFilter filter)
   at NUnit.ConsoleRunner.Program.Main(String[] args)

Any help would be appreciated.


@ChrisMaddock commented on Fri Jul 07 2017

I'd expect this is the same issue as this, which will be fixed in NUnit Console 3.7.

#225

Try the console from the latest build of master at the link below - and see if that solves the problems you're seeing?
https://ci.appveyor.com/project/CharliePoole/nunit-console/build/3.7.0-dev-03691/artifacts


@ChrisMaddock commented on Fri Jul 07 2017

I'm also going to move this issue over to the NUnit Console repo.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 8, 2017

The stack trace is identical so I think this is fixed, but please let us know how it goes with 3.7.

@HaomingFu
Copy link

I have been using nunit cnosole executable since

  1. I need to automate running my tests on a daily basis, and
  2. I have problems using the nunit nuget package.

I only found nuget packages on the website https://ci.appveyor.com/project/CharliePoole/nunit-console/build/3.7.0-dev-03691/artifacts.
Where can I download the nunit console executable so that I can test it?

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

@CharliePoole
Copy link
Collaborator

@jnm2 Isn't that console fix merged? Assuming it is, user can take the latest build from MyGet.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

@CharliePoole Sure. I don't see any link to MyGet.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 10, 2017

Are we talking https://www.myget.org/F/nunit/api/v2/package/NUnit.Console/3.7.0-dev-03704?
The package names don't match NuGet then?

@HaomingFu
Copy link

@jnm2 Thanks. Now I have access to the nunit console executable 3.7. I will kick off a test shortly and it may take time to finish. Will update the thread when the test is done.

@CharliePoole
Copy link
Collaborator

They do, but NUnit.Console is a meta-package. It brings in NUnit.ConsoleRunner and five extensions. Our MyGet feed provides a sort of running beta release as development proceeds. We should have a link to it, I agree, although it should not be so prominent that users who don't really need a dev release start using it.

I think this goes back to my wanting the web site to be product focused. There's an awful lot to say about each product, which would be easier if the second-level pages were all about individual products like the Console Runner.

@HaomingFu
Copy link

HaomingFu commented Jul 11, 2017

I have finished running my unit tests using nunit console 3.7. Unfortunately error -100 is still hit and I see exactly the same callstack on error. The unit tests are divided into 28 categories. Each time I run my unit tests, for sure some category will hit error -100. However it is unpredictable which category will hit this error.

In addition to the callstack, I also see this message in the nunit console output:

System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host

@jnm2
Copy link
Collaborator

jnm2 commented Jul 11, 2017

@HaomingFu That's good to know, thanks. Could you try one more thing I think has a chance of success? Can you tell us what happens when you run #251's build https://ci.appveyor.com/api/buildjobs/3ajik15763diqnbd/artifacts/package%2FNUnit.ConsoleRunner.3.7.0-ci-03730-pr-251.nupkg?

@HaomingFu
Copy link

HaomingFu commented Jul 11, 2017

The new Nunit console runner looks good since my nunit tests didn't hit error -100 any more!

@jnm2
Copy link
Collaborator

jnm2 commented Jul 12, 2017

@rprouse We have to get #251 in 3.7 now.

@rprouse
Copy link
Member

rprouse commented Jul 13, 2017

@jnm2 Sounds good. I will mark this as fixed by #251 too and merge. I will be doing the release tonight.

@HaomingFu
Copy link

I am afraid we have to reopen this case since error -100 was hit again. I still saw exactly the same stack trace.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 13, 2017

@HaomingFu Thanks for letting us know. I will revisit this tonight.

Since I have not been successful in getting the SocketException on my machine, I might have to resort to really verbose logging and have you try things.

Another thought: I wonder if I need to wrap IClientChannelSink the same as the server sink?
33e75f2#diff-47f68aa1d792b3d290cd51f3b01306dcR69

@jnm2 jnm2 reopened this Jul 13, 2017
@jnm2
Copy link
Collaborator

jnm2 commented Jul 13, 2017

@rprouse For better or worse, #251 is in. I'd rather work off that than revert. I don't know if I will make any headway tonight so you probably shouldn't wait the release very long since it certainly improved the SocketException situation for many. I'll keep you updated if I make a discovery.

@rprouse
Copy link
Member

rprouse commented Jul 13, 2017

@jnm2 thanks for the update. Keep me in the loop. It takes awhile to pull the changes together, go through the issues, build and test, so you will have some time. If you feel like you are making progress, just ping me. Once the initial setup and admin stuff is done for the build, pulling in a PR and redoing the build isn't hard.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 13, 2017

I'm still trying to repro. If I can't repro, then the feedback loop is going to be very long while we wait for external help collecting logs and we'll miss the 3.7 deadline anyway.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 13, 2017

My working machine generates SocketExceptions using ConsoleRunners 3.6.1 like there's no tomorrow, but I can't get even a single failure building from master.

@jnm2
Copy link
Collaborator

jnm2 commented Jul 13, 2017

Unless you want a very last-minute speculative fix wrapping IClientChannelSink the same as the server sink, which I need to understand better:
33e75f2#diff-47f68aa1d792b3d290cd51f3b01306dcR69

@jnm2
Copy link
Collaborator

jnm2 commented Dec 20, 2017

@ChrisMaddock The delay only happens if you specify --remoting-wait-after-stop and then it waits for the number of milliseconds specified by --remoting-shutdown-delay=. We could unify the two parameters and add it as a workaround, maybe.

@ChrisMaddock
Copy link
Member Author

I'm not totally keen on this being an option flag - I'd rather have it in the build, or not. The flag feels a bit like a --dont-be-broken option. 😄

My preferences:

  1. Fix the timing issue with no additional delay. (Sounds like you've exhausted that path, Joseph - although I wonder if a new communication architecture would tick this box. That's a much bigger job.)
  2. Have the delay for everyone.
  3. Have a flag to add the delay.

@thegrima - Are you able to provide one/some of the exact stack traces you saw? We've seen that socket exceptions have been thrown for a number of different reasons. Also a bit of information about what you're actually running - do you have multiple assemblies, are you passing in a VS solution, and .nunit file? TeamCity? 😄

@jnm2
Copy link
Collaborator

jnm2 commented Dec 20, 2017

@ChrisMaddock I have the same preferences, though I'm no longer convinced that the new architecture will have the bugfixing ROI that I first hoped.

@ChrisMaddock
Copy link
Member Author

I'm hopeful it will fix the race condition shutdown issue that you diagnosed and we know exists - and we can then handle other bug reports separately. 😄

@thegrima
Copy link

@ChrisMaddock : Sure. We're using our TeamCity server with 2 build agent to build a complete VS solution. Solution have multiple test assemblies. .nunit file is generated by TeamCity.

Don't known if it's relevant, but it happen very often that our build system is building the same solution in parallel using the two agents (in different directory of course).

Typical command line in logs is
[10:41:37][Step 5/19] Starting: "C:\Program Files (x86)\NUnit.org\nunit-console\nunit3-console.exe" C:\BuildAgent\temp\buildTmp\U4rDkJny6srG4NUZro4KGUKaxIJoNq7T.nunit --result=C:\BuildAgent\temp\buildTmp\U4rDkJny6srG4NUZro4KGUKaxIJoNq7T.nunit.xml --noheader --x86 --framework=net-4.0

For the stacktrace, you'll find some in this file.
nunit-console-error.log

I could provide complete build logs from TeamCity but they're over 5mb per log and i don't know if it's relevant.

Hope it helps

@ChrisMaddock
Copy link
Member Author

ChrisMaddock commented Dec 21, 2017

Likely very relevant information, thanks. 😄

Could you try something out for us? This is a bit of a stab in the dark, but I'm intrigued to find out.

Could you try adding the --process=Multiple flag? There's currently a bug with .nunit files that forces each assembly to run in sequence (#116). Default behaviour should be that the assemblies run in parallel. I wouldn't expect this to be related - but we've seen other issues result from our TestRunner classes not being used as expected (e.g. the AppDomain problems) - and I'm wondering if this could be a similar problem...

@ChrisMaddock
Copy link
Member Author

ChrisMaddock commented Dec 21, 2017

(This would need to be without @jnm2's --remoting options - which we know fix's the problem by adding in additional delay. 🙂)

@ChrisMaddock
Copy link
Member Author

I suggest this, as, am I understanding correctly, you currently only see this problem on TeamCity? When running via an .nunit file?

Of course, there's many other ways your TC machine could be different - but it seems to be a common thread.

@thegrima
Copy link

Yes, the problems only occurs in TC. We never encountered it when running all the tests from VS.

I could try your suggestion, but since this is our production release environment, i will not be able to do so before the first week of January. I'll post results when available.

@ChrisMaddock
Copy link
Member Author

No problem. 👍

Another option for diagnosis - you could retrieve the .nunit file from TeamCity, and try running the same command line locally. That would help to work out if it's a problem specific to the machine TC is running on, or the .nunit file set-up that TC is using.

@herebebeasties
Copy link

We see this from non-.nunit invocations on other CI systems (Bamboo). We're on Windows.

@ChrisMaddock
Copy link
Member Author

@herebebeasties - same questions as above! 😄

Are you able to provide one/some of the exact stack traces you saw? We've seen that socket exceptions have been thrown for a number of different reasons. Also a bit of information about what you're actually running - do you have multiple assemblies, are you passing in a VS solution, and .nunit file? TeamCity? 😄

@herebebeasties
Copy link

Single assembly, no VS solution, no NUnit, no TeamCity, but concurrent builds/runs under Bamboo.
I'll find you a stack trace tomorrow.

@PhilipGullick
Copy link

Is there an update on the explicit wait being merged into master for a new release? I don't mind applying the fix, however that branch will soon become outdated.

@rprouse
Copy link
Member

rprouse commented Jan 24, 2018

@PhilipGullick do you mean the command line option --remoting-wait-after-stop? It was intended to diagnose the issue and provide possible workarounds, but does not work for everyone. Have you tried our recent builds of master? Does that fix your problems?

@PhilipGullick
Copy link

PhilipGullick commented Jan 24, 2018

@rprouse I have applied the fix and for me it does not fix the issue completely:

If I have --inprocess set, it fails with Unhandled Exception: NUnit.Engine.NUnitEngineException: Exception encountered unloading AppDomain

If I have no -inprocess set, it fails with #117

So from recommendations on #117 I set --agents=2 and the suffer from the AppDomain issue again.

I'm currently investigating the last ran test to identify if there are any tasks running erroneously when the test has finished.
I will also pull down the latest master branch to see if that fixes the issue.

@ChrisMaddock
Copy link
Member Author

@PhilipGullick - This should be fixed in master, although it will now return a -5 return code. If not, it would be good to understand more details about your scenario.

@PhilipGullick
Copy link

Awesome, thanks guys - will try it now and let you know the result

@PhilipGullick
Copy link

Hmm, still affected by #117 . I will limit the agents now to see if AppDomain is still working

@ChrisMaddock
Copy link
Member Author

@PhilipGullick - sorry, it's only the AppDomain issue I'd expect to be fixed in master. I haven't looked into #117.

@PhilipGullick
Copy link

@ChrisMaddock No worries, running with --inprocess set now. Will let you know what happens.

@PhilipGullick
Copy link

@ChrisMaddock, with --inprocess set I have not had any issues with AppDomain when using master.

@ChrisMaddock
Copy link
Member Author

Great - thanks for the update! 👍🏻

@ChrisMaddock
Copy link
Member Author

All,

We believe this issue will have been fixed by @BlythMeister's temporary fix in #370.

Until the next console release, please try the master build, version 3.9.0-dev-03932 or later. You can find this on out MyGet Feed.


If you still experience this issue with the above build of master...

  1. Run your tests with the --inprocess flag - see if this exposes a more useful exception or error message.

  2. Add the --trace=Debug flag to your command line. When you next see the exception, examine the logs for any clues.

Please then open a new issue. The SocketException is often caused by an nunit-agent crash, which can occur in a number of different scenarios. We believe many of these have been fixed in NUnit.Console 3.8, and the fix in #372 - however, they may still be edge cases we have yet to tackle. Please report these if you come across any!

@ChrisMaddock ChrisMaddock added this to the 3.9 milestone Feb 15, 2018
@ChrisMaddock ChrisMaddock changed the title NUnit console fails with error code -100 SocketException thrown during console run Feb 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests