Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVeyor Test Failures under .NET 3.5 #2579

Closed
rprouse opened this issue Nov 28, 2017 · 29 comments
Closed

AppVeyor Test Failures under .NET 3.5 #2579

rprouse opened this issue Nov 28, 2017 · 29 comments
Labels
closed:done is:bug pri:critical Use this to indicate a hotfix may be necessary
Milestone

Comments

@rprouse
Copy link
Member

rprouse commented Nov 28, 2017

These test failures under .NET 3.5 are getting consistent enough that we are getting few passing builds and we had one build time out at 60 minutes in the .NET 3.5 tests.

In PR #2380 I am going to #if net35 and increase the timeout as a temporary workaround, but we should look at this. I am curious why we are having so many .NET 3.5 threading issues, but no other target.

1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous
Test exceeded Timeout value of 1000ms
1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+DequeueBlocking_StopTest.DequeueBlocking_Stop
Test exceeded Timeout value of 1000ms
7) Cancelled : NUnit.Framework.Internal.Execution.EventQueueTests+SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous
Test cancelled by user
   at NUnit.Framework.Internal.Reflect.InvokeMethod(MethodInfo method, Object fixture, Object[] args) in C:\projects\nunit\src\NUnitFramework\framework\Internal\Reflect.cs:line 232
   at NUnit.Framework.Internal.Commands.TestMethodCommand.RunNonAsyncTestMethod(TestExecutionContext context) in C:\projects\nunit\src\NUnitFramework\framework\Internal\Commands\TestMethodCommand.cs:line 109
   at NUnit.Framework.Internal.Commands.TestMethodCommand.Execute(TestExecutionContext context) in C:\projects\nunit\src\NUnitFramework\framework\Internal\Commands\TestMethodCommand.cs:line 64
   at NUnit.Framework.Internal.Commands.BeforeTestCommand.Execute(TestExecutionContext context) in C:\projects\nunit\src\NUnitFramework\framework\Internal\Commands\BeforeTestCommand.cs:line 50
   at NUnit.Framework.Internal.Commands.BeforeAndAfterTestCommand.Execute(TestExecutionContext context) in C:\projects\nunit\src\NUnitFramework\framework\Internal\Commands\BeforeAndAfterTestCommand.cs:line 59
1) Failed : NUnit.Framework.Assertions.CollectionAssertTest.PerformanceTests(System.Linq.Enumerable+<RangeIterator>d__b1)
Elapsed time of 137.7538ms exceeds maximum of 100ms
@rprouse
Copy link
Member Author

rprouse commented Dec 2, 2017

Hit yet another .NET 3.5 test failure today ☹️

  1. Failed : >NUnit.Framework.Internal.Execution.EventQueueTests.PumpSynchronousAndAsynchronousEvents
    Test exceeded Timeout value of 1000ms

@jnm2
Copy link
Contributor

jnm2 commented Dec 2, 2017

I think we should change all timeouts to a ten second minimum. That will help distinguish actual logic errors from CI being overloaded.

@rprouse
Copy link
Member Author

rprouse commented Dec 2, 2017

If it is the CI being overloaded, why is it always the .NET 3.5 tests that fail and never any of the other targets? I am okay with increasing the timeout for 3.5 since it might be inefficiencies in the .NET 3.5 runtime, but I would prefer to keep the other targets where they are since they never seem to fail.

@jnm2
Copy link
Contributor

jnm2 commented Dec 2, 2017

@rprouse That's a good question, but I imagine the same answer would explain either inefficient thread handling or actual logic errors. I guess my question is, is it worth special-casing 3.5? In my line of thought, the non-3.5 tests don't take 1000ms. They take maybe 1ms and would continue to take 1ms even when the timeouts are all changed to 10000ms.

@rprouse
Copy link
Member Author

rprouse commented Dec 2, 2017

Maybe, I would just hate to see slowdowns creep into the code and not be noticed because of the larger timeout, but I also agree that timeouts shouldn't be used for performance testing so I am good either way.

@jnm2
Copy link
Contributor

jnm2 commented Dec 2, 2017

I could be wrong but I haven't seen any of our timeouts measure slowness of code, only binary correctness of logic.

It would be nice to make it a standard that Timeout is only for binary logic failures (simply as a backstop so that we don't wait an hour for CI to fail) and MaxTime for performance measurements (because we'll have actual numbers rather than interrupting it.)

In some cases, you might even use them together but each for a separate purpose.

@jnm2
Copy link
Contributor

jnm2 commented Dec 2, 2017

The .NET 3.5 test run timed out at 60 minutes again.

Well, let's not waste time changing test timeouts to 10,000ms then, since NUnit Framework itself is buggy. There's pretty clearly a logic error since this happened multiple times now that I think about it.

@jnm2
Copy link
Contributor

jnm2 commented Dec 6, 2017

Ran into it again.

@jnm2
Copy link
Contributor

jnm2 commented Dec 7, 2017

Ran into it three more times.

@jnm2
Copy link
Contributor

jnm2 commented Dec 7, 2017

And again.

@rprouse
Copy link
Member Author

rprouse commented Dec 7, 2017

Here is a new .NET 3.5 failure 😦

1) Failed : NUnit.Framework.Assertions.CollectionAssertTest.PerformanceTests(System.Linq.Enumerable+<RangeIterator>d__b1)
Elapsed time of 137.7538ms exceeds maximum of 100ms

@jnm2
Copy link
Contributor

jnm2 commented Dec 7, 2017

I really want to know what's special about net35. I can't think of any way that it's just failing because it's third in the list.

@rprouse
Copy link
Member Author

rprouse commented Dec 7, 2017

I was thinking of using the 3.5 Docker container microsoft/dotnet-framework:3.5 to run and maybe debug our 3.5 tests repeatedly to try to track it down and see if it happens.

@mikkelbu
Copy link
Member

mikkelbu commented Dec 7, 2017

Actually, in https://ci.appveyor.com/project/CharliePoole/nunit/build/3.10.0-ci-04960-pr-2586#L318
3.5 is failing with

1) Failed : NUnit.Framework.Assertions.CollectionAssertTest.PerformanceTests(System.Linq.Enumerable+<RangeIterator>d__b1)
Elapsed time of 137.7538ms exceeds maximum of 100ms

but 4.5 is also failing (in another test)

1) Failed : NUnit.Framework.Assertions.CollectionAssertTest.PerformanceTests(System.Collections.Generic.List`1[System.Double])
Elapsed time of 118.375ms exceeds maximum of 100ms

@jnm2
Copy link
Contributor

jnm2 commented Dec 7, 2017

Came here to say the same as @mikkelbu.

@jnm2
Copy link
Contributor

jnm2 commented Dec 8, 2017

Observed a higher time on AppVeyor 4.5:

  1. Failed : NUnit.Framework.Assertions.CollectionAssertTest.PerformanceTests(System.Collections.Generic.List`1[System.Double])
    Elapsed time of 370.5775ms exceeds maximum of 100ms

@jnm2
Copy link
Contributor

jnm2 commented Dec 8, 2017

We should probably multiply all our performance tests by at least ten when running under CI. That was where my Has.MaxTime request came from.

@jnm2
Copy link
Contributor

jnm2 commented Dec 11, 2017

We'll be forced to take care of the performance tests next in order to ship a .NET Standard 2.0 DLL: #2555

@rprouse
Copy link
Member Author

rprouse commented Dec 12, 2017

Another .NET 3.5 failure

1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous
  Expected: null
  But was:  <NUnit.Framework.AssertionException:   Expected: True
  But was:  False
   at NUnit.Framework.Assert.ReportFailure(String message)
   at NUnit.Framework.Assert.That[TActual](TActual actual, IResolveConstraint expression, String message, Object[] args)
   at NUnit.Framework.Internal.Execution.EventQueueTests.SetWaitHandle_Enqueue_AsynchronousTest.Consumer() in C:\projects\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 196
   at NUnit.Framework.Internal.Execution.EventQueueTests.ProducerConsumerTest.ConsumerThreadWrapper() in C:\projects\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 316>
   at NUnit.Framework.Internal.Execution.EventQueueTests.ProducerConsumerTest.RunProducerConsumer() in C:\projects\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 304
   at NUnit.Framework.Internal.Execution.EventQueueTests.SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous() in C:\projects\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 181

@jnm2
Copy link
Contributor

jnm2 commented Dec 12, 2017

Latest AppVeyor net35 failure. This time it looks like the whole process died:

System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
Server stack trace: 
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
   at System.Runtime.Remoting.Channels.SocketStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Runtime.Remoting.Channels.SocketHandler.ReadFromSocket(Byte[] buffer, Int32 offset, Int32 count)
   at System.Runtime.Remoting.Channels.SocketHandler.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.Runtime.Remoting.Channels.Tcp.TcpFixedLengthReadingStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.BinaryReader.ReadBytes(Int32 count)
   at System.Runtime.Serialization.Formatters.Binary.SerializationHeaderRecord.Read(__BinaryParser input)
   at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadSerializationHeaderRecord()
   at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()
   at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(HeaderHandler handler, __BinaryParser serParser, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
   at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream, HeaderHandler handler, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
   at System.Runtime.Remoting.Channels.CoreChannel.DeserializeBinaryResponseMessage(Stream inputStream, IMethodCallMessage reqMsg, Boolean bStrictBinding)
   at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)
Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at NUnit.Engine.ITestAgent.Stop()
   at NUnit.Engine.Runners.ProcessRunner.Dispose(Boolean disposing)
   at NUnit.Engine.Runners.AbstractTestRunner.Dispose()
   at NUnit.Engine.Runners.MasterTestRunner.Dispose(Boolean disposing)
   at NUnit.Engine.Runners.MasterTestRunner.Dispose()
   at NUnit.ConsoleRunner.ConsoleRunner.RunTests(TestPackage package, TestFilter filter)
   at NUnit.ConsoleRunner.Program.Main(String[] args)

@jnm2
Copy link
Contributor

jnm2 commented Dec 12, 2017

New AppVeyor/net35 crash:

Unable to acquire remote process agent
   at NUnit.Engine.Runners.ProcessRunner.CreateAgentAndRunner()
   at NUnit.Engine.Runners.ProcessRunner.RunTests(ITestEventListener listener, TestFilter filter)

@jnm2
Copy link
Contributor

jnm2 commented Jan 24, 2018

I'm starting to see this every now and then on Travis/netstandard2.0:

3) Failed : NUnit.Framework.Internal.Execution.TestWorkerTests.BusyExecuteIdleEventsCalledInSequence
  Expected: "BusyExecIdle" after 200 milliseconds delay
  But was:  "Busy"

@rprouse
Copy link
Member Author

rprouse commented Feb 27, 2018

I've started looking at this. First step is to start reproducing locally.

Notes so far

Running .NET 3.5 tests directly on my laptop

I cannot reproduce just running the .NET 3.5 tests on my machine. Best guess is that it is related to number of cores? Pretty sure that the AppVeyor VMs have one core, so we default to two workers to run. I've tried locally with one and two workers, but no luck.

Running tests in a .NET 3.5 docker container

Next test, creating a docker image with this Dockerfile,

FROM microsoft/dotnet-framework:3.5
WORKDIR \app
COPY tools/nunit.consolerunner.3.7.0/NUnit.ConsoleRunner/tools/* ./
ENTRYPOINT ["nunit3-console.exe", "net35/nunit.framework.tests.dll"]

Which I build with

docker build -t rprouse/nunit .

First build is slow as it pulls all of the base images needed. Subsequent are much quicker. I can then run the tests with,

docker run -it -v c:\src\nunit\nunit\bin\Release\net35:c:\app\net35 rprouse/nunit

Number of workers is correct, but 10 runs succeeded 😦

Run multiple docker instances simultaneously

Run the .NET 3.5 tests in three docker containers at the same time to put the system under load?

Woot, got it two of the three test runs failed.

1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+DequeueBlocking_StopTest.DequeueBlocking_Stop
Test exceeded Timeout value of 2000ms

and

1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous
Test exceeded Timeout value of 2000ms

It isn't failing every time, but getting closer...

@rprouse
Copy link
Member Author

rprouse commented Feb 27, 2018

Comes back a bit sheepish 🐑

3 docker containers worked, so I decided to see if it isn't cores related and just ran three sets of tests on my laptop at the same time.

.\tools\nunit.consolerunner.3.7.0\NUnit.ConsoleRunner\tools\nunit3-console.exe .\bin\Release\net35\nunit.framework.tests.dll

It took a few more tries, but I've managed to reproduce again,

1) Failed : NUnit.Framework.Internal.Execution.EventQueueTests+SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous
  Expected: null
  But was:  <NUnit.Framework.AssertionException:   Expected: True
  But was:  False

   at NUnit.Framework.Assert.ReportFailure(String message) in C:\src\nunit\nunit\src\NUnitFramework\framework\Assert.cs:line 397
   at NUnit.Framework.Assert.That[TActual](TActual actual, IResolveConstraint expression, String message, Object[] args) in C:\src\nunit\nunit\src\NUnitFramework\framework\Assert.That.cs:line 246
   at NUnit.Framework.Internal.Execution.EventQueueTests.SetWaitHandle_Enqueue_AsynchronousTest.Consumer() in C:\src\nunit\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 196
   at NUnit.Framework.Internal.Execution.EventQueueTests.ProducerConsumerTest.ConsumerThreadWrapper() in C:\src\nunit\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 316>
   at NUnit.Framework.Internal.Execution.EventQueueTests.ProducerConsumerTest.RunProducerConsumer() in C:\src\nunit\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 304
   at NUnit.Framework.Internal.Execution.EventQueueTests.SetWaitHandle_Enqueue_AsynchronousTest.SetWaitHandle_Enqueue_Asynchronous() in C:\src\nunit\nunit\src\NUnitFramework\tests\Internal\EventQueueTests.cs:line 181

Initial conclusion is that it is load based, not the number of cores. Oh well, it was fun playing with docker 😄

@rprouse
Copy link
Member Author

rprouse commented Mar 10, 2018

This isn't fully fixed, we had a test hang in https://ci.appveyor.com/project/CharliePoole/nunit/build/3.10.0-ci-05316-issue-2405. There have been two other failures, but they don't have the new code because the fork wasn't updated. It is looking promising for some of the callstacks we've seen.

I am leaving open and moving to the 3.11 milestone.

@rprouse rprouse modified the milestones: 3.10, 3.11 Mar 10, 2018
@jnm2
Copy link
Contributor

jnm2 commented Mar 10, 2018

Your work so far is impressive! I'm hopeful.

@rprouse rprouse modified the milestones: 3.11, 3.12 Aug 16, 2018
@rprouse
Copy link
Member Author

rprouse commented Apr 22, 2019

@jnm2 I think this is fixed. Have you seen recent occurrences?

@jnm2
Copy link
Contributor

jnm2 commented Apr 22, 2019

@rprouse No, but I haven't been triggering builds in this repo recently.
Most recent net35 failure is a hang: https://ci.appveyor.com/project/CharliePoole/nunit/builds/23999000
But then here's netstandard2.0 hanging as well, so I think it's a separate issue: https://ci.appveyor.com/project/CharliePoole/nunit/builds/23162726

@rprouse
Copy link
Member Author

rprouse commented Apr 23, 2019

I am going to close this as fixed. We can open another issue if it crops up again. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed:done is:bug pri:critical Use this to indicate a hotfix may be necessary
Projects
None yet
Development

No branches or pull requests

3 participants