Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console runner performance varies wildly depending on environmental characteristics #2217

Closed
mintsoft opened this issue Jun 4, 2017 · 70 comments

Comments

@mintsoft
Copy link

mintsoft commented Jun 4, 2017

Hiya,

We've just rolled out NUnit 3 for the majority of our build suite, and we've found some interesting (to say the least) performance behaviour. I think this may be a duplicate of: #480

Basically we noticed that tests "are slow on CI"; after eliminating the hardware and installation etc. We've found that tests when executed through NUnit-console are quicker when Visual Studio is open, than when it's closed. This seems insane behaviour, however it is absolutely what we see (absolutely repeatable on multiple machines, including a freshly installed machine):

  • With VS closed: 9.131 seconds
  • With VS open: 2.54 seconds.

It happens with every test project we have with comparable performance differences (i.e. 1 minute to 4 minutes on the larger suites). When I say with Visual Studio open, I don't mean the relevant sln or csproj open, I mean just the application itself.

Here the first run was with VS open, the next 2 with it closed and then it open again:

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe test\Web\Tests.csproj
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    test\Web\Tests.csproj


Run Settings
    BasePath: D:\code\test\Web\bin\
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 18:37:09Z
    End time: 2017-06-04 18:37:12Z
    Duration: 2.540 seconds

Results (nunit3) saved as TestResult.xml

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe test\Web\Tests.csproj
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    test\Web\Tests.csproj


Run Settings
    BasePath: D:\code\test\Web\bin\
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 18:37:20Z
    End time: 2017-06-04 18:37:29Z
    Duration: 9.131 seconds

Results (nunit3) saved as TestResult.xml

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe test\Web\Tests.csproj
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    test\Web\Tests.csproj


Run Settings
    BasePath: D:\code\test\Web\bin\
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 18:37:53Z
    End time: 2017-06-04 18:38:02Z
    Duration: 9.157 seconds

Results (nunit3) saved as TestResult.xml

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe test\Web\Tests.csproj
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    test\Web\Tests.csproj


Run Settings
    BasePath: D:\code\test\Web\bin\
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 18:43:47Z
    End time: 2017-06-04 18:43:50Z
    Duration: 2.685 seconds

Results (nunit3) saved as TestResult.xml

D:\code>

This is running on:

  • Server 2012R2
  • VS 2015 Pro Update 3 (14.0.25420.01 Update 3)
  • We do have ReSharper 2016.2 installed (it was mentioned extensively in Slow performance of nunit-console #480)
  • Test project is a .NET 4.6.1 against NUnit 3.6.1.0 using the NUnit Console 3.6.1.0

I'm at a loss how to explain or debug this further; happy to provide any info or run any tests you like ??

@CharliePoole
Copy link
Contributor

Running from the project file is somewhat unusual. That, combined with the fact that VS is involved, makes me suspicious. Can you see what happens if you just run the assembly?

@mintsoft
Copy link
Author

mintsoft commented Jun 4, 2017

@CharliePoole That's a good point, however I've just tried it with the same behaviour:

Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\Users\Developer>cd /D D:\code

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe D:\code\test\Web\bin\Tests.dll
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    D:\code\test\Web\bin\Tests.dll


Run Settings
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 20:37:00Z
    End time: 2017-06-04 20:37:07Z
    Duration: 7.195 seconds

Results (nunit3) saved as TestResult.xml

D:\code>NUnit.ConsoleRunner.3.6.1\tools\nunit3-console.exe D:\code\test\Web\bin\Tests.dll
NUnit Console Runner 3.6.1
Copyright (C) 2017 Charlie Poole

Runtime Environment
   OS Version: Microsoft Windows NT 6.3.9600.0
  CLR Version: 4.0.30319.42000

Test Files
    D:\code\test\Web\bin\Tests.dll


Run Settings
    DisposeRunners: True
    WorkDirectory: D:\code
    ImageRuntimeVersion: 4.0.30319
    ImageTargetFrameworkName: .NETFramework,Version=v4.6.1
    ImageRequiresX86: False
    ImageRequiresDefaultAppDomainAssemblyResolver: False
    NumberOfTestWorkers: 8

Test Run Summary
  Overall result: Passed
  Test Count: 175, Passed: 175, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
  Start time: 2017-06-04 20:38:17Z
    End time: 2017-06-04 20:38:20Z
    Duration: 2.540 seconds

Results (nunit3) saved as TestResult.xml

D:\code>

The first run (~7seconds) was with VS closed, the second (~2seconds) was with it open.

@CharliePoole
Copy link
Contributor

Interesting. As developers, we are almost always running NUnit tests locally with VS running, so those are the times we see. I'll do some experiments.

@mintsoft
Copy link
Author

mintsoft commented Jun 4, 2017

As developers, we are almost always running NUnit tests locally with VS running, so those are the times we see.

That's exactly what we found, it was only on CI that we noticed the run-time difference initially.

@CharliePoole
Copy link
Contributor

I am unable to duplicate this so I'm marking it for somebody else to try to confirm.

When you did your experiments, did you try it with VS open but no solution loaded? I wonder if it could have to do with some assemblies being in memory already.

I'm assuming that neither your tests nor your SUT uses VS in any way - i.e. you are not loading the VS object model or anything like that.

@mintsoft
Copy link
Author

mintsoft commented Jun 4, 2017

@CharliePoole Yeah it was VS open with no projects loaded at all vs it unopened. The tests are totally in-memory standalone things. There's no relationship to VS in the tests at all.
What version of Windows are you using? I'm wondering if it's a Server vs Desktop runtime style difference

@CharliePoole
Copy link
Contributor

I'm running on Windows 10.

@rprouse
Copy link
Member

rprouse commented Jun 5, 2017

@mintsoft just a guess, but could you repeat the test with another .NET program running, even just a simple console app blocked in a Console.Readline()? It is a stretch, but I am curious if it isn't anything to do with Visual Studio, but instead the fact that a .NET program is running and therefore the CLR is in memory.

Also, are you only able to reproduce this on your CI machine or are you also reproducing this on your development machines?

@mintsoft
Copy link
Author

mintsoft commented Jun 5, 2017

@rprouse I've just tried that, the tests ran in ~9 seconds (that's the slow time) with a console application running in the background.

I can reproduce this on dev machines easily enough even with a fresh install of windows.

I've been experimenting with other applications being open, interestingly enough it doesn't seem to be limited to Visual Studio. One that was particularly interesting is the ModernUI WPF Example application (https://github.com/firstfloorsoftware/mui/tree/master/1.0/FirstFloor.ModernUI/FirstFloor.ModernUI.App). I compiled it, ran it and found that if it's running when the tests are executed they again run at about 2.5/3 seconds (fast speed). Close it, run the tests again and I'm back to 8 seconds. I am very confused now

@CharliePoole
Copy link
Contributor

Maybe we are looking at this wrong. Could the problem be in how we report the time?

@mintsoft
Copy link
Author

mintsoft commented Jun 5, 2017

Maybe we are looking at this wrong. Could the problem be in how we report the time?

@CharliePoole I don't think so, when I'm watching the runs it physically takes longer to watch. I can do a video of the test if you want?

@mintsoft
Copy link
Author

mintsoft commented Jun 5, 2017

I've done a quick gif showing it: 2017-06-05_23-08-34

There's nothing else open at all other than ShareX, cmd and nunit-console.

There are 3 runs that are slow (8-9 seconds), then I open the ModernUI app I've compiled (based on the one linked to earlier, but with stuff deleted from it) it's down to <3 seconds. I close the ModernUI app again, and it's back to 8-9 seconds.

I'm currently deleting stuff from the ModernUI application to see if I can delete something that will make the effect go away. Also debug vs release complication of the ModernUI app seems to make no difference, it's still .. making it quicker

@rprouse
Copy link
Member

rprouse commented Jun 6, 2017

Looking at the pauses in the output, it looks like it is happening while starting up the remote agent. The actual running of the tests appears to be the same. Could you try running the tests with the --inprocess command line switch to see if it makes a difference and help narrow down the problem? I am thinking it is setting up remoting, networking or process launching.

@rprouse
Copy link
Member

rprouse commented Jun 6, 2017

Sorry, I am wrong, the Run Settings is output at the end of the test run. It still could be related to the remote agent though, so please try the --inprocess switch and let me know the results.

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

@rprouse I've tested that anyway, the --inprocess has no effect on either time. It's still ~ 3seconds vs 8seconds.

@CharliePoole
Copy link
Contributor

That's expected since the timing is done entirely within the framework code.

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

@CharliePoole I've managed to reproduce it on my Windows 10 pro laptop with basically the same behaviour. 11 Seconds normal, 3 with the ModernUI Application open. So it doesn't look like it's limited to Windows Server 2012 R2

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

I've also reproduced it on my personal desktop machine running Windows 10 Pro, although I had to reboot in order to get the "slow" time, then I got the same behaviour as 2012 R2 and my laptop.
Also my old Windows 7 Laptop has the same behaviour

@indy-singh
Copy link
Contributor

indy-singh commented Jun 6, 2017

Hi All,

I can reproduce this behaviour. With the ModernUI Application closed, tests take around 10-11 seconds to complete. With it open tests take around 3-4 seconds.

My tests can be found here. I have attached the verbose output for the slow and fast runs.

slow.txt
fast.txt

Edit* The above results are from my desktop PC running Windows 10 v1703. I have carried out the exact same scenario on my Surface Pro 3 (which is running Windows 8.1). And I get the same behaviour.

fast-surface.txt
slow-surface.txt

Cheers,
Indy

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

@singh400 Do you have the output xml files from the runs?

@indy-singh
Copy link
Contributor

@mintsoft These are from my desktop PC running Windows 10 v1703:-

Slow_TestResult.txt
Fast_TestResult.txt

Cheers,
Indy

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

Thanks, it's interesting. Looking at a few tests there doesn't seem to anything obvious except that the durations are longer on the slower run for all the tests :/ Strange

@indy-singh
Copy link
Contributor

@mintsoft I get the same behaviour if I use AnimationExamples instead of ModernUI app.

Can you confirm?

Cheers,
Indy

@indy-singh
Copy link
Contributor

Found a better example; AnimationTiming.

  1. Build AnimationTiming. Open and keep the .exe open.

  2. Run the test suite. Test execution is slow.

  3. In AnimationTiming click "Start Animations" under the tab "RepeatBehavour Example"

  4. Run the test suite. Test execution is fast.

  5. In AnimationTiming click "Stop" under the tab "RepeatBehavour Example"

  6. Run the test suite. Test execution is slow.

Can anyone else confirm?

@mintsoft
Copy link
Author

mintsoft commented Jun 6, 2017

@singh400 Yes I can confirm that is exactly what I'm seeing. Half of those examples didn't compile for me, however the AnimationTiming behaves exactly like you stated.

This is really odd, it seems like maybe WPF animations are forcing the framework into some sort of high resolution timing mode which is having an effect on the way that NUnit's console runner is scheduling the test execution?

@indy-singh
Copy link
Contributor

indy-singh commented Jun 6, 2017

@mintsoft To be honest, I have no idea. I think it is GPU/GFX related See edit. Just played about with stopping/starting this video. I initially picked the 8K in the hopes it would make the tests fast. It didn't. Picked 144p, and I am seeing the slow/fast behave as I stop/start the video. I embedded the video in a simple html file to ensure nothing else was having an effect.

I think what is happening is that the WPFs apps hint/invoke the GPU See edit as do playing videos in Chrome. That then has a side-effect in NUnit to make the tests go fast. My knowledge beyond this point is pretty sketchy, so I can only guess and begin to make assumptions.

* Edit: Chrome, WPF make changes to the windows timer resolution. Next three posts contain more info.

@mintsoft
Copy link
Author

mintsoft commented Jun 7, 2017

I think what is happening is that the WPFs apps hint/invoke the GPU as do playing videos in Chrome. That then has a side-effect in NUnit to make the tests go fast. My knowledge beyond this point is pretty sketchy, so I can only guess and begin to make assumptions.

Well indeed, this all feels like guesswork, however I was looking at how NUnit schedules the test runs:
https://github.com/nunit/nunit-console/blob/master/src/NUnitEngine/nunit.engine/Runners/AggregatingTestRunner.cs#L160

I can't see anywhere that LevelOfParallelism is actually set to anything other than 1.

@appel1
Copy link

appel1 commented Jun 7, 2017

Sounds like this could be related to changes in the windows timer resolution.

@indy-singh
Copy link
Contributor

@appel1 So I found this blog post. Using the utility ClockRes I was able to confirm a change in the "Current timer interval" when starting and stopping the animations in AnimationTiming. Which matches the fast and slow behaviour. I also checked when watching a video in Chrome, same changes to the timer interval.

@oznetmaster
Copy link
Contributor

oznetmaster commented Jun 8, 2017

We changed to Sleep(1) from Sleep(0) because Microsoft issued a document that basically stated that Sleep(0) should never be used!

It had nothing to do with the CF build.

I think I referenced the document in the original issue. I have not searched for it, but it should be there.

I think that Thread.Yield () could be used instead, but there were issues with that as well (besides the fact that it does not exist in CF).

@mintsoft
Copy link
Author

mintsoft commented Jun 9, 2017

I think that Thread.Yield () could be used instead, but there were issues with that as well (besides the fact that it does not exist in CF).

IIRC Thread.Yield won't allow a thread from another core to be time-sliced in, only threads that are already scheduled for execution on the core that's yielded. I.e. it doesn't force a context switch

@appel1
Copy link

appel1 commented Jun 9, 2017

Is there a reason for the event pump thread to run with highest priority? The commit comment that it fixes a deadlock seems odd to me. Always tricky when you mess with the thread priorities.

@CharliePoole
Copy link
Contributor

@appel1 Where is this coming from? The event pump thread should not have highest priority at all. It makes blocking calls to unknown event handlers.

@CharliePoole
Copy link
Contributor

Wow! That has been in there since NUnit 2. Back then, it made a bit more sense. We had control over the event handlers and knew what they did. And using a higher priority made it possible for Sleep(0) to work.

@appel1
Copy link

appel1 commented Jun 9, 2017

I'm most likely missing something but I can't observe any issues using nunit-console, nunit-gui or R# if I remove changing the thread priority and sleeps in dequeue even when running the tests in-process.

What problem is that code trying to solve? Don't have access to any single core non-ht machines to test on so maybe that is it.

        public void Enqueue(Event e)
        {
            // Add to the collection
            _queue.Enqueue(e);

            // Wake up threads that may have been sleeping
            _mreAdd.Set();
        }

        public Event Dequeue(bool blockWhenEmpty)
        {
            do
            {
                // Dequeue our work item
                Event e;
                if (_queue.TryDequeue (out e))
                {
                    return e;
                }

                if (!blockWhenEmpty || Interlocked.CompareExchange(ref _stopped, 0, 0) != 0)
                    return null;
                
                _mreAdd.Wait(-1);

                if (Interlocked.CompareExchange(ref _stopped, 0, 0) != 0)
                    return null;

            } while (true);
        }

@CharliePoole
Copy link
Contributor

My inclination is to drop the priority-setting and the Sleep and see what happens over the next month. We can adjust before a release if we have to.

@indy-singh
Copy link
Contributor

Are we any closer to choosing a solution (#2233 or #2236)? It has been 18 days since the issue was raised, and 14 days since the regression was identified in commit 9086990.

It reads like that the majority of people prefer #2236 in which case I can close #2233 and we can just focus on getting #2236 into the next release.

@CharliePoole
Copy link
Contributor

The rule is that it requires two committers to approve a merge to master. Neither of these has that yet. Usually, the PR submitter simply makes some changes to get the PR approved, but in this case we appear to have devolved into a more general discussion. 😞

Since we have two PRs, I'll summarize here...

#2233 Simply changes the Sleep period. It had one review from @jnm2 who requested changes.

  • He wanted an explanation of why the Sleep was there in the first place, which I think I answered.
  • He wanted a test, which I'm not sure how to write. In any case, the tests of the event queue are fairly extensive.

#2236 Drops the Sleep entirely and eliminates the high priority, which is what I asked for in one of my comments. I think that's the easiest solution. However, it goes on to make a bunch of other changes that don't seem related to this problem. This has one review from @rprouse and comments from me.

  • @rprouse Approved this for merge.
  • I declined to review but made some comments. I'm concerned that the "extra" changes in this PR may break what was fixed a year ago... a general lack of responsiveness in the GUI. Frankly, I would not have known how to do what @oznetmaster did back then, so I don't want to lose it without a good reason. I do wish he had left us more comments though! Since @oznetmaster is not available, I'll go back and do one.

I'm going to add reviews to each PR. My inclination is to use #2233 right away and hold #2236 for more discussion. @oznetmaster is changing continents right now and will eventually become available again.ging #2233 will make the change available on our MyGet feed for anyone who wants an immediate fix.

@CharliePoole
Copy link
Contributor

I notice that this still is marked as needing confirmation, although it was confirmed a while back. I'll fix.

@rprouse
Copy link
Member

rprouse commented Jun 22, 2017

Since this is fixed by #2233, should we close this issue, or should we keep it open as we discuss #2236?

@CharliePoole
Copy link
Contributor

I'd keep it open, since it has the most thorough discussion. It started out as a kind of puzzle and it was only gradually turned into a workable bug by folks on the thread.

@rprouse
Copy link
Member

rprouse commented Aug 22, 2017

@jskeet has confirmed that #2233 has resolved his issue as reported in nunit/dotnet-test-nunit#109. #2236 and this issue have been quiet, should we close?

@CharliePoole
Copy link
Contributor

I think that the merge of #2233 was intended to close this.

@mintsoft
Copy link
Author

I think judging by #2217 (comment) you wanted to keep it open until a decision is made with #2236

@CharliePoole
Copy link
Contributor

Ah... too many issues, too many comments. 😄

I'd still say close it. The references from the other issue will still be there.

@mintsoft
Copy link
Author

That's fine with me

@mintsoft
Copy link
Author

Fixed by @singh400 in #2233

@jnm2
Copy link
Contributor

jnm2 commented Nov 1, 2018

By reading @indy-singh's great writeup at https://medium.com/@indy_singh/fixing-nunit-slow-test-execution-a67c284355c4, I became aware that JetBrains opted not to use Sleep(0) because it's not guaranteed to yield, as we've discussed in several threads. Instead, they use timeBeginPeriod/timeEndPeriod to get Windows to switch to a more accurate resolution. (See https://youtrack.jetbrains.com/issue/PROF-715#focus=streamItem-27-2769768-0-0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants