Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233

indy-singh · 2017-06-08T20:18:32Z

Issue #2217 covers this regression in detail.

jnm2

Before we fix this I'd like to have a good idea of why it was important. Thread.Sleep(1) forces a context switch and Thread.Sleep(0) does not force a context switch. We may be losing the ability for the event pump to process an event which has complex side effects on something else which wasn't documented here when the code was first written. The original change from 0 to 1 should have come with a test demonstrating the necessity, but I think we should at least try to figure it out and add that test now if possible. There may be a better way to solve the original problem than either Thread.Sleep(1) or Thread.Sleep(0).

I'd also love if you could write a timing test which fails before the change and passes after the change. That will help keep us from getting into the very same kind of scenario again.

CharliePoole · 2017-06-08T21:32:09Z

@jnm2 You're right... that's why we did it. Events were not being sent, which was quite noticeable in the GUI. A better way that doesn't wait 15ms would be a wonderful thing. 😄

mintsoft · 2017-06-08T22:15:36Z

I'd also love if you could write a timing test which fails before the change and passes after the change. That will help keep us from getting into the very same kind of scenario again.

Hmm, I think that might be quite hard given that VS sets the timer interval to 1; the tests would pass when executed in VS. However when ran on a machine with nothing running, they would fail.

I guess one option would be to reset the timer interval to default in the setup of the test, it does mean calling an unmanaged, undocumented kernel function in ntdll though, probably not portable to mono!

CharliePoole · 2017-06-08T23:08:52Z

IMO, such a test is not a unit test. If we had one it should be elsewhere.

CharliePoole · 2017-06-09T01:39:50Z

@jnm2 Were you about to offer an alternative to Sleep?

jnm2 · 2017-06-09T01:49:50Z

@mintsoft

I guess one option would be to reset the timer interval to default in the setup of the test, it does mean calling an unmanaged, undocumented kernel function in ntdll though, probably not portable to mono!

Yes, I see your point. Next best thing: can you please document in a comment above the line exactly why it should not be Thread.Sleep(1)? We should have that, no matter what fix we land on.

jnm2 · 2017-06-09T01:55:06Z

@CharliePoole

I'm still interested in getting a unit test of the original problem, if that's possible.

Were you about to offer an alternative to Sleep?

Yes, but I want to understand the problem. Now that you said GUI, I only have more questions though.

Thread.Sleep and this event pump were not being run on a GUI thread, were they? I believe Thread.Sleep only pumps a few whitelisted messages. If this code was causing GUI to lock up it's a perfect candidate for moving to a background thread.
(If Thread.Sleep was being used as a poor man's Application.DoEvents to synchronously pump messages and then continue, all the more reason to move it to a background thread. DoEvents is dangerous because it causes reentry in GUI logic. I can tell many war stories.)

CharliePoole · 2017-06-09T01:58:57Z

I just refreshed my memory by doing a quick search. Bottom line is that many articles say your best bet is to use Sleep(1) to avoid starving the consumer thread in a situation like this. That's clearly why we made the change, as @oznetmaster points out.

I'm inclined to accept the change back to Sleep(0), which we had clearly been using for a long time before we changed it. It may be that Windows load balancing takes care of the problem for us automatically. We could (later) even experiment with eliminating the sleep entirely. Since we just did a release, we do have some time to play with this change in master.

CharliePoole · 2017-06-09T02:03:27Z

I'm a little suspicious here since the Travis failure is related to timing.

CharliePoole · 2017-06-09T04:03:32Z

No, this is many threads away from any gui. We're in the framework here. The point is only that these events are consumed by runners, including any gui, which use them to show progress. If the event pump thread is starved of time, then it appears to the runner that nothing is happening in the tests.

Of course, all of this was invented back in the world of single-core machines. In most cases today, we have multiple threads running at the same time, not just pretending. However, we may occasionally have to run on a single processor even today.

mintsoft · 2017-06-09T07:27:07Z

Of course, all of this was invented back in the world of single-core machines. In most cases today, we have multiple threads running at the same time, not just pretending. However, we may occasionally have to run on a single processor even today.

One option, would be to only sleep(1) on a single core machine? If you're using a single core machine, you can kiss parallel test performance goodbye so the 15ms pause is probably tolerable; the moment you have more than 1 core, then it becomes irrelevant.

oznetmaster · 2017-06-09T07:31:57Z

On a single core machine, if any of the tests do blocking I/O, then parallel test performance can be significant.

mintsoft · 2017-06-09T10:16:36Z

On a single core machine, if any of the tests do blocking I/O, then parallel test performance can be significant.

That's true, however my point is that it doesn't feel like a use case which should be prioritised? Plus if the IO is stalled, would you notice a 15ms pause in that execution?

CharliePoole · 2017-06-09T11:42:40Z

@singh400 Have you looked at the failing test?

indy-singh · 2017-06-09T19:32:58Z

@CharliePoole Can not get them to fail on my machine.

CharliePoole · 2017-06-09T19:54:52Z

Failure is only on linux and only on the .NET 2.0 build. Is anyone else able to duplicate it?

jnm2

Sorry, I forgot to come back to this. Comments documenting the reason 1 is bad were all I was waiting for.

CharliePoole · 2017-06-22T18:47:10Z

Travis failure is an unrelated intermittent error. Restarted.

CharliePoole · 2017-06-22T19:15:15Z

I'm approving this change. I would actually prefer to see the Sleep removed entirely and the high priority setting on the event pump thread removed. However, this will improve performance and we can change it further later.

@rprouse I don't see this as a hotfix item myself. The problem seems to have existed since 3.4.0. Of course, if you do a hotfix for another reason, this or a follow-on larger fix should go in it.

mintsoft · 2017-06-22T21:14:35Z

Thanks for this @singh400

indy-singh · 2017-06-22T21:25:57Z

Thanks for merging this PR @CharliePoole . How does one match up the correct build on MyGet? I presume this PR was automatically built and pushed to MyGet?

MyGet currently lists (https://www.myget.org/feed/nunit/package/nuget/NUnit) the following builds that occurred today:-

3.8.0-dev-04109
3.8.0-dev-04110

The AppVeyor out suggests that this PR was built as NUnit.3.8.0-ci-04074-pr-2233 except I can not locate that on MyGet?

Cheers,
Indy

CharliePoole · 2017-06-22T21:34:26Z

Those builds on appveyor with "ci" are all prior to merge. The build after merge is "dev", this one being https://ci.appveyor.com/project/CharliePoole/nunit/build/3.8.0-dev-04110

mintsoft · 2017-07-07T20:50:48Z

@CharliePoole Is there a release schedule for the next version of NUnit? I'm wondering when we can expect to upgrade to NUnit 3 again

CharliePoole · 2017-07-07T22:08:05Z

@rprouse That's a question for you. 😄

rprouse · 2017-07-11T00:19:00Z

@mintsoft we try to release quarterly and the next release is scheduled for end-Aug, https://github.com/nunit/nunit/milestone/30

indy-singh · 2017-07-11T21:01:35Z

@rprouse I'd hope that a regression of this type would be severe enough to issue a hot fix (v3.7.2 maybe?).

indy-singh · 2017-07-23T23:21:03Z

@rprouse Bump - I'd like an answer to whether or not a hot-fix will be released. Currently v3.7.1 is having an non-trivial impact on our CI server. I'd rather not move to a semi-official release from MyGet if that can be helped.

mintsoft · 2017-07-24T08:53:16Z

Same here @singh400 I'd like to upgrade to NUnit3 again 👍

rprouse · 2017-07-24T11:25:53Z

Are there features in 3.7 that you need that would prevent you from using an earlier 3.x release that doesn't have this problem?

3.8 is due out next month. I might be willing to ship a bit early, but I would prefer not to release a hot fix this far out.

mintsoft · 2017-08-15T07:00:40Z

Are there features in 3.7 that you need that would prevent you from using an earlier 3.x release that doesn't have this problem?

Personally, no there aren't; I just don't really want to incur the pain of upgrading platform wide more than I have to.

indy-singh · 2017-08-17T21:31:40Z

Are there features in 3.7 that you need that would prevent you from using an earlier 3.x release that doesn't have this problem?

No. Just that I was hoping that once the regression was identified and fixed that a hotfix would be released without delay. Especially since I proved that this bug has the potential to impact CI flow by adding minutes to test execution.

But given that the next scheduled release is due at the end of this month, we might as well wait for it.

jnm2 · 2018-09-14T01:30:20Z

Related: #3019

See issue nunit#2217

e26e895

jnm2 requested changes Jun 8, 2017

View reviewed changes

Added comments explain change.

43b83c4

mintsoft mentioned this pull request Jun 18, 2017

Simplified event queue for discussion relating to #2217 #2236

Closed

indy-singh mentioned this pull request Jun 22, 2017

Console runner performance varies wildly depending on environmental characteristics #2217

Closed

jnm2 approved these changes Jun 22, 2017

View reviewed changes

CharliePoole merged commit 7bdec91 into nunit:master Jun 22, 2017

ChrisMaddock mentioned this pull request Aug 21, 2017

Massive performance discrepancy between net451 and netcoreapp1.0 with large parameterization nunit/dotnet-test-nunit#109

Closed

jnm2 mentioned this pull request Sep 14, 2018

Attempt to fix hangs by using Sleep(1) instead of Sleep(0) for the polyfill's replacement of Thread.Yield #3019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233

Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233

indy-singh commented Jun 8, 2017

jnm2 left a comment

CharliePoole commented Jun 8, 2017

mintsoft commented Jun 8, 2017

CharliePoole commented Jun 8, 2017

CharliePoole commented Jun 9, 2017

jnm2 commented Jun 9, 2017

jnm2 commented Jun 9, 2017 •

edited

CharliePoole commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

mintsoft commented Jun 9, 2017

oznetmaster commented Jun 9, 2017

mintsoft commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

indy-singh commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

jnm2 left a comment •

edited

CharliePoole commented Jun 22, 2017

CharliePoole commented Jun 22, 2017

mintsoft commented Jun 22, 2017

indy-singh commented Jun 22, 2017

CharliePoole commented Jun 22, 2017

mintsoft commented Jul 7, 2017

CharliePoole commented Jul 7, 2017

rprouse commented Jul 11, 2017

indy-singh commented Jul 11, 2017

indy-singh commented Jul 23, 2017

mintsoft commented Jul 24, 2017

rprouse commented Jul 24, 2017

mintsoft commented Aug 15, 2017

indy-singh commented Aug 17, 2017

jnm2 commented Sep 14, 2018

Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233

Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233

Conversation

indy-singh commented Jun 8, 2017

jnm2 left a comment

Choose a reason for hiding this comment

CharliePoole commented Jun 8, 2017

mintsoft commented Jun 8, 2017

CharliePoole commented Jun 8, 2017

CharliePoole commented Jun 9, 2017

jnm2 commented Jun 9, 2017

jnm2 commented Jun 9, 2017 • edited

CharliePoole commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

mintsoft commented Jun 9, 2017

oznetmaster commented Jun 9, 2017

mintsoft commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

indy-singh commented Jun 9, 2017

CharliePoole commented Jun 9, 2017

jnm2 left a comment • edited

Choose a reason for hiding this comment

CharliePoole commented Jun 22, 2017

CharliePoole commented Jun 22, 2017

mintsoft commented Jun 22, 2017

indy-singh commented Jun 22, 2017

CharliePoole commented Jun 22, 2017

mintsoft commented Jul 7, 2017

CharliePoole commented Jul 7, 2017

rprouse commented Jul 11, 2017

indy-singh commented Jul 11, 2017

indy-singh commented Jul 23, 2017

mintsoft commented Jul 24, 2017

rprouse commented Jul 24, 2017

mintsoft commented Aug 15, 2017

indy-singh commented Aug 17, 2017

jnm2 commented Sep 14, 2018

jnm2 commented Jun 9, 2017 •

edited

jnm2 left a comment •

edited