-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing regression that caused NUnit to be sensitive to windows timer resolution #2233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we fix this I'd like to have a good idea of why it was important. Thread.Sleep(1)
forces a context switch and Thread.Sleep(0)
does not force a context switch. We may be losing the ability for the event pump to process an event which has complex side effects on something else which wasn't documented here when the code was first written. The original change from 0
to 1
should have come with a test demonstrating the necessity, but I think we should at least try to figure it out and add that test now if possible. There may be a better way to solve the original problem than either Thread.Sleep(1)
or Thread.Sleep(0)
.
I'd also love if you could write a timing test which fails before the change and passes after the change. That will help keep us from getting into the very same kind of scenario again.
@jnm2 You're right... that's why we did it. Events were not being sent, which was quite noticeable in the GUI. A better way that doesn't wait 15ms would be a wonderful thing. 😄 |
Hmm, I think that might be quite hard given that VS sets the timer interval to 1; the tests would pass when executed in VS. However when ran on a machine with nothing running, they would fail. I guess one option would be to reset the timer interval to default in the setup of the test, it does mean calling an unmanaged, undocumented kernel function in ntdll though, probably not portable to mono! |
IMO, such a test is not a unit test. If we had one it should be elsewhere. |
@jnm2 Were you about to offer an alternative to Sleep? |
Yes, I see your point. Next best thing: can you please document in a comment above the line exactly why it should not be |
I'm still interested in getting a unit test of the original problem, if that's possible.
Yes, but I want to understand the problem. Now that you said GUI, I only have more questions though.
|
I just refreshed my memory by doing a quick search. Bottom line is that many articles say your best bet is to use Sleep(1) to avoid starving the consumer thread in a situation like this. That's clearly why we made the change, as @oznetmaster points out. I'm inclined to accept the change back to Sleep(0), which we had clearly been using for a long time before we changed it. It may be that Windows load balancing takes care of the problem for us automatically. We could (later) even experiment with eliminating the sleep entirely. Since we just did a release, we do have some time to play with this change in master. |
I'm a little suspicious here since the Travis failure is related to timing. |
No, this is many threads away from any gui. We're in the framework here. The point is only that these events are consumed by runners, including any gui, which use them to show progress. If the event pump thread is starved of time, then it appears to the runner that nothing is happening in the tests. Of course, all of this was invented back in the world of single-core machines. In most cases today, we have multiple threads running at the same time, not just pretending. However, we may occasionally have to run on a single processor even today. |
One option, would be to only |
On a single core machine, if any of the tests do blocking I/O, then parallel test performance can be significant. |
That's true, however my point is that it doesn't feel like a use case which should be prioritised? Plus if the IO is stalled, would you notice a 15ms pause in that execution? |
@singh400 Have you looked at the failing test? |
@CharliePoole Can not get them to fail on my machine. |
Failure is only on linux and only on the .NET 2.0 build. Is anyone else able to duplicate it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to come back to this. Comments documenting the reason 1
is bad were all I was waiting for.
Travis failure is an unrelated intermittent error. Restarted. |
I'm approving this change. I would actually prefer to see the Sleep removed entirely and the high priority setting on the event pump thread removed. However, this will improve performance and we can change it further later. @rprouse I don't see this as a hotfix item myself. The problem seems to have existed since 3.4.0. Of course, if you do a hotfix for another reason, this or a follow-on larger fix should go in it. |
Thanks for this @singh400 |
Thanks for merging this PR @CharliePoole . How does one match up the correct build on MyGet? I presume this PR was automatically built and pushed to MyGet? MyGet currently lists (https://www.myget.org/feed/nunit/package/nuget/NUnit) the following builds that occurred today:-
The AppVeyor out suggests that this PR was built as Cheers, |
Those builds on appveyor with "ci" are all prior to merge. The build after merge is "dev", this one being https://ci.appveyor.com/project/CharliePoole/nunit/build/3.8.0-dev-04110 |
@CharliePoole Is there a release schedule for the next version of NUnit? I'm wondering when we can expect to upgrade to NUnit 3 again |
@rprouse That's a question for you. 😄 |
@mintsoft we try to release quarterly and the next release is scheduled for end-Aug, https://github.com/nunit/nunit/milestone/30 |
@rprouse I'd hope that a regression of this type would be severe enough to issue a hot fix (v3.7.2 maybe?). |
@rprouse Bump - I'd like an answer to whether or not a hot-fix will be released. Currently v3.7.1 is having an non-trivial impact on our CI server. I'd rather not move to a semi-official release from MyGet if that can be helped. |
Same here @singh400 I'd like to upgrade to NUnit3 again 👍 |
Are there features in 3.7 that you need that would prevent you from using an earlier 3.x release that doesn't have this problem? 3.8 is due out next month. I might be willing to ship a bit early, but I would prefer not to release a hot fix this far out. |
Personally, no there aren't; I just don't really want to incur the pain of upgrading platform wide more than I have to. |
No. Just that I was hoping that once the regression was identified and fixed that a hotfix would be released without delay. Especially since I proved that this bug has the potential to impact CI flow by adding minutes to test execution. But given that the next scheduled release is due at the end of this month, we might as well wait for it. |
Related: #3019 |
Issue #2217 covers this regression in detail.