Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert epoll changes #9561

Merged
merged 3 commits into from
Sep 12, 2019
Merged

Revert epoll changes #9561

merged 3 commits into from
Sep 12, 2019

Conversation

normanmaurer
Copy link
Member

No description provided.

…e various problems in different testsuites.

Motivation:

Changes that were done to the EpollEventLoop to optimize some things did break some testsuite and caused timeouts. We need to investigate to see why this is the case but for
now we should just revert so we can do a release.

Modifivations:

- Partly revert 1fa7a5e and a22d4ba

Result:

Testsuites pass again.
@normanmaurer normanmaurer added this to the 4.1.41.Final milestone Sep 12, 2019
@normanmaurer normanmaurer merged commit 7f39142 into 4.1 Sep 12, 2019
@normanmaurer normanmaurer deleted the revert_epoll branch September 12, 2019 10:54
@njhill
Copy link
Member

njhill commented Sep 12, 2019

:-/ Could you share any more details of failures/symptoms? Can I help?

@normanmaurer
Copy link
Member Author

@njhill I will one I was able to cut a release... I still need to better understand what exactly is wrong and need to comeup with a standalone testcase.

@normanmaurer
Copy link
Member Author

basically the symptom is some "stales" that result in timeouts at the end.

@njhill
Copy link
Member

njhill commented Sep 12, 2019

@normanmaurer I noticed #9535 is still listed in the 4.1.41 release notes even though it was reverted

@normanmaurer
Copy link
Member Author

@njhill thanks fixed... should be visible soon.

@njhill
Copy link
Member

njhill commented Sep 13, 2019

The three changes reverted were essentially independent, hopefully the problem can be isolated to just one of them and we could then start by reinstating the other two...

@normanmaurer
Copy link
Member Author

@njhill yes will work on this today hopefully. That said I had different failures and all of them only disappeared when I reverted all these changes. This will be fun to debug 😭

@njhill
Copy link
Member

njhill commented Sep 13, 2019

Thanks @normanmaurer and please let me know if I can help, seems likely that it's my fault one way or another!

I'd be inclined to try #9397 by itself first, would guess that one is least likely to be the problem (but could very well be wrong).

@franz1981
Copy link
Contributor

franz1981 commented Sep 13, 2019

@njhill

seems likely that it's my fault one way or another!

Please let me gently disagree :)
We have code reviewed them and you are bringing innovations in many parts of netty where I see that people (including me) often just accept the code as it is: innovation has its risks and software has bugs, by definition. Thanks for the hard work, instead :)

@njhill
Copy link
Member

njhill commented Sep 19, 2019

@normanmaurer looking again at the changes in question I have an idea what the problem might be. I suspect now that the use of lazySet is bad when resetting wakenUp and nextDeadlineNanos in the event loop prior to entering epollWait, and my justification for it (that you rightly asked me to add) in the adjacent comment is misguided.

Avoiding missed wakeups or timers via the interleaving of reads/writes to these variables and the task queue by the event loop and task-submitting threads depends on global ordering, and full volatile writes for all of them are required to ensure this.

Things probably work fine on x86 prior to certain JIT stages kicking in, which I guess might make it harder to repro, but I can have a go at writing a stress test to trigger.

Here is a commit of this small fix based on the version prior to your reversions: 64cfc8f.

Really hoping that this is the (whole) problem and that we could reinstate all of those changes after verifying!

cc @franz1981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants