-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert epoll changes #9561
Revert epoll changes #9561
Conversation
This reverts commit 8739886.
…e various problems in different testsuites. Motivation: Changes that were done to the EpollEventLoop to optimize some things did break some testsuite and caused timeouts. We need to investigate to see why this is the case but for now we should just revert so we can do a release. Modifivations: - Partly revert 1fa7a5e and a22d4ba Result: Testsuites pass again.
:-/ Could you share any more details of failures/symptoms? Can I help? |
@njhill I will one I was able to cut a release... I still need to better understand what exactly is wrong and need to comeup with a standalone testcase. |
basically the symptom is some "stales" that result in timeouts at the end. |
@normanmaurer I noticed #9535 is still listed in the 4.1.41 release notes even though it was reverted |
@njhill thanks fixed... should be visible soon. |
The three changes reverted were essentially independent, hopefully the problem can be isolated to just one of them and we could then start by reinstating the other two... |
@njhill yes will work on this today hopefully. That said I had different failures and all of them only disappeared when I reverted all these changes. This will be fun to debug 😭 |
Thanks @normanmaurer and please let me know if I can help, seems likely that it's my fault one way or another! I'd be inclined to try #9397 by itself first, would guess that one is least likely to be the problem (but could very well be wrong). |
Please let me gently disagree :) |
@normanmaurer looking again at the changes in question I have an idea what the problem might be. I suspect now that the use of Avoiding missed wakeups or timers via the interleaving of reads/writes to these variables and the task queue by the event loop and task-submitting threads depends on global ordering, and full volatile writes for all of them are required to ensure this. Things probably work fine on x86 prior to certain JIT stages kicking in, which I guess might make it harder to repro, but I can have a go at writing a stress test to trigger. Here is a commit of this small fix based on the version prior to your reversions: 64cfc8f. Really hoping that this is the (whole) problem and that we could reinstate all of those changes after verifying! cc @franz1981 |
No description provided.