New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup of insiders is hung when opening anything #142786
Comments
I hit this once too. That warning message implies that loading extensions is stuck, cc @deepak1556 in case this is related to the electron update? |
Another occurrence in #142766 |
@bpasero @deepak1556 Do we plan to ship Electron 16 on stable? This might be a blocker. |
@alexdima yup the current plan is to ship this milestone. Maybe we can add some additional logging to help narrow down where in the extension load code path things are getting stuck, this can help if we get similar reports once shipped to stable. Currently we had only a single occurrence of this issue for those who encountered it, so there is not much we could progress on diagnosing it. I didn't want to block the update because of that. Also if we revert to Electron 13, we will be on an EOL release line https://www.electronjs.org/blog/electron-17-0#end-of-support-for-13xy and anything > 13 will have render process reuse by default but the downside will be not having self hosted with insiders. |
@deepak1556 I was able to reproduce when running @bpasero @deepak1556 Could you please also try to reproduce? |
Oh was not aware of that, will give it a try. Thanks for the steps! |
I tried to reproduce in my Windows 11 ARM VM and was not successful. Tried a few window reloads on a small TypeScript project with an editor open that has a compile error. |
@bpasero I am also on Windows 11 ARM vm and couldn't repro it so far, but the trick to repro it is to have some memory intensive tasks running in the background (building electron does the trick for me). I am now able to repro the issue. |
Created a minimal repro https://gist.github.com/deepak1556/59f981cd8a55cda92beadd6bf97330a9, that shows a ping/pong via nodejs sockets between renderer process and child process forked from the main process. After a couple of reloads, the server/renderer will get into a state were the async tasks gets queued up and does not process until another explicit task is executed to flush out the queue. An example state when event loop is paused:
At this point, executing a |
It seems that after reloading the window, the IOCP will get into a weird state and event loop integration will stop functioning, so a workaround is to recreate the IOCP for each reload: --- a/shell/common/node_bindings.cc
+++ b/shell/common/node_bindings.cc
@@ -576,7 +576,11 @@ void NodeBindings::LoadEnvironment(node::Environment* env) {
}
void NodeBindings::PrepareMessageLoop() {
-#if !BUILDFLAG(IS_WIN)
+#if BUILDFLAG(IS_WIN)
+ if (uv_loop_->iocp && uv_loop_->iocp != INVALID_HANDLE_VALUE)
+ CloseHandle(uv_loop_->iocp);
+ uv_loop_->iocp = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 2);
+#else
int handle = uv_backend_fd(uv_loop_);
// If the backend fd hasn't changed, don't proceed. I'm still trying to figure out the root cause. |
Currently testing the above workaround with vscode. |
After a bit of testing, the extension host does not get unresponsive when the iocp is reinitialized on reload. Although, we now get a prompt that says @zcbenz instead of reinitializing the iocp for the same process reload case, can we instead close the default uv loop and also perform the background thread cleanup which is usually done only during ~NodeBindings, and reinitialize after reload ? Since we destroy the node environment for reload, I feel this should be safe to do, thoughts ? |
When navigating Chromium may load the new page before the old page is destroyed, so it is possible for multiple environments sharing the same loop in one process, and we probably don't have a good timing to close the default uv loop. But reinitializing IOCP has the same problem too. In theory destroying and initializing Node.js environments should not affect the uv loop at all, by reading the code I couldn't find anywhere closing or initializing the uv loop, but somehow having multiple Node.js environments is making our node integration run into weird states, there were some efforts working around it (electron/electron#25869 and electron/electron#27582) but I think they are not fixing the root cause. |
It seems that the node integration only breaks when the first node environment is destroyed, everything works fine if we just create and destroy subframes or child windows. So I wonder if we can work around the problem by creating a dummy node environment in renderer before any web frame is created and keeping it alive forever. The root cause is still mysterious to me though. |
@zcbenz looked a bit more into the root issue and have some updates, a) First let me walkthrough the basics of Electron event loop integration on windows and make sure I haven't misunderstood any of the components.
b) Next a couple of principles when using IOCP stated in https://docs.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports which are relevant for this bug,
c) Finally what is happening in the process reuse scenario on windows,
At this point the background thread is unable to signal back to the main thread and we see it as user facing bug that non of node callbacks got executed as from the repro. Another test that confirmed this was, expose The workaround posted in #142786 (comment) helped because on reload although there are multiple background threads, for the new IOCP only main thread and only newly created background thread are associated, so they continue to work based on the previously listed principles. Based on the above results, I now try to maintain a single background thread across reloads into the same process and this always works fine. I found that there was an attempt in Electron that would have achieved the same electron/electron#27582 but was later reverted electron/electron#28175. The cause for hang before the revert is incorrect, the problem was not with IOCP but rather in electron/electron#27582 although we short-circuited the The semaphore dance works like this https://github.com/libuv/libuv/blob/4296fec7f50145e2a307f3db7ae22984713976a7/src/win/thread.c#L309-L333,
With electron/electron#27582 there can be situation with active I/O tasks on reload, the semaphore value on main thread would already be incremented by 1 to signal action to the background thread and another call to |
I will put up a PR in Electron to further discuss this solution. |
/verified Definitely working for me. Although I haven't repro'd this in while. |
Does this issue occur when all extensions are disabled?: Can't tell. Went away on restart
Commit: 82a8bec
Date: 2022-02-10T05:16:36.093Z
Electron: 16.0.8
Chromium: 96.0.4664.110
Node.js: 16.9.1
V8: 9.6.180.21-electron.0
OS: Windows_NT x64 10.0.19044
Steps to Reproduce:
I get this:
The text was updated successfully, but these errors were encountered: