-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
win, pipe: avoid synchronous pipe deadlocks #1273
Conversation
@saghul could you take a look? |
src/win/pipe.c
Outdated
uv_mutex_unlock(m); | ||
} | ||
restart_readfile: | ||
if (hThreadForInterrupt) { | ||
QueueUserWorkItem(uv__pipe_zero_read_interrupter, | ||
hThreadForInterrupt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's undefined what Win32 will do with this handle after this function returns (I expect it should reuse it for the next IO call). Provision needs to be made to ensure that the work item scheduled here doesn't try to access this handle asynchronously. This is what the mutex locks in uv__pipe_pause_read
are required for.
29c2844
to
0fa84df
Compare
I've simplified the logic there. Now the "interrupter-thread" is created once and it calls @bnoordhuis @cjihrig could you review this? @vtjnash I'm sorry, I don't really understand what is the issue here. The duplicated thread handle should be perfectly safe to be used in another thread without any synchronization. The mutex in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments. Mostly taking your word for it. I'd like at least one other person to review.
test/test-pipe-open-read-pipe.c
Outdated
/* Give uv_run some time to start */ | ||
uv_sleep(250); | ||
/* Try to access the pipe again, in different loop */ | ||
uv_loop_t test_loop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the top of the function.
test/test-pipe-open-read-pipe.c
Outdated
} | ||
|
||
void pipe_read_thread_proc(void* arg) { | ||
uv_pipe_t* pipe = arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate the assignment.
src/win/pipe.c
Outdated
static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) { | ||
int result; | ||
DWORD bytes; | ||
uv_read_t* req = (uv_read_t*) parameter; | ||
uv_pipe_t* handle = (uv_pipe_t*) req->data; | ||
uv_loop_t* loop = handle->loop; | ||
HANDLE hThread = NULL; | ||
HANDLE hThreadForInterrupt = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assignment should be separate, but clearly the surrounding code isn't doing that.
Why do you assume that? I would be surprised if the thread-pool was guaranteed to create a new thread for every work item submitted – that would be a lot more expensive than the mutex I'm suggesting may be necessary to synchronize the shared state between the two threads.
I don't understand why this causes node to hang, since it doesn't look like your example code is reading from STDIN. Is node starting an active read on the STDIN pipe even without an |
In MSDN Original node issue with repro using 3 files is here: nodejs/node#10836. My test case boils it down to the What happens in node is the following:
|
Yes, that example is fine because it doesn't use the thread pool and explicitly uses the thread object itself as a mutex. This is valid, but very slow and resource intensive.
I was looking at that, but I don't see why node process |
Sorry @vtjnash , I'm lost. What are your concerns exactly? I don't see anything particularly wrong with this code, am I missing something? As for the read: it seems that node starts reading the pipe and then immediately pause it, as described here: nodejs/node#5776 (this was later reverted). In any case the reason for deadlock can be easily verified by attaching debugger to node processes. |
0fa84df
to
dd6c4db
Compare
@cjihrig I've corrected the code. I've also made the duplicated thread handle not inheritable. |
The
This sounds like a bug in nodejs. It shouldn't be initializing and attempting a read when it doesn't want the data (since that wastes resources – and causes the hang observed here). |
Ah, I see now. I was missing the part that As for node - the linked PR tried to fix that and it caused other problems. OTOH, a process could be reading from stdin and create child processes, and we would still get this bug. |
Not necessarily – that could also be considered a bug in the javascript app that is trying to simultaneously consume stdin from two places. (instead of incurring the overhead here of starting an extra thread and waking up both threads twice every second). |
dd6c4db
to
590ba6d
Compare
I've updated the PR code, PTAL. |
src/win/pipe.c
Outdated
uv__pipe_interrupter_param_t* param; | ||
|
||
param = (uv__pipe_interrupter_param_t*) uv__malloc( | ||
sizeof(uv__pipe_interrupter_param_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be able to use alloca
– the application needs to guarantee that this pointer is never accessed after the function returns, to avoid race conditions, anyways.
src/win/pipe.c
Outdated
for (;;) { | ||
r = WaitForSingleObject(interrupterParam->completedEvent, | ||
PIPE_ZERO_READ_INTERRUPT_INTERVAL); | ||
if (r == WAIT_TIMEOUT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still race-y. getting a wait-timeout previous to here doesn't guarantee that the thread handle is still valid after here.
src/win/pipe.c
Outdated
@@ -46,6 +46,9 @@ struct uv__ipc_queue_item_s { | |||
/* A zero-size buffer for use by uv_pipe_read */ | |||
static char uv_zero_[] = ""; | |||
|
|||
/* To prevent deadlocks we will interrupt synchronous pipe read every 500 ms */ | |||
#define PIPE_ZERO_READ_INTERRUPT_INTERVAL 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very frequent - especially for something that will likely be caused by stdio, having timers that need to get woken up at a 500ms period seems like it could have a measurable impact on battery life. For non-nodejs consumers of this library (i.e. programs that don't share the nodejs bug, like my use case of JuliaLang), there should also be a way to disable this code.
590ba6d
to
c94b219
Compare
@vtjnash I've updated the code. I've added a mutex for synchronization, bumped interval to 2.5s and made entire thing opt-in by a call to |
ping? |
pong? |
ping? |
Sorry Bartosz, I missed that you asked for a review. I've only looked at this briefly but it seems like a rather inelegant solution. What would happen if |
I guess this would affect both handles. I'll double check. |
@bnoordhuis each I know this is not an elegant solution, and as such per @vtjnash suggestion it is on-demand. But this is the only way I know of solving this, which is a real-world issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a very aesthetically pleasing solution but if you fix the outstanding issues, I'll sign off on it. Thanks.
src/win/pipe.c
Outdated
typedef struct { | ||
volatile HANDLE thread; | ||
HANDLE threadMutex; | ||
HANDLE completedEvent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stylistic issue: snake_case, not camelCase.
src/win/pipe.c
Outdated
uv__readfile_interrupter_t* interrupter; | ||
int r; | ||
|
||
interrupter = (uv__readfile_interrupter_t*) param; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous cast?
src/win/pipe.c
Outdated
} | ||
ReleaseMutex(interrupter->threadMutex); | ||
} else { | ||
/* ReadFile thread has terminated - cleanup the handles and exit */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: clean up
src/win/pipe.c
Outdated
uv__readfile_interrupter_t* interrupter; | ||
|
||
interrupter = (uv__readfile_interrupter_t*) uv__malloc( | ||
sizeof(uv__readfile_interrupter_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(*interrupter)
. Also, superfluous cast.
src/win/pipe.c
Outdated
} | ||
|
||
static uv__readfile_interrupter_t* uv__start_readfile_interrupter() { | ||
HANDLE threadHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: snake_case, not camelCase.
src/win/pipe.c
Outdated
goto failed_event; | ||
} | ||
if (!QueueUserWorkItem(uv__readfile_interrupter_thread, interrupter, | ||
WT_EXECUTEINLONGTHREAD)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you line up the argument?
src/win/pipe.c
Outdated
interrupter = (uv__readfile_interrupter_t*) uv__malloc( | ||
sizeof(uv__readfile_interrupter_t)); | ||
if (interrupter == NULL) { | ||
goto failed_malloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just return NULL
here. No need to jump, nothing to clean up.
src/win/pipe.c
Outdated
WT_EXECUTEINLONGTHREAD)) { | ||
goto failed_queue; | ||
} | ||
goto done; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return interrupter
?
test/test-pipe-open-read-pipe.c
Outdated
HANDLE stdin_read_pipe, stdin_write_pipe; | ||
SECURITY_ATTRIBUTES sa_attr; | ||
|
||
sa_attr.nLength = sizeof(SECURITY_ATTRIBUTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(sa_attr)
?
I took a quick look... and it's nasty 😞 Not your fault @bzoz! Is there a reason why we don't do this always for sync pipes? I mean, who wouldn't want to avoid deadlocks? Specially given users don't know if a pipe is synchronous or not, right? |
The operation is a no-op for asynchronous pipes (they don't have the same issue in the kernel). I requested that it not be always enable for synchronous pipes since this is only needed as a workaround for a nodejs bug. Other frontends (for example, my use case is Julia), don't need the extra overhead of spawning and synchronizing yet another thread. |
Can you clarify why it's not needed? If it's a kernel thing, why isn't Julia running into the same problem? |
Julia's standard library doesn't execute a read on a pipe if there's no active reader at the language / user level. That ensures it's only an issue in edge cases (two readers in separate processes trying to consume tokens off of the same pipe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo nits. Should the interval be configurable?
src/win/pipe.c
Outdated
} | ||
|
||
static uv__readfile_interrupter_t* uv__start_readfile_interrupter() { | ||
HANDLE thread_handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
() -> (void) - () is a function taking any number of arguments.
src/win/pipe.c
Outdated
@@ -985,6 +1079,8 @@ static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) { | |||
uv_mutex_lock(m); | |||
handle->pipe.conn.readfile_thread = hThread; | |||
uv_mutex_unlock(m); | |||
/* Give some time for other processes to wake before restarting. */ | |||
Sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? What happens if you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately - yes. Without it the test won't work. Interrupter thread will be stuck at CancelSynchronousIo, zero-read at ReadFile, and the test thread at NtQueryInformationFile. For whatever reason zero-read thread seems to ignore the CancelSynchronousIo
call, unless a Sleep
(or a printf
, or a attached debugger) is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be conditional on interrupter
.
src/win/pipe.c
Outdated
@@ -1010,6 +1106,9 @@ static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) { | |||
} | |||
|
|||
POST_COMPLETION_FOR_REQ(loop, req); | |||
if (interrupter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (interrupter != NULL) {
ASSERT(r == 0); | ||
|
||
/* Give uv_run some time to start */ | ||
uv_sleep(250); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this isn't going to be flaky...
@bnoordhuis updated, PTAL |
src/win/pipe.c
Outdated
SwitchToThread(); | ||
} | ||
ReleaseMutex(interrupter->thread_mutex); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (WAIT_OBJECT_0)
@vtjnash updated, PTAL |
If no one objects, I will land this tomorrow. |
I still don't like that one function that is super-specific. Also pretty much Node only AFAWCT. Can we wake this behavior a compile-time define? We can always turn it into a function if it's needed later, but I'd really want to avoid adding this weird to use API. |
Add a thread that will interrupt uv_pipe_zero_readdile_thread_proc every two and half second. This allows other processes to access the pipe without deadlocking Ref: nodejs/node#10836
84ce471
to
c4d887e
Compare
I thought so far that libuv has been avoiding compile-time defines? This one seems like it would be especially inconvenient for system packagers (if there are any on Windows?), as it forces them to choose between bad latency (read monitoring is only active 70% of the time with this PR) for all applications or nodejs/node#10836. |
See also #95 (comment), which this PR implements. |
Using @vtjnash I've added a slight break which happens every 2.5s. Does this really end in the read being active only 70% of the time? I think that this API is suboptimal. Basically each synchronous pipe will be affected by this bug and can cause a deadlock. I think that all synchronous pipes read should have interrupter enabled. I will drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a slight break
Ah, no, right. I always forget that the units of Sleep are milliseconds not seconds.
Basically each synchronous pipe will be affected by this bug and can cause a deadlock
Only pipes that are inherited or passed to another program can cause deadlock.
|
||
if (!QueueUserWorkItem(uv__readfile_interrupter_thread, | ||
interrupter, | ||
WT_EXECUTEINLONGTHREAD)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use RegisterWaitForSingleObject
to avoid needing to allocate a new thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSDN says I can't wait on a pipe.
I could use CreateTimerQueueTimer to use the threadpool to wake interrupter every 2.5s, but I think that would complicate things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly true. I spent several hours yesterday trying to work around the NT kernel behavior (and failing), and wishing that it was open source.
But that's not why you're allocating this work item. This thread call can be replaced with a RegisterWaitForSingleObject
callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On what handle should I wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interrupter->completed_event
Got any twitter followers on the NT Kernel team? :) I'm still hoping that someday that they'll fix this bug. Is it a competition? I've got #1498 with 5 years and counting 😛. You are far more active around here than me though. |
Special problems call for special measures.
It has been broken for a lot longer though 🤷♂️ |
Since I cannot add this to |
Why would anyone ever want that? |
My stance on this, unless there is a new approach which we haven't seen yet is to do it compile time. |
The user-mode symlinks thing was only an example. I'm reluctant to make this compile time. Mostly I think because there is no such mechanism already in place. IMHO a call to BTW, @vtjnash are you sure it should not be default in libuv? Even though in this specific situation is caused by the way Node handles stdio, Julia app could receive a pipe that is already being read as one of its stdio handles. Trying to initialize such pipe with |
I don't think that protecting applications against deadlock caused by other misbehaving applications is really in the purview of libuv. For more context, this was also discussed on the original PR (joyent/libuv#1377 (comment)). I'll admit that it's unsatisfying that the deadlock could be avoided if Microsoft fixed their kernel. However it should be rare to encounter situations that it would occur (an application that spawns nodejs after spawning ReadFile in a thread pool on the same NamedPipe it passes to nodejs to use as stdin – note that this situation also creates an race condition of two processes trying to consume user keystroke input). |
As compile time option: #1525 |
This PR does not address this case. This PR can only help with working around nodejs bugs, there is nothing that can be done in libuv that would work around bugs in other applications. |
The issue in Node.js was solved, this makes this PR obsolete. Closing. |
Add a thread that will interrupt
uv_pipe_zero_readdile_thread_proc
every half second. This allows other processes to access the pipe without deadlocking. The "interrupt" is done by a call toCancelSynchronousIo
, similar to howuv__pipe_getname
pauses the read (see joyent/libuv#1313)On Windows a call to
uv_pipe_open
will deadlock if another thread or process hasReadFile
call active on that pipe.This can happen when a process spawns Node with
STDIN
redirected to pipe. Libuv will callReadFile
in a separate thread (win/pipe.c:971). If this Node app spawns another child Node process with inherited stdio handles, this child process will hang. This is the reason behind nodejs/node#10836.uv_pipe_open
hangs on a call toNtQueryInformationFile
(win/pipe.c:1931). This can be solved by usingNtQueryObject
. Unfortunately, this does not solve the issue - a later call touv_set_pipe_handle
(win/pipe.c:245) usesSetNamedPipeHandleState
,GetNamedPipeHandleState
andNtQueryInformationFile
(with different parameters).This is related to joyent/libuv#1313. But since the
ReadFile
is used in another process thanuv_open_pipe
we cannot synchronize them.Ref: joyent/libuv#1313 (similar issue in libuv)
Ref: nodejs/node#10836 (issue in node fixed by this)