Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win, pipe: avoid synchronous pipe deadlocks #1273

Closed
wants to merge 1 commit into from

Conversation

bzoz
Copy link
Member

@bzoz bzoz commented Mar 23, 2017

Add a thread that will interrupt uv_pipe_zero_readdile_thread_proc every half second. This allows other processes to access the pipe without deadlocking. The "interrupt" is done by a call to CancelSynchronousIo, similar to how uv__pipe_getname pauses the read (see joyent/libuv#1313)

On Windows a call to uv_pipe_open will deadlock if another thread or process has ReadFile call active on that pipe.

This can happen when a process spawns Node with STDIN redirected to pipe. Libuv will call ReadFile in a separate thread (win/pipe.c:971). If this Node app spawns another child Node process with inherited stdio handles, this child process will hang. This is the reason behind nodejs/node#10836.

uv_pipe_open hangs on a call to NtQueryInformationFile (win/pipe.c:1931). This can be solved by using NtQueryObject. Unfortunately, this does not solve the issue - a later call to uv_set_pipe_handle (win/pipe.c:245) uses SetNamedPipeHandleState, GetNamedPipeHandleState and NtQueryInformationFile (with different parameters).

This is related to joyent/libuv#1313. But since the ReadFile is used in another process than uv_open_pipe we cannot synchronize them.

Ref: joyent/libuv#1313 (similar issue in libuv)
Ref: nodejs/node#10836 (issue in node fixed by this)

@bzoz
Copy link
Member Author

bzoz commented Apr 3, 2017

@saghul could you take a look?

src/win/pipe.c Outdated
uv_mutex_unlock(m);
}
restart_readfile:
if (hThreadForInterrupt) {
QueueUserWorkItem(uv__pipe_zero_read_interrupter,
hThreadForInterrupt,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's undefined what Win32 will do with this handle after this function returns (I expect it should reuse it for the next IO call). Provision needs to be made to ensure that the work item scheduled here doesn't try to access this handle asynchronously. This is what the mutex locks in uv__pipe_pause_read are required for.

@bzoz
Copy link
Member Author

bzoz commented Apr 26, 2017

I've simplified the logic there. Now the "interrupter-thread" is created once and it calls CancelSynchronousIo in a loop. This thread will exit when uv_pipe_zero_readfile_thread_proc exits (WaitForSingleObject returns something different than WAIT_TIMEOUT)

@bnoordhuis @cjihrig could you review this?

@vtjnash I'm sorry, I don't really understand what is the issue here. The duplicated thread handle should be perfectly safe to be used in another thread without any synchronization. The mutex in uv__pipe_pause_read looks more like a way to signal uv_pipe_zero_readfile_thread_proc to resume the read.

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. Mostly taking your word for it. I'd like at least one other person to review.

/* Give uv_run some time to start */
uv_sleep(250);
/* Try to access the pipe again, in different loop */
uv_loop_t test_loop;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the top of the function.

}

void pipe_read_thread_proc(void* arg) {
uv_pipe_t* pipe = arg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate the assignment.

src/win/pipe.c Outdated
static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) {
int result;
DWORD bytes;
uv_read_t* req = (uv_read_t*) parameter;
uv_pipe_t* handle = (uv_pipe_t*) req->data;
uv_loop_t* loop = handle->loop;
HANDLE hThread = NULL;
HANDLE hThreadForInterrupt = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assignment should be separate, but clearly the surrounding code isn't doing that.

@vtjnash
Copy link
Member

vtjnash commented Apr 26, 2017

The duplicated thread handle should be perfectly safe to be used in another thread without any synchronization

Why do you assume that? I would be surprised if the thread-pool was guaranteed to create a new thread for every work item submitted – that would be a lot more expensive than the mutex I'm suggesting may be necessary to synchronize the shared state between the two threads.

This can happen when a process spawns Node with STDIN redirected to pipe. Libuv will call ReadFile in a separate thread (win/pipe.c:971). If this Node app spawns another child Node process with inherited stdio handles, this child process will hang. This is the reason behind nodejs/node#10836.

I don't understand why this causes node to hang, since it doesn't look like your example code is reading from STDIN. Is node starting an active read on the STDIN pipe even without an .on("data") handler?

@bzoz
Copy link
Member Author

bzoz commented Apr 26, 2017

In MSDN DuplicateHandle example it is used in similar way as here - a handle to object (in that case it is a mutex) is duplicated and passed on to a worker thread. Worker thread then uses the duplicated handle without any synchronization. Here we do exactly the same thing. Handle from GetCurrentThread() is duplicated into hThreadForInterrupt and passed on to thread-pool job. Worker thread uses this handle exclusively, it is not passed to any other thread or function.

Original node issue with repro using 3 files is here: nodejs/node#10836. My test case boils it down to the uv_pipe_open call on an anonymous pipe that is already read by another thread.

What happens in node is the following:

  • node process A spawns child node process B with {stdio: 'pipe'} option
  • node process B starts reading from stdin - because stdio is a pipe this creates uv_pipe_zero_readfile_thread_proc which calls ReadFile
  • node process B spawns child node process C with the default {stdio: inherit}
  • node process C access process.stdin - this will call uv_pipe_open and in turn NtQueryInformationFile on a pipe read by B and deadlock

@vtjnash
Copy link
Member

vtjnash commented Apr 26, 2017

Yes, that example is fine because it doesn't use the thread pool and explicitly uses the thread object itself as a mutex. This is valid, but very slow and resource intensive.

What happens in node is the following:

I was looking at that, but I don't see why node process B is reading from stdin. I don't see anywhere in the middle.js code that is starting a read on stdin?

@bzoz
Copy link
Member Author

bzoz commented Apr 27, 2017

Sorry @vtjnash , I'm lost. What are your concerns exactly? I don't see anything particularly wrong with this code, am I missing something?

As for the read: it seems that node starts reading the pipe and then immediately pause it, as described here: nodejs/node#5776 (this was later reverted). In any case the reason for deadlock can be easily verified by attaching debugger to node processes.

@bzoz
Copy link
Member Author

bzoz commented Apr 27, 2017

@cjihrig I've corrected the code.

I've also made the duplicated thread handle not inheritable.

@vtjnash
Copy link
Member

vtjnash commented Apr 27, 2017

The uv_pipe_zero_readfile_thread_proc function is called in a thread-pool (QueueUserWorkItem), so it does not own the thread that it is running on. It is undefined behavior to have any copy of this handle live when the uv_pipe_zero_readfile_thread_proc function returns.

As for the read

This sounds like a bug in nodejs. It shouldn't be initializing and attempting a read when it doesn't want the data (since that wastes resources – and causes the hang observed here).

@bzoz
Copy link
Member Author

bzoz commented Apr 27, 2017

Ah, I see now. I was missing the part that uv_pipe_zero_readfile_thread_proc is not a "normal" thread. Thanks for pointing it out, you are right, I'll fix that.

As for node - the linked PR tried to fix that and it caused other problems. OTOH, a process could be reading from stdin and create child processes, and we would still get this bug.

@vtjnash
Copy link
Member

vtjnash commented Apr 27, 2017

OTOH, a process could be reading from stdin and create child processes, and we would still get this bug

Not necessarily – that could also be considered a bug in the javascript app that is trying to simultaneously consume stdin from two places. (instead of incurring the overhead here of starting an extra thread and waking up both threads twice every second).

@bzoz
Copy link
Member Author

bzoz commented May 8, 2017

I've updated the PR code, PTAL.

src/win/pipe.c Outdated
uv__pipe_interrupter_param_t* param;

param = (uv__pipe_interrupter_param_t*) uv__malloc(
sizeof(uv__pipe_interrupter_param_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be able to use alloca – the application needs to guarantee that this pointer is never accessed after the function returns, to avoid race conditions, anyways.

src/win/pipe.c Outdated
for (;;) {
r = WaitForSingleObject(interrupterParam->completedEvent,
PIPE_ZERO_READ_INTERRUPT_INTERVAL);
if (r == WAIT_TIMEOUT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still race-y. getting a wait-timeout previous to here doesn't guarantee that the thread handle is still valid after here.

src/win/pipe.c Outdated
@@ -46,6 +46,9 @@ struct uv__ipc_queue_item_s {
/* A zero-size buffer for use by uv_pipe_read */
static char uv_zero_[] = "";

/* To prevent deadlocks we will interrupt synchronous pipe read every 500 ms */
#define PIPE_ZERO_READ_INTERRUPT_INTERVAL 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very frequent - especially for something that will likely be caused by stdio, having timers that need to get woken up at a 500ms period seems like it could have a measurable impact on battery life. For non-nodejs consumers of this library (i.e. programs that don't share the nodejs bug, like my use case of JuliaLang), there should also be a way to disable this code.

@bzoz
Copy link
Member Author

bzoz commented May 11, 2017

@vtjnash I've updated the code. I've added a mutex for synchronization, bumped interval to 2.5s and made entire thing opt-in by a call to uv_pipe_enable_interrupter.

@bzoz
Copy link
Member Author

bzoz commented Jun 6, 2017

ping?

@bzoz
Copy link
Member Author

bzoz commented Jun 19, 2017

pong?

@bzoz
Copy link
Member Author

bzoz commented Jul 7, 2017

ping?

@bnoordhuis
Copy link
Member

Sorry Bartosz, I missed that you asked for a review. I've only looked at this briefly but it seems like a rather inelegant solution. What would happen if uv_pipe_open() duplicates the handle before using it?

@bzoz
Copy link
Member Author

bzoz commented Jul 7, 2017

I guess this would affect both handles. I'll double check.

@bzoz
Copy link
Member Author

bzoz commented Jul 13, 2017

@bnoordhuis each uv_pipe_t instance needs a separate call to uv_pipe_enable_interrupter.

I know this is not an elegant solution, and as such per @vtjnash suggestion it is on-demand. But this is the only way I know of solving this, which is a real-world issue.

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a very aesthetically pleasing solution but if you fix the outstanding issues, I'll sign off on it. Thanks.

src/win/pipe.c Outdated
typedef struct {
volatile HANDLE thread;
HANDLE threadMutex;
HANDLE completedEvent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stylistic issue: snake_case, not camelCase.

src/win/pipe.c Outdated
uv__readfile_interrupter_t* interrupter;
int r;

interrupter = (uv__readfile_interrupter_t*) param;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous cast?

src/win/pipe.c Outdated
}
ReleaseMutex(interrupter->threadMutex);
} else {
/* ReadFile thread has terminated - cleanup the handles and exit */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: clean up

src/win/pipe.c Outdated
uv__readfile_interrupter_t* interrupter;

interrupter = (uv__readfile_interrupter_t*) uv__malloc(
sizeof(uv__readfile_interrupter_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(*interrupter). Also, superfluous cast.

src/win/pipe.c Outdated
}

static uv__readfile_interrupter_t* uv__start_readfile_interrupter() {
HANDLE threadHandle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: snake_case, not camelCase.

src/win/pipe.c Outdated
goto failed_event;
}
if (!QueueUserWorkItem(uv__readfile_interrupter_thread, interrupter,
WT_EXECUTEINLONGTHREAD)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you line up the argument?

src/win/pipe.c Outdated
interrupter = (uv__readfile_interrupter_t*) uv__malloc(
sizeof(uv__readfile_interrupter_t));
if (interrupter == NULL) {
goto failed_malloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just return NULL here. No need to jump, nothing to clean up.

src/win/pipe.c Outdated
WT_EXECUTEINLONGTHREAD)) {
goto failed_queue;
}
goto done;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return interrupter?

HANDLE stdin_read_pipe, stdin_write_pipe;
SECURITY_ATTRIBUTES sa_attr;

sa_attr.nLength = sizeof(SECURITY_ATTRIBUTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(sa_attr)?

@saghul
Copy link
Member

saghul commented Aug 8, 2017

I took a quick look... and it's nasty 😞 Not your fault @bzoz!

Is there a reason why we don't do this always for sync pipes? I mean, who wouldn't want to avoid deadlocks? Specially given users don't know if a pipe is synchronous or not, right?

@vtjnash
Copy link
Member

vtjnash commented Aug 8, 2017

The operation is a no-op for asynchronous pipes (they don't have the same issue in the kernel). I requested that it not be always enable for synchronous pipes since this is only needed as a workaround for a nodejs bug. Other frontends (for example, my use case is Julia), don't need the extra overhead of spawning and synchronizing yet another thread.

@saghul
Copy link
Member

saghul commented Aug 8, 2017

Can you clarify why it's not needed? If it's a kernel thing, why isn't Julia running into the same problem?

@vtjnash
Copy link
Member

vtjnash commented Aug 8, 2017

Julia's standard library doesn't execute a read on a pipe if there's no active reader at the language / user level. That ensures it's only an issue in edge cases (two readers in separate processes trying to consume tokens off of the same pipe).

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo nits. Should the interval be configurable?

src/win/pipe.c Outdated
}

static uv__readfile_interrupter_t* uv__start_readfile_interrupter() {
HANDLE thread_handle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

() -> (void) - () is a function taking any number of arguments.

src/win/pipe.c Outdated
@@ -985,6 +1079,8 @@ static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) {
uv_mutex_lock(m);
handle->pipe.conn.readfile_thread = hThread;
uv_mutex_unlock(m);
/* Give some time for other processes to wake before restarting. */
Sleep(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? What happens if you remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately - yes. Without it the test won't work. Interrupter thread will be stuck at CancelSynchronousIo, zero-read at ReadFile, and the test thread at NtQueryInformationFile. For whatever reason zero-read thread seems to ignore the CancelSynchronousIo call, unless a Sleep (or a printf, or a attached debugger) is present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be conditional on interrupter.

src/win/pipe.c Outdated
@@ -1010,6 +1106,9 @@ static DWORD WINAPI uv_pipe_zero_readfile_thread_proc(void* parameter) {
}

POST_COMPLETION_FOR_REQ(loop, req);
if (interrupter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (interrupter != NULL) {

ASSERT(r == 0);

/* Give uv_run some time to start */
uv_sleep(250);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this isn't going to be flaky...

@bzoz
Copy link
Member Author

bzoz commented Aug 9, 2017

@bnoordhuis updated, PTAL

src/win/pipe.c Outdated
SwitchToThread();
}
ReleaseMutex(interrupter->thread_mutex);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (WAIT_OBJECT_0)

@bzoz
Copy link
Member Author

bzoz commented Aug 10, 2017

@vtjnash updated, PTAL

@bzoz
Copy link
Member Author

bzoz commented Aug 17, 2017

If no one objects, I will land this tomorrow.

@saghul
Copy link
Member

saghul commented Aug 18, 2017

I still don't like that one function that is super-specific. Also pretty much Node only AFAWCT. Can we wake this behavior a compile-time define? We can always turn it into a function if it's needed later, but I'd really want to avoid adding this weird to use API.

Add a thread that will interrupt uv_pipe_zero_readdile_thread_proc every two
and half second. This allows other processes to access the pipe without
deadlocking

Ref: nodejs/node#10836
@vtjnash
Copy link
Member

vtjnash commented Aug 18, 2017

Can we wake this behavior a compile-time define?

I thought so far that libuv has been avoiding compile-time defines? This one seems like it would be especially inconvenient for system packagers (if there are any on Windows?), as it forces them to choose between bad latency (read monitoring is only active 70% of the time with this PR) for all applications or nodejs/node#10836.

@vtjnash
Copy link
Member

vtjnash commented Aug 21, 2017

See also #95 (comment), which this PR implements.

@bzoz
Copy link
Member Author

bzoz commented Aug 22, 2017

Using WaitForSingleObject was the first thing I tried, since I basically want to add a timeout to the ReadFile. I was not able to make it work, it looks like one cannot synchronize on pipes (see MSDN).

@vtjnash I've added a slight break which happens every 2.5s. Does this really end in the read being active only 70% of the time?

I think that this API is suboptimal. Basically each synchronous pipe will be affected by this bug and can cause a deadlock. I think that all synchronous pipes read should have interrupter enabled. I will drop uv_pipe_enable_interrupter and make it opt-in through an option to uv_loop_configure.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a slight break

Ah, no, right. I always forget that the units of Sleep are milliseconds not seconds.

Basically each synchronous pipe will be affected by this bug and can cause a deadlock

Only pipes that are inherited or passed to another program can cause deadlock.


if (!QueueUserWorkItem(uv__readfile_interrupter_thread,
interrupter,
WT_EXECUTEINLONGTHREAD))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use RegisterWaitForSingleObject to avoid needing to allocate a new thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSDN says I can't wait on a pipe.

I could use CreateTimerQueueTimer to use the threadpool to wake interrupter every 2.5s, but I think that would complicate things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly true. I spent several hours yesterday trying to work around the NT kernel behavior (and failing), and wishing that it was open source.

But that's not why you're allocating this work item. This thread call can be replaced with a RegisterWaitForSingleObject callback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what handle should I wait?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interrupter->completed_event

@bzoz
Copy link
Member Author

bzoz commented Aug 23, 2017

So, adding an option to uv_loop_configure won't work, because there is no flags or anything similar in uv_loop_t on Windows ([1], [2]).

So again - while this is not a nice API, and implementation is basically a hack, I don't see any other option.

BTW, this has been open for 5 months now 🎂

@vtjnash
Copy link
Member

vtjnash commented Aug 23, 2017

I don't see any other option.

Got any twitter followers on the NT Kernel team? :) I'm still hoping that someday that they'll fix this bug.

Is it a competition? I've got #1498 with 5 years and counting 😛. You are far more active around here than me though.

@saghul
Copy link
Member

saghul commented Aug 24, 2017

I thought so far that libuv has been avoiding compile-time defines?

Special problems call for special measures.

BTW, this has been open for 5 months now 🎂

It has been broken for a lot longer though 🤷‍♂️

@bzoz
Copy link
Member Author

bzoz commented Aug 24, 2017

Since I cannot add this to uv_loop_configure, maybe we should add a global configuration (eg. uv_configure)? Besides opting-in for this, it could maybe be used to opt-out from user-mode symlinks (#1466) ?

@saghul
Copy link
Member

saghul commented Aug 25, 2017

Besides opting-in for this, it could maybe be used to opt-out from user-mode symlinks (#1466) ?

Why would anyone ever want that?

@saghul
Copy link
Member

saghul commented Aug 25, 2017

My stance on this, unless there is a new approach which we haven't seen yet is to do it compile time.

@bzoz
Copy link
Member Author

bzoz commented Aug 29, 2017

The user-mode symlinks thing was only an example.

I'm reluctant to make this compile time. Mostly I think because there is no such mechanism already in place. IMHO a call to uv_configure (or uv_loop_configure with NULL as loop parameter) would be nicer and easier to maintain than a new build feature.

BTW, @vtjnash are you sure it should not be default in libuv? Even though in this specific situation is caused by the way Node handles stdio, Julia app could receive a pipe that is already being read as one of its stdio handles. Trying to initialize such pipe with uv_pipe_open will deadlock any runtime.

@vtjnash
Copy link
Member

vtjnash commented Aug 30, 2017

I don't think that protecting applications against deadlock caused by other misbehaving applications is really in the purview of libuv. For more context, this was also discussed on the original PR (joyent/libuv#1377 (comment)). I'll admit that it's unsatisfying that the deadlock could be avoided if Microsoft fixed their kernel. However it should be rare to encounter situations that it would occur (an application that spawns nodejs after spawning ReadFile in a thread pool on the same NamedPipe it passes to nodejs to use as stdin – note that this situation also creates an race condition of two processes trying to consume user keystroke input).

@bzoz
Copy link
Member Author

bzoz commented Sep 6, 2017

As compile time option: #1525

@vtjnash
Copy link
Member

vtjnash commented Sep 8, 2017

Trying to initialize such pipe with uv_pipe_open will deadlock any runtime

This PR does not address this case. This PR can only help with working around nodejs bugs, there is nothing that can be done in libuv that would work around bugs in other applications.

@bzoz
Copy link
Member Author

bzoz commented Feb 14, 2018

The issue in Node.js was solved, this makes this PR obsolete. Closing.

@bzoz bzoz closed this Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants