Segfault fix for pipes #239

Merged
merged 1 commit into from Aug 20, 2011

Projects

None yet

2 participants

@pbozeman

EventMachine relies on the fact that when close(fd)
is called that the fd is removed from any
epoll event queues.

However, this is not always the behavior of close(fd)

See man 4 epoll Q6/A6 and then consider what happens
when using pipes with eventmachine.
(As is often done when communicating with a subprocess)

The pipes end up looking like:

ls -l /proc//fd
...
lr-x------ 1 root root 64 2011-08-19 21:31 3 -> pipe:[940970]
l-wx------ 1 root root 64 2011-08-19 21:31 4 -> pipe:[940970]

This meets the critera from man 4 epoll Q6/A4 for not
removing fds from epoll event queues until all fds
that reference the underlying file have been removed.

If the EventableDescriptor associated with fd 3 is deleted,
its dtor will call EventableDescriptor::Close(),
which will call ::close(int fd).

However, unless the EventableDescriptor associated with fd 4 is
also deleted before the next call to epoll_wait, events may fire
for fd 3 that were registered with an already deleted
EventableDescriptor.

Therefore, it is necessary to notify EventMachine that
when an EventableDescriptor is closing so that it can
remove it from the event loop.

Patrick Bozeman Segfault fix for pipes
EventMachine relies on the fact that when close(fd)
is called that the fd is removed from any
epoll event queues.

However, this is not *always* the behavior of close(fd)

See man 4 epoll Q6/A6 and then consider what happens
when using pipes with eventmachine.
(As is often done when communicating with a subprocess)

The pipes end up looking like:

ls -l /proc/<pid>/fd
...
lr-x------ 1 root root 64 2011-08-19 21:31 3 -> pipe:[940970]
l-wx------ 1 root root 64 2011-08-19 21:31 4 -> pipe:[940970]

This meets the critera from man 4 epoll Q6/A4 for not
removing fds from epoll event queues until all fds
that reference the underlying file have been removed.

If the EventableDescriptor associated with fd 3 is deleted,
its dtor will call EventableDescriptor::Close(),
which will call ::close(int fd).

However, unless the EventableDescriptor associated with fd 4 is
also deleted before the next call to epoll_wait, events may fire
for fd 3 that were registered with an already deleted
EventableDescriptor.

Therefore, it is necessary to notify EventMachine that
when an EventableDescriptor is closing so that it can
remove it from the event loop.
3db64d9
@pbozeman

This fixes an issue that we could repro fairly consistently in Cloud Foundry. One of our components that involves a lot of subprocess management was segfaulting in production. We repro'd the issue with a stand along test in a QA environment and we would get a segfault about every 5 to 15 minutes. After running with a custom EM build using the commit above, I've run for ~6 hours with no segfaults.

@tmm1

Awesome, thanks for tracking this down. I've seen this behavior as well- the socket is technically still open in any forks of the EM process, so events continue to come through.

@tmm1 tmm1 merged commit 46e3bc2 into eventmachine:master Aug 20, 2011
@pbozeman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment