Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault fix for pipes #239

Merged
merged 1 commit into from Aug 20, 2011
Merged

Segfault fix for pipes #239

merged 1 commit into from Aug 20, 2011

Conversation

pbozeman
Copy link
Contributor

EventMachine relies on the fact that when close(fd)
is called that the fd is removed from any
epoll event queues.

However, this is not always the behavior of close(fd)

See man 4 epoll Q6/A6 and then consider what happens
when using pipes with eventmachine.
(As is often done when communicating with a subprocess)

The pipes end up looking like:

ls -l /proc//fd
...
lr-x------ 1 root root 64 2011-08-19 21:31 3 -> pipe:[940970]
l-wx------ 1 root root 64 2011-08-19 21:31 4 -> pipe:[940970]

This meets the critera from man 4 epoll Q6/A4 for not
removing fds from epoll event queues until all fds
that reference the underlying file have been removed.

If the EventableDescriptor associated with fd 3 is deleted,
its dtor will call EventableDescriptor::Close(),
which will call ::close(int fd).

However, unless the EventableDescriptor associated with fd 4 is
also deleted before the next call to epoll_wait, events may fire
for fd 3 that were registered with an already deleted
EventableDescriptor.

Therefore, it is necessary to notify EventMachine that
when an EventableDescriptor is closing so that it can
remove it from the event loop.

EventMachine relies on the fact that when close(fd)
is called that the fd is removed from any
epoll event queues.

However, this is not *always* the behavior of close(fd)

See man 4 epoll Q6/A6 and then consider what happens
when using pipes with eventmachine.
(As is often done when communicating with a subprocess)

The pipes end up looking like:

ls -l /proc/<pid>/fd
...
lr-x------ 1 root root 64 2011-08-19 21:31 3 -> pipe:[940970]
l-wx------ 1 root root 64 2011-08-19 21:31 4 -> pipe:[940970]

This meets the critera from man 4 epoll Q6/A4 for not
removing fds from epoll event queues until all fds
that reference the underlying file have been removed.

If the EventableDescriptor associated with fd 3 is deleted,
its dtor will call EventableDescriptor::Close(),
which will call ::close(int fd).

However, unless the EventableDescriptor associated with fd 4 is
also deleted before the next call to epoll_wait, events may fire
for fd 3 that were registered with an already deleted
EventableDescriptor.

Therefore, it is necessary to notify EventMachine that
when an EventableDescriptor is closing so that it can
remove it from the event loop.
@pbozeman
Copy link
Contributor Author

This fixes an issue that we could repro fairly consistently in Cloud Foundry. One of our components that involves a lot of subprocess management was segfaulting in production. We repro'd the issue with a stand along test in a QA environment and we would get a segfault about every 5 to 15 minutes. After running with a custom EM build using the commit above, I've run for ~6 hours with no segfaults.

@tmm1
Copy link
Contributor

tmm1 commented Aug 20, 2011

Awesome, thanks for tracking this down. I've seen this behavior as well- the socket is technically still open in any forks of the EM process, so events continue to come through.

tmm1 added a commit that referenced this pull request Aug 20, 2011
@tmm1 tmm1 merged commit 46e3bc2 into eventmachine:master Aug 20, 2011
@pbozeman
Copy link
Contributor Author

no problem

On Fri, Aug 19, 2011 at 8:58 PM, tmm1 <
reply@reply.github.com>wrote:

Awesome, thanks for tracking this down. I've seen this behavior as well-
the socket is technically still open in any forks of the EM process, so
events continue to come through.

Reply to this email directly or view it on GitHub:
#239 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants