Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An epoll problem in old kernel #309

Closed
unknownzerx opened this issue Jul 20, 2017 · 2 comments
Closed

An epoll problem in old kernel #309

unknownzerx opened this issue Jul 20, 2017 · 2 comments

Comments

@unknownzerx
Copy link

@unknownzerx unknownzerx commented Jul 20, 2017

In old kernel, (I am using 2.6.32-431.el6 on Centos6), when connecting to a invalid ip:port, epoll only gives EPOLLERR|EPOLLHUP instead of EPOLLOUT|EPOLLERR|EPOLLHUP. Since Seastar only concerns about EPOLLOUT|EPOLLIN, engine().net().connect() will hang.

It might be a kernel bug. Should seastar handle this case?

@avikivity

This comment has been minimized.

Copy link
Contributor

@avikivity avikivity commented Jul 20, 2017

It's hard to say if it's a kernel bug or not, since the documentation is silent on the matter. Logically, we should get EPOLLOUT since a write() will not block.

Regardless, I'll be happy to accept a patch that fixes the problem.

@nyh

This comment has been minimized.

Copy link
Contributor

@nyh nyh commented Jul 20, 2017

Very interesting. It would be nice to find a pointer to a Linux commit which changed the poll bits generated on a connect failure, to verify that this behavior existed and later changed (although I guess you are testing, and experimentally verified that...). I agree with Avi that EPOLLOUT should have been there too, but if it's not, it's not...

Neither poll() nor epoll() (see the epoll_ctl() man page for example) require you to add POLLHUP or POLLERR to the requested "events" mask: We will be woken on such events regardless of what we waited for. So let's look at what reactor::posix_connect does:

return pfd->writeable().then([pfd]() mutable {
        auto err = pfd->get_file_desc().getsockopt<int>(SOL_SOCKET, SO_ERROR);
        if (err != 0) {
            throw std::system_error(err, std::system_category());
        }
        return make_ready_future<>();
    });

writable() does

future<> reactor_backend_epoll::writeable(pollable_fd_state& fd) {
    return get_epoll_future(fd, &pollable_fd_state::pollout, EPOLLOUT);
}

The get_epoll_future() call adds the EPOLLOUT to the requested events. This is the correct thing to do - as I explained above we can't add EPOLLHUP or EPOLLERR to there.

I think the problem lies in reactor_backend_epoll::complete_epoll_event() - if we requested just EPOLLOUT we completely ignore the EPOLLERR bit that we get back in the actual event. If we find the EPOLLOUT bit we call complete_epoll_event() to wake up the sleeping promise (the "writable()" call above) - but it should probably also check the EPOLLERR bit, and if found, wake up the promise with an exception. Note that changing complete_epoll_event() is not enough - we would also need to change its caller, reactor_backend_epoll::wait_and_process(), to not not strip EPOLLERR, because right now it does:

        auto events = evt.events & (EPOLLIN | EPOLLOUT);

If think if we make this change, in most cases (or all cases?) the connect code quoted above will get an exception directly from the writable() future - and will not even need to do the getsockopt check.

What I'm afraid though, is the extra EPOLLERR check in the poller hotpath will add unnecessary slowdown on modern Linux, which appears (as you noticed) to always accompany EPOLLERR with EPOLLOUT anyway. But maybe there are additional caseswhere an EPOLLERR could happen without an EPOLLOUT and we're missing those too? In the connect() case we found a workaround to missing this EPOLLERR (using getsockopt()) but maybe there are other cases were it's important we don't ignore an EPOLLERR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.