New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Selectors module for more robust and efficient syscalls #7882
Conversation
Going to get this file working first, then move on to nailgun_io.py and socket.py.
We no longer need safe_select()! Required tweaking a test mock.
Keeping this as a draft for now because I'm going to push this directly to a branch so that we get wheels. I want to run this against Twitter's sandbox, which will cover substantially more Nailgun use cases than our CI covers. Otherwise, ready for the review. |
We can instead use PollSelector where this is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
def test_recv_check_calls(self, mock_select): | ||
mock_select.return_value = ([1], [], []) | ||
@mock.patch('selectors.DefaultSelector' if PY3 else 'select.select', **PATCH_OPTS) | ||
def test_recv_check_calls(self, mock_selector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the test!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I didn't add any new tests—only modified the original to work with selectors
instead of select
.
Is it at all feasible to reproduce this error reliably? How often does this show up in Twitter CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does this show up in Twitter CI?
It causes 87 failing targets in Twitter's sandbox, and it had done this in the prior sandbox run so it's a consistent issue. See https://docs.google.com/spreadsheets/d/1jhE1rnj5Q34wzlML5fRPtbvl8_qDT6WwHcD4zDbQ330/edit#gid=0 (requires Twitter gmail).
Is it at all feasible to reproduce this error reliably?
Probably not in Pants, because we don't have enough targets to get the filedescriptor out of range
issue here.
Note that before merging anything, I'm going to run this against Twitter's sandbox to confirm it fixes the issue + doesn't introduce new issues.
def test_recv_check_calls(self, mock_select): | ||
mock_select.return_value = ([1], [], []) | ||
@mock.patch('selectors.DefaultSelector' if PY3 else 'select.select', **PATCH_OPTS) | ||
def test_recv_check_calls(self, mock_selector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I didn't add any new tests—only modified the original to work with selectors
instead of select
.
Confirmed that this fixes Twitter's 87 failures and does not introduce any new failures to Twitter's sandbox! Ready for merge once this gets more reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing, thanks!
while 1: | ||
remaining_time = calculate_remaining_time() | ||
possibly_raise_timeout(remaining_time) | ||
events = selector.select(timeout=-1 * remaining_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do this in a couple of places ("select until timeout, if readable do something"), I wonder if we could extract this to a common behaviour, which would hide away the PY3
checks as well, something like:
def check_readable(file_descriptor: int, timeout: int, py3_selector=DefaultSelector):
if PY3:
# Use py3_selector for whatever
else:
# use select.select
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I did this for two of the call-sites and it's a big improvement.
I didn't use the new function here, though, because I found it too difficult to work around this section using a while loop and registering the selector before that while loop. The control flow is too divergent for it to make sense rewiring socket.is_readable()
.
Simplifies some of the call-sites and allows us to add a unit test for this functionality. Also, in the process I realized we were mocking incorrectly so this fixes the recv test to correctly mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes!
### Problem Twitter was running into the error `filedescriptor out of range in select()` when compiling a large number of targets, per #7880. In the process, we found that it is more efficient and robust to use the more modern syscalls of `epoll()`, `devpoll()`, and `kqueue()` than `select()`. Python 3.4 introduced the Selectors module to choose the best syscall available for us. This will close #7880. ### Solution For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.
…ld#7882) ### Problem Twitter was running into the error `filedescriptor out of range in select()` when compiling a large number of targets, per pantsbuild#7880. In the process, we found that it is more efficient and robust to use the more modern syscalls of `epoll()`, `devpoll()`, and `kqueue()` than `select()`. Python 3.4 introduced the Selectors module to choose the best syscall available for us. This will close pantsbuild#7880. ### Solution For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.
In #7882, we started using the much more efficient selectors module with Python 3 instead of select.select(). We may now remove the Python 2 code that was still using the original select.select().
Problem
Twitter was running into the error
filedescriptor out of range in select()
when compiling a large number of targets, per #7880.In the process, we found that it is more efficient and robust to use the more modern syscalls of
epoll()
,devpoll()
, andkqueue()
thanselect()
. Python 3.4 introduced the Selectors module to choose the best syscall available for us.This will close #7880.
Solution
For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.