Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Selectors module for more robust and efficient syscalls #7882

Merged
merged 9 commits into from Jun 13, 2019

Conversation

Projects
None yet
4 participants
@Eric-Arellano
Copy link
Contributor

commented Jun 12, 2019

Problem

Twitter was running into the error filedescriptor out of range in select() when compiling a large number of targets, per #7880.

In the process, we found that it is more efficient and robust to use the more modern syscalls of epoll(), devpoll(), and kqueue() than select(). Python 3.4 introduced the Selectors module to choose the best syscall available for us.

This will close #7880.

Solution

For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.

Eric-Arellano added some commits Jun 12, 2019

Use selectors for NailgunExecutor
Going to get this file working first, then move on to nailgun_io.py and socket.py.
Use selectors for socket.py
We no longer need safe_select()!

Required tweaking a test mock.

@Eric-Arellano Eric-Arellano changed the title WIP: Use Selectors module for more robust and efficient syscalls Use Selectors module for more robust and efficient syscalls Jun 12, 2019

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Jun 12, 2019

Keeping this as a draft for now because I'm going to push this directly to a branch so that we get wheels. I want to run this against Twitter's sandbox, which will cover substantially more Nailgun use cases than our CI covers.

Otherwise, ready for the review.

Punt on nailgun_io.Fix issues with epoll() not supporting normal files
We can instead use PollSelector where this is an issue.
@ity

ity approved these changes Jun 12, 2019

Copy link
Contributor

left a comment

lgtm!

Show resolved Hide resolved src/python/pants/java/nailgun_executor.py Outdated
def test_recv_check_calls(self, mock_select):
mock_select.return_value = ([1], [], [])
@mock.patch('selectors.DefaultSelector' if PY3 else 'select.select', **PATCH_OPTS)
def test_recv_check_calls(self, mock_selector):

This comment has been minimized.

Copy link
@ity

ity Jun 12, 2019

Contributor

thanks for adding the test!

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jun 12, 2019

Author Contributor

To clarify, I didn't add any new tests—only modified the original to work with selectors instead of select.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

Is it at all feasible to reproduce this error reliably? How often does this show up in Twitter CI?

@ity ity marked this pull request as ready for review Jun 12, 2019

@Eric-Arellano
Copy link
Contributor Author

left a comment

How often does this show up in Twitter CI?

It causes 87 failing targets in Twitter's sandbox, and it had done this in the prior sandbox run so it's a consistent issue. See https://docs.google.com/spreadsheets/d/1jhE1rnj5Q34wzlML5fRPtbvl8_qDT6WwHcD4zDbQ330/edit#gid=0 (requires Twitter gmail).

Is it at all feasible to reproduce this error reliably?

Probably not in Pants, because we don't have enough targets to get the filedescriptor out of range issue here.

Note that before merging anything, I'm going to run this against Twitter's sandbox to confirm it fixes the issue + doesn't introduce new issues.

Show resolved Hide resolved src/python/pants/java/nailgun_executor.py Outdated
def test_recv_check_calls(self, mock_select):
mock_select.return_value = ([1], [], [])
@mock.patch('selectors.DefaultSelector' if PY3 else 'select.select', **PATCH_OPTS)
def test_recv_check_calls(self, mock_selector):

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jun 12, 2019

Author Contributor

To clarify, I didn't add any new tests—only modified the original to work with selectors instead of select.

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

Confirmed that this fixes Twitter's 87 failures and does not introduce any new failures to Twitter's sandbox!

Ready for merge once this gets more reviews.

@blorente
Copy link
Contributor

left a comment

Looks amazing, thanks!

while 1:
remaining_time = calculate_remaining_time()
possibly_raise_timeout(remaining_time)
events = selector.select(timeout=-1 * remaining_time)

This comment has been minimized.

Copy link
@blorente

blorente Jun 13, 2019

Contributor

So we do this in a couple of places ("select until timeout, if readable do something"), I wonder if we could extract this to a common behaviour, which would hide away the PY3 checks as well, something like:

def check_readable(file_descriptor: int, timeout: int, py3_selector=DefaultSelector):
  if PY3:
    # Use py3_selector for whatever
  else:
    # use select.select

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Jun 13, 2019

Author Contributor

Good idea! I did this for two of the call-sites and it's a big improvement.

I didn't use the new function here, though, because I found it too difficult to work around this section using a while loop and registering the selector before that while loop. The control flow is too divergent for it to make sense rewiring socket.is_readable().

Eric-Arellano added some commits Jun 13, 2019

Extract out socket.is_readable() function
Simplifies some of the call-sites and allows us to add a unit test for this functionality.

Also, in the process I realized we were mocking incorrectly so this fixes the recv test to correctly mock.
@blorente
Copy link
Contributor

left a comment

LGTM, thanks for the changes!

@Eric-Arellano Eric-Arellano merged commit 1048ebc into pantsbuild:master Jun 13, 2019

1 of 2 checks passed

continuous-integration/travis-ci/push The Travis CI build could not complete due to an error
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Eric-Arellano Eric-Arellano deleted the Eric-Arellano:selectors branch Jun 13, 2019

Eric-Arellano added a commit that referenced this pull request Jun 13, 2019

Use Selectors module for more robust and efficient syscalls (#7882)
### Problem
Twitter was running into the error `filedescriptor out of range in select()` when compiling a large number of targets, per #7880.

In the process, we found that it is more efficient and robust to use the more modern syscalls of `epoll()`, `devpoll()`, and `kqueue()` than `select()`. Python 3.4 introduced the Selectors module to choose the best syscall available for us.

This will close #7880.

### Solution
For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.

cattibrie added a commit to cattibrie/pants that referenced this pull request Jun 19, 2019

Use Selectors module for more robust and efficient syscalls (pantsbui…
…ld#7882)

### Problem
Twitter was running into the error `filedescriptor out of range in select()` when compiling a large number of targets, per pantsbuild#7880.

In the process, we found that it is more efficient and robust to use the more modern syscalls of `epoll()`, `devpoll()`, and `kqueue()` than `select()`. Python 3.4 introduced the Selectors module to choose the best syscall available for us.

This will close pantsbuild#7880.

### Solution
For Python 3 code, use the new selectors library. Python 2 code stays the same as before, since the library does not exist for it and we are soon going to drop Python 2.

blorente added a commit that referenced this pull request Jun 28, 2019

Remove Python 2 select.select() code (#7964)
In #7882, we started using the much more efficient selectors module with Python 3 instead of select.select(). We may now remove the Python 2 code that was still using the original select.select().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.