Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental option to run unary-stream RPCs on a single Python thread. #20753

Merged
merged 26 commits into from Oct 23, 2019

Conversation

gnossen
Copy link
Contributor

@gnossen gnossen commented Oct 23, 2019

Benchmarking revealed that context switching and lock contention led to a considerable amount of overhead in executing RPCs taking the "integrated" code path, which used multiple threads.

Overall, this PR has been shown to consistently decrease end-to-end latency of unary-stream RPCs by around 6.5% on Intel x86-64 workstations. Additional testing involving pinned cores and disabling Intel Turbo revealed that, under non-ideal thermal conditions, this win may degrade to around 2.5%.

An experimental channel option consumed only by the Python layer and not passed to core has been added to activate the single-threaded code path, since it cannot fulfill the grpc.Future interface.

This change was tested both in OSS using bazel and in the Google codebase with Blaze, using various sanitizers, including ASAN and MSAN. I was originally concerned that the change to channel.pyx.pxi would result in a memory leak, but this was not born out by testing.

At the moment, testing this code path is manual, since activating it across the entire test suite would require manual instrumentation and parametrization of every existing unit test. I have filed an issue to track this work.

This PR resolves #20741.

Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good! One major thing I think can improve is about the regression of initial_metadata.

The idea is that if the info that the application is asking is not ready yet, then donate the calling
You can change the condition.wait into trying to get an event, and handle it. If response event propagated earlier, the RPCState will cache it. In that way, the _SingleThreadedRendezvous can be truly singled threaded.

src/python/grpcio/grpc/__init__.py Outdated Show resolved Hide resolved
src/python/grpcio/grpc/_channel.py Show resolved Hide resolved
src/python/grpcio/grpc/_channel.py Show resolved Hide resolved
src/python/grpcio/grpc/_channel.py Outdated Show resolved Hide resolved
src/python/grpcio/grpc/_channel.py Show resolved Hide resolved
def _done():
return self._state.initial_metadata is not None

_common.wait(self._state.condition.wait, _done)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to change the condition.wait to do other things that either move the RPC forward or notify application that the info is not ready.

Applies to all condition.wait in _SingleThreadedRendezvous.

Copy link
Contributor Author

@gnossen gnossen Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes some of the current assumptions in the code. At the very least, we'd have to buffer message events dequeued by the attribute methods here so that they can later be produced by the iterator. I don't view this as a blocker for this PR, considering the use cases of our target user. Adding as a TODO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link the issue here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Don't forget to link issues in comments. Also, this PR may need a cherry-pick.

src/python/grpcio/grpc/__init__.py Outdated Show resolved Hide resolved
def _done():
return self._state.initial_metadata is not None

_common.wait(self._state.condition.wait, _done)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link the issue here

@gnossen
Copy link
Contributor Author

gnossen commented Oct 23, 2019

The TODO was added just a few lines down.

@gnossen
Copy link
Contributor Author

gnossen commented Oct 23, 2019

@gnossen gnossen merged commit 018580f into grpc:master Oct 23, 2019
@gnossen gnossen deleted the unary_stream branch October 23, 2019 22:19
@lock lock bot locked as resolved and limited conversation to collaborators Jan 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Python Server-Side Streaming Performance
3 participants