Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait_for_termination method to grpc.Server #19299

Merged
merged 9 commits into from
Aug 5, 2019
Merged

Conversation

lidizheng
Copy link
Contributor

@lidizheng lidizheng commented Jun 8, 2019

An attempt to add a "run_forever" like method to grpc.Server.

@lidizheng lidizheng added kind/enhancement lang/Python release notes: yes Indicates if PR needs to be in release notes labels Jun 8, 2019
Copy link
Contributor

@gnossen gnossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! This will be a great improvement! Three high-level thoughts:

  1. This needs to be our new canonical form. No more time.sleep(SECONDS_IN_A_DAY). We need to do a sweep of our example code and documentation, both internally and externally.
  2. This is definitely going to require a cherrypick.
  3. A change in the abstract interfaces got my spidey senses tingling. We've seen trouble in the past with opencensus and grpcio-testing when making this sort of change. I did a cursory look through grpcio-testing and this looks like it should be okay. There might be another use that I've missed though.

2) The `__del__` of the server object is invoked.

Args:
grace: A duration of time in seconds or None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define more clearly what grace does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grace variable is removed. Now, the wait_for_termination function accepts no argument.

My original design is to replace our internal API which have both delay and grace. However, the semantic will become a lot messy. And it is hard to explain that:

  1. The grace variable set here is not necessary the source of truth, because other thread can call server.Stop as well;
  2. The delay variable will only work in main thread, what should it do in other thread;
  3. The semantic of delay is hard to define that which signal should it react to.

src/python/grpcio/grpc/_server.py Outdated Show resolved Hide resolved
@lidizheng lidizheng marked this pull request as ready for review June 12, 2019 22:24
Copy link
Contributor

@gnossen gnossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of nits.

time.sleep(_WAIT_FOR_BLOCKING.total_seconds())

# Invoke manually here, in Python 2 it will be invoked by GC sometime.
server.__del__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think del server is more conventional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del server is not going to trigger __del__ stably in Python 2, and won't trigger __del__ in Python 3 unless it is the very last ref.

This is an EXPERIMENTAL API.

The wait will not consume computational resources during blocking, and it
will block indefinitely. There are two ways to unblock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's a bit contradictory to say it will block indefinitely and then to define to conditions under which it will stop blocking. How about "...and it will block until one of the two following conditions are met:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice. This is much better!

@mehrdada
Copy link
Member

mehrdada commented Jun 13, 2019

Why not return the threading.Event and let the user wait on it however they want?

server.termination_event().wait()

Sounds simpler to implement and with less book-keeping (no arrays and locks etc. needed). Simply create a single Event object for each server in __init__ and save it in server state and call set on it when the server shuts down and call it a day. You don't need to raise exception etc. either. Semantics are fully clear because threading.Event has proper docs.

P.S. there's one valid reason not to expose the event object directly, which is the user can screwup and set or clear the event. I am not too worried as this is python and private APIs are just a suggestion anyway, but if you are worried about that, here's my suggestion:

  • Define wait_for_termination(timeout=None) exactly matching the semantics of threading.Event (even directly link to the doc on threading.Event.wait).
  • Implement it the same way as above, waiting on a single Event instance you create at the beginning, and proxy the arguments directly to that instance's wait method.

will block indefinitely. There are two ways to unblock:

1) Calling `stop` on the server in another thread;
2) The `__del__` of the server object is invoked.
Copy link
Member

@mehrdada mehrdada Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__del__ should never appear in the docs. It's an implementation detail and not part of the guaranteed public interface of the object. We can choose to kill __del__ should we want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removed.


with self._state.lock:
if self._state.stage is _ServerStage.STOPPED:
raise ValueError('Failed to wait for a stopped server.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why raise an exception at all. If we are already stopped it should just return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. And it still align with the semantic of blocking until the server is stopped.

@lidizheng
Copy link
Contributor Author

@mehrdada The first change you gave might be too much. As you said, the user might abuse the new API. I think the second suggestion is valid that adding a timeout variable to this function to simulate a Event.wait.

New tests added. Docstring updated. Changes reflected to gRFC.

Copy link
Member

@mehrdada mehrdada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks--I may have not been clear about my implementation suggestion, so I wrote them down in-line with code

@@ -764,7 +764,7 @@ def __init__(self, completion_queue, server, generic_handlers,
self.interceptor_pipeline = interceptor_pipeline
self.thread_pool = thread_pool
self.stage = _ServerStage.STOPPED
self.shutdown_events = None
self.shutdown_events = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:

self.termination_event = threading.Event()
self.shutdown_events = [self.termination_event]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case you see in terms of giving applications access of the actual Event object?

Copy link
Member

@mehrdada mehrdada Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not giving applications access to Event; it's just simplifying your implementation (no locks, just a single event to keep track of all waiters). All of this belongs to the _state object, which is private, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense. Changed.

@@ -959,6 +958,17 @@ def add_secure_port(self, address, server_credentials):
def start(self):
_start(self._state)

def wait_for_termination(self, timeout=None):
termination_event = threading.Event()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:

def wait_for_termination(self, timeout=None):
    return self._state.termination_event.wait(timeout=timeout)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return is also important, because it helps the user identify if timeout expired or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is my oversight.

Copy link
Member

@mehrdada mehrdada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks good now. Have you manually tested an example under the new API to ensure it works well with Ctrl+C?

@lidizheng
Copy link
Contributor Author

@mehrdada Interestingly, the Event.wait is only interrupt-able if the timeout is specified in both Python 2/3. I have manually confirmed this bahavior.

https://bugs.python.org/issue35935

@mehrdada
Copy link
Member

If this interferes with signals in production and examples alike, what's the point of this feature?
Sounds like in practice you're kind of screwed if you use it. We'd just be increasing the API surface for unclear benefits.

@lidizheng
Copy link
Contributor Author

@mehrdada Alternatively, we can set the default timeout to the maximum timestamp.
https://stackoverflow.com/questions/45704243/what-is-the-value-of-c-pytime-t-in-python

@mehrdada
Copy link
Member

@lidizheng does that help with Ctrl+C? I assume not. The solution is to do it in a loop with small timeouts, but that sounds bad too. I think you should just axe this feature. Is there a concrete ask here?

@lidizheng
Copy link
Contributor Author

lidizheng commented Jun 14, 2019

@gnossen @mehrdada It does help with CTRL+C. With the timeout set, most of the problem go away.

Does CTRL+C work for Event.wait:

Platform Python Version With timeout Without timeout
Windows 2.7 Yes No
Windows 3.7.1 No No
macOS 2.7 Yes No
macOS 3.7 Yes No
Linux 2.7 Yes No
Linux 3.6 Yes Yes
Linux 3.7 Yes Yes

I would say setting the timeout to a big enough number is a valid workaround for the CPython bug.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement lang/Python release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python equivalent of C++ server's Wait() method
4 participants