Python Grpc client service method does not return when called from httpd on Windows #13050

boumarc1 · 2017-10-18T16:05:34Z

What version of gRPC and what language are you using?

Python gRPC client executed from wsgi module of Apache httpd
I tested grpcio packages from 1.1.0, to 1.4.4 inclusively

What operating system (Linux, Windows, …) and version?

Windows server 2012

What runtime / compiler are you using (e.g. python version or version of gcc)

Tested grpcio packages from 1.1.0, to 1.4.4 inclusively on python 2.7

What did you do?

When trying to connect to a Grpc Server from a simple Python Grpc client launched from httpd with mod_wsgi, the process freezes during initialization. It really looks like a race condition because enabling the all logs seems to slow or lock the process long enough to make it work sometimes. The same code works fine when launched alone.

What did you expect to see?

I would expect it to carry on.

What did you see instead?

It does not return from call.

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

NotWorking.txt
Working.txt

Anything else we should know about your project / environment?

mehrdada · 2017-10-19T05:54:54Z

I'm not familiar with Apache mod_wsgi. Does it fork the process by any chance? If so, interplay of gRPC and fork is a tricky domain and there are known issues.

boumarc1 · 2017-10-19T14:38:38Z

The mod_wsgi module runs in httpd process on Windows. The "daemon" mode is not available. So in our case, whenever the call is made from the Grpc Client, the server is blocked.

http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html

mehrdada · 2017-10-19T17:19:43Z

Can you please give us a minimal test case that reproduces the problem?

boumarc1 · 2017-10-19T18:36:10Z

I can't paste the company source code here so I rewrote it. It is almost the simplest case possible based on the documentation sample. I believe the fastidious part is to setup and configure Apache, mod_wsgi and make it call the client code. When the server requests the grpc client (list_devices in this case), it never returns. The server code doesn't matter because it is not invoked.

import grpc

import DeviceService_pb2

def get_config_stub(settings):
    channel = grpc.insecure_channel(settings.GRPC_CHANNEL)
    stub = DeviceService_pb2.ConfigurationStub(channel)
    return stub

def list_devices(settings, model):
    request_filter = DeviceService_pb2.Filter()
    request_filter.model = model
    stub = get_config_stub(settings)
    devices = stub.ListDevices(request_filter)
    return devices

syntax = "proto3";

package DeviceService;

message Filter
{
    string model = 1;
}

message Device
{
    string id = 1;
    string model = 2;
}

message Devices
{
    repeated Device items = 1;
}

service Configuration
{
    rpc ListDevices(Filter) returns (Devices);
}

mehrdada · 2017-10-19T19:54:17Z

Thanks. This is great. I'll try setting up a Windows machine to try and reproduce. Can you point me to an Apache httpd/mod_wgsi setup tutorial on Windows you suggest I follow to get to a close configuration to what you have?

boumarc1 · 2017-10-19T20:31:44Z

In fact, I wasn't involved on the web server setup, but I'll try to get what I could. I worked on the Grpc service which works pretty well when the client reaches it.

mehrdada · 2017-10-20T19:49:37Z

Is there a way for you to set GPRC_TRACE=api GPRC_VERBOSITY=debug (not =all) environment variables for the process hosting gRPC to get some filtered trace information?

mehrdada · 2017-10-20T19:56:19Z

Nevermind... I did some fancy grepping to extract the info :)

boumarc1 · 2017-10-23T18:43:59Z

Our setup is using DJango with mod_wsgi on Apache httpd. It is based on the django documentation:
https://docs.djangoproject.com/en/1.11/howto/deployment/wsgi/modwsgi

In case it helps, I detailed the version we're using
Python 2.7.13
Apache Httpd 2.4.27
mod_wsgi 4.5.17
django 1.11.2

boumarc1 · 2017-10-26T20:46:37Z

When looking at the NotWorking log, it looks like the service is freezing when using the CompletionQueue, which is part of the Windows I/O Completion Ports API. Both Grpc and Httpd implementation create a CompletionQueue from that API, using all threads available. It is possible that the issue happens because the wsgi_mod is in the httpd process, so httpd and our Grpc client are both creating a CompletionQueue in that same process, and both are jammed. If I am right, It would be possible to narrow down the problem to that case.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx
Threads and Concurrency
The most important property of an I/O completion port to consider carefully is the concurrency value. The concurrency value of a completion port is specified when it is created with CreateIoCompletionPort via the NumberOfConcurrentThreads parameter. This value limits the number of runnable threads associated with the completion port. When the total number of runnable threads associated with the completion port reaches the concurrency value, the system blocks the execution of any subsequent threads associated with that completion port until the number of runnable threads drops below the concurrency value.

nicolasnoble · 2017-10-26T23:32:00Z

So, the way I think it's supposed to work is that each Completion Port has its own threadpool, and the value we specify on the creation is the one that is attached specifically for that Completion Port. These threads are created by the kernel, for the purpose of this Completion Port, and will only be used by the kernel to, well, execute the operations you're asking. Then it's possible to max out these threads by filling the completion queue with operations, which is in fact kind of the ideal scenario. If your system is busy enough that all of the threads in the threadpool are doing things, without growing the queue itself, then it's the best optimized scenario.

In all cases, two scenarios can occur then:

each completion port has its own threadpool attached, in which case, one shouldn't block the other.
all completion ports share the same threadpool, which then can get a high contention if there's not enough threads, but then the worst case scenario is "things get slow", as the threadpool is struggling to keep up with the demand.

I think the most likely scenario however is that you see you are blocked into the Completion Port because there's simply nothing happening.

In fact, I would question the threading model you are using, because by definition, if you have two event loops that are going to block on polling, they can't work with each other, since they are going to prevent each other from working. The "race condition" at that point wouldn't be thread-related, but rather event-related, where you have a different set of event sent to two different completion queues, and sometimes it'll work, sometime they'll be stuck waiting forever. The smoking gun here would be the "slowness" to start, which would be caused by completion port 1 reaching a timeout state waiting for events, going outside, reaching into completion port 2, which then manages to trigger some events.

mehrdada · 2017-10-30T22:42:32Z

In light of what @nicolasnoble documented above, I'd like to point out that having two "event managers" in the same process (e.g. gRPC + gevent) is not yet supported by gRPC Python and this situation categorically falls under the same sort of issues. We are actively thinking about at least some of these scenarios, like gevent that I mentioned, but have not yet decided on a path forward.

boumarc1 · 2017-10-31T19:34:57Z

Thank for your support.

@nicolasnoble I agree with you about our problematic threading model. When I first made the the C++ gRPC Service, I didn't expect the gRPC client (the Python request handlers) would be in the Apache httpd process. Even then, I didn't expect gRPC client request could interfere with Apache server. If mod_wsgi could be run in daemon mode, this would not be an issue, but Windows implementation doesn't support it. I think however that the limitation should be mentioned in the documentation.

At this point, I rewrote the request handlers to spawn a Python "gRPC client proxy" that communicates json request/response through standard streams, but this is an ugly solution.

Again, thanks for your help.

nathanielmanistaatgoogle added lang/Python platform/Windows labels Oct 18, 2017

nathanielmanistaatgoogle assigned mehrdada Oct 18, 2017

mehrdada added disposition/requires reporter action kind/bug labels Oct 19, 2017

mehrdada removed the disposition/requires reporter action label Oct 19, 2017

mehrdada added kind/enhancement priority/Needs Prioritization and removed kind/bug labels Oct 31, 2017

mehrdada closed this as completed Nov 8, 2017

lock bot locked as resolved and limited conversation to collaborators Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Grpc client service method does not return when called from httpd on Windows #13050

Python Grpc client service method does not return when called from httpd on Windows #13050

boumarc1 commented Oct 18, 2017 •

edited

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 20, 2017 •

edited

mehrdada commented Oct 20, 2017

boumarc1 commented Oct 23, 2017

boumarc1 commented Oct 26, 2017

nicolasnoble commented Oct 26, 2017

mehrdada commented Oct 30, 2017

boumarc1 commented Oct 31, 2017

Python Grpc client service method does not return when called from httpd on Windows #13050

Python Grpc client service method does not return when called from httpd on Windows #13050

Comments

boumarc1 commented Oct 18, 2017 • edited

What version of gRPC and what language are you using?

What operating system (Linux, Windows, …) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

What did you expect to see?

What did you see instead?

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

Anything else we should know about your project / environment?

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 19, 2017

boumarc1 commented Oct 19, 2017

mehrdada commented Oct 20, 2017 • edited

mehrdada commented Oct 20, 2017

boumarc1 commented Oct 23, 2017

boumarc1 commented Oct 26, 2017

nicolasnoble commented Oct 26, 2017

mehrdada commented Oct 30, 2017

boumarc1 commented Oct 31, 2017

boumarc1 commented Oct 18, 2017 •

edited

mehrdada commented Oct 20, 2017 •

edited