Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Grpc client service method does not return when called from httpd on Windows #13050

Closed
boumarc1 opened this issue Oct 18, 2017 · 13 comments

Comments

@boumarc1
Copy link

boumarc1 commented Oct 18, 2017

What version of gRPC and what language are you using?

Python gRPC client executed from wsgi module of Apache httpd
I tested grpcio packages from 1.1.0, to 1.4.4 inclusively

What operating system (Linux, Windows, …) and version?

Windows server 2012

What runtime / compiler are you using (e.g. python version or version of gcc)

Tested grpcio packages from 1.1.0, to 1.4.4 inclusively on python 2.7

What did you do?

When trying to connect to a Grpc Server from a simple Python Grpc client launched from httpd with mod_wsgi, the process freezes during initialization. It really looks like a race condition because enabling the all logs seems to slow or lock the process long enough to make it work sometimes. The same code works fine when launched alone.

What did you expect to see?

I would expect it to carry on.

What did you see instead?

It does not return from call.

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

NotWorking.txt
Working.txt

Anything else we should know about your project / environment?

@mehrdada
Copy link
Member

I'm not familiar with Apache mod_wsgi. Does it fork the process by any chance? If so, interplay of gRPC and fork is a tricky domain and there are known issues.

@boumarc1
Copy link
Author

The mod_wsgi module runs in httpd process on Windows. The "daemon" mode is not available. So in our case, whenever the call is made from the Grpc Client, the server is blocked.

http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html

@mehrdada
Copy link
Member

Can you please give us a minimal test case that reproduces the problem?

@boumarc1
Copy link
Author

I can't paste the company source code here so I rewrote it. It is almost the simplest case possible based on the documentation sample. I believe the fastidious part is to setup and configure Apache, mod_wsgi and make it call the client code. When the server requests the grpc client (list_devices in this case), it never returns. The server code doesn't matter because it is not invoked.

import grpc

import DeviceService_pb2

def get_config_stub(settings):
    channel = grpc.insecure_channel(settings.GRPC_CHANNEL)
    stub = DeviceService_pb2.ConfigurationStub(channel)
    return stub

def list_devices(settings, model):
    request_filter = DeviceService_pb2.Filter()
    request_filter.model = model
    stub = get_config_stub(settings)
    devices = stub.ListDevices(request_filter)
    return devices
syntax = "proto3";

package DeviceService;

message Filter
{
    string model = 1;
}

message Device
{
    string id = 1;
    string model = 2;
}

message Devices
{
    repeated Device items = 1;
}

service Configuration
{
    rpc ListDevices(Filter) returns (Devices);
}

@mehrdada
Copy link
Member

Thanks. This is great. I'll try setting up a Windows machine to try and reproduce. Can you point me to an Apache httpd/mod_wgsi setup tutorial on Windows you suggest I follow to get to a close configuration to what you have?

@boumarc1
Copy link
Author

In fact, I wasn't involved on the web server setup, but I'll try to get what I could. I worked on the Grpc service which works pretty well when the client reaches it.

@mehrdada
Copy link
Member

mehrdada commented Oct 20, 2017

Is there a way for you to set GPRC_TRACE=api GPRC_VERBOSITY=debug (not =all) environment variables for the process hosting gRPC to get some filtered trace information?

@mehrdada
Copy link
Member

Nevermind... I did some fancy grepping to extract the info :)

@boumarc1
Copy link
Author

Our setup is using DJango with mod_wsgi on Apache httpd. It is based on the django documentation:
https://docs.djangoproject.com/en/1.11/howto/deployment/wsgi/modwsgi

In case it helps, I detailed the version we're using
Python 2.7.13
Apache Httpd 2.4.27
mod_wsgi 4.5.17
django 1.11.2

@boumarc1
Copy link
Author

When looking at the NotWorking log, it looks like the service is freezing when using the CompletionQueue, which is part of the Windows I/O Completion Ports API. Both Grpc and Httpd implementation create a CompletionQueue from that API, using all threads available. It is possible that the issue happens because the wsgi_mod is in the httpd process, so httpd and our Grpc client are both creating a CompletionQueue in that same process, and both are jammed. If I am right, It would be possible to narrow down the problem to that case.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx
Threads and Concurrency
The most important property of an I/O completion port to consider carefully is the concurrency value. The concurrency value of a completion port is specified when it is created with CreateIoCompletionPort via the NumberOfConcurrentThreads parameter. This value limits the number of runnable threads associated with the completion port. When the total number of runnable threads associated with the completion port reaches the concurrency value, the system blocks the execution of any subsequent threads associated with that completion port until the number of runnable threads drops below the concurrency value.

@nicolasnoble
Copy link
Member

So, the way I think it's supposed to work is that each Completion Port has its own threadpool, and the value we specify on the creation is the one that is attached specifically for that Completion Port. These threads are created by the kernel, for the purpose of this Completion Port, and will only be used by the kernel to, well, execute the operations you're asking. Then it's possible to max out these threads by filling the completion queue with operations, which is in fact kind of the ideal scenario. If your system is busy enough that all of the threads in the threadpool are doing things, without growing the queue itself, then it's the best optimized scenario.

In all cases, two scenarios can occur then:

  1. each completion port has its own threadpool attached, in which case, one shouldn't block the other.
  2. all completion ports share the same threadpool, which then can get a high contention if there's not enough threads, but then the worst case scenario is "things get slow", as the threadpool is struggling to keep up with the demand.

I think the most likely scenario however is that you see you are blocked into the Completion Port because there's simply nothing happening.

In fact, I would question the threading model you are using, because by definition, if you have two event loops that are going to block on polling, they can't work with each other, since they are going to prevent each other from working. The "race condition" at that point wouldn't be thread-related, but rather event-related, where you have a different set of event sent to two different completion queues, and sometimes it'll work, sometime they'll be stuck waiting forever. The smoking gun here would be the "slowness" to start, which would be caused by completion port 1 reaching a timeout state waiting for events, going outside, reaching into completion port 2, which then manages to trigger some events.

@mehrdada
Copy link
Member

In light of what @nicolasnoble documented above, I'd like to point out that having two "event managers" in the same process (e.g. gRPC + gevent) is not yet supported by gRPC Python and this situation categorically falls under the same sort of issues. We are actively thinking about at least some of these scenarios, like gevent that I mentioned, but have not yet decided on a path forward.

@boumarc1
Copy link
Author

Thank for your support.

@nicolasnoble I agree with you about our problematic threading model. When I first made the the C++ gRPC Service, I didn't expect the gRPC client (the Python request handlers) would be in the Apache httpd process. Even then, I didn't expect gRPC client request could interfere with Apache server. If mod_wsgi could be run in daemon mode, this would not be an issue, but Windows implementation doesn't support it. I think however that the limitation should be mentioned in the documentation.

At this point, I rewrote the request handlers to spawn a Python "gRPC client proxy" that communicates json request/response through standard streams, but this is an ugly solution.

Again, thanks for your help.

@mehrdada mehrdada closed this as completed Nov 8, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants