Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the Python client library thread safe when using gRPC? #3272

Closed
AdamLazarus opened this issue Apr 5, 2017 · 9 comments
Closed

Is the Python client library thread safe when using gRPC? #3272

AdamLazarus opened this issue Apr 5, 2017 · 9 comments
Assignees
Labels
api: core type: question Request for information or clarification. Not an issue.

Comments

@AdamLazarus
Copy link

Hello all,

The documentation [1] makes it clear that http2lib objects aren't thread safe in the Python client library. Are clients that have gRPC support (such as Pubsub) thread safe when using gRPC? This question has been asked with the Java client library before [2] but I'd appreciate a firm answer for Python too.

Thank you.

[1] https://developers.google.com/api-client-library/python/guide/thread_safety
[2] googleapis/google-cloud-java#1320

@lukesneeringer
Copy link
Contributor

lukesneeringer commented Apr 5, 2017

Hi @AdamLazarus,
Thanks for asking.

The short answer is: We think so. :-)
(Additionally, if you find thread-safety issues, feel free to open them as bugs.)

@dhermes dhermes added api: core type: question Request for information or clarification. Not an issue. labels Apr 19, 2017
@philipperemy
Copy link

Just for information, it seems that you cannot share your datastore.Client() object across all the threads. You're going to have something that looks like this:

E1130 10:54:55.377618000 140736526345152 ssl_transport_security.c:435] Corruption detected.
E1130 10:54:55.377821000 140736526345152 ssl_transport_security.c:411] error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
E1130 10:54:55.377891000 140736526345152 secure_endpoint.c:185]        Decryption error: TSI_DATA_CORRUPTED

@dhermes
Copy link
Contributor

dhermes commented Nov 30, 2017

@philipperemy I'd love to see an example that reproduces this. I've used Client()-s based on gRPC connections across multiple threads without issue.

@philipperemy
Copy link

philipperemy commented Dec 1, 2017

Sure! This is roughly the code where I have one datastore.Client() per thread:

from google.cloud import datastore

def get_data(symbol_):
    print('Init...')
    data_store_client = datastore.Client()
    print('Done...')
    query = data_store_client.query(kind=symbol_)
    query_iter = query.fetch()
    print_once = True
    for entity in query_iter:
        print(entity)


def parallel_function(f, sequence, num_threads=None):
    from multiprocessing import Pool
    pool = Pool(processes=num_threads)
    result = pool.map(f, sequence)
    cleaned = [x for x in result if x is not None]
    pool.close()
    pool.join()
    return cleaned

def run_query():
    [...]
    parallel_function(f=get_data, sequence=symbols, num_threads=4)

The other code is very similar except that I define a global variable DATA_STORE_CLIENT and this variable is visible across all the threads.

Both code do not work.

When num_threads=1 it runs smoothly.

@speedplane
Copy link

Has this ever been addressed? Creating a new client for each thread can effectively double the number of threads in the system.

@philipperemy
Copy link

@speedplane if you want something that can run in production, you might want to use something else. Those libs are not very stable unfortunately.

@speedplane
Copy link

@philipperemy what other options are there for accessing the datastore? Isn't this the official library?

@speedplane
Copy link

speedplane commented May 14, 2019

I'm looking at the code now, and it's much worse than 1 new thread per client. It seems that when using gRPC, there are 4 threads: a consumption thread, a channel spin thread, a delivering thread, and a polling thread. (I'm not sure what these threads do or if they're always used). This seems to be per client, and can get bad, take the following example:

  • You have a 4 core server with 4 worker request handler
  • Each worker has 20 threaded request handlers (so it can handle 80 simultaneous requests).
  • Each request handler thread needs access to 2 clients: the datastore and cloud storage.
  • Each of those clients spawns 4 gRPC threads.

That results in 720 threads (= 4 * 20 * (1 + 2 * 4)) when 80 would have worked fine.

@tseaver
Copy link
Contributor

tseaver commented May 14, 2019

@speedplane We expect that gRPC-based clients to be thread safe: the issues we know of are to do with multiprocessing (forking after creating a client).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: core type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

6 participants