Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub: support batching publish requests with asyncio #20

Open
relud opened this issue Jan 9, 2019 · 6 comments
Open

PubSub: support batching publish requests with asyncio #20

relud opened this issue Jan 9, 2019 · 6 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/python-pubsub API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@relud
Copy link
Contributor

relud commented Jan 9, 2019

Is your feature request related to a problem? Please describe.

I have an asyncio application that needs to publish messages to PubSub, but I'm having issues because google.cloud.pubsub.PublisherClient.publish:

  1. returns futures that aren't compatible with await or asyncio.wrap_future
  2. returns futures that never complete if Batch._commit throws an uncaught exception (like in PubSub: RetryError in batch publish causes futures to never complete google-cloud-python#7103 and PubSub: Propagate RetryError in PublisherClient.publish google-cloud-python#7071)
  3. doesn't enforce a maximum number of threads, which is eating memory

Describe the solution you'd like

I wrote a new google.cloud.pubsub_v1.publisher._batch.async.Batch that implements google.cloud.pubsub_v1.publisher._batch.base.Batch. It uses asyncio to provide awaitable futures that automatically propagate exceptions. It uses a shared concurrent.futures.ThreadPoolExecutor in conjunction with asyncio.wrap_future to asynchronously call Batch.client.publish while enforcing a maximum number of workers. I specifically only wrapped Batch.client.publish in a thread because (if i understand correctly) it only blocks on exclusive access to the grpc channel, so it shouldn't create performance issues as seen in the first alternative below.

I would like to submit this as a pull request, but only if it would be useful.

Describe alternatives you've considered

  • I tried patching google.cloud.pubsub_v1.publisher._batch.thread.Batch to use concurrent.futures.ThreadPoolExecutor. Unfortunately it had performance issues when all workers would reach a time.sleep and there wouldn't be any workers to check that not yet submitted tasks could be ready.
  • I tried patching google.cloud.pubsub_v1.futures.Future to inherit from concurrent.futures.Future. This fixed compatiblity with asyncio.wrap_future, but not uncaught exceptions and unlimited thread spawning.
  • I tried to patch google.cloud.pubsub_v1.publisher._batch.thread.Batch to join spawned threads, which would propagate uncaught exceptions, but I was unable to figure out a solution.
@relud
Copy link
Contributor Author

relud commented Jan 11, 2019

For the record, this is my current implementation of an asyncio batch: https://github.com/mozilla/gcp-ingestion/blob/24c1cea/ingestion-edge/ingestion_edge/util.py#L7-L95

@anguillanneuf
Copy link
Contributor

I haven't tried this myself, but there's a proposed solution to make publish future act like a concurrent Future: googleapis/google-cloud-python#6201 (comment)

@plamut
Copy link
Contributor

plamut commented Jun 13, 2019

Python 2 support is deprecated, but it still needs to be preserved until the end of the year. Any support for asyncio will thus have to wait for at least another 6 months or so.

@agates4
Copy link

agates4 commented Oct 17, 2019

do we have any update on this

@plamut plamut transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/python-pubsub API. label Jan 31, 2020
@plamut plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jan 31, 2020
@matehat
Copy link

matehat commented Apr 20, 2020

Now that we're past the date when python 2 support is officially dropped, can we have an update on this? Any timeline for the official asyncio support?

I'm sure we're not alone in trying to use google cloud pubsub with asyncio-based libraries.

@meredithslota
Copy link
Contributor

We just released a new version of Pub/Sub that drops python 2.7 (and 3.5) support: https://github.com/googleapis/python-pubsub/releases/tag/v2.0.0 about a week ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/python-pubsub API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

9 participants