Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: async/await support #2

Closed
c4rlo opened this issue Jan 15, 2020 · 35 comments
Closed

datastore: async/await support #2

c4rlo opened this issue Jan 15, 2020 · 35 comments
Labels
api: datastore Issues related to the googleapis/python-datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@c4rlo
Copy link

c4rlo commented Jan 15, 2020

It would be awesome if the datastore library supported async/await style operations, e.g.:

entity = await client.get_async(key)

Something similar was previously suggested in googleapis/google-cloud-python#40. That was folded into googleapis/google-cloud-python#557, which isn't really the same thing, especially since ndb does not support async/await with no plans to add this support (see googleapis/python-ndb#289).

@busunkim96
Copy link
Contributor

CC @software-dov

@lidizheng
Copy link

@c4rlo I'm currently working on adding AsyncIO support to gRPC (doc). Suggestions and comments are welcomed. Can you provide more details about your usage? E.g. Are you building for new services, or would like to integrate AsyncIO into existing ones? What's your Python version?

@c4rlo
Copy link
Author

c4rlo commented Jan 17, 2020

Cool!

Not quite what I was asking for here though; ideally I'd like to use the datastore library rather than gRPC directly.

Once gRPC supports asyncio, is it likely that the datastore library will make await support available to users?

Meanwhile it seems that the gRPC Python API already has a .future() method on the stub methods (that's how ndb works its tasklet-based magic). I wonder whether this could in principle also be used to support await in the datastore library? Maybe not worth it though if improved async support in gRPC is not far off...

Edit: As for my usage, it's for a hobby project I've got on Google App Engine (https://vimhelp.org) which I'm looking to migrate from Python 2.7 to Python 3.7/3.8. It's currently using ndb, but it's not much code and I'd like to migrate it to whatever is the nicest way of doing things :)

@dannymilsom
Copy link

Big +1! I recently commented on this on googleapis/google-cloud-python#3329 (comment)

I'm also using the google cloud datastore library on Google App Engine (Python 3).

@lidizheng
Copy link

@dannymilsom Reviews and comments are welcomed. Also, can you provide more details? What is the most important value of providing AsyncIO API for you? E.g. improving performance, compatible with existing AsynciO application, or better programming practice...

@paulking86
Copy link

+1 too.

https://docs.djangoproject.com/en/3.0/topics/async/

Django is planning to support fully async views and middleware soon, and I would very much like to use datastore on app engine!

@dannymilsom
Copy link

@lidizheng Hi! We build lots of web apps on GCP - particularly Google App Engine with Datastore. Historically this has centred around Django which has always been WSGI only, but there are now more and more Python web frameworks moving to support ASGI (we are actively using FastAPI and Django even has it on the 3.0 roadmap.

Leveraging the async API in the cloud datastore library would almost definitely enable better performance and lower running costs on cloud services.

@dannymilsom
Copy link

Just to also add that this seems to be a popular feature request from the community based on older issues - see googleapis/google-cloud-python#3103

@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Feb 7, 2020
@crwilcox crwilcox added api: datastore Issues related to the googleapis/python-datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Feb 7, 2020
@paulking86
Copy link

paulking86 commented Mar 18, 2020

django/django@fc0fa72

Django async support has now been implemented. Any update on this? @crwilcox

@nikita-davydov
Copy link

BUMP for this issue

@faynburd
Copy link

faynburd commented Sep 3, 2020

+1

@crwilcox
Copy link
Contributor

crwilcox commented Sep 9, 2020

I expect we will be adding Async Support to datastore in the near future, though development hasn't started yet on datastore.

Currently, we have a dev release for google-cloud-firestore https://pypi.org/project/google-cloud-firestore/2.0.0.dev1/

We are also working to add Async to our REST clients, with the first being google-cloud-storage

@nikita-davydov
Copy link

nikita-davydov commented Nov 4, 2020

@crwilcox Do you have any news about this feature?

@crwilcox
Copy link
Contributor

crwilcox commented Nov 4, 2020

@ndavydovdev I do not. The 2.0.0 version of datastore has async surface, but it is at the generated proto layer, so while you could use it, it isn't necessarily as ergonomic as the sync datastore surface currently. The state of the main branch reflects that today as does 2.0.0dev1 on pypi.org.

@crwilcox
Copy link
Contributor

https://pypi.org/project/google-cloud-datastore/2.0.0/ contains async surfaces at the generated layer. You can construct a client using google/cloud/datastore_v1/services/datastore/async_client.py

@dolev-isp
Copy link

@crwilcox Is the generated client considered stable, both in terms of functionality, as well as interface going forward? I'm asking because I don't see any reference to it in any official doc, and I'd like to know if we can rely on it for the long run.
Also, are there plans to support async in the main client?

@crwilcox
Copy link
Contributor

crwilcox commented Nov 16, 2020

Hi @dolev-isp this may be leaking a bit of implementation detail :) Happy to elaborate. The short version is you can rely on the generated layer, and we will version using https://semver.org/ to determine the next release number. We have been doing that with this library so far in fact.

As far as plans, we don't currently have a plan to add a handwritten generated client. We very recently published Firestore with an async interface, but want to better understand use before implementing in datastore.

More on the library, and generated vs handwritten code. If you look at the current client:

  • google.cloud.datastore: This is handwritten. It leverages the code in the other namespaces.
  • google.cloud.datastore_v1: Generated client from v1 datastore proto.
  • google.cloud_datastore_admin_v1: Generated client from v1 datastore admin proto

The docs at https://googleapis.dev/python/datastore/latest highlight the handwritten layer as it is tailored/crafted to better suit datastore use cases. This isn't to say the generated layers aren't usable clients. The vast majority of Cloud Client Libraries are generated and are considered stable GA surface. The underlying generator is the same for the different Python client libraries.

You will find the generated client is very similar to the handwritten layer but the surface is different. Though if you do want async today this is the way you can get it. The interface is a bit more verbose to be sure.

from google.cloud import datastore_v1
from google.cloud import datastore

kind = "test"
id = "test_id"
project = "crwilcox-test-project"
# Example using client
client = datastore.Client()
key = client.key(kind, id)
entity = client.get(key)
# Example using datastore_v1 surface
# Async client at datastore_v1.services.datastore.DatastoreAsyncClient()
v1_client = datastore_v1.DatastoreClient()
key = datastore.Key(kind, id, project=project)

key_pb  = key.to_protobuf()
v1_client.lookup({"keys": [key_pb], "project_id": project})

@dolev-isp
Copy link

Hi @crwilcox thanks for the in-depth reply, and for adding the async support!

I have to say it would be really awesome if you would be able to add async functions to the handwritten client library, which is really easy and straightforward to use. Our needs are very straightforward too... We would be glad to see the different client methods in an async version, in a similar way to the way they existed in the NDB library, for example. So you'd have client.get_async(), client.get_multi_async(), client.put_async(), etc. An async version of the Query Iterator would be awesome too, although a bit more complicated I guess (and at a lesser priority in our case for now)...

Do you think this is something that can be added to the near-term roadmap?

@ArcLightSlavik
Copy link

Hi @crwilcox I've been trying to implement the outer async api for my project with the parts that are already made and using firestore as a template, but I'm facing an issue when running in pytest.

If I have two identical tests that have just put and get the first one passes but second one stops dead.
google/cloud/datastore_v1/services/datastore/async_client.py 'commit method'

        # Wrap the RPC method; this adds retry and timeout information,
        # and friendly error handling.
        rpc = gapic_v1.method_async.wrap_method(
            self._client._transport.commit,
            default_timeout=60.0,
            client_info=DEFAULT_CLIENT_INFO,
        )

        # Send the request.
        response = await rpc(request, retry=retry, timeout=timeout, metadata=metadata,)

The first test passes through this part as expected, the second test freezes on await rpc

Screenshot 2020-12-07 at 11 09 52

@crwilcox
Copy link
Contributor

crwilcox commented Dec 7, 2020

@ArcLightSlavik are you using pytest-asyncio? I seem to recall that if you don't you can experience some odd behaviors?
https://pypi.org/project/pytest-asyncio/

@ArcLightSlavik
Copy link

ArcLightSlavik commented Dec 8, 2020

@crwilcox Yeah I am
0.14.0 for pytest-asyncio
Tried both 5.x and 6.x for pytest didn't work on either

google-api-core==1.23.0
google-auth==1.23.0
google-cloud-bigquery==1.24.0
google-cloud-core==1.4.3
google-cloud-datastore==2.0.1
google-cloud-error-reporting==0.30.1
google-cloud-kms==1.4.0
google-cloud-logging==1.15.1
google-cloud-pubsub==1.6.1
google-cloud-storage==1.15.1
google-resumable-media==0.5.1
googleapis-common-protos==1.6.0
grpc-google-iam-v1==0.12.3
grpcio==1.30.0

Python 3.7

@ArcLightSlavik
Copy link

Fixed by upgrading grpcio to latest version, 1.35.0 as of right now

@gnagel
Copy link

gnagel commented Feb 22, 2021

@crwilcox 🤞 Will this feature get merged in soon? I'd be thrilled to use it on my app ❤️

@crwilcox
Copy link
Contributor

@gnagel development for this feature has not started at this time.

@pmlanger
Copy link

@crwilcox Would you be willing to accept a PR on it?

@crwilcox
Copy link
Contributor

@pmlanger I would be alright with that, but I want to be clear this may be is a large work item. We recently did this for Firestore and I think it was in the area of 1000-2000 lines of change. A lot of this is in duplicating test coverage and altering to cover slightly different async surfaces. At a high level the work to accomplish this:

  1. create base classes containing functionality that can be shared across async and sync implementations.
  2. create _prep_call_name methods that will let us share the request/response payload construction across the implementations.
  3. Creating an async layer using 1 and 2's code.
  4. test porting after all the refactoring and the new surface.

https://github.com/googleapis/python-firestore could certainly be referenced to see what we would have in mind for this work.

The way the datastore library is built is that it has a handwritten layer (https://github.com/googleapis/python-datastore/tree/master/google/cloud/datastore) that is placed over a generated client. We use this generated layer in most all of our Google Cloud libraries. It is unique that for our database products that we tend to layer over them to combine multiple RPCs and make a better user experience. That said, some calls aren't as complicated, and you might be able to move some of this into your app by using our generated surface, which is at https://github.com/googleapis/python-datastore/tree/master/google/cloud/datastore_v1.

The large difference between the two layers is that the generated client takes a request dictionary that has the fields of the specific call. I think I have a rough sample above if you are curious what that might look like.

@pmlanger
Copy link

@crwilcox Thanks for the details and the references! I actually wrote a slightly restricted version of that handwritten 'async datastore client' for a (non-open source) project a few weeks ago that had been using the ('synchronous') google.cloud.datastore.client.Client before. Obviously, I didn't create common base classes for google.cloud.datastore.client.Client and the new client, but I am aware how the code works and where there are opportunities for reuse. Since I profited a lot from the code in this project, and there seems some 'demand' here, I wanted to give back.

I know it's not a small feat, but as long as there is not too much a rush, I'd like to help.

@crwilcox
Copy link
Contributor

crwilcox commented Feb 25, 2021

I appreciate the question. I just want to be clear the scale. I'd hate for that to sneak up on anyone.

The underlying gRPC library has an async path, used by the generated async later, and using that would be better than wrapping the sync surface behind async.

I don't think anyone from our team is going to get to this, at the soonest, before April, as we have a few other projects we are focused on. Realistically it may be after that even. I do think adding the surface is valuable though, it just isn't "top of the stack" at the moment.

@pmlanger
Copy link

pmlanger commented Feb 25, 2021

Great - I appreciate your being clear on this. Honestly, if there was no one using it, I wouldn't mind and do this just for fun and to clean up my application :-)

The underlying gRPC library has an async path, used by the generated async later, and using that would be better than wrapping the sync surface behind async.

I am not 100% sure what you mean by "better than wrapping the sync surface behind async". But if you mean not to import helpers etc. from the sync portion (Client,Batch,Query/Iterator,..) into the async one, we are on the same page. I do get the idea of factoring out the common portions (e.g., creating protobufs in the correct way for each operation), and using them from both the sync and (to be created) async clients.
And I think having an extra "AsyncClient" is a better approach than amending the existing "Client" with <operation>_async methods.
edit: Basically, as it's done in https://github.com/googleapis/python-firestore/tree/master/google/cloud/firestore_v1

@crwilcox
Copy link
Contributor

I think we are on the same page. I was trying to say that there is a gain to using gRPC's async support instead of wrapping the sync variety in an async wrapper.

@nikita-davydov
Copy link

Are there any updates about this feature? Maybe if you have some plans to implement this, I will be ready to help you at my free time. I bet that it's a very wanted and important feature for all Python developers who use google cloud datastore in production. So maybe we can decompose the task about this feature and start the development process with people (as I am) who are interested in it

@crwilcox crwilcox removed their assignment Sep 21, 2021
@judahrand
Copy link

I'd like to leave a plus one here. Given that Firestore in Datastore mode is still a useful and recommended setup for some use cases if seems less than ideal to have the Datastore client be less functional.

Are there any plans to start on this work?

@akarsh1995
Copy link

Couldn't wait any longer. Tried aiogcd. Working good !

@meredithslota
Copy link
Contributor

We released async support in v2.4.0 but forgot to come back and close this issue: https://github.com/googleapis/python-datastore/releases/tag/v2.4.0 Apologies for the delay (but based on the lack of comments on this issue since late 2021, I think folks figured it out, ha). Thanks!

@amitkot
Copy link

amitkot commented Jun 15, 2023

@meredithslota that release notes the new async DatastoreAdminClient. Is there an async Client implementation provided?
I see DatastoreAsyncClient here but I can't find it on Google Cloud Datastore docs, and it is not clear how to use it for e.g. an async put() operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests