Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batch requests #740

Closed
j-rewerts opened this issue May 4, 2017 · 13 comments
Closed

Support batch requests #740

j-rewerts opened this issue May 4, 2017 · 13 comments
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@j-rewerts
Copy link

I found the update where batch updates were removed. Is there any interest in having this back?

I'm planning on using the Drive API to make tons of permission changes. Having this functionality would be nice.

@wellczech
Copy link

How many permission do you want to change (or how many API calls)? With good per account parallelism you can do a lot already. I'm currently making 1 250 000 Gmail API calls (7 different user accounts), takes about 6 hours to complete. Drive file metadata for 17K accounts (14M files) can be downloaded in 3 hours.

Batch requests doesn't help much with quotas. My only issue with such huge amount of request is that OS has issues with that number of connections. I'd be fine without batch requests, I'd rather see HTTP2 support in googleapis nodejs client.

@jelder
Copy link

jelder commented Jun 5, 2017

I too am working on an app where batch requests would be desirable. We do a large number of GMail queries and batching would likely help avoid some concurrency problems for us.

@cb-lixar
Copy link

cb-lixar commented Jun 9, 2017

I don't know about their other APIs, but with their EMM APIs Google has recently added a clause in their public documentation that specifically states they do not want batch requests being made. Right now I'm having to look into queuing requests so they aren't made too often but this client does not seem to offer built-in rate limiting support.

@ace-n
Copy link
Contributor

ace-n commented Jun 9, 2017

cc @erickoledadevrel

@wclr
Copy link

wclr commented Jun 11, 2017

I need to go get info about 20-50 files with drive.file.get and without batch I have to do 50 requests, with a limitation of 1000 requests per 100 sec, I can handle this task only for more than 5 sec, it is quite long time for such simple task. Prior this I was using drive.newBatchRequest() and it worked like a charm in a sec.

Batch would be really helpful for such simple situations.

@wellczech
Copy link

@whitecolor In my experience API allows short bursts and it should be possible to get properties fast. Can you confirm, that user authentication is done before making API requests? I made mistakes that I've authenticated users with every request, but tokens are valid for an hour. So that's the first thing to check for me now.

Without knowing much about context, If I would need to check properties of same set of files in regular intervals, I'd add custom properties to hose files and then use drive.file.list with query to that property - I'll get all files of interest in one request.

Mostly, I'm using my custom library which wraps gapi-client into highland.js streams. Rate-limiting is fairly easy there. My library is not yet on github, but I'll release to general public. How much are you in hurry to solve this?

@wclr
Copy link

wclr commented Jun 11, 2017

Well, I authenticate before each new bunch of requests.

  1. My task is get full info about files in particular GDrive folder (folderId is given)
  2. I have Service Account that is given access to the folder (Using Sharing)
  3. So just before I need to get info about files in folder, I authorize using JSON web token (email and private-key)
  4. Then I get file ids using drive.children.list({folderId, auth: jwtClient}) and then I have to make request for for each file using drive.files.get({fileId, auth: jwtClient}) (I need size, type, downloadUrl)

I believe can not get this with one drive.files.list, any suggestions how to optimize?

I'm using my custom library which wraps gapi-client into highland.js streams. Rate-limiting is fairly easy there.

How this can help to overcome platform restrictions for 1000 requests per 100 sec (or I believe almost the same 10 per 1 sec)?

@wellczech
Copy link

First of all, I suggest Drive API v3.

Only one request to file.list is needed. Request object: {q: "'[FOLDER_ID]' in parents"}. Good practise is to add 'fields' param to get back only properties you need - 'files(size,downloadUrl,mimeType)'.

@wclr
Copy link

wclr commented Jun 11, 2017

@wellczech thank you very much for the advice that works for me perfect!

@j-rewerts
Copy link
Author

Sorry for the late reply. I think if you're pulling metadata for 14 million files in 3 hours with no issues, I think I'll be okay. That amount of use isn't costing you anything? Or did you pay to boost your quota?

@wellczech
Copy link

Didn't pay anything. There are quotas per user and total quota for whole app. Key is to process multiple accounts in parallel and to have good error checking for failed requests. Listing of files can be done by thousands, so it's fast. If you are going to change permissions, it will take more time, because each permission change is one request. Also I suggest ordering accounts based on number of files you're going to update and start with accounts having most files - I usually process 98% of accounts in 2/3 of total time and rest of the time app processes just last few account with hundreds thousands files.

I use two approaches, either I buffer data in memory and then save JSON files to disk, this way I can hit quota limits easily (with 50-60% CPU usage). Recently I started to store data in SQLite, which seems to be bottleneck now (100% CPU, waiting for disk, but safer).

I'm thinking about using protobuffers and/or HTTP2 in hopes it will make API requests even faster.

@JustinBeckwith JustinBeckwith added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 15, 2018
@JustinBeckwith
Copy link
Contributor

Greetings folks! Sadly, the global batch specific endpoints for googleapis have been turned down:
https://developers.googleblog.com/2018/03/discontinuing-support-for-json-rpc-and.html

Individual per-API batch methods still exist for many APIs, and those will continue to work just fine.

The real win here will be moving to HTTP/2, which we're tracking over in #1130. Thanks!

@jrmdayn
Copy link

jrmdayn commented Jan 10, 2023

Having hit the issue of batching Google apis requests using Node.js myself, I decided to develop a solution that follows the guidelines provided in Google's batching documentation, implementing the multipart/mixed HTTP request/response protocol.

Please have a look and do not hesitate to raise an issue if you find a bug or would like to discuss improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

8 participants