Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage API has poor performance in Google Cloud Functions #101

Closed
dmho418 opened this issue Dec 5, 2017 · 17 comments
Closed

Storage API has poor performance in Google Cloud Functions #101

dmho418 opened this issue Dec 5, 2017 · 17 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. docs status: blocked Resolving the issue is dependent on other work. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@dmho418
Copy link

dmho418 commented Dec 5, 2017

edit by @stephenplusplus

Follow along in the Google issue tracker: https://issuetracker.google.com/issues/70555688


Environment details

  • Node.js version: v6.11.5
  • @google-cloud/storage version: 1.5.0

Steps to reproduce

The API seems to never reuse connections, which causes Cloud Functions using this API to have poor performance and blow up socket connection and DNS quotas very easily.
In the best practices guide (https://cloud.google.com/functions/docs/bestpractices/networking) they give the NodeJS PubSub as an example, which when declared globally will avoid uncesesary DNS queries and connections.

Could be because the configuration of the requests are hardcoded

forever: false,

@stephenplusplus
Copy link
Contributor

forever: false, which prevents sending a Keep-Alive header, is only for file downloads. We added that recently (PR) as a fix for this issue. The problem we were having was, for slower / throttled connections, we weren't receiving the proper end event from the HTTP request stream.

All other API requests made through the Storage API should be using the Keep-Alive header. The default configuration for all requests is here: https://github.com/googleapis/nodejs-common/blob/22383ec89b2866b52850594efab60bd5484a81cc/src/util.js#L30-L37

It's possible for users to change all of our settings with request interceptors. That would look like:

const Storage = require('@google-cloud/storarage')
const gcs = new Storage({ ... })

gcs.interceptors.push({
  request: reqOpts => {
    reqOpts.forever = true
    return reqOpts
  }
})

What do you think-- should we reverse the forever: false fix, and if users have problems, recommend they set the request interceptor, like above?

@dmho418
Copy link
Author

dmho418 commented Dec 6, 2017

Thanks for the tip!
Adding the interceptor solved the problem. Now each connection is being reused for 1000+ invocations.
I'm not an expert in NodeJS so I'm not going to suggest any code changes.
Still, we could mention somewhere in the docs or samples about the interceptors.

@stephenplusplus stephenplusplus added docs priority: p2 Moderately-important priority. Fix may not be included in next release. type: enhancement labels Dec 11, 2017
@joemanfoo
Copy link

My suggestion is that the API should/ought to manage this based off of the action being performed if it is a requirement. For example if I have a download operation to fetch data, do stuff with it and then upload it I've got to keep this straight in my code on what the library dependancy is for downloading (turn off keep alive) and then turn it back on for uploading.

I currently have Google Functions that are timing out due to this issue where the uploading to Storage has to have the interceptor updated now...previous version I didn't (1.1.1 for example "just worked") but with 1.5.x with the reqOpts.forever is causing the uploads to timeout on about 30% of the function invocations.

@stephenplusplus
Copy link
Contributor

Thanks for the input.

and then turn it back on for uploading.

The only time we disable keepAlive is for file downloads. If you want it to remain off, you don't have to manage anything. If you want it off for downloads, you can assign the override like I wrote in my last post.

If we did have some methods using keepAlive, and multiple others not, you could get specific with the interceptor:

gcs.interceptors.push({
  request: reqOpts => {
    if (reqOpts.method === 'POST' && reqOpts.uri.includes('{somethingFromTheApiUrl}')) {
      reqOpts.forever = true
    }
    return reqOpts
  }
})

And to go even further, you can assign interceptors on all levels of the hierarchy:

gcs.interceptors.push({ request: function(reqOpts) {} })
bucket.interceptors.push({ request: function(reqOpts) {} })
file.interceptors.push({ request: function(reqOpts) {} })

@joemanfoo
Copy link

Thanks @stephenplusplus - I'm having a terrible issue with the upload process from my cloud functions to cloud storage is timing out nearly 90% of the time now. This started around December 6th.
collagebuildertimeouts

I thought perhaps it was something with my setting the keepalive to false to fix the download issue but with updated code to create a new storage object to use I'm still getting timeouts more than successes.

And I've not a clue on who/what or where I need to try to get help other than chucking out $300 bucks for support with Google.

@dmho418
Copy link
Author

dmho418 commented Dec 18, 2017

In my case the GCP Console was very helpful for debugging this issue.
If you want to know if it has anything to do with reusing connections, go to
GCP Console > APIs & Services > Google Cloud Functions API > Quotas
Check the plots and compare the number of socket connections and DNS resolutions to the number of functions invocations.

@joemanfoo
Copy link

joemanfoo commented Dec 18, 2017

Thanks @dmho418 for the hint. However my issue is that the Google Function that I'm trying to upload content to GCS just timesout.

I built a very simple test function:

exports.uploadTest = functions.database.ref('/upload-test').onWrite(event => {
  const fbBucket = admin.storage().bucket()
  // download a file
  let file = 'inventory/uploads/127470647675333/1507506068973-leg-tc-1.JPG'
  console.log('downloading test file...')
  return fbBucket.file(file).download({ destination: '/tmp/1507506068973-leg-tc-1.JPG'})
  // upload a file
  .then(() => {
    console.log('uploading test file...')
    return fbBucket.upload('/tmp/1507506068973-leg-tc-1.JPG', { destination: 'inventory/uploads/127470647675333/test-1507506068973-leg-tc-1.JPG' })
  }).catch(error => {
    console.log(error)
    return error
  })
})

If I point this to a file that's less than 5MB then the function executes and finished successfully. If I point the script to an image that's larger than 5MB then the function fails. This behavior started to be manifested on December 7th however the change to the Google environment likely took place earlier.

There's an open ticket with BugTracker: https://issuetracker.google.com/issues/70555688 opened by another with the same timeout issue however they are saying they have been able to download and upload payloads larger than 10MB.

The firebase.bucket()... code uses the storage api under the hood and replacing the bucket code to using the api directly doesn't change the outcome.

@stephenplusplus
Copy link
Contributor

I'm sorry, I was mistaken. I forgot we had a request in April (2017) to not send the Keep-Alive header for Cloud Functions in all of our APIs: (googleapis/google-cloud-node#2254).

This seems to be a unique issue for GCF, so we'll have to see where we get in the Google issue tracker: https://issuetracker.google.com/issues/70555688

For now, I'll call this blocked on our part (the client library). Feel free to provide any extra information, especially if anyone else is having the same problem, or has an idea where we might look for an answer.

@stephenplusplus stephenplusplus added the status: blocked Resolving the issue is dependent on other work. label Dec 18, 2017
@shaibt
Copy link

shaibt commented Dec 25, 2017

Having the same issue with Firebase Functions:
Downloading a <1Mb file to a function and then uploading it again after simple processing (same size) - about ~10% of my functions timeout on upload (that usually takes ~300ms).
Intuitively, it seems that when a function "resource" is recycled - i.e its already "warm" (I can deduct this from the function "start-up time") - this happens more often.
Problems started around early december.
Never attempted to override the default keep-alive config.

@shaibt
Copy link

shaibt commented Mar 12, 2018

Hey @stephenplusplus,
Do you know if anything has moved on this? or googleapis/nodejs-bigquery#41?

@danrasmuson
Copy link

I can confirm

gcs.interceptors.push({
  request: reqOpts => {
    reqOpts.forever = true
    return reqOpts
  }
})

Increased the speed of my uploads from ~20-30 seconds to 2-3 seconds.

@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Jun 8, 2018
@fhinkel
Copy link
Contributor

fhinkel commented Sep 6, 2018

@stephenplusplus Would it make sense to make the options @danielrasmuson suggests the default?

@stephenplusplus
Copy link
Contributor

@fhinkel I'm not sure. The suggested fix for this issue would reverse the fix for googleapis/google-cloud-node#2254.

@victor5114
Copy link

Thx for your help @stephenplusplus on this issue.

I'm having the following use case where I read a stream from a remote file stored on Cloud Storage, processing it through multiple transform and finally writing the processed files into two different buckets.

const fieldStream = remoteFileSrc
    .createReadStream()
    .on('error', logErr)
    .pipe(parserStream)
    .pipe(dropStream);

  fieldStream
    .on('error', logErr)
    .pipe(csvStream1)
    .pipe(writeStream1);

  fieldStream
    .on('error', logErr)
    .pipe(pseudoStream) // This transform slows down the process
    .pipe(csvStream2)
    .pipe(writeStream2);

Locally, I get the two files correctly written on both destination buckets but i'm getting timeout while I run this code on my Cloud Function environment no matter which option I choose regarding the reqOpts.forever = true/false parameter.

{ Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/user_code/node_modules/@google-cloud/storage/node_modules/request/request.js:816:19)
    at ClientRequest.g (events.js:292:16)
    [...]
    at Timer.listOnTimeout (timers.js:214:5) code: 'ESOCKETTIMEDOUT', connect: false }

It looks like pseudoStream transform slows down the whole process as it needs to access other remote resources for each chunk I process through the stream thus causing the timeout issue. The only workaround I came with so far is to process smaller files (~5Mb instead of ~10Mb) to avoid the timeout.

@stephenplusplus
Copy link
Contributor

Hey @victor5114, would you mind opening a new issue for that? I don't believe this is related (or at least, there could be an unrelated way to solve it).

@sduskis sduskis removed 🚨 This issue needs some love. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Feb 5, 2019
@JustinBeckwith JustinBeckwith added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: enhancement labels Jun 5, 2019
@stephenplusplus
Copy link
Contributor

Our library now is using a lot of new guts, especially in terms of the transport libraries. Is anyone still seeing a performance drop in GCF using a newer version of Storage?

@jkwlui
Copy link
Member

jkwlui commented Sep 17, 2019

Closing this stale issue upon recommendation to try newer versions of the library with new transport implementations. @dmho418 Please feel free to reopen this issue if you still experience the issue.

@jkwlui jkwlui closed this as completed Sep 17, 2019
@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. docs status: blocked Resolving the issue is dependent on other work. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

10 participants