Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flakey deploy... #73

Open
jtslear opened this issue May 19, 2017 · 8 comments
Open

Flakey deploy... #73

jtslear opened this issue May 19, 2017 · 8 comments
Labels

Comments

@jtslear
Copy link

jtslear commented May 19, 2017

This is a Bug Report

Description

For bug reports:

  • What went wrong?
    After a first time deploy failed due to a bug in the function I created, serverless told me I broke things, as is expected. I took corrective action and deployed again. The deploy indicated a success, however the function source location was not properly updated. Leaving me with a false positive that all is well.

  • What did you expect should have happened?
    The function source location should've been updated.

  • What was the config you used?
    Can be found here

  • What stacktrace or error message from your provider did you see?
    No stack trace, serverless indicated all is well. Google on the other hand thinks differently, as the function remains to be pointed to the older source location. I've attached a gist below with the gcloud output.

Additional Data

  • 2 buckets were created properly, the first one being the one with the failed code:
 projects % gsutil ls gs://sls-synk-dev-1495197075934/serverless/synk/dev/
gs://sls-synk-dev-1495197075934/serverless/synk/dev/1495197091614-2017-05-19T12:31:31.614Z/
gs://sls-synk-dev-1495197075934/serverless/synk/dev/1495197504769-2017-05-19T12:38:24.769Z/
  • However when describing the function after a deploy, we can see it's still configured to point to the incorrect source, instead of our new location.

  • Workaround, kick off another deploy. But the fact that this failed the first time, has me worried that something is causing this to fail improperly, and for obviously reasons, future deploys.

  • The full log of what I did is here

  • In the above log, the time serverless takes checking on the status of the update progress is significantly different during the deploy which did not update the source properly. (line 53)

  • Serverless Framework Version you're using: 1.13.2

  • Operating System: OSX

  • serverless-google-cloudfunctions Version: 1.0.0

@pmuens pmuens added the bug label May 19, 2017
@pmuens
Copy link
Contributor

pmuens commented May 22, 2017

Hey @jtslear thanks for the very detailed issue description! 👍

I tried to reproduce it. Here's what I did:

I created a new service based on the template and renamed the function in index.js from http to https to ensure that Google Cloud is not able to find the exported function.

I deployed the service (note this is a fresh deployment from scratch) and the deployment fails in the update phase with the same error message about a missing export like you've also encountered.

After that I changed the name back from https to http and re-deployed. I also got a super fast deployment (like you've got here), however the deployment succeeded and the function points to the correct location in the bucket.

Furthermore only one bucket was created for this specific service (even after re-deploying).

It seems like you're using v1.0.0. Do you pull this from npm? It seems like this bug is already fixed in the current master. Could you please use the version in master and try again to see if this bug is still around?

Thanks!

@jtslear
Copy link
Author

jtslear commented May 22, 2017

So I pulled from master this time, however, I'm still able to recreate the above issue. Please note, I'm only getting one bucket, which I believe is legit, it's the contents inside the bucket that are being updated, but the source value for the function is not. I recreate the issue with the exact steps above and ended up with the below output.

The strange thing about this is that you recreated it nearly identical to how I did.

The gcloud command is run after first deploy failed. I then ran gsutil to print the list of objects in the bucket after the second deploy completed successfully. But gcloud reports the function is still in a goofy state:

● % gcloud beta functions describe http
entryPoint: http
httpsTrigger:
  url: https://us-central1-asdf-165014.cloudfunctions.net/http
latestOperation: operations/YXNkZi0xNjUwMTQvdXMtY2VudHJhbDEvaHR0cC80VW1lbmExa0F5Zw
name: projects/asdf-165014/locations/us-central1/functions/http
serviceAccount: asdf-165014@appspot.gserviceaccount.com
sourceArchiveUrl: gs://sls-synk-dev-1495482334612/serverless/synk/dev/1495482335845-2017-05-22T19:45:35.845Z/synk.zip
status: FAILED
timeout: 60s
updateTime: '2017-05-22T19:45:53Z'
● % gsutil ls gs://sls-synk-dev-1495482334612/serverless/synk/dev/
gs://sls-synk-dev-1495482334612/serverless/synk/dev/1495482335845-2017-05-22T19:45:35.845Z/
gs://sls-synk-dev-1495482334612/serverless/synk/dev/1495482416000-2017-05-22T19:46:56.000Z/
● % gcloud beta functions describe http
availableMemoryMb: 256
entryPoint: http
httpsTrigger:
  url: https://us-central1-asdf-165014.cloudfunctions.net/http
latestOperation: operations/YXNkZi0xNjUwMTQvdXMtY2VudHJhbDEvaHR0cC80VW1lbmExa0F5Zw
name: projects/asdf-165014/locations/us-central1/functions/http
serviceAccount: asdf-165014@appspot.gserviceaccount.com
sourceArchiveUrl: gs://sls-synk-dev-1495482334612/serverless/synk/dev/1495482335845-2017-05-22T19:45:35.845Z/synk.zip
status: FAILED
timeout: 60s
updateTime: '2017-05-22T19:45:53Z'
● % cat yarn.lock | grep serverless
"serverless-google-cloudfunctions@https://github.com/serverless/serverless-google-cloudfunctions.git#master":
  resolved "https://github.com/serverless/serverless-google-cloudfunctions.git#c1b670bc10ce1fda873a91e3c111602

@pmuens
Copy link
Contributor

pmuens commented May 23, 2017

Hey @jtslear thanks for getting back and trying it with master!

Seems like the duplicate bucket issue was resolved with the fixes in master. I tried to reproduce this bug another time (according to the steps I've written down above), but was still unable to reproduce it.

However I'm not using the gcloud CLI tool. I look into the deployment managers Web UI to see the function and the corresponding sourceArchiveUrl which is correct in my case. 🤔

@pmuens
Copy link
Contributor

pmuens commented May 23, 2017

Ok. That's weird. Here's a quick update on the issue:

This seems to be smth. on Google ends. It seems like the deployment manager doesn't reflect the actual state the function is in. While the deployment manager tells me that everything works fine I get an error in the function Web UI console telling me that the exported method is not correct.

So I can reproduce this behavior here.

Still unsure how to solve this since we're going through the deployment manager which should take care of that.

So the workaround (as pointed out by @jtslear above) is to simply kick-of another deployment 🤔

@jtslear
Copy link
Author

jtslear commented May 23, 2017

Indeed kicking another deploy fixes it for some reason.

The only thing I haven't investigated further is if there's anything googles audit logs may be able to tell me. The reason I've not looked is that I don't have a lot of experience with it overall.

Maybe file a case with google?

@pmuens
Copy link
Contributor

pmuens commented May 23, 2017

Thanks for getting back @jtslear

The only thing I haven't investigated further is if there's anything googles audit logs may be able to tell me.

I'm also not 100% sure about that. Haven't looked into it yet.

Maybe file a case with google?

That sounds like a good idea!

@jtslear
Copy link
Author

jtslear commented May 24, 2017

@pmuens I think I have an answer for you. No explanation, just the results of an audit trail.

I'm going through the recreation process that we've outlined above (just a different name at this point).

During the first deploy that purposely fails we called the operation cloudfunctions.functions.create

{
 protoPayload: {
  @type: "type.googleapis.com/google.cloud.audit.AuditLog"    
  authenticationInfo: {…}    
  authorizationInfo: [
   0: {
    permission: "cloudfunctions.functions.create"      
    granted: true      
   }
  ]
  resourceName: "projects/asdf-165014/locations/us-central1/functions/bob"    
 }
 insertId: "55049ACC251A1.A033D05.AB09F1EC"   
 resource: {…}  
 timestamp: "2017-05-24T18:57:24.601Z"   
 severity: "NOTICE"   
 logName: "projects/asdf-165014/logs/cloudaudit.googleapis.com%2Factivity"   
 operation: {…}   
 receiveTimestamp: "2017-05-24T18:57:34.307875820Z"   
}

The audit trail also provides other details telling me what went wrong and everything, so I know from this information that google attempted to do something after the creation of the function.

{
 protoPayload: {
  @type: "type.googleapis.com/google.cloud.audit.AuditLog"    
  status: {
   code: 3     
   message: "Function load error: Node.js module defined by file index.js is expected to export function named bob"     
  }
  serviceName: "cloudfunctions.googleapis.com"    
  methodName: "google.cloud.functions.v1beta2.CloudFunctionsService.CreateFunction"    
  resourceName: "projects/asdf-165014/locations/us-central1/functions/bob"    
 }
 resource: {…}   
 timestamp: "2017-05-24T18:57:53.993Z"   
 severity: "ERROR"   
 logName: "projects/asdf-165014/logs/cloudaudit.googleapis.com%2Factivity"   
 operation: {…}  
 receiveTimestamp: "2017-05-24T18:57:58.498014344Z"   
}

During the second deploy that is supposed to update the function, and is supposed to be successful but isn't, we call the same operation.

I also do not see any details from google stating that a function was attempted to be loaded and validated. This is all I see:

{
 protoPayload: {
  @type: "type.googleapis.com/google.cloud.audit.AuditLog"    
  authenticationInfo: {…}    
  authorizationInfo: [
   0: {
    permission: "cloudfunctions.functions.create"      
    granted: true      
   }
  ]
  resourceName: "projects/asdf-165014/locations/us-central1/functions/bob"    
 }  
 resource: {…}   
 timestamp: "2017-05-24T19:01:24.187Z"   
 severity: "NOTICE"   
 logName: "projects/asdf-165014/logs/cloudaudit.googleapis.com%2Factivity"   
 receiveTimestamp: "2017-05-24T19:01:27.489762696Z"   
}

But this is not correct; the function is already created, we simply need to update in this case. So I feel all we did was an idempotent action of creating the function. I think that is why serverless thinks it was successful. Cuz yes, creating something was indeed processed correctly. But we never call anything that updates the function declaration.

A third deploy we appear to call a different operation: cloudfunctions.functions.update

{
 protoPayload: {
  @type: "type.googleapis.com/google.cloud.audit.AuditLog"    
  authenticationInfo: {…}    
  authorizationInfo: [
   0: {
    permission: "cloudfunctions.functions.update"      
    granted: true      
   }
  ]
  resourceName: "projects/asdf-165014/locations/us-central1/functions/bob"    
 } 
 resource: {…}   
 timestamp: "2017-05-24T19:03:25.598Z"   
 severity: "NOTICE"   
 logName: "projects/asdf-165014/logs/cloudaudit.googleapis.com%2Factivity"   
 operation: {…}   
 receiveTimestamp: "2017-05-24T19:03:27.847282533Z"   
}

So it's as if something either inside of serverless, or this library is checking the wrong component and not finishing up. Perhaps something related to readiness of the function? Since the function was not created properly the first time, serverless may not be doing the correct operation during the second deploy.

Due to this, filing a support case with google won't be of much use. So I do not plan to do so.

@pmuens
Copy link
Contributor

pmuens commented May 25, 2017

@jtslear Wow! Thank you very much for the in-depth analysis!

That makes sense. I'll look into the code and try to figure out if we can change the way the operations are called when something goes wrong.

Thanks again for the very detailed description. This helps a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants