Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Critical Bug] sls deploy does not create API endpoints in Stages section of Amazon API Gateway #1684

Closed
codepreneur opened this issue Jul 27, 2016 · 26 comments
Labels
Milestone

Comments

@codepreneur
Copy link
Member

codepreneur commented Jul 27, 2016

Serverless Framework Version:
1.0.0-alpha.2
Operating System:

image

Expected Behavior:

API endpoints described/configured in serverless.yaml file being deployed and visible in Stages section on AWS console (Amazon API Gateway)

Actual Behavior:

They are not visible and are not being deployed there as you can see below:

image

leads to:

image

which leads to:

image

Cloudformation and sls deploy does not throw any errors

I would consider this as critical bug/issue. @pmuens ? @eahefnawy ?

@pmuens
Copy link
Contributor

pmuens commented Jul 27, 2016

Hey @codepreneur thank you for reporting this!

@eahefnawy I think we had some similar issue where @nikgraf had also problems with the REST API stage. Can you find the corresponding issue?

@wedgybo
Copy link
Contributor

wedgybo commented Jul 27, 2016

The problem is because the APIGateway:Deployment action DependsOn your first APIGateway:Method creation, which will typically only happen on your first deploy. That condition should be removed.

@pmuens
Copy link
Contributor

pmuens commented Jul 27, 2016

Thanks @wedgybo for pointing that out and providing a solution!

@wedgybo
Copy link
Contributor

wedgybo commented Jul 27, 2016

I think it's masking an issue of order. I don't know enough about cloudformation and how the resource object works to provide a proper solution yet.

If I remove the DependsOn for the deploy. It fails completely because no methods are created. Not sure what's going on as I assume the items in the resource file run in order and that seems to be correct when I output it

{ AWSTemplateFormatVersion: '2010-09-09',
  Description: 'The AWS CloudFormation template for this Serverless application\'s resources outside of Lambdas and Api Gateway',
  Resources:
   { ServerlessDeploymentBucket: { Type: 'AWS::S3::Bucket' },
     IamRoleLambda: { Type: 'AWS::IAM::Role', Properties: [Object] },
     IamPolicyLambda: { Type: 'AWS::IAM::Policy', Properties: [Object] },
     hello: { Type: 'AWS::Lambda::Function', Properties: [Object] },
     crawl: { Type: 'AWS::Lambda::Function', Properties: [Object] },
     walk: { Type: 'AWS::Lambda::Function', Properties: [Object] },
     run: { Type: 'AWS::Lambda::Function', Properties: [Object] },
     RestApiApigEvent: { Type: 'AWS::ApiGateway::RestApi', Properties: [Object] },
     ResourceApigEvent0: { Type: 'AWS::ApiGateway::Resource', Properties: [Object] },
     ResourceApigEvent1: { Type: 'AWS::ApiGateway::Resource', Properties: [Object] },
     ResourceApigEvent2: { Type: 'AWS::ApiGateway::Resource', Properties: [Object] },
     ResourceApigEvent3: { Type: 'AWS::ApiGateway::Resource', Properties: [Object] },
     GetMethodApigEvent0: { Type: 'AWS::ApiGateway::Method', Properties: [Object] },
     GetMethodApigEvent1: { Type: 'AWS::ApiGateway::Method', Properties: [Object] },
     PostMethodApigEvent2: { Type: 'AWS::ApiGateway::Method', Properties: [Object] },
     PostMethodApigEvent3: { Type: 'AWS::ApiGateway::Method', Properties: [Object] },
     DeploymentApigEvent: { Type: 'AWS::ApiGateway::Deployment', Properties: [Object] },
     helloApigPermission: { Type: 'AWS::Lambda::Permission', Properties: [Object] },
     crawlApigPermission: { Type: 'AWS::Lambda::Permission', Properties: [Object] },
     walkApigPermission: { Type: 'AWS::Lambda::Permission', Properties: [Object] },
     runApigPermission: { Type: 'AWS::Lambda::Permission', Properties: [Object] } },
  Outputs:
   { IamRoleArnLambda: { Description: 'ARN of the lambda IAM role', Value: [Object] },
     ServerlessDeploymentBucketName: { Value: [Object] } } }

@eahefnawy
Copy link
Member

hmmm this is probably related to #1551 ... It might be because of DependsOn ... or simply because the Deployment resource doesn't really change so it doesnt reflect any updates. Maybe putting some unique token in the template would make sure each deployment is always updated in CF

@wedgybo
Copy link
Contributor

wedgybo commented Jul 27, 2016

Right I've done some more digging and the depends on just masks a race condition that exists if you have many functions. A few thoughts.

There needs to be a WaitCondition triggered when all the methods have been created which then runs the deploy.

There also needs to be something that checks for method removal and deploys.

Deploy keys need to be unique to force a new deployment.

    const deploymentTemplate = `
      {
        "Type" : "AWS::ApiGateway::Deployment",
        "DependsOn" : "${this.methodDep}",
        "Properties" : {
          "RestApiId" : { "Ref": "RestApiApigEvent" },
          "StageName" : "${this.options.stage}",
          "Description" : "${this.options.stage} created at ${new Date()}"
        }
      }
    `;

    var uniqueDeploymentObject = {};

    uniqueDeploymentObject['DeploymentApigEvent' + (new Date()).getTime()] = JSON.parse(deploymentTemplate);

    const newDeploymentObject = uniqueDeploymentObject;

@codepreneur
Copy link
Member Author

@wedgybo If I do sls remove and then sls deploy the problem persists. Out of 3 endpoints, all 3 get deployed to Resources but only two of them to Stages and it seems that they get selected randomly (for example this time, only first and third got deployed) so it proves its a race condition.

image

and

image

@wedgybo
Copy link
Contributor

wedgybo commented Jul 27, 2016

Yup. I've got the fix for that in a branch. Just trying to find a solution for when endpoints are removed as I can't see a way to trigger a deploy after those events. Another solution could be to hook into the CloudFormation::Stack Resource UPDATE_COMPLETE event but I'm not sure if that's possible. Maybe someone in the know could chip in?

@wedgybo
Copy link
Contributor

wedgybo commented Jul 28, 2016

Played some more last night and I've come to the conclusion that the deploy can't be part of the same stack update as there's no way of hooking into when the stack cleanup has complete, otherwise removed resources aren't actually removed from the stage.

So I would propose that the first updateStack call has the ApiGateway::Deployment resource removed. After a successful update, a second update runs with an identical setup, but includes the Deployment. This way CloudFormation will only work on the Deployment diff and it will capture any deleted resources after the cleanup of the first update.

I'll submit a PR with the changes, but I'm not convinced it's the cleanest way to achieve the above.

Thoughts?

@codepreneur
Copy link
Member Author

we need react's virtual dom diffing for cloudformation too (where we treat cloudformation as dom)

yoitsro pushed a commit to mallzee/serverless that referenced this issue Jul 28, 2016
Applys the first updateStack without the Deployment resource. Waits for that to complete successfully then applys a second updateStack the same as the first with the Deployment resource added.

Adds DeletePolicy as Retain on the Deployment so rollbacks can be made in the API gateway interface.

Closes serverless#1684
wedgybo added a commit to mallzee/serverless that referenced this issue Jul 28, 2016
Applys the first updateStack without the Deployment resource. Waits for that to complete successfully then applys a second updateStack the same as the first with the Deployment resource added.

Adds DeletePolicy as Retain on the Deployment so rollbacks can be made in the API gateway interface.

Closes serverless#1684
@eahefnawy eahefnawy added this to the v1.0.0-beta.2 milestone Aug 1, 2016
@eahefnawy eahefnawy added the bug label Aug 1, 2016
@flomotlik
Copy link
Contributor

Hey guys, so I finally talked to @eahefnawy and he walked me through this. The basic issue that there is with resources getting correctly deployed to AWS, but those not showing up in Stages sounds to me like a bug in AWS.

While there are other bugs we've also worked around in AWS, this one can really only be resolved through hacks (e.g. removing resources and readding them on second deploy) it seems.

So I really want to make sure that we're adding this to resolve a massive problem. If it isn't a huge issue though and just a nuisance we can also put that into our docs and talk to AWS about a better way to resolve this that is cleaner and doesn't lead to a hack in our system (which will I'm sure lead to other issues down the road)

Is the resources not showing up in the Stage section something prohibitive for specific workflows, is it limiting or hurting your infrastructure in some ways or is it simply bad but not a huge deal?

Just want to make sure I fully understand what an impact this has.

Especially considering that we're not really dealing with Stages in APIG at all and only create one per Cloudformation deployment.

@wedgybo
Copy link
Contributor

wedgybo commented Aug 2, 2016

I immediately ran into problems once I'd adding more than one function which is what send me down this path.

There is a quick fix to the race condition, where you can add all the method resources to the deployments DependsOn option. This will help when setting up all the functions, but does not solve the problem when removing functions.

The stage still keeps endpoints around for removed functions.

I've had a quick conversation with one of the guys at the APIGateway team and he's passed on the query to the cloudformation team.

https://twitter.com/sapessi/status/760500226247360512

Will see what the chat is

@flomotlik
Copy link
Contributor

Adding @sapessi in here as well from AWS, so he is aware :D

This problem has also lead to an issue with #1703 as setting new CORS parameters on a MOCK doesn't trigger a new deployment

wedgybo added a commit to mallzee/serverless that referenced this issue Aug 3, 2016
Applys the first updateStack without the Deployment resource. Waits for that to complete successfully then applys a second updateStack the same as the first with the Deployment resource added.

Adds DeletePolicy as Retain on the Deployment so rollbacks can be made in the API gateway interface.

Closes serverless#1684
@austencollins
Copy link
Member

austencollins commented Aug 3, 2016

@flomotlik Users cannot deploy endpoints at all because of this. If you cannot create a deployment, then no changes will be made on your public endpoints.

I'm able to create the first 1 API Gateway deployment, but no further ones. Recreated this issue 3 times w/ two functions, each with their own endpoint. All deployments after the first create the resources in API Gateway, but do not create an API Gateway Deployment. This is definitely not a nuisance, this makes the framework unusable for all any REST API use-case.

@domdivakaruni
Copy link

@ac360 @flomotlik are you creating a new deployment resource each time you update your RestAPI and updating the DeploymentId on your stage to point to the new deployment? Lets get on a call first thing tomorrow and talk thru it. I think you've talked to Reto and/or Chris W before? I am going to get your email contact from them or if you can email me at ddivakar@amazon.com, we can coordinate on a call for tomorrow AM.

@austencollins
Copy link
Member

austencollins commented Aug 4, 2016

@domdivakaruni @flomotlik I didn't code this functionality. But, when I create a stack, an API Gateway REST API staged deployment is created. When I update the stack, all paths and methods are updated, but I see no evidence that a new deployment was created. It's not in the stage's deployment history in the dashboard.

Perhaps it's simply a matter of us not creating a new CF API Gateway Deployment Resource in our updateStack logic.

@eahefnawy @pmuens @flomotlik Can you guys confirm we are creating the deployment resource on stack updates, and if so jump on a call with the AWS gents to figure out the real issue?

@domdivakaruni +100 for awesome customer support :)

@eahefnawy
Copy link
Member

@ac360 @domdivakaruni the issue is that the deployment template we're using (and the one you linked) is always the same with updates. The same rest api, and the same stage regardless of endpoint/methods. So doesn't the fact that it's linked to the updated restapi should mean that it reflects the updates there as well?

Updating the deployment template itself is easy, we can just add a unique token to the deployment id which should make a new deployment resource. But I'm not 100% sure this will solve the issue since again, it's the same template linked to the same stage/restapi.

So I'm not sure if it's something on AWS end, or simply a certain pattern that we should be following with deployment templates. A call with the AWS team should clear things up for sure. Thanks a lot for pitching in @domdivakaruni 😊

@flomotlik
Copy link
Contributor

@domdivakaruni Reaching out in an email to get the meeting set up

@domdivakaruni
Copy link

Awesome. Yea, lets talk thru it. Ill look for your email! @flomotlik

@flomotlik
Copy link
Contributor

@codepreneur @wedgybo Now I finally get the whole picture of this. This is definitely a complete Show Stopper for the API Gateway implementation. Thanks for reporting and getting this started. Will fix as quickly as possible.

@wedgybo
Copy link
Contributor

wedgybo commented Aug 4, 2016

No problem. Hopefully a sensible fix will come out instead of the hack fix PR raised. I get the feeling we have to create the stage manually rather than relying on the deployment doing it.

@flomotlik
Copy link
Contributor

Yup, talking to @domdivakaruni from AWS soon to find a way to resolve this. I'm not completely sure yet if there is something we need to do differently in our Cloudformation template, or if its a bug in AWS. Thanks for all your research @wedgybo so far on this, very helpful

@flomotlik
Copy link
Contributor

As this is fixed as far as we can fix it I think we can close this specific issue. We'll still have to talk to AWS to make sure the removal happens in a clean way, but we can't really do anything about that yet.

@doapp-ryanp
Copy link
Contributor

Is there still discussions going on with AWS about not requiring re-name of CloudFormation APIG deployment on updates? This limitation (seems like a bug in CF) prevents us from being able to use serverless per #2530

Was the decision to not use swagger under the covers to manage APIG based on management simplicity? Or is there some feature not available in swagger that requires CloudFormation.

@flomotlik
Copy link
Contributor

Is there still discussions going on with AWS about not requiring re-name of CloudFormation APIG deployment on updates? This limitation (seems like a bug in CF) prevents us from being able to use serverless per #2530

Yup, though the problem is that this seems like a very very deep seated thing in CF that if you don't change attributes or the name it doesn't get redeployed. Don't think this is going away and we can't even work around it, because even if we somehow detect the times when you did a change to your HTTP endpoint and only deploy then (and not every time) it would still be the same problem.

Was the decision to not use swagger under the covers to manage APIG based on management simplicity? Or is there some feature not available in swagger that requires CloudFormation

It was easier to put everything into CF and ignore Swagger. You can deploy Swagger yourself of course, but we decided against it simply because it would mean another technology we have to support directly and swagger doesn't work well with other CF resources. With CF you can easily reference everything else.

@doapp-ryanp
Copy link
Contributor

Thanks for the quick response @flomotlik . Thats too bad. We will try to go the route of not having sls manage APIG.

I know you guys have put alot of thought into going pure CloudFront, however their track record of not keeping up with features combined with some serious limitations (like discussed in #2387) is a risky bet to go all-in on for new AWS services.

Probably OK as-is for a v1.x release, but something that should probably be discussed for future releases so you are not held hostage by what CloudFront provides and how it behaves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants