Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

[plugin/ecs]: deployment pruning in release step deletes the ALB listener if there is only one targetgroup #3384

Closed
pcjun97 opened this issue May 24, 2022 · 8 comments

Comments

@pcjun97
Copy link

pcjun97 commented May 24, 2022

Describe the bug
If there is only one targetgroup attached to the listener, waypoint will delete the listener during the pruning of old deployments, even if the targetgroup does not belong to the pruned deployments.

During the pruning of deployments in the release step, Waypoint only removes the targetgroup that is saved in the state. However, this is not the case when there is only one targetgroup in the listener, which can be seen in this part of the code (specifically L1206, there is no targetgroup ARN verification):

// If there is only 1 target group, delete the listener
if len(def) == 1 && len(def[0].ForwardConfig.TargetGroups) == 1 {
log.Debug("only 1 target group, deleting listener")
s.Update("Deleting ALB listener (ARN: %q)", state.Arn)
_, err = elbsrv.DeleteListenerWithContext(ctx, &elbv2.DeleteListenerInput{
ListenerArn: listener.ListenerArn,
})
if err != nil {
return status.Errorf(codes.Internal, "failed to delete ALB listener (ARN %q): %s", *listener.ListenerArn, err)
}
s.Update("Deleted ALB Listener")

Steps to Reproduce

  1. Deploy and release a working app, pruning all old deployments (making sure ALB, listener and targetgroup are created and attached).
    waypoint up -app=example -prune-retain=0
  2. Run deploy without releasing to create and attach the targetgroup to the listener.
    waypoint deploy -app=example -release=false
  3. Modify the forward rules by removing all targetgroups, except the one created in step (2). Update the weight of that targetgroup to 100.
  4. Release the app. The listener will get deleted.
    waypoint release -app=example -prune-retain=0

Expected behavior
release step should not remove the ALB listener if the remaining targetgroup does not belong to the pruned deployments.

Waypoint Platform Versions

  • Waypoint CLI Version: 0.8.2
  • Waypoint Server Platform and Version: docker 0.8.2
  • Waypoint Plugin: aws/ecs

Additional context
Bumped into this bug when there were several failing deployments, then the next successful release proceeds to remove the listener. Logs attached below:

» Pruning old deployments...
  Deployment: 01G3GW567BNFB45NXHN44373MW (v259)
-> Running deployment destroy v259
-> Destroying ecs deployment...
-> Gathering deployment resource state
-> Describing load balancer arn:aws:elasticloadbalancing:<masked>
-> Found existing listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Finished gathering resource state
-> Initiating deletion of ALB Listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Describing ALB listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Deregistering this deployment's target group from ALB listener
-> Deregistered this deployment's target group from ALB listener
-> Deleting target group 
-> Deleting service 
-> Deleted service 
-> Finished destroying ECS deployment
  Deployment: 01G3GW4R41RQYWETTTYFCEFY9J (v258)
-> Running deployment destroy v258
  Deployment: 01G3GFWHH9FV2RM5J96D1HXW3J (v255)
-> Running deployment destroy v255
-> Destroying ecs deployment...
-> Gathering deployment resource state
-> Describing load balancer arn:aws:elasticloadbalancing:<masked>
-> Found existing listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Finished gathering resource state
-> Initiating deletion of ALB Listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Describing ALB listener (ARN: "arn:aws:elasticloadbalancing:u<masked>")
-> Deleting ALB listener (ARN: "arn:aws:elasticloadbalancing:<masked>")
-> Deleted ALB Listener
-> Deleting target group 
-> Deleting service 
-> Deleted service 
-> Finished destroying ECS deployment
@pcjun97 pcjun97 added the new label May 24, 2022
@pcjun97 pcjun97 changed the title [plugin/ecs]: release deletes the ALB listener if previous deployment(s) failed to attach a targetgroup to the listener [plugin/ecs]: deployment pruning in release step deletes the ALB listener if there is only one targetgroup May 24, 2022
@evanphx
Copy link
Contributor

evanphx commented Jun 1, 2022

Hi @pcjun97,

Could you tell us more about what the issue with the current behavior is? You do a good job detailing what it currently does, but we're a little unclear why you believe it's the wrong behavior.

Thanks!

@pcjun97
Copy link
Author

pcjun97 commented Jun 1, 2022

Hi @evanphx,

I believe Waypoint should not remove the listener if the only target group attached to it does not belongs to the deployment it is pruning.

I have seen Waypoint removing the listener when I was testing with v0.8.2. I had some failing deployments, and after fixing the errors in the config, Waypoint was able to deploy and release the app, attaching the target group to the existing listener. It then proceeds to remove the only listener during the pruning of deployments in the release stage, leaving the load balancer with no listeners, despite the latest deployment and release was successful.

@evanphx
Copy link
Contributor

evanphx commented Jun 1, 2022

Ah ok! So you've got target groups on the listener that waypoint isn't managing and thusly it's making the wrong decision. We thought perhaps that was the case. This code makes too many assumptions about listener ownership currently (which would mean it knows all target groups are waypoint managed) and so we need to do some refactoring to fix this sort of bug in general.

@pcjun97
Copy link
Author

pcjun97 commented Jun 1, 2022

Hmm, the listener is created and managed by waypoint when I bumped into the bug.

I am aware of the assumptions made by this plugin, but for this case I do have a small PR that prevents it from deleting the listener by checking the target group's arn first: #3385

@evanphx
Copy link
Contributor

evanphx commented Jun 1, 2022

I saw the PR, I guess the question is under what circumstances would the target group in the listener not be one that was previously created?

@pcjun97
Copy link
Author

pcjun97 commented Jun 2, 2022

Hey, thanks for the swift reply! Sorry it took quite a while. After some investigations, aside from the manual editing of the listener mentioned in the original report, the other condition I found that would trigger the bug is quite specific: it happens when the deploy step was able to create the target group, but was not able to attach it to the listener.

This can be replicated by specifying a wrong subnets in the alb block, which will trigger the following error:
(We believe this is what caused our first encounter of the bug, as we were upgrading from an older version of Waypoint and missed out the subnets field)

-> Initiating ALB creation                                                                                                                                     
-> No ALB listener specified - looking for listeners for ALB "waypoint-ecs-httpbin"                                                                            
-> Using existing ALB Listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Modifying ALB Listener to introduce target group                            
! failed to create deployment resources: 2 errors occurred:                    
        * rpc error: code = Internal desc = failed to introduce new target group to                                                                            
  existing ALB listener: InvalidConfigurationRequest: The following target groups                                                                              
  are in a different VPC than load balancer                                                                                                                    
  'arn:aws:elasticloadbalancing:<masked>:loadbalancer/app/waypoint-ecs-httpbin/df437a1c701d8952':                                                
  arn:aws:elasticloadbalancing:<masked>:targetgroup/httpbin-01G4J0VJB0TJSW3Z01CKG0DQ/77062e3912af8953                                            
        status code: 400, request id: 1991a38b-0c4e-4851-a936-a5f80a5576f6                                                                                                                                                                                                                                                    
        * Error during rollback: 1 error occurred:                                                                                                             
        * argument cannot be satisfied: type: ecs.WorkspaceDestroy. This is a bug in                                                                           
  the go-argmapper library since this shouldn't happen at this point.

During the next successful release, purging of deployments will remove the listener:

» Pruning old deployments...
  Deployment: 01G4J0P6VYR39D60KT8Q0NW9MF (v33)
-> Running deployment destroy v33
-> Destroying ecs deployment...
-> Deleting service httpbin-01G4J0P7FSYZ4NR83GQ35082
-> Deleted service httpbin-01G4J0P7FSYZ4NR83GQ35082
-> Initiating deletion of ALB Listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Describing ALB listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Deregistering this deployment's target group from ALB listener
-> Deregistered this deployment's target group from ALB listener
-> Deleting target group httpbin-01G4J0P7FSYZ4NR83GQ35082
-> Finished destroying ECS deployment
  Deployment: 01G4J0JGTYE5QZE82JC47R1V1S (v32)
-> Running deployment destroy v32
-> Destroying ecs deployment...
-> Deleting service httpbin-01G4J0JHEF1470Z6XZYMP2W4
-> Deleted service httpbin-01G4J0JHEF1470Z6XZYMP2W4
-> Initiating deletion of ALB Listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Describing ALB listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Deleting ALB listener (ARN: "arn:aws:elasticloadbalancing:<masked>:listener/app/waypoint-ecs-httpbin/df437a1c701d8952/cf9741838e09cc9a")
-> Deleted ALB Listener
-> Deleting target group httpbin-01G4J0JHEF1470Z6XZYMP2W4
-> Finished destroying ECS deployment

I understand the mentioned condition is caused by a faulty configuration due to human errors (also possibly, unexpected interruptions like network issues, process getting killed, etc), and not Waypoint's logic. But while I expect my deployments to fail for arbitrary reasons, I do not expect Waypoint to surprise me by removing the listener that has a working service attached to it.

@evanphx
Copy link
Contributor

evanphx commented Jun 2, 2022

Thanks for all the info @pcjun97!

This makes much more sense now! Yeah, I agree that Waypoint is being confusing based on these conditions, we'll look to make it much more predictable around this sort of thing.

@pcjun97
Copy link
Author

pcjun97 commented Jul 31, 2023

Fixed in #4742

@pcjun97 pcjun97 closed this as completed Jul 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants