-
Notifications
You must be signed in to change notification settings - Fork 329
[plugin/ecs]: deployment pruning in release
step deletes the ALB listener if there is only one targetgroup
#3384
Comments
release
deletes the ALB listener if previous deployment(s) failed to attach a targetgroup to the listenerrelease
step deletes the ALB listener if there is only one targetgroup
Hi @pcjun97, Could you tell us more about what the issue with the current behavior is? You do a good job detailing what it currently does, but we're a little unclear why you believe it's the wrong behavior. Thanks! |
Hi @evanphx, I believe Waypoint should not remove the listener if the only target group attached to it does not belongs to the deployment it is pruning. I have seen Waypoint removing the listener when I was testing with v0.8.2. I had some failing deployments, and after fixing the errors in the config, Waypoint was able to deploy and release the app, attaching the target group to the existing listener. It then proceeds to remove the only listener during the pruning of deployments in the release stage, leaving the load balancer with no listeners, despite the latest deployment and release was successful. |
Ah ok! So you've got target groups on the listener that waypoint isn't managing and thusly it's making the wrong decision. We thought perhaps that was the case. This code makes too many assumptions about listener ownership currently (which would mean it knows all target groups are waypoint managed) and so we need to do some refactoring to fix this sort of bug in general. |
Hmm, the listener is created and managed by waypoint when I bumped into the bug. I am aware of the assumptions made by this plugin, but for this case I do have a small PR that prevents it from deleting the listener by checking the target group's arn first: #3385 |
I saw the PR, I guess the question is under what circumstances would the target group in the listener not be one that was previously created? |
Hey, thanks for the swift reply! Sorry it took quite a while. After some investigations, aside from the manual editing of the listener mentioned in the original report, the other condition I found that would trigger the bug is quite specific: it happens when the deploy step was able to create the target group, but was not able to attach it to the listener. This can be replicated by specifying a wrong subnets in the
During the next successful release, purging of deployments will remove the listener:
I understand the mentioned condition is caused by a faulty configuration due to human errors (also possibly, unexpected interruptions like network issues, process getting killed, etc), and not Waypoint's logic. But while I expect my deployments to fail for arbitrary reasons, I do not expect Waypoint to surprise me by removing the listener that has a working service attached to it. |
Thanks for all the info @pcjun97! This makes much more sense now! Yeah, I agree that Waypoint is being confusing based on these conditions, we'll look to make it much more predictable around this sort of thing. |
Fixed in #4742 |
Describe the bug
If there is only one targetgroup attached to the listener, waypoint will delete the listener during the pruning of old deployments, even if the targetgroup does not belong to the pruned deployments.
During the pruning of deployments in the
release
step, Waypoint only removes the targetgroup that is saved in the state. However, this is not the case when there is only one targetgroup in the listener, which can be seen in this part of the code (specificallyL1206
, there is no targetgroup ARN verification):waypoint/builtin/aws/ecs/platform.go
Lines 1205 to 1217 in 02952e2
Steps to Reproduce
100
.Expected behavior
release
step should not remove the ALB listener if the remaining targetgroup does not belong to the pruned deployments.Waypoint Platform Versions
0.8.2
docker 0.8.2
aws/ecs
Additional context
Bumped into this bug when there were several failing deployments, then the next successful release proceeds to remove the listener. Logs attached below:
The text was updated successfully, but these errors were encountered: