ECS service parameter wait_for_ready_state can lead to inconsistent new deployments #16012

maximelenair · 2020-11-04T12:04:56Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform core version: 0.13.3
Terraform AWS provider version: 3.13.0

Affected Resource(s)

aws_ecs_service

Terraform Configuration Files

resource "aws_ecs_service" "my_service" {
  name          = "my-service"
  cluster       = "my-cluster"
  desired_count = 1

  launch_type             = "FARGATE"
  enable_ecs_managed_tags = true
  propagate_tags          = "SERVICE"

  deployment_maximum_percent         = 200
  deployment_minimum_healthy_percent = 50
  wait_for_steady_state              = true

  network_configuration {
    subnets         = ["subnet-*******", "subnet-*******", "subnet-*******"]
    security_groups = ["sg-********"]
  }

  service_registries {
    registry_arn = "arn:aws:servicediscovery:*******:********:service/srv-*********"
  }

  task_definition = "my-service:10"
  tags            = local.common_tags
}

Expected Behavior

When using the wait_for_steady_state parameter during the creation of a service, we have multiple service / task status possible:

Both the service and the task are in a healthy state, terraform apply is successful
The service has an issue and terraform apply fails after a timeout (ie, the specific Docker image does not exist)
The service is running but the task has a consistent issue preventing it from passing its initial health check and terraform apply fails after a timeout (ie, a container fails to start because of a missing environment variable)

Actual Behavior

In both case 1. and 2., the actual behaviour follows the expected behavior.
For case 3., the actual behavior is inconsistent given the same Terraform configuration.

Example on 5 different tests (deploy/destroy without any configuration change):

Test 1: Success - after 2 minutes 30
Test 2: Success - after 7 minutes 40
Test 3: Success - after 1 minutes 20
Test 4: Failure - timeout after 10 minutes
Test 5: Success - after 5 minutes 20

Steps to Reproduce

Create a Docker container that will fail on startup
Create an ECS service using the wait_for_steady_state parameter
Create and destroy the resource multiple times

References

Add wait_for_steady_state attribute to aws_ecs_service #3485

The text was updated successfully, but these errors were encountered:

jacob-israel-turner · 2020-11-23T19:45:42Z

Assuming the provider is using ecs.wait under the hood - I ran into a similar issue (outside of Terraform) on a project. ecs.wait will only wait for 10 minutes before failing out, while occasionally ECS deployments can take up to 15 minutes to reach a steady state. We solved this locally by calling ecs.wait twice in a row, in case the first timed out. We haven't run into this issue since.

straygar · 2021-03-24T10:00:14Z

This is super painful. We have to choose between:

Fire and forget with the Terraform update -> verify the deployment in the ECS service
Have flaky deployments, that take close to 10min or more

Are there any workaround we can use for now, until this is fixed?

nickfaughey · 2021-05-07T20:39:36Z

If it fits your app's architecture, you can look into lowering deregistration_delay, which defaults to 5 minutes and may eat up half of this hardcoded 10 minute wait time.

We intended to replace wait_for_steady_state to be replaced with a custom shell script in commit 6a4fac2. We found the shell script was buggy and difficult to maintain across local (macOS) and CI/CD (ubuntu) environments. Instead, we'll replace wait_for_steady_state. If flakiness continues to be an issue, we'll investigate reducing the deregistration_delay on the ALB to allow services to reach a steady state more quickly, as suggested here hashicorp/terraform-provider-aws#16012 (comment) Co-authored-by: Olly Swanson <olly.swanson95@gmail.com>

anGie44 · 2022-04-19T00:17:08Z

Hi @maximelenair , some changes to handling of wait_for_steady_state were recently released in v4.10.0 (edit: and more recently in v4.13.0), so if you are able to upgrade to the latest version and give it a go, it's possible this particular issue you are seeing has been addressed as well. If you or anyone following this issue are able to provide feedback after upgrading, it would be much appreciated!

Relates #24223

github-actions · 2022-07-08T01:48:48Z

This functionality has been released in v4.22.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions · 2022-08-07T02:36:51Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the service/ecs Issues and PRs that pertain to the ecs service. label Nov 4, 2020

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 4, 2020

bill-rich added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 4, 2020

This was referenced Apr 16, 2021

ECS service wait_for_steady_state doesn't work #18329

Closed

Introduce custom timeout when waiting for aws_ecs_service to reach a steady state #18868

Closed

gdavison mentioned this issue Jul 1, 2022

resource/aws_ecs_service: Add customizable timeouts for Create and Update #25641

Merged

gdavison closed this as completed in #25641 Jul 4, 2022

github-actions bot added this to the v4.22.0 milestone Jul 4, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECS service parameter wait_for_ready_state can lead to inconsistent new deployments #16012

ECS service parameter wait_for_ready_state can lead to inconsistent new deployments #16012

maximelenair commented Nov 4, 2020

jacob-israel-turner commented Nov 23, 2020

straygar commented Mar 24, 2021

nickfaughey commented May 7, 2021

anGie44 commented Apr 19, 2022 •

edited

Loading

github-actions bot commented Jul 8, 2022

github-actions bot commented Aug 7, 2022

ECS service parameter wait_for_ready_state can lead to inconsistent new deployments #16012

ECS service parameter wait_for_ready_state can lead to inconsistent new deployments #16012

Comments

maximelenair commented Nov 4, 2020

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

References

jacob-israel-turner commented Nov 23, 2020

straygar commented Mar 24, 2021

nickfaughey commented May 7, 2021

anGie44 commented Apr 19, 2022 • edited Loading

github-actions bot commented Jul 8, 2022

github-actions bot commented Aug 7, 2022

anGie44 commented Apr 19, 2022 •

edited

Loading