Destroy aws_ecs_service.service on Fargate gets stuck #3414

varas · 2018-02-16T10:47:30Z

Destroy gets stuck on resource aws_ecs_service on Fargate until you manually stop all the tasks.

Terraform Version

Terraform v0.11.3

provider.aws v1.9.0
provider.template v1.0.0

Affected Resource(s)

Please list the resources as a list, for example:

aws_ecs_service

Terraform Configuration Files

resource "aws_ecs_service" "service" {
  name            = "..."
  cluster         = "${aws_ecs_cluster.cluster.id}"
  task_definition = "${aws_ecs_task_definition.task.arn}"
  desired_count   = 1
  health_check_grace_period_seconds = 1

  load_balancer = {
    target_group_arn = "${aws_alb_target_group.main.arn}"
    container_name   = "..."
    container_port   = 5555
  }

  launch_type = "FARGATE"

  network_configuration {
    security_groups = ["${aws_security_group.awsvpc_sg.id}"]
    subnets         = ["${module.vpc.private_subnets}"]
  }

  depends_on = ["aws_alb_listener.main"]
}

Debug Output

aws_ecs_service.service: Still destroying... (ID: arn:aws:ecs:us-east-1:218277271359:service/blink, 10s elapsed)
aws_ecs_service.service: Still destroying... (ID: arn:aws:ecs:us-east-1:218277271359:service/blink, 20s elapsed)
...

Expected Behavior

In order to destroy the Fargate ECS tasks it should stop all the service tasks.

Actual Behavior

I gets stuck trying to destroy the resource.

Steps to Reproduce

Simple launch a Fargate cluster using launch_type = "FARGATE"

terraform apply
terraform destroy

The text was updated successfully, but these errors were encountered:

rnemec-ng · 2018-11-25T02:50:28Z

Is this going to be looked into? Are there any workarounds? (preferably without manual intervention)
Thxnks

marcotesch · 2019-03-17T16:40:13Z

The ecs_service resource delete operation still does a draining of tasks within a service.

This might not be an open issue anymore? @bflad ?

nwade615 · 2019-03-21T18:16:01Z

This is happening to me, as well. The CLI gets stuck on aws_ecs_service.api: Still destroying.... In the AWS console, the ECS service appears destroyed, but the running tasks remain. Strangely, it only happens with one of my Fargate services, not all of them. I must manually stop the tasks in the console for the destroy to continue.

Terraform v0.11.13
provider.aws v2.2.0

bavibm · 2019-08-14T14:46:56Z

Hello all, I'm also getting this issue with Terraform v0.12.6 and AWSProvider v2.23.0

This is my ECS configuration, excluding load balancer and other network-related resources (replacing details with "X"):

# ecs.tf

resource "aws_ecs_cluster" "X" {
  name = var.name_prefix
}

resource "aws_ecs_task_definition" "X" {
  family                   = "${var.name_prefix}-X"
  execution_role_arn       = "arn:aws:iam::X:role/ecsTaskExecutionRole"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]

  cpu                   = 1024
  memory                = 2048
  container_definitions = file("${path.module}/task-definitions/X.json")

}

resource "aws_ecs_service" "X" {
  name            = "${var.name_prefix}-X"
  cluster         = aws_ecs_cluster.X.id
  task_definition = aws_ecs_task_definition.X.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    security_groups = [aws_security_group.X.id]
    subnets         = [var.service_subnet_id]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.X.id
    container_name   = "X"
    container_port   = var.X
  }

  depends_on = [aws_lb_listener.X]
}

Once it starts to destroy my ecs resources, it hangs at aws_ecs_service.X still destroying...
I have to manually go into the ECS management console and stop the running tasks in this service, cancel my Terraform destroy, and re-issue the command for it to work.

I am currently looking into using the local-exec provisioner to execute AWS CLI commands on the destroy stage for the service resource in order to automatically stop all tasks running in it as a workaround.

bavibm · 2019-08-15T14:05:13Z

So I managed to get the aforementioned workaround working for my specific case, I created a shell script that gets executed by Terraform to stop the EC2 task before destroying the service so that it won't get stuck. This requires the AWS CLI to be installed and configured on the same machine.

Here is what it looks like:

#!/usr/bin/env bash

if [ -z ${REGION} ] || [ -z ${CLUSTER} ] || [ -z ${SERVICE} ]; then
  echo "Please specify a region, cluster name, and service name..."
  exit 1
fi

if ! [ -x "$(command -v aws)" ]; then
  echo "The AWS CLI not installed..."
  >&2
  exit 1
else
  echo "AWS CLI found!"
fi

aws ecs list-tasks \
  --region ${REGION} \
  --cluster ${CLUSTER} \
  --service-name ${SERVICE} \
  --output text \
  >$(dirname $0)/.tasks

IFS=$'\n'
arns=($(awk '/TASKARNS/ {print $2}' $(dirname $0)/.tasks))

rm $(dirname $0)/.tasks

Copy and paste the script somewhere in your module or root such as in {module}/scripts/stop-tasks.sh and inside your ecs service resource add the local-exec provisioner so it looks something like this:

resource "aws_ecs_service" "X" {

  ...

  provisioner "local-exec" {
    when = "destroy"
    command = "${path.module}/scripts/stop-tasks.sh > ${path.module}/scripts/stop-tasks.out"
    environment = {
      REGION = var.region,
      CLUSTER = aws_ecs_cluster.X.name,
      SERVICE = aws_ecs_service.X.name
    }
  }
}

I haven't tested it in other situations, but feel free to use and modify at your leisure! I hope this issue gets fixed soon

sethhochberg · 2020-11-18T23:32:38Z

Following up with another possible workaround, for any who need it. We took inspiration from @bavibm's solution and implemented a destroy provisioner on the cluster resource which stops all tasks, idles the service, and waits for things to reach a state where the cluster itself can be destroyed.

The important part of your script:

SERVICES="$(aws ecs list-services --cluster "${CLUSTER}" | grep "${CLUSTER}" || true | sed -e 's/"//g' -e 's/,//')"
for SERVICE in $SERVICES ; do
  # Idle the service that spawns tasks
  aws ecs update-service --cluster "${CLUSTER}" --service "${SERVICE}" --desired-count 0

  # Stop running tasks
  TASKS="$(aws ecs list-tasks --cluster "${CLUSTER}" --service "${SERVICE}" | grep "${CLUSTER}" || true | sed -e 's/"//g' -e 's/,//')"
  for TASK in $TASKS; do
    aws ecs stop-task --task "$TASK"
  done

  # Delete the service after it becomes inactive
  aws ecs wait services-inactive --cluster "${CLUSTER}" --service "${SERVICE}"
  aws ecs delete-service --cluster "${CLUSTER}" --service "${SERVICE}"
done

Your cluster definition:

resource "aws_ecs_cluster" "whatevername" {
  name = "whatever_cluster_name"

  provisioner "local-exec" {
    when = destroy
    command = "${path.module}/scripts/stop-tasks.sh"
    environment = {
      CLUSTER = self.name
    }
  }
}

Because of hashicorp/terraform#23679, we are only relying on self references to pass into the cleanup task, and discover the rest based on the cluster data available via the AWS CLI. Our AWS profile and region are set via other configuration on the host that executes the script.

bclabs-kylian · 2022-09-15T05:25:11Z

this happened to me as well. i had to delete ECS security group from the RDS security group manually.

moazzamk · 2022-11-14T00:51:27Z

This is happening to me. If I try to delete the security group manually (through Amazon console), it says it is being used by a network interface. If I try to delete the network interface, it says it is being used by the security group.

davidbudnick · 2024-02-12T06:43:25Z

Still happening any solution?

module.ecs.aws_ecs_service.keep_ui_service_staging: Still destroying... [id=arn:aws:ecs:us-east-1:905418292571:serv...luster-staging/keep-ui-service-staging, 1m0s elapsed]
module.ecs.aws_ecs_service.keep_ui_service_staging: Still destroying... [id=arn:aws:ecs:us-east-1:905418292571:serv...luster-staging/keep-ui-service-staging, 1m10s elapsed]

(It hit almost 6 mins before I manually killed the job)

Manually required to run:
terraform state rm module.ecs.aws_ecs_service.keep_ui_service_staging

davidbudnick · 2024-02-12T08:43:40Z

Update:

Seems as the team is aware of the issue and have suggested adding a depends_on for the policy:
REF: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service

I was able to get it working here without having to add any extra scripts. Finished in around 2m40s. (Could be related to the timeout of the container while draining)

Edit: Looks as per the docs:

The following target group attributes are supported. You can modify these attributes only if the target group type is instance or ip. If the target group type is alb, these attributes always use their default values.
RE: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html

Therefore it will be 300 seconds by default but the bonus is the resource deletes without having to manually stop the job 🥳

Overall if you don't add the depends_on for the policy it will never finish.

Most likely the issue can be closed 📕

javierguzman · 2024-04-22T12:10:20Z

I "fixed" this by setting the desired count to zero, similar to what others as previously done:

provisioner "local-exec" {
    when = destroy
    command = <<EOF
    echo "Update service desired count to 0 before destroy."
    REGION=${split(":", self.cluster)[3]}
    aws ecs update-service --region $REGION --cluster ${self.cluster} --service ${self.name} --desired-count 0 --force-new-deployment
    echo "Update service command executed successfully."
    EOF
  }

  timeouts {
    delete = "5m"
  }

I guess this should be done automatically by the provider.

bflad added service/ecs Issues and PRs that pertain to the ecs service. bug Addresses a defect in current functionality. and removed service/ecs Issues and PRs that pertain to the ecs service. labels Feb 21, 2018

ismail-s mentioned this issue Sep 9, 2019

RT-179 AWS exemplar deployment nhsconnect/integration-adaptors#77

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destroy aws_ecs_service.service on Fargate gets stuck #3414

Destroy aws_ecs_service.service on Fargate gets stuck #3414

varas commented Feb 16, 2018

rnemec-ng commented Nov 25, 2018

marcotesch commented Mar 17, 2019

nwade615 commented Mar 21, 2019

bavibm commented Aug 14, 2019 •

edited

bavibm commented Aug 15, 2019 •

edited

sethhochberg commented Nov 18, 2020 •

edited

bclabs-kylian commented Sep 15, 2022

moazzamk commented Nov 14, 2022

davidbudnick commented Feb 12, 2024 •

edited

davidbudnick commented Feb 12, 2024 •

edited

javierguzman commented Apr 22, 2024 •

edited

Destroy aws_ecs_service.service on Fargate gets stuck #3414

Destroy aws_ecs_service.service on Fargate gets stuck #3414

Comments

varas commented Feb 16, 2018

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

rnemec-ng commented Nov 25, 2018

marcotesch commented Mar 17, 2019

nwade615 commented Mar 21, 2019

bavibm commented Aug 14, 2019 • edited

bavibm commented Aug 15, 2019 • edited

sethhochberg commented Nov 18, 2020 • edited

bclabs-kylian commented Sep 15, 2022

moazzamk commented Nov 14, 2022

davidbudnick commented Feb 12, 2024 • edited

davidbudnick commented Feb 12, 2024 • edited

javierguzman commented Apr 22, 2024 • edited

bavibm commented Aug 14, 2019 •

edited

bavibm commented Aug 15, 2019 •

edited

sethhochberg commented Nov 18, 2020 •

edited

davidbudnick commented Feb 12, 2024 •

edited

davidbudnick commented Feb 12, 2024 •

edited

javierguzman commented Apr 22, 2024 •

edited