ecs service creation fails when using newly created iam policy #2869

mvandiest · 2015-07-28T02:55:23Z

I have what appears to be a timing issue when attempting to create a iam role/security policy and immediately use it as the iam_role of a new ecs service.

I get the following aws error in terraform:

* InvalidParameterException: Unable to assume role and validate the listeners configured on your load balancer.  Please verify the role being passed has the proper permissions.
    status code: 400, request id: []

If I specify a pre-existing iam role with an identical policy everything works fine.

I am using the following config:

provider "aws" {
  region = "${var.aws_region}"
}

resource "aws_ecs_cluster" "cluster" {
  name = "${var.exp_name}-${var.exp_version}"
}

resource "aws_ecs_service" "publicapi" {
  name = "publicapi"
  cluster = "${aws_ecs_cluster.cluster.id}"
  task_definition = "${aws_ecs_task_definition.publicapi.arn}"
  desired_count = 3
  iam_role = "${aws_iam_role.ecs_servicerole.arn}"

  load_balancer {
    elb_name = "${aws_elb.adminapi_elb.id}"
    container_name = "publicapi"
    container_port = 8081
  }
}

resource "template_file" "publicapi_task_definition" {
    filename = "${path.module}/task-definitions/publicapi.json.tpl"

    vars {
        version = "${var.exp_version}"
    }
}

resource "aws_ecs_task_definition" "publicapi" {
  family = "publicapi"
  container_definitions = "${template_file.publicapi_task_definition.rendered}"
}

resource "aws_iam_role_policy" "policy" {
    name = "policy"
    role = "${aws_iam_role.ecs_servicerole.id}"
    policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticloadbalancing:Describe*",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "ec2:Describe*",
        "ec2:AuthorizeSecurityGroupIngress"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
EOF
}

resource "aws_iam_role" "ecs_servicerole" {
    name = "ecs_servicerole"
    assume_role_policy = <<EOF
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

# Create a new load balancer
resource "aws_elb" "adminapi_elb" {
  name = "adminapielb"
  availability_zones = ["us-east-1b", "us-east-1c"]

  listener {
    instance_port = 8081
    instance_protocol = "http"
    lb_port = 80
    lb_protocol = "http"
  }

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 2
    timeout = 3
    target = "HTTP:8081/"
    interval = 30
  }

  cross_zone_load_balancing = true
  idle_timeout = 400
  connection_draining = true
  connection_draining_timeout = 400

  tags {
    Name = "adminapi_elb"
  }
}

The text was updated successfully, but these errors were encountered:

philp · 2015-07-29T09:57:31Z

I'm experiencing almost exactly the same problem detailed here.

catsby · 2015-08-20T20:13:01Z

Hello – I believe you are correct, this is a timing issue. It takes a few seconds for permissions to propagate through AWS:

Important
After you create an IAM role, it may take several seconds for the permissions to propagate. If your first attempt to launch an instance with a role fails, wait a few seconds before trying again. For more information, see Troubleshooting Working with Roles in the Using IAM guide.

source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

Unfortunately the API doesn't give us any kind of status to base this on.
Can you confirm for me that a follow up plan & apply (or just apply) is sufficient for things to go thru?

philp · 2015-08-21T07:48:18Z

I can confirm that in my scenario, a new apply does get things working s expected.

mvandiest · 2015-08-21T13:35:15Z

I know it's hacky, but would it make sense to add an artificial delay in execution for the policy dependent step? Seeing that the API gives you no feedback I don't see another option.

This is a pretty big deal for automated deployments. Running a failed step twice is not really something that should be happening in a CI system.

philp · 2015-08-21T14:10:36Z

Or would it be possible to poll the API after the role/policy has been created, until it can be described, at which point move on to the next step of the plan?

catsby · 2015-08-21T16:40:58Z

Or would it be possible to poll the API after the role/policy has been created, until it can be described, at which point move on to the next step of the plan?

I don't think an immediate follow-up describe would result in a failure; IIRC it's in the API, just not propagated to all the other AWS parts. But of course I'll try it out....

I know it's hacky, but would it make sense to add an artificial delay in execution for the policy dependent step?

Might have to do this... I'll try the describe thing first though

radeksimko · 2015-08-23T17:05:13Z

I did try to call Describe immediately after creating the IAM policy and API replied as I expected (unfortunately) - i.e. IAM policy exists. It was not fully propagated at that time though.

Therefore I submitted #3061 which just retries ECS service create calls. It took about 2 secs when I was testing it (effectively 3 retries after 500ms).

mvandiest · 2015-08-27T04:12:02Z

Thx @radeksimko

cordoval · 2015-12-12T03:33:27Z

This problem still occurs for me.

InvalidParameterException: Unable to assume role ...

radeksimko · 2015-12-15T10:17:53Z

@cordoval This could be caused either by naively low timeout (2 mins atm) or strong inconsistency as described here: #3928

Would you mind creating a new issue & attaching debug log (minus any secrets, of course)? Then we would at least know at which point did the error occur. The outcome can be either the solution described in the linked issue #3928 or timeout increase.

cordoval · 2015-12-15T11:11:43Z

I added a depends_on like in another ticket and it seems to behave better now, i get less errors. I actually forget now once i pass a certain point. I think i will start associating errors with commits that way i can go back and reproduce.

Thanks though for now.

tiyberius · 2015-12-16T20:35:51Z

@cordoval When you added a depends_on, I'm assuming that you added it to the IAM role that gets assigned to the load balancer?

cordoval · 2015-12-16T20:39:46Z

depends_on = ["aws_iam_role_policy.ecs_service_role_policy"]

on the aws_ecs_service resource block

tiyberius · 2015-12-16T20:41:09Z

Cool, thanks! And you said you were still getting errors, but just less frequently?

cordoval · 2015-12-16T21:02:18Z

not anymore, not of this type at least.

sheeley · 2016-01-14T07:23:36Z

I've been running terraform apply with this setup:

resource "template_file" "iam_elb_role" {
  template = "${file("policies/iam_elb_role.json")}"
  vars = {
    elb1 = "arn:aws:elasticloadbalancing:${var.region}:${var.account_id}:loadbalancer/${aws_elb.api.name}"
    elb2 = "arn:aws:elasticloadbalancing:${var.region}:${var.account_id}:loadbalancer/${aws_elb.ui.name}"
  }
}

resource "aws_iam_role_policy" "elb" {
    name = "test_policy"
    role = "${aws_iam_role.ecs_role.id}"
    policy = "${template_file.iam_elb_role.rendered}"
}

resource "aws_ecs_service" "api" {
  name = "lumen-api"
  cluster = "${aws_ecs_cluster.api.id}"
  task_definition = "${aws_ecs_task_definition.api.arn}"
  desired_count = 3
  iam_role = "${aws_iam_role.ecs_role.arn}"
  /*
  if this says "Unable to assume role and validate the listeners", it is likely a timeout:
  https://github.com/hashicorp/terraform/issues/2869
  not sure why it isn't actually fixed, given the bug was closed so long ago.
  */
  depends_on = ["aws_iam_role_policy.elb", "aws_s3_bucket_object.config"]

  load_balancer {
    elb_name = "${aws_elb.api.id}"
    container_name = "lumen-api"
    container_port = 80
  }
}

I'm running into these timeouts regularly. This file has been creating other resources that sometimes manage to increase the timeout to a point where it works, but often I see failures as mentioned above


* aws_ecs_service.api: InvalidParameterException: Unable to assume role and validate the listeners configured on your load balancer.  Please verify the role being passed has the proper permissions.
    status code: 400, request id: ...

Let me know how I can help continue to debug!

radeksimko · 2016-01-14T08:05:35Z

@sheeley this is (unfortunately) a known issue related to eventually-consistent IAM.
What you're describing is described already in #4375 I believe.

Hopefully #4447 and following PRs will address this.

sheeley · 2016-01-25T19:52:47Z

@radeksimko thanks for the info!

sheeley · 2016-01-25T23:05:47Z

@radeksimko Is it possible there's a separate issue? I've gone ahead and created my IAM role in a previous terraform run, so it already exists. When I run terraform plan, I see it is only trying to create 2 clusters:

+ aws_ecs_service.api
    cluster:                                 "" => "arn:aws:ecs:us-east-1:{acct-id}:cluster/lumen-api"
    desired_count:                           "" => "3"
    iam_role:                                "" => "arn:aws:iam::{acct-id}:role/lumen_ecs_role"
    load_balancer.#:                         "" => "1"
    load_balancer.3516934612.container_name: "" => "lumen-api"
    load_balancer.3516934612.container_port: "" => "80"
    load_balancer.3516934612.elb_name:       "" => "lumen-api-elb"
    name:                                    "" => "lumen-api"
    task_definition:                         "" => "arn:aws:ecs:us-east-1:{acct-id}:task-definition/lumen-api:5"

+ aws_ecs_service.ui
    cluster:                                 "" => "arn:aws:ecs:us-east-1:{acct-id}:cluster/lumen-ui"
    desired_count:                           "" => "3"
    iam_role:                                "" => "arn:aws:iam::{acct-id}:role/lumen_ecs_role"
    load_balancer.#:                         "" => "1"
    load_balancer.2643330267.container_name: "" => "lumen-ui"
    load_balancer.2643330267.container_port: "" => "80"
    load_balancer.2643330267.elb_name:       "" => "lumen-ui-elb"
    name:                                    "" => "lumen-ui"
    task_definition:                         "" => "arn:aws:ecs:us-east-1:{acct-id}:task-definition/lumen-ui:1"

I continue to get the InvalidParameterException. However, when I simulate the role (lumen_elb_role_policy) through the AWS UI, I see all passing. Could there be some additional issue? Should I follow up with a new GH issue?

sheeley · 2016-01-26T00:25:08Z

totally my fault. figured out a policy issue.

ghost · 2020-04-28T02:18:03Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added bug provider/aws labels Jul 28, 2015

philp mentioned this issue Jul 29, 2015

IAM role not recognised when creating aws_ecs_service #2880

Closed

catsby added the waiting-response An issue/pull request is waiting for a response from the community label Aug 20, 2015

radeksimko removed the waiting-response An issue/pull request is waiting for a response from the community label Aug 23, 2015

radeksimko mentioned this issue Aug 23, 2015

Various ECS bugfixes (IAM, destroy timeout) #3061

Merged

radeksimko closed this as completed in fad019e Aug 25, 2015

radeksimko mentioned this issue Dec 27, 2015

helper: Add ContinuousTargetOccurence to work around inconsistency #4447

Merged

russmac mentioned this issue Mar 3, 2016

0.6.12 "Unable to assume role and validate the listeners configured on your load balancer. Please verify the role being passed has the proper permissions." #5442

Closed

elodani mentioned this issue Jul 13, 2017

interpolation does not wait for resources to complete creation? #15536

Closed

frncmx mentioned this issue Jul 16, 2017

ECR repository policy creation fails when using newly created IAM policy hashicorp/terraform-provider-aws#1164

Closed

bmcustodio pushed a commit to bmcustodio/terraform that referenced this issue Sep 26, 2017

Add Zyborg.Vault PowerShell module to libs list (hashicorp#2869)

cc22ace

suaaa7 mentioned this issue Oct 28, 2019

エラー原因の調査と改善 suaaa7/aws-ml-batch#12

Closed

ghost locked and limited conversation to collaborators Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ecs service creation fails when using newly created iam policy #2869

ecs service creation fails when using newly created iam policy #2869

mvandiest commented Jul 28, 2015

philp commented Jul 29, 2015

catsby commented Aug 20, 2015

philp commented Aug 21, 2015

mvandiest commented Aug 21, 2015

philp commented Aug 21, 2015

catsby commented Aug 21, 2015

radeksimko commented Aug 23, 2015

mvandiest commented Aug 27, 2015

cordoval commented Dec 12, 2015

radeksimko commented Dec 15, 2015

cordoval commented Dec 15, 2015

tiyberius commented Dec 16, 2015

cordoval commented Dec 16, 2015

tiyberius commented Dec 16, 2015

cordoval commented Dec 16, 2015

sheeley commented Jan 14, 2016

radeksimko commented Jan 14, 2016

sheeley commented Jan 25, 2016

sheeley commented Jan 25, 2016

sheeley commented Jan 26, 2016

ghost commented Apr 28, 2020

ecs service creation fails when using newly created iam policy #2869

ecs service creation fails when using newly created iam policy #2869

Comments

mvandiest commented Jul 28, 2015

philp commented Jul 29, 2015

catsby commented Aug 20, 2015

philp commented Aug 21, 2015

mvandiest commented Aug 21, 2015

philp commented Aug 21, 2015

catsby commented Aug 21, 2015

radeksimko commented Aug 23, 2015

mvandiest commented Aug 27, 2015

cordoval commented Dec 12, 2015

radeksimko commented Dec 15, 2015

cordoval commented Dec 15, 2015

tiyberius commented Dec 16, 2015

cordoval commented Dec 16, 2015

tiyberius commented Dec 16, 2015

cordoval commented Dec 16, 2015

sheeley commented Jan 14, 2016

radeksimko commented Jan 14, 2016

sheeley commented Jan 25, 2016

sheeley commented Jan 25, 2016

sheeley commented Jan 26, 2016

ghost commented Apr 28, 2020