Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Error destroying aws_ssoadmin resources #33337

Closed
novekm opened this issue Sep 6, 2023 · 15 comments · Fixed by #34751
Closed

[Bug]: Error destroying aws_ssoadmin resources #33337

novekm opened this issue Sep 6, 2023 · 15 comments · Fixed by #34751
Assignees
Labels
bug Addresses a defect in current functionality. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. service/ssoadmin Issues and PRs that pertain to the ssoadmin service.
Milestone

Comments

@novekm
Copy link
Contributor

novekm commented Sep 6, 2023

Terraform Core Version

1.5.2

AWS Provider Version

5.15.0

Affected Resource(s)

aws_ssoadmin_permission_set

Expected Behavior

Successful destroy of resources

Actual Behavior

Failure/Error: Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account 123456789012.This is related to a fix that was merged in v5.14 but the issue persists. The only way to get past this error is to re-run terraform destroy a second time.

This led me to think through the possibility of adding a retry, as it seems terraform is attempting to destroy a resource that is already destroyed. Adjusting the new timeouts block has no effect, as the error occurs within 60sec in my testing.

Taking a look deeper into the permission_set.go file for the resource I found this:

Existing Code

if tfawserr.ErrCodeEquals(err, ssoadmin.ErrCodeResourceNotFoundException) {
		return diags
	}
if err != nil {
		return sdkdiag.AppendErrorf(diags, "deleting SSO Permission Set (%s): %s", permissionSetARN, err)
	}

From the docs about retries, it appears that this could be modified to retry if these errors occur instead of just returning the error message. I believe it could look something like this:

Potential New Code

if tfawserr.ErrCodeEquals(err, ssoadmin.ErrCodeResourceNotFoundException) {
		return retry.RetryableError(err, ssoadmin.ErrCodeResourceNotFoundException)
	}
if err != nil {
		return retry.RetryableError(diags, "deleting SSO Permission Set (%s): %s", permissionSetARN, err)
	}

I'd like to try to implement and submit the PR for the fix for this, as it seems it's been open for a while and multiple customers are having this issue. It is also a blocker for a module I created and am trying to release that manages AWS IAM Identity Center resources. I just haven't worked with retry logic in terraform before. Happy for any guidance on testing/implementing this fix.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_ssoadmin_permission_set" "pset" {
  for_each = var.permission_sets
  name     = each.key
  instance_arn     = local.ssoadmin_instance_arn
  description      = lookup(each.value, "description", null)
  relay_state      = lookup(each.value, "relay_state", null)      // (Optional) URL used to redirect users within the application 
  during the federation authentication process
  session_duration = lookup(each.value, "session_duration", null) // The length of time that the application user sessions are 
  valid in the ISO-8601 standard
  tags             = lookup(each.value, "tags", {})

  timeouts {
    update = "10m"
  }
}

Steps to Reproduce

  1. terraform apply
  2. terraform destroy

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

#23585

Would you like to implement a fix?

Yes

@novekm novekm added the bug Addresses a defect in current functionality. label Sep 6, 2023
@github-actions github-actions bot added the service/ssoadmin Issues and PRs that pertain to the ssoadmin service. label Sep 6, 2023
@github-actions
Copy link

github-actions bot commented Sep 6, 2023

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 6, 2023
@justinretzolk justinretzolk added prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. and removed needs-triage Waiting for first response or review from a maintainer. labels Sep 7, 2023
@novekm
Copy link
Contributor Author

novekm commented Sep 9, 2023

@justinretzolk I just submitted a PR for this that I think should fix it. Can someone review it when they get a chance?

@jar-b
Copy link
Member

jar-b commented Nov 17, 2023

Hey @novekm - The error message you shared in the issue body is a failure which can only be present during an update operation for the aws_ssoadmin_permission_set resource.

Here is the function in which the waiting for SSO Permission Set message is constructed:

func provisionPermissionSet(ctx context.Context, conn *ssoadmin.SSOAdmin, permissionSetARN, instanceARN string, timeout time.Duration) error {
input := &ssoadmin.ProvisionPermissionSetInput{
InstanceArn: aws.String(instanceARN),
PermissionSetArn: aws.String(permissionSetARN),
TargetType: aws.String(ssoadmin.ProvisionTargetTypeAllProvisionedAccounts),
}
output, err := conn.ProvisionPermissionSetWithContext(ctx, input)
if err != nil {
return fmt.Errorf("provisioning SSO Permission Set (%s): %w", permissionSetARN, err)
}
if _, err := waitPermissionSetProvisioned(ctx, conn, instanceARN, aws.StringValue(output.PermissionSetProvisioningStatus.RequestId), timeout); err != nil {
return fmt.Errorf("waiting for SSO Permission Set (%s) provision: %w", permissionSetARN, err)
}
return nil
}

And this is only referenced once during the update operation here:

// Re-provision ALL accounts after making the above changes
if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutUpdate)); err != nil {
return sdkdiag.AppendFromErr(diags, err)
}

Grepping through all of the SSO Admin resources it does look like some others (boundary_attachment, permission_set_inline_policy, customer_managed_policy_attachment, and managed_policy_attachment) call this function during delete operations, so its possible the fix proposed in #33384 is still valid, but needs to be applied to a different resource.

% rg provisionPermissionSet
internal/service/ssoadmin/permissions_boundary_attachment.go
122:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
184:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/permission_set_inline_policy.go
96:     if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
161:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/customer_managed_policy_attachment.go
108:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
175:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/managed_policy_attachment.go
101:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutCreate)); err != nil {
163:    if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {

internal/service/ssoadmin/permission_set.go
215:            if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutUpdate)); err != nil {
290:func provisionPermissionSet(ctx context.Context, conn *ssoadmin.SSOAdmin, permissionSetARN, instanceARN string, timeout time.Duration) error {

Are you able to provide a more complete configuration and/or logs to determine which resource this is failing on during the destroy step?

@novekm
Copy link
Contributor Author

novekm commented Nov 17, 2023

Hi @jar-b, thanks for taking a look into this! Like mentioned, the error listed above appears when running terraform destroy. I have just re-created the issue again in my account.

For context, I am using a TF module I created, but I believe the issue persists whether or not I use the module. It also appears others are having the same issue. I'm not sure if they are using a module or not, but it is not my module since it is not public yet, so a module-specific issue can likely be ruled out.

Here's my main.tf:

module "aws-iam-identity-center" {
  source = "./modules/aws-iam-identity-center" // local example

  // Create desired GROUPS in IAM Identity Center
  sso_groups = {
    Admin : {
      group_name        = "Admin"
      group_description = "Admin IAM Identity Center Group"
    },
    Dev : {
      group_name        = "Dev"
      group_description = "Dev IAM Identity Center Group"
    },
    QA : {
      group_name        = "QA"
      group_description = "QA IAM Identity Center Group"
    },
    Audit : {
      group_name        = "Audit"
      group_description = "Audit IAM Identity Center Group"
    },
  }

  // Create desired USERS in IAM Identity Center
  sso_users = {
    NarutoUzumaki : {
      group_membership = ["Admin", "Dev", "QA", "Audit"]
      user_name        = "nuzumaki"
      given_name       = "Naruto"
      family_name      = "Uzumaki"
      email            = "nuzumaki@hiddenleaf.village"
    },
    SasukeUchiha : {
      group_membership = ["QA", "Audit"]
      user_name        = "suchiha"
      given_name       = "Sasuke"
      family_name      = "Uchiha"
      email            = "suchiha@hiddenleaf.village"
    },
  }

  // Create permissions sets backed by AWS managed policies
  permission_sets = {
    AdministratorAccess = {
      description          = "Provides AWS full access permissions.",
      session_duration     = "PT4H", // how long until session expires - this means 4 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/AdministratorAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
    ViewOnlyAccess = {
      description          = "Provides AWS view only permissions.",
      session_duration     = "PT3H", // how long until session expires - this means 3 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
  }

  // Assign users/groups access to accounts with the specified permissions
  account_assignments = {
    Admin : {
      principal_name  = "Admin"                                   // name of the user or group you wish to have access to the account(s)
      principal_type  = "GROUP"                                   // entity type (user or group) you wish to have access to the account(s)
      permission_sets = ["AdministratorAccess", "ViewOnlyAccess"] // permissions the user/group will have in the account(s)
      account_ids = [                                             // account(s) the group will have access to. Permissions they will have in account are above line
        local.account1_account_id,                                // locals are used to allow for global changes to multiple account assignments
        # local.account2_account_id, // if hard coding the account ids, you would need to change them in every place you want to change
        # local.account3_account_id, // these are defined in a locals.tf file, example is in this directory
        # local.account4_account_id,
      ]
    },
    Audit : {
      principal_name  = "Audit"            // name of the user or group you wish to have access to the account(s)
      principal_type  = "GROUP"            // entity type (user or group) you wish to have access to the account(s)
      permission_sets = ["ViewOnlyAccess"] // permissions the user/group will have in the account(s)
      account_ids = [                      // account(s) the group will have access to. Permissions they will have in account are above line
        local.account1_account_id,         // locals are used to allow for global changes to multiple account assignments
        # local.account2_account_id, // if hard coding the account ids, you would need to change them in every place you want to change
        # local.account3_account_id, // these are defined in a locals.tf file, example is in this directory
        # local.account4_account_id,
      ]
    },
  }

}

1. terraform apply - Apply completes successfully.

Apply complete! Resources: 19 added, 0 changed, 0 destroyed.

2. terraform destroy - fails, here is error message:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx1) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 
╵
╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx2) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 

I am not sure why the error mentions a "provision" when the resources are being destroyed. The only way to resolve the error currently is to run terraform destroy a second time. To address this in the PR I submitted, I added a retry logic if this error appears at all, since running terraform destroy a second time resolves it consistently.

Here is my TF_LOG="ERORR" output, let me know if the "DEBUG" would be more helpful::

2023-11-17T03:39:01.685-0500 [ERROR] provider.terraform-provider-aws_v5.26.0_x5: Response contains error diagnostic: diagnostic_summary="waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account xxxxxxxxxxxx." tf_proto_version=5.4 tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=xxx-xxx-xxx-xxx-xxx@caller=github.com/hashicorp/terraform-plugin-go@v0.19.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_detail= tf_resource_type=aws_ssoadmin_managed_policy_attachment @module=sdk.proto diagnostic_severity=ERROR tf_rpc=ApplyResourceChange timestamp=2023-11-17T03:39:01.685-0500
2023-11-17T03:39:01.693-0500 [ERROR] vertex "module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy[\"ViewOnlyAccess.arn:aws:iam::aws:policy/job-function/ViewOnlyAccess\"] (destroy)" error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-72236a571cf03aa7/ps-baff34dc81e0f1c4) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account xxxxxxxxxxxx.
2023-11-17T03:39:01.715-0500 [ERROR] provider.terraform-provider-aws_v5.26.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_detail= diagnostic_severity=ERROR tf_proto_version=5.4 tf_resource_type=aws_ssoadmin_managed_policy_attachment @caller=github.com/hashicorp/terraform-plugin-go@v0.19.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_summary="waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found." tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=xxx-xxx-xxx-xxx-xxx tf_rpc=ApplyResourceChange timestamp=2023-11-17T03:39:01.715-0500
2023-11-17T03:39:01.721-0500 [ERROR] vertex "module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy[\"AdministratorAccess.arn:aws:iam::aws:policy/AdministratorAccess\"] (destroy)" error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.

3. After running terraform destroy a second time, it destroys the permission sets successfully:

Plan: 0 to add, 0 to change, 2 to destroy.
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]: Destruction complete after 0s
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]: Destruction complete after 0s

Destroy complete! Resources: 2 destroyed.

As another note, after re-applying and destroying again, this time only a single error message appears, instead of two errors as listed above:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Assignment not found.
│ 
│ 
╵

After a second terraform destroy again it succeeds, but mentions that 2 resources were destroyed:

Destroy complete! Resources: 2 destroyed.

Perhaps in addition to using the retry logic I added, the order the in which resources are destroyed could also be modified? Looking at the plan for my destroy, the permission sets are always the last items in the list. Maybe if the permissions sets were deleted first, then the error would likely not appear either? Let me know if you need any more detail. Thanks!

@jar-b
Copy link
Member

jar-b commented Nov 17, 2023

Thanks for the extra detail. Inspecting the error logs, it looks like the two failing resource types during the first destroy are aws_ssoadmin_managed_policy_attachment (the ViewOnlyAccess and AdministratorAccess items specifically).

Since it's reaching the waiter step, this implies the detachment of the managed policies is successful, but the provisioning step which occurs after is failing.

_, err = conn.DetachManagedPolicyFromPermissionSetWithContext(ctx, input)
if tfawserr.ErrCodeEquals(err, ssoadmin.ErrCodeResourceNotFoundException) {
return diags
}
if err != nil {
return sdkdiag.AppendErrorf(diags, "detaching Managed Policy (%s) from SSO Permission Set (%s): %s", managedPolicyARN, permissionSetARN, err)
}
// Provision ALL accounts after detaching the managed policy.
if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {
return sdkdiag.AppendFromErr(diags, err)
}

If you inspect the state after the first destroy (do not refresh), I'm guessing you'll see that both the permission set resources AND the managed policy attachment resources are still present. When the second terraform destroy is run, the state is refreshed prior to execution, and the read operation detects the managed policy attachments no longer exist and removes them from state. This happens before presenting the plan, which is why only the two permission set resources remain and go through cleanly.

policy, err := FindManagedPolicy(ctx, conn, managedPolicyARN, permissionSetARN, instanceARN)
if !d.IsNewResource() && tfresource.NotFound(err) {
log.Printf("[WARN] SSO Managed Policy Attachment (%s) not found, removing from state", d.Id())
d.SetId("")
return diags
}

Seeing the full resource definition and terraform state list output at each step could confirm these assumptions. If this is the root cause, I suspect the provisioning step inside managed_policy_attachment.go is what actually needs to be adjusted to properly handle situations where the underlying permission set or instance ARN no longer exist.

@novekm
Copy link
Contributor Author

novekm commented Nov 17, 2023

Thanks for the additional context, that makes sense. Upon checking terraform.tfstate after the first destroy, you are correct that I still see resources there. The resources I see are:

  • aws_ssoadmin_instance - data source
  • aws_ssoadmin_managed_policy_attachment - ViewOnlyAccess
  • aws_ssoadmin_permission_set AdministratorAccess
  • aws_ssoadmin_permission_set ViewOnlyAccess

Running terraform state list also shows the following:

❯ terraform state list
module.aws-iam-identity-center.data.aws_ssoadmin_instances.sso_instance
module.aws-iam-identity-center.aws_ssoadmin_managed_policy_attachment.pset_aws_managed_policy["ViewOnlyAccess.arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["AdministratorAccess"]
module.aws-iam-identity-center.aws_ssoadmin_permission_set.pset["ViewOnlyAccess"]

What is odd to me is that it appears the AdminstratorAccess managed policy attachment is not present, so likely was deleted I'm assuming. However the other managed policy attachment (ViewOnlyAccess) is present, as well as the permission sets for both.

I will take a look at managed_policy_attachment.go but I am also more than welcome to feedback to update my PR to address what you have identified to be the root cause.

@jar-b
Copy link
Member

jar-b commented Nov 17, 2023

Can you share the resource definitions in your module? The managed policy attachment and permission set resources would be most helpful, along with any other resources they reference. Or an equivalent standalone configuration that produces the same result is fine if you prefer not to share the module content at this time.

@novekm
Copy link
Contributor Author

novekm commented Nov 17, 2023

Sure, here they are:

aws_ssoadmin_managed_policy_attachment:

resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  # iterate over the permission_sets map of maps, and set the result to be pset_name and pset_index
  # ONLY if the policy for each pset_index is valid.
  for_each = { for pset in local.pset_aws_managed_policy_maps : "${pset.pset_name}.${pset.policy_arn}" => pset }

  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = each.value.policy_arn
  permission_set_arn = aws_ssoadmin_permission_set.pset[each.value.pset_name].arn
}

aws_ssoadmin_permission_set:

# - SSO Permission Set -
resource "aws_ssoadmin_permission_set" "pset" {
  for_each = var.permission_sets
  name = each.key

  # lookup function retrieves the value of a single element from a map, when provided it's key.
  # if the given key does not exist, the default value (null) is returned instead

  instance_arn     = local.ssoadmin_instance_arn
  description      = lookup(each.value, "description", null)
  relay_state      = lookup(each.value, "relay_state", null) // (Optional) URL used to redirect users within the application during the federation authentication process
  session_duration = lookup(each.value, "session_duration", null) // The length of time that the application user sessions are valid in the ISO-8601 standard
  tags             = lookup(each.value, "tags", {})
}

locals.tf:

# - Permission Sets and Policies -
locals {
  # - Fetch SSO Instance ARN and SSO Instance ID -
  ssoadmin_instance_arn = tolist(data.aws_ssoadmin_instances.sso_instance.arns)[0]
  sso_instance_id = tolist(data.aws_ssoadmin_instances.sso_instance.identity_store_ids)[0]

  # Iterate over the objects in var.permission sets, then evaluate the expression's 'pset_name'
  # and 'pset_index' with 'pset_name' and 'pset_index' only if the pset_index.managed_policies (AWS Managed Policy ARN)
  # produces a result without an error (i.e. if the ARN is valid). If any of the ARNs for any of the objects
  # in the map are invalid, the for loop will fail.

  # pset_name is the attribute name for each permission set map/object
  # pset_index is the corresponding index of the map of maps (which is the variable permission_sets)
  aws_managed_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.aws_managed_policies) }
  customer_managed_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.customer_managed_policies) }

  #  ! NOT CURRENTLY SUPPORTED !
  # inline_policy_permission_sets = { for pset_name, pset_index in var.permission_sets : pset_name => pset_index if can(pset_index.inline_policy) }



  # When using the 'for' expression in Terraform:
  # [ and ] produces a tuple
  # { and } produces an object, and you must provide two result expressions separated by the => symbol
  # The 'flatten' function takes a list and replaces any elements that are lists with a flattened sequence of the list contents

  # create pset_name and managed policy maps list. flatten is needed because the result is a list of maps.name
  # This nested for loop will run only if each of the managed_policies are valid ARNs.

  # - AWS Managed Policies -
  pset_aws_managed_policy_maps = flatten([
    for pset_name, pset_index in local.aws_managed_permission_sets : [
      for policy in pset_index.aws_managed_policies : {
        pset_name  = pset_name
        policy_arn = policy
      } if pset_index.aws_managed_policies != null && can(pset_index.aws_managed_policies)
    ]
  ])

  # - Customer Managed Policies -
  pset_customer_managed_policy_maps = flatten([
    for pset_name, pset_index in local.customer_managed_permission_sets : [
      for policy in pset_index.customer_managed_policies : {
        pset_name  = pset_name
        policy_name = policy
        # path = path
      } if pset_index.customer_managed_policies != null && can(pset_index.customer_managed_policies)
    ]
  ])

  #  ! NOT CURRENTLY SUPPORTED !
  # - Inline Policy -
  #   pset_inline_policy_maps = flatten([
  #     for pset_name, pset_index in local.inline_policy_permission_sets : [
  #       for policy in pset_index.inline_policy : {
  #         pset_name  = pset_name
  #         inline_policy = policy
  #         # path = path
  #       } if pset_index.inline_policy != null && can(pset_index.inline_policy)
  #     ]
  #   ])

}

I can also create a new standalone configuration and post that here if needed

@jar-b
Copy link
Member

jar-b commented Nov 17, 2023

Thanks - A minimal configuration would be helpful as it can be re-used for an acceptance test.

@novekm
Copy link
Contributor Author

novekm commented Nov 17, 2023

Minimal configuration:

# Fetch existing SSO Instance
data "aws_ssoadmin_instances" "sso_instance" {}

locals {
  # - Fetch SSO Instance ARN and SSO Instance ID -
  ssoadmin_instance_arn = tolist(data.aws_ssoadmin_instances.sso_instance.arns)[0]
  sso_instance_id       = tolist(data.aws_ssoadmin_instances.sso_instance.identity_store_ids)[0]
}

#  Create IAM IDC Group
resource "aws_identitystore_group" "example" {

  identity_store_id = local.sso_instance_id
  display_name      = "Admin"
  description       = "Admin Group"
}

# Create IAM IDC User
resource "aws_identitystore_user" "example" {
  identity_store_id = local.sso_instance_id
  display_name      = "Naruto Uzumaki"
  user_name         = "nuzumaki"
  name {
    given_name  = "Naruto"
    family_name = "Uzumaki"
  }
  emails {
    value   = "nuzumaki@hokage.village"
    primary = true
  }
}

# Create IAM IDC Group Membership
resource "aws_identitystore_group_membership" "sso_group_membership" {
  identity_store_id = local.sso_instance_id
  group_id  = aws_identitystore_group.example.group_id
  member_id = aws_identitystore_user.example.user_id
}

# Create Permission Set
resource "aws_ssoadmin_permission_set" "example" {
  name = "ExamplePermissionSet"
  instance_arn     = local.ssoadmin_instance_arn
  description      = "ExamplePermissionSet"
  session_duration = "PT3H"
}

# Create Managed Policy Attachment
resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
  permission_set_arn = aws_ssoadmin_permission_set.example.arn
}

# Create Account Assignment
resource "aws_ssoadmin_account_assignment" "account_assignment" {
  instance_arn       = local.ssoadmin_instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.example.arn

  principal_id   = aws_identitystore_group.example.group_id
  principal_type = "GROUP"

  target_id   = "000000000000"
  target_type = "AWS_ACCOUNT"
}

1. terraform apply:

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

2. terraform destroy error:

╷
│ Error: waiting for SSO Permission Set (arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx) provision: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: Received a 404 status error: Permission set provision not found in AWS account 000000000000.
│ 
│ 
╵

3. re-run terraform destroy:

Plan: 0 to add, 0 to change, 1 to destroy.
aws_ssoadmin_permission_set.example: Destroying... [id=arn:aws:sso:::permissionSet/ssoins-xxx/ps-xxx,arn:aws:sso:::instance/ssoins-xxx]
aws_ssoadmin_permission_set.example: Destruction complete after 0s

Destroy complete! Resources: 1 destroyed.

Same issue is happening with simplified configuration as well.

@jar-b jar-b self-assigned this Nov 27, 2023
@jar-b
Copy link
Member

jar-b commented Nov 27, 2023

Thanks @novekm - I was able to reproduce with the configuration above.

Reproduction and Cause

My current understanding of the issue is that the deletion of both the managed policy attachment and account assignment simultaneously causes problems when the Delete operation of the policy attachment attempts to re-provision the permission set:

// Provision ALL accounts after detaching the managed policy.
if err := provisionPermissionSet(ctx, conn, permissionSetARN, instanceARN, d.Timeout(schema.TimeoutDelete)); err != nil {
return sdkdiag.AppendFromErr(diags, err)
}

Because the account assignment no longer exists, the provision step fails with an error like:

Received a 404 status error: Permission set provision not found in AWS account 012345678901.

Solution

I was able to resolve this by creating an explicit dependency between the two resources using the depends_on meta argument. You can add this to either resource (but not both) and destroy should complete in one pass. Here is an example of the modified managed policy attachment resource.

resource "aws_ssoadmin_managed_policy_attachment" "pset_aws_managed_policy" {
  depends_on = [aws_ssoadmin_account_assignment.account_assignment]

  instance_arn       = local.ssoadmin_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
  permission_set_arn = aws_ssoadmin_permission_set.example.arn
}

Because of this explicit dependency, the destroy operation will completely destroy aws_ssoadmin_managed_policy_attachment before beginning destruction of the account assignment (the inverse of the apply order). This allows the re-provisioning of the permission set to complete successfully before the account assignment is removed. More importantly, this means a clean destroy in one pass 👍 .

Provider Impact

At this time I'd propose not make provider side changes to ignore or retry this particular error. This appears to be a function of the relationship between the account assignment and managed policy attachment when destruction of both is triggered simultaneously. The meaning of this error could change depending on the combination of resources being destroyed, so suppressing it could result in incorrect behavior under other conditions. Resolution of the issue with an explicit depends_on argument also factors in, as it allows the impacted configuration to function correctly with no provider changes.

Please let us know if you have any concerns resolving the original issue with this approach.

@novekm
Copy link
Contributor Author

novekm commented Nov 27, 2023

Thanks @jar-b for the detailed response! I will try the adding depends_on and see if it also works on my end, will keep you posted. If all works, I'll submit a PR with an update to the docs for the resource that explicitly lists this current limitation/the current resolution.

@jar-b jar-b added the waiting-response Maintainers are waiting on response from community or contributor. label Nov 30, 2023
@novekm
Copy link
Contributor Author

novekm commented Dec 5, 2023

Hi @jar-b! Sorry for the delay, last week was quite busy with re:Invent :) I have tested your recommendation and can confirm it resolves the error for me. I have tested both with the simplified configuration I posted above, and also within a module I created. The two affected resources are indeed aws_ssoadmin_managed_policy_attachment and aws_ssoadmin_account_assignment - the other policy attachments seemed to work fine without adding the depends_on meta-argument.

I have created a PR - # 34751 that updates the public docs for these resources, adding clear documentation on the error and resolution. I have submitted many docs for the AWSCC provider, but this is my first for the AWS provider. It seems is uses a different format/structure in the repo. Let me know if the PR needs to be updated. Thanks again for the help resolving this! This will help many customers.

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Dec 5, 2023
@github-actions github-actions bot added this to the v5.30.0 milestone Dec 6, 2023
@github-actions github-actions bot removed the bug Addresses a defect in current functionality. label Dec 7, 2023
Copy link

github-actions bot commented Dec 7, 2023

This functionality has been released in v5.30.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

github-actions bot commented Jan 7, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2024
@justinretzolk justinretzolk added the bug Addresses a defect in current functionality. label Feb 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. service/ssoadmin Issues and PRs that pertain to the ssoadmin service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants