Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy aws_ram_resource_share_accepter from member account when share contains some resource types #19718

Merged

Conversation

kbalk
Copy link
Contributor

@kbalk kbalk commented Jun 8, 2021

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #19319

Output from acceptance testing:

$ TF_ACC=1 go test ./aws -v -count 1 -parallel 1 -run=TestAccAwsRamResourceShareAccept
=== RUN   TestAccAwsRamResourceShareAccepter_basic
=== PAUSE TestAccAwsRamResourceShareAccepter_basic
=== RUN   TestAccAwsRamResourceShareAccepter_disappears
=== PAUSE TestAccAwsRamResourceShareAccepter_disappears
=== RUN   TestAccAwsRamResourceShareAccepter_resource_association
=== PAUSE TestAccAwsRamResourceShareAccepter_resource_association
=== CONT  TestAccAwsRamResourceShareAccepter_basic
--- PASS: TestAccAwsRamResourceShareAccepter_basic (80.79s)
=== CONT  TestAccAwsRamResourceShareAccepter_resource_association
--- PASS: TestAccAwsRamResourceShareAccepter_resource_association (90.39s)
=== CONT  TestAccAwsRamResourceShareAccepter_disappears
--- PASS: TestAccAwsRamResourceShareAccepter_disappears (257.06s)
PASS
ok  	github.com/terraform-providers/terraform-provider-aws/aws	428.329s

I just realized that the comments associated with this PR are probably confusing: The original unit test submitted with this code change brought out a timing issue. As this timing issue was not the result of the code change, a different unit test was created to demonstrate that the code change worked. The timing issue still needs to be addressed and Terraform will need to decide how they want to handle it. The original unit test can be retrieved from the commits and the comments provide more information on the timing problem.

@kbalk kbalk requested a review from a team as a code owner June 8, 2021 19:30
@ghost ghost added size/M Managed by automation to categorize the size of a PR. labels Jun 8, 2021
@github-actions github-actions bot added service/ram Issues and PRs that pertain to the ram service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. needs-triage Waiting for first response or review from a maintainer. labels Jun 8, 2021
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome @kbalk 👋

It looks like this is your first Pull Request submission to the Terraform AWS Provider! If you haven’t already done so please make sure you have checked out our CONTRIBUTING guide and FAQ to make sure your contribution is adhering to best practice and has all the necessary elements in place for a successful approval.

Also take a look at our FAQ which details how we prioritize Pull Requests for inclusion.

Thanks again, and welcome to the community! 😃

@kbalk
Copy link
Contributor Author

kbalk commented Jun 8, 2021

I've made the source code change and modified the unit test for this issue, but the unit test will give the error:

    testing_new.go:63: Error running post-test destroy, there may be dangling resources: exit status 1
        
        Error: error deleting Route53 Resolver Query Log Config (rqlc-9b7a9c6be6d1464e): InvalidRequestException: 
[RSLVR-01203] You cannot delete a query logging configuration that is currently shared. Trace Id: 
"1-60bedf08-4114d75f2a1ac4025cd07549"

A unit test has the ability to check that the "resource of interest" (in this case aws_ram_resource_share_accepter) has been destroyed, but I can't squelch the error message related to the destruction of other related resources. There's an ExpectError() function, but the error doesn't occur where I could catch it. There something about the timing of the destroy that can affect the success or failure of the unit test.

I can silence the above error in resource_aws_route53_resolver_query_log_config_association.go, resourceAwsRoute53ResolverQueryLogConfigDelete(), but I would then have to make the change for each resource in the list provided by AWS (as mentioned in the original issue). I'm not sure those changes would be acceptable. Here's the code snippet I would add (I tested this and it works):

    if isAWSErr(err, "InvalidRequestException", "cannot delete a query logging configuration that is currently shared") {
        return nil
    }     

Basically, I'm not sure how to work around the destroy error.

@kbalk
Copy link
Contributor Author

kbalk commented Jun 11, 2021

@shuheiktgw I was wondering about a change you made in your commit of April 27, 2021 to Fix aws_ram_resource_share_accepter eventual consistency problems. One of those changes involved the creation of the function ResourceShareOwnedBySelfDisassociated() in terraform-provider-aws/aws/internal/service/ram/waiter/waiter.go. The function calls WaitForState() and the StateChangeConf value for "Target" is an empty list.

In my test case for this PR, I'm deleting aws_ram_resource_share_accepter with associated resources, but sometimes waiter.ResourceShareOwnedBySelfDisassociated() returns before those resource are destroyed. I'm thinking it's because the waiter doesn't have Targets to wait on and returns too soon. I could use some advise or help to correct this problem and was wondering if you'd be willing to help?

@kbalk
Copy link
Contributor Author

kbalk commented Jun 11, 2021

If I set the test option "parallel" to 20 instead of 1, the unit test will pass. If I set TF_LOG, the unit test will pass.

@lorengordon
Copy link
Contributor

Hmm, looking like there is some kind of race condition in the aws_ram_resource_association waiter when the query log configuration is shared. Fairly sure that issue with the waiter is behind the error:

        Error: error deleting Route53 Resolver Query Log Config (rqlc-9b7a9c6be6d1464e): InvalidRequestException: 
[RSLVR-01203] You cannot delete a query logging configuration that is currently shared. Trace Id: 
"1-60bedf08-4114d75f2a1ac4025cd07549"

In order to separate that race condition from the issue that this patch is fixing, recommend sharing a Codebuild job in the test instead. Here is an updated config that is working for me when I exercise this patch:

provider "aws" {
  profile = "resource-member"
}

provider "aws" {
  alias   = "owner"
  profile = "resource-owner"
}

resource "aws_ram_resource_share_accepter" "member" {
  share_arn = aws_ram_principal_association.invite.resource_share_arn
}

resource "aws_ram_principal_association" "invite" {
  provider = aws.owner

  principal          = data.aws_caller_identity.member.account_id
  resource_share_arn = aws_ram_resource_share.owner.arn
}

resource "aws_ram_resource_share" "owner" {
  provider = aws.owner

  name                      = local.name
  allow_external_principals = true
}

resource "aws_ram_resource_association" "codebuild" {
  provider = aws.owner

  resource_arn       = aws_codebuild_project.this.arn
  resource_share_arn = aws_ram_resource_share.owner.arn
}

resource "aws_codebuild_project" "this" {
  provider = aws.owner

  name          = local.name
  service_role  = aws_iam_role.codebuild.arn

  source {
    type = "NO_SOURCE"

    buildspec = <<-BUILDSPEC
      version: 0.2
      phases: {}
      BUILDSPEC
  }

  artifacts {
    type = "NO_ARTIFACTS"
  }

  environment {
    compute_type                = "BUILD_GENERAL1_SMALL"
    image                       = "aws/codebuild/standard:5.0"
    type                        = "LINUX_CONTAINER"
  }
}

resource "aws_iam_role" "codebuild" {
  provider = aws.owner

  name = local.name

  assume_role_policy = <<-EOF
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "codebuild.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    EOF
}

resource "aws_iam_role_policy" "codebuild" {
  provider = aws.owner

  role = aws_iam_role.codebuild.name

  policy = <<-POLICY
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Resource": [
            "*"
          ],
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ]
        }
      ]
    }
    POLICY
}

resource "random_string" "this" {
  length  = 6
  upper   = false
  special = false
  number  = false
}

locals {
  name = "tf-test-resource-share-${random_string.this.result}"
}

data "aws_caller_identity" "member" {}

output "ram_share_arn" {
  value = aws_ram_resource_share.owner.arn
}

@lorengordon
Copy link
Contributor

lorengordon commented Jun 14, 2021

Alternatively, putting a simple 10s sleep on destroy between the assocation and query log config appears to stabilize the test when sharing the query log config:

provider "aws" {
  profile = "resource-member"
}

provider "aws" {
  alias   = "owner"
  profile = "resource-owner"
}

resource "aws_ram_resource_share_accepter" "member" {
  share_arn = aws_ram_principal_association.invite.resource_share_arn
}

resource "aws_ram_principal_association" "invite" {
  provider = aws.owner

  principal          = data.aws_caller_identity.member.account_id
  resource_share_arn = aws_ram_resource_share.owner.arn
}

resource "aws_ram_resource_share" "owner" {
  provider = aws.owner

  name                      = local.name
  allow_external_principals = true
}

resource "aws_ram_resource_association" "query_log" {
  provider = aws.owner

  resource_arn       = time_sleep.wait.triggers.query_log_config_arn
  resource_share_arn = aws_ram_resource_share.owner.arn
}

resource "time_sleep" "wait" {
  destroy_duration = "10s"

  triggers = {
    query_log_config_arn = aws_route53_resolver_query_log_config.this.arn
  }
}

resource "aws_route53_resolver_query_log_config" "this" {
  provider = aws.owner

  name            = local.name
  destination_arn = aws_s3_bucket.query_log.arn
}

resource "aws_s3_bucket" "query_log" {
  provider = aws.owner

  force_destroy = true
}

resource "random_string" "this" {
  length  = 6
  upper   = false
  special = false
  number  = false
}

locals {
  name = "tf-test-resource-share-${random_string.this.result}"
}

data "aws_caller_identity" "member" {}

output "ram_share_arn" {
  value = aws_ram_resource_share.owner.arn
}

@kbalk
Copy link
Contributor Author

kbalk commented Jun 15, 2021

Replaced the unit test that demonstrates the timing problem with the destruction of RAM shared resources and instead used a unit test that is not affected by the timing problem.

@kbalk kbalk changed the title [WIP] Destroy aws_ram_resource_share_accepter from member account when share contains some resource types Destroy aws_ram_resource_share_accepter from member account when share contains some resource types Jun 15, 2021
@lorengordon
Copy link
Contributor

I ran the test in a loop to try to confirm this test is not impacted by eventual consistency issues:

$ let errors=0; for i in {1..20}; do echo "Loop = $i"; echo "Errors = $errors"; AWS_PROFILE=aws AWS_ALTERNATE_PROFILE=awsalternate TF_ACC=1 go test ./aws -v -count 1 -parallel 1 -run='TestAccAwsRamResourceShareAccepter_res' -timeout 180m; let errors+=$?; done
...
Loop = 20
Count = 0
=== RUN   TestAccAwsRamResourceShareAccepter_resource_association
=== PAUSE TestAccAwsRamResourceShareAccepter_resource_association
=== CONT  TestAccAwsRamResourceShareAccepter_resource_association
--- PASS: TestAccAwsRamResourceShareAccepter_resource_association (121.61s)
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws       121.719s

@YakDriver YakDriver self-assigned this Jul 8, 2021
@YakDriver YakDriver removed the needs-triage Waiting for first response or review from a maintainer. label Jul 8, 2021
@YakDriver YakDriver force-pushed the b-aws_ram_resource_share_acceptor-disassociate branch from 3fe840f to db357b0 Compare July 8, 2021 12:41
@YakDriver YakDriver modified the milestones: v3.48.0, v3.49.0 Jul 8, 2021
@YakDriver YakDriver modified the milestones: v3.48.0, v3.49.0 Jul 8, 2021
@YakDriver
Copy link
Member

@kbalk Thanks for your contribution! I made a few minor tweaks. Hopefully we'll get this merged today.

Copy link
Member

@YakDriver YakDriver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🎉

Check my minor changes to see some preferred ways.

Output from acceptance tests:

--- PASS: TestAccAwsRamResourceShareAccepter_basic (89.59s)
--- PASS: TestAccAwsRamResourceShareAccepter_resourceAssociation (97.37s)
--- PASS: TestAccAwsRamResourceShareAccepter_disappears (264.03s)

@YakDriver YakDriver merged commit d52bc1d into hashicorp:main Jul 8, 2021
@ewbankkit ewbankkit modified the milestones: v3.49.0, v3.48.0 Jul 8, 2021
@github-actions
Copy link

github-actions bot commented Jul 8, 2021

This functionality has been released in v3.49.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Aug 8, 2021

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/ram Issues and PRs that pertain to the ram service. size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot destroy aws_ram_resource_share_accepter from member account when share contains some resource types
4 participants