Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-creating aws_elasticsearch_domain loses aws_elasticsearch_domain_policy #11188

Open
Ghazgkull opened this issue Dec 6, 2019 · 8 comments
Open
Labels
service/elasticsearch Issues and PRs that pertain to the elasticsearch service.

Comments

@Ghazgkull
Copy link

Ghazgkull commented Dec 6, 2019

I recently made changes to an aws_elasticsearch_domain which triggered terraform to recreate the domain. The domain deletion and creation was successful, but as part of this process the associated aws_elasticsearch_domain_policy was emptied out and the policy was lost. This resulted in the new domain being created in an unusable state.

Applying TF a second time successfully re-created the aws_elasticsearch_domain_policy, but this is a major problem when I push changes to a CI/CD pipeline which expects to only run Terraform once and end up in a functional state.

@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Dec 6, 2019
@Ghazgkull
Copy link
Author

Ghazgkull commented Dec 6, 2019

Some more details on what happened in my case...

The changes I made to the aws_elasticsearch_domain which triggered the domain to be re-recreated were:

  • I changed the machine type from m4.large to t2.small
  • I changed the setting for encryped_at_rest from true to false
  • I changed the number of worker nodes in the cluster from 3 to 1

The Terraform output for the domain itself indicated that it needed to re-create the domain appropriately, but also deleted the access_policies and didn't restore them:

  # aws_elasticsearch_domain.redacted must be replaced
-/+ resource "aws_elasticsearch_domain" "redacted" {
      ~ access_policies       = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "es:*"
                      - Effect    = "Allow"
                      - Principal = {
                          - AWS = "*"
                        }
                      - Resource  = "redacted"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
...

The Terraform output also indicated that it was blowing away my access policies:

  # aws_elasticsearch_domain_policy.redacted will be updated in-place
  ~ resource "aws_elasticsearch_domain_policy" "redacted" {
      ~ access_policies = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "es:*"
                      - Effect    = "Allow"
                      - Principal = {
                          - AWS = "*"
                        }
                      - Resource  = "redacted"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        domain_name     = "redacted"
        id              = "esd-policy-redacted"
    }

  # aws_elasticsearch_domain_policy.main will be updated in-place
  ~ resource "aws_elasticsearch_domain_policy" "main" {
      ~ access_policies = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "es:*"
                      - Effect    = "Allow"
                      - Principal = {
                          - AWS = "*"
                        }
                      - Resource  = "redacted"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        domain_name     = "redacted"
        id              = "esd-policy-redacted"
    }

@Ghazgkull
Copy link
Author

Update that I've been able to consistently reproduce this problem by simply changing the encrypt_at_rest setting of my aws_elasticsearch_domain. Doing this triggers Terraform to destroy and re-create the domain and it consistently get created with an empty access policy.

@Ghazgkull
Copy link
Author

I found a workaround, for anyone else who might encounter this problem. By removing the separate aws_elasticsearch_domain_policy and instead defining the policy inline in the aws_elasticsearch_domain resource, I observe that my domain gets re-created with the correct access policy.

@Ghazgkull
Copy link
Author

There's a flip-side defect associated with my workaround, in the case where I'm not destroying/re-creating my elasticsearch domain. I observe that when I simply move the policy from an external resource to inline, the policy is again completely lost from the domain. So there are two bugs here:

  1. Destroying/re-creating an elasticsearch domain with an external policy resource results in an empty policy.
  2. Updating an elasticsearch domain from an external policy resources to an inline resource results in an empty policy.

@DrFaust92 DrFaust92 added the service/elasticsearch Issues and PRs that pertain to the elasticsearch service. label May 21, 2020
@evandam
Copy link

evandam commented Oct 14, 2020

I'm seeing the same behavior, are there any updates on this by any chance?

I'd much prefer sticking to using the aws_elasticsearch_domain_policy resource since it allows us to reference the aws_elasticsearch_domain in the policy (ex: ${aws_elasticsearch_domain.example.arn}/*).

Thanks!


Small update, I have another workaround for this. Adding a depends_on to the policy seems to work. I had to add it to my aws_iam_policy_document.

data "aws_iam_policy_document" "domain_policy" {
  depends_on = [aws_elasticsearch_domain.elasticache]
  statement {
    ...
  }
}

resource "aws_elasticsearch_domain_policy" "domain_policy" {
  domain_name     = aws_elasticsearch_domain.elasticache.domain_name
  access_policies = data.aws_iam_policy_document.domain_policy.json
}

resource "aws_elasticsearch_domain" "elasticache" {
  ...
}

Maybe this would be better off either documented or the order of running the dependencies can be handled?

@justinretzolk
Copy link
Member

Hey y'all 👋 Thank you for taking the time to open this issue and for the additional discussion around it. Given that there's been a number of AWS provider releases since the last update, can anyone confirm whether you're still experiencing this behavior?

@justinretzolk justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 18, 2021
@micgrivas
Copy link

micgrivas commented Oct 4, 2023

The problems persists.
In our case (using a version from late 2022), the problem appears in both new or recreated clusters/domains and when the policy changes without other changes to the domain.
The Terraform plan output indicated that it would drop the previous and put the new access policy, but the AWS console showed it added none.
In our case, we have two statements in the policy, if that makes any difference.

 # module.supporting-services.module.opensearch["redacted"].aws_elasticsearch_domain_policy.main will be created
+ resource "aws_elasticsearch_domain_policy" "main" {
      + access_policies = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "es:ESHttp*"
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "*"
                        }
                      + Resource  = redacted/*"
                      + Sid       = "http-access"
                    },
                  + {
                      + Action    = "es:ESCrossClusterGet"
                      + Effect    = "Allow"
                      + Principal = {
                          + AWS = "*"
                        }
                      + Resource  = "redacted"
                      + Sid       = "crosscluster"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + domain_name     = "redacted"
      + id              = (known after apply)
    }
    ```

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Oct 4, 2023
@micgrivas
Copy link

micgrivas commented Oct 5, 2023

The depends_on alleviated the problem but not totally solved.
Now, in most cases it works fine, but when the whole cluster is re-created (delete and create), then policy remains empty.
It seems, IMHO, that there is some situation with the order of applying. Possibly, in some cases, the policy is tried before the cluster is ready.
That would explain the solution with depends_on. A probable timeout while deleting/creating whole cluster, would explain why it does not work at cluster recreation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/elasticsearch Issues and PRs that pertain to the elasticsearch service.
Projects
None yet
Development

No branches or pull requests

5 participants