Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to destroy azurerm_key_vault and associated azurerm_key_vault_access_policy (30 minute timeout) #10707

Closed
mikegouldthorp opened this issue Feb 23, 2021 · 17 comments · Fixed by #10931
Assignees
Milestone

Comments

@mikegouldthorp
Copy link

mikegouldthorp commented Feb 23, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform = 0.12.29
azurerm = 2.48.0

Affected Resource(s)

  • resource "azurerm_key_vault"
  • resource "azurerm_key_vault_access_policy"

Terraform Configuration Files

resource "azurerm_key_vault" "main" {
  name                        = "keyvaultname"
  location                    = var.location
  enabled_for_disk_encryption = true
  soft_delete_retention_days  = 90
  purge_protection_enabled    = null
  tenant_id                   = data.azurerm_client_config.current.tenant_id

  sku_name = "standard"

  network_acls {
    default_action = "Allow"
    bypass         = "AzureServices"
  }
}

resource "azurerm_key_vault_access_policy" "kv_readers_policy" {
  key_vault_id = azurerm_key_vault.main.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = data.azuread_group.key_vault_readers.id

  key_permissions = [
    "get",
    "list"
  ]

  secret_permissions = [
    "get",
    "list"
  ]

  storage_permissions = [
    "get",
    "list"
  ]

  certificate_permissions = [
    "get",
    "getissuers",
    "list",
    "listissuers"
  ]
}

Debug Output

Panic Output

Expected Behavior

When destroying a Key Vault the associated Key Vault Access Policies are destroyed as well.

Actual Behavior

When destroying a Key Vault the (de)provisioning process fails after 30 minutes.

Error: failed waiting for Key Vault Access Policy (Object ID: "XXXXXXXXXXXXXXXXXXXXXXXX") to apply: timeout while waiting for state to become 'notfound' (last state: 'found', timeout: 30m0s)

Error: Error updating Access Policy (Object ID "XXXXXXXXXXXXXXXXXXXXXXXX" / Application ID "") for Key Vault "key_vault_name" (Resource Group "resource_group_name"): keyvault.VaultsClient#UpdateAccessPolicy: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded

Steps to Reproduce

  1. terraform destroy

Important Factoids

This issue appears to be relatively new. The destroy process on the Key Vault and associated Key Vault Access Policies worked with the following Terraform and Azurerm versions...

Terraform = 0.12.29
azurerm = 2.46.0

References

  • #0000
@mikegouldthorp
Copy link
Author

I found that re-running terraform destroy a second time the provision process continues fails with the same error as before but this attempt the Key Vault and the Key Vault Access Policy resources were removed from Azure.

A subsequent terraform plan confirms that there are "no infrastructure changes".

@cdobinsky
Copy link

cdobinsky commented Feb 25, 2021

I experience the same issue but only since azurerm 2.48.0

I tested it with 2.46.1 and 2.47.0 where it destroys the policy just fine but when I try to use 2.48.0 the timeout occurs for the policy.

Everything tested with Terraform 0.13.6 and 0.14.7, creation and deletion was with the same azurerm Version

@muellermatthias
Copy link

muellermatthias commented Feb 25, 2021

If you update the permissions according to the changes in #10593, then the example will work with 2.48.0.

Creating the access policy with a version prior to 2.48.0 and then trying to delete it with 2.48.0 and updated permissions will still fail.

@surlypants
Copy link

surlypants commented Feb 25, 2021

cross referenced here:
https://discuss.hashicorp.com/t/destroying-azure-key-vault-with-policies-and-or-secrets/21251/4

this is a significant breaking regression

@cdobinsky
Copy link

cdobinsky commented Feb 26, 2021

Thanks muellermatthias for the info. I changed the permissions in my access policy to camel case (before it was all lowercase) and it works just fine now. (Tested with terraform 0.13.6 / azurerm 2.48.0)

I also tested the creation of the policy with 2.44.0 (all lower case) (our project is currently on 2.44) and deletion with 2.48.0 and I can also confirm that this will fail. I also tried the creation with 2.44.0 and then changing everything to camel case and applying with 2.48.0 again, it will change the case in the state but the destroy still fails.

If it was already created with camel case in 2.44.0 the destroy completes fine in 2.48 except it has a storage_permission since this can only be created with lowercase in <=2.47.0 and after upgarding to 2.48.0 the destroy fails.

I read the changelog of 2.48.0 about the normalizing but I wasn't aware that I have to change all my access policy resources to camel case. I wonder if this is even intended since it worked before without camel case and it is only a problem on destroy.

So maybe there should be an error if the permission is not camel case or it should not be case sensitive at all.
Also the backward compatibility needs to be fixed somehow, that a deletion will complete even if it was created with all lower case before.

This was the tested policy which worked with <=2.47.0
resource "azurerm_key_vault_access_policy" "admin-access-policy" {
  key_vault_id = azurerm_key_vault.vault.id

  tenant_id = var.tenant_id
  object_id = var.admin_group_id

  key_permissions = [
    "backup",
    "create",
    "decrypt",
    "delete",
    "encrypt",
    "get",
    "import",
    "list",
    "purge",
    "recover",
    "restore",
    "sign",
    "unwrapKey",
    "update",
    "verify",
    "wrapKey"
  ]

  secret_permissions = [
    "backup",
    "delete",
    "get",
    "list",
    "purge",
    "recover",
    "restore",
    "set"
  ]

  certificate_permissions = [
    "backup",
    "create",
    "delete",
    "deleteissuers",
    "get",
    "getIssuers",
    "import",
    "list",
    "listissuers",
    "managecontacts",
    "manageissuers",
    "purge",
    "recover",
    "restore",
    "setissuers",
    "update"
  ]

  storage_permissions = [
    "backup",
    "delete",
    "deletesas",
    "get",
    "getsas",
    "list",
    "listsas",
    "purge",
    "recover",
    "regeneratekey",
    "restore",
    "set",
    "setsas",
    "update"
  ]
}
And with camel case which works with 2.48.0+
resource "azurerm_key_vault_access_policy" "admin-access-policy" {
  key_vault_id = azurerm_key_vault.vault.id

  tenant_id = var.tenant_id
  object_id = var.admin_group_id

  key_permissions = [
    "Backup",
    "Create",
    "Decrypt",
    "Delete",
    "Encrypt",
    "Get",
    "Import",
    "List",
    "Purge",
    "Recover",
    "Restore",
    "Sign",
    "UnwrapKey",
    "Update",
    "Verify",
    "WrapKey"
  ]

  secret_permissions = [
    "Backup",
    "Delete",
    "Get",
    "List",
    "Purge",
    "Recover",
    "Restore",
    "Set"
  ]

  certificate_permissions = [
    "Backup",
    "Create",
    "Delete",
    "DeleteIssuers",
    "Get",
    "GetIssuers",
    "Import",
    "List",
    "ListIssuers",
    "ManageContacts",
    "ManageIssuers",
    "Purge",
    "Recover",
    "Restore",
    "SetIssuers",
    "Update"
  ]

  storage_permissions = [
    "Backup",
    "Delete",
    "DeleteSAS",
    "Get",
    "GetSAS",
    "List",
    "ListSAS",
    "Purge",
    "Recover",
    "RegenerateKey",
    "Restore",
    "Set",
    "SetSAS",
    "Update"
  ]
}

@TamasSzerb
Copy link

@cdobinsky unfortunately it didn't work for me, with terraform v0.14.6 linux amd64, azurerm v2.49.0.

Also https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/key_vault_access_policy
uses non-camelcase, and API also not uses it: https://docs.microsoft.com/en-us/rest/api/keyvault/vaults/update (see: accessPolicies)

@ceciliasharp
Copy link
Contributor

Sometimes it works, but mostly not. In the portal I can see that the policy is still there and if I remove it manually while terraform is running it will work otherwise it will continue for 30 min and fail.

@peteneville
Copy link

peteneville commented Mar 3, 2021

This ticket is for destroying the keyvault, but I'm seeing a similar issue on a seemingly unrelated app_config update where it's trying to update secret permissions based on case difference and I get a time out.

I've created a completely new environment with TF 0.13.6 and azurerm 2.48.0 and camel case of "Get", "List" on secret_permissions. All good until I changed an app_setting on a function app (that also has a secreturi values for another app_setting, but no changes) by adding simple key / value and now my apply will time out:

module.webapp_web.azurerm_key_vault_access_policy.appservice_identity: Still destroying... 25m10s elapsed]

and for some reason it's trying to update secret_permissions:

  ~ secret_permissions      = [
      - "Get",
      - "List",
      + "get",
      + "list"

Looking in the portal the access policy has been removed early on, but no new replacement one added replaced, but the apply shows as failed with output:

Error: failed waiting for Key Vault Access Policy (Object ID: "9fd...f2") to apply: timeout while waiting for state to become 'notfound' (last state: 'found', timeout: 30m0s)

@peteneville
Copy link

...and now a second test of removing the single test app_setting shows secret permission changes again:
~ secret_permissions = [
- "Get",
- "List",
+ "get",
+ "list",
]
and what looks like is going to be another timeout, but again the policy appears to have been replaced correctly.

@peteneville
Copy link

I have a mixture of case for secret_permissions in my state file. The above indicates that they'll be changed to lowercase. Should the state be camel or lower? Should the terraform code be camel or lower for these values? Terraform examples at https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/key_vault_secret show show they should be lower.
I will update my state and code and make both lower and see if I can prevent these changes. Doubt this will help with the timeout though when we do want to update a policy.

e.g.
"secret_permissions": [
"Get",
"List"
],
and
"secret_permissions": [
"get",
"list"
],

@peteneville
Copy link

peteneville commented Mar 4, 2021

Following more testing, I have determined that 2.48.0 has an issue with updating the keyvault policy as it times out after 30 mins. Same code but with 2.47.0 works and policy is updated within a few seconds. I will check 2.49.0 again just to see if that also has same issue.
Checked 2.49.0, same issue.

@ghost

This comment has been minimized.

@BrendanThompson
Copy link
Contributor

@katbyte given that you've just updated the CHANGELOG.md for the v2.50.0 release (b74f30f) and there are no PRs against this issue I can safely assume that it won't hit that milestone. Is there any ETA as to which milestone this guy is going to get resolved in?

@katbyte katbyte modified the milestones: v2.50.0, v2.51.0 Mar 5, 2021
@Alboroni
Copy link

Alboroni commented Mar 5, 2021

Also suffering from this so would be good to get ETA for sorting this regression

@mattduguid
Copy link

mattduguid commented Mar 10, 2021

Sometimes it works, but mostly not. In the portal I can see that the policy is still there and if I remove it manually while terraform is running it will work otherwise it will continue for 30 min and fail.

same workaround for us, if we manually delete the 2 keyvault policies that its looping on, a few second later it determines the state change and proceeds,

module.<OUR_MODULE>.azurerm_key_vault_access_policy.: Still destroying... [id=/subscriptions//, 13m50s elapsed]
module.<OUR_MODULE>.azurerm_key_vault_access_policy.: Still destroying... [id=/subscriptions//, 13m50s elapsed]
...MANUAL DELETE FROM AZURE PORTAL...
module.<OUR_MODULE>.azurerm_key_vault_access_policy.: Destruction complete after 14m42s
module.<OUR_MODULE>.azurerm_key_vault_access_policy.: Destruction complete after 14m48s

@ghost
Copy link

ghost commented Mar 12, 2021

This has been released in version 2.51.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.51.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Apr 10, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.