Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Terraform returning an error during deployments on Azure ("A retryable error occurred.") #1424

Closed
przemyslavic opened this issue Jul 3, 2020 · 1 comment
Assignees
Projects
Milestone

Comments

@przemyslavic
Copy link
Collaborator

Describe the bug
There is often an issue with Azure cluster deployment due to a terraform error saying "A retryable error occurred."

To Reproduce
Steps to reproduce the behavior:

  1. execute epicli apply -f test.yml (configuration given below)

Expected behavior
The cluster has been deployed successfully.

Config files

---
kind: epiphany-cluster
name: test
provider: azure
specification:
  admin_user:
    key_path: /path/to/id_rsa
    name: operations
  cloud:
    region: xxx
    subscription_name: xxx
    use_public_ips: true
    use_service_principal: false    
  components:
    kafka:
      count: 2
    kubernetes_master:
      count: 1
    kubernetes_node:
      count: 3
    load_balancer:
      count: 1
    logging:
      count: 2
    monitoring:
      count: 1
    postgresql:
      count: 2
    rabbitmq:
      count: 2
    ignite:
      count: 2
    opendistro_for_elasticsearch:
      count: 2
  name: test
  prefix: 'qa'
title: Epiphany cluster Config

OS (please complete the following information):

  • OS: [all]

Cloud Environment (please complete the following information):

  • Cloud Provider [MS Azure]

Additional context
Log:

13:08:18 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-monitoring-vm-0: Still creating... [1m40s elapsed]
13:08:22 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-ignite-vm-1: Still creating... [1m40s elapsed]
13:08:25 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-kubernetes-node-vm-0: Still creating... [1m40s elapsed]
13:08:28 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-kubernetes-node-vm-1: Still creating... [1m50s elapsed]
13:08:28 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-monitoring-vm-0: Still creating... [1m50s elapsed]
13:08:31 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-monitoring-vm-0: Creation complete after 1m53s [id=/subscriptions/xxx-xxx-xxx-xxx/resourceGroups/ci-retryable-rg/providers/Microsoft.Compute/virtualMachines/ci-retryable-monitoring-vm-0]
13:08:31 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-ignite-vm-1: Creation complete after 1m49s [id=/subscriptions/xxx-xxx-xxx-xxx/resourceGroups/ci-retryable-rg/providers/Microsoft.Compute/virtualMachines/ci-retryable-ignite-vm-1]
13:08:31 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-kubernetes-node-vm-0: Creation complete after 1m46s [id=/subscriptions/xxx-xxx-xxx-xxx/resourceGroups/ci-retryable-rg/providers/Microsoft.Compute/virtualMachines/ci-retryable-kubernetes-node-vm-0]
13:08:31 INFO cli.engine.terraform.TerraformCommand - azurerm_virtual_machine.ci-retryable-kubernetes-node-vm-1: Creation complete after 1m53s [id=/subscriptions/xxx-xxx-xxx-xxx/resourceGroups/ci-retryable-rg/providers/Microsoft.Compute/virtualMachines/ci-retryable-kubernetes-node-vm-1]
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand - Warning: "resource_group_name": [DEPRECATED] This field has been deprecated and is no longer used - will be removed in 2.0 of the Azure Provider
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand -   on build/retryable/terraform/003_ci-retryable-k8s-ss.tf line 22, in resource "azurerm_storage_share" "ci-retryable-k8s-ss":
13:08:31 INFO cli.engine.terraform.TerraformCommand -   22: resource "azurerm_storage_share" "ci-retryable-k8s-ss" {
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand - Error: Code="RetryableError" Message="A retryable error occurred."
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand -   on build/retryable/terraform/015_ci-retryable-kubernetes-master-vm-2.tf line 13, in resource "azurerm_virtual_machine" "ci-retryable-kubernetes-master-vm-2":
13:08:31 INFO cli.engine.terraform.TerraformCommand -   13: resource "azurerm_virtual_machine" "ci-retryable-kubernetes-master-vm-2" {
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformCommand -
13:08:31 INFO cli.engine.terraform.TerraformRunner - Run done in 286062ms
13:08:31 ERROR epicli - Error running: "terraform apply --auto-approve -state=/shared/build/retryable/terraform//terraform.tfstate /shared/build/retryable/terraform/"
13:08:34 INFO dump_debug_info - Error dump has been written to: /shared/epicli_error_20200703-130831.dump
13:08:34 WARNING dump_debug_info - This dump might contain sensitive information. Check before sharing.

Large impact on CI builds.
On average one of the three builds fails because of this

@przemyslavic przemyslavic added this to the 0.7.1 milestone Jul 3, 2020
@przemyslavic przemyslavic added this to Needs triage in Bugs via automation Jul 3, 2020
@seriva seriva self-assigned this Jul 9, 2020
seriva added a commit that referenced this issue Jul 14, 2020
Fix: Terraform returning an error during deployments on Azure "A retryable error occurred." (#1424)
@przemyslavic przemyslavic self-assigned this Jul 15, 2020
@przemyslavic
Copy link
Collaborator Author

Fix tested. In the case of a terraform error, another attempt is made. The builds were successful. The log from the second retry can be found below.

2020-07-15T07:27:55.5529021Z 07:27:55 ERROR cli.engine.terraform.TerraformCommand - Error: Code="RetryableError" Message="A retryable error occurred."
2020-07-15T07:27:55.5529867Z 07:27:55 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:55.5535961Z 07:27:55 INFO cli.engine.terraform.TerraformCommand -   on ../../shared/build/devazurrhelflannel/terraform/024_ci-devazurrhelflannel-logging-vm-0.tf line 13, in resource "azurerm_virtual_machine" "ci-devazurrhelflannel-logging-vm-0":
2020-07-15T07:27:55.5538539Z 07:27:55 INFO cli.engine.terraform.TerraformCommand -   13: resource "azurerm_virtual_machine" "ci-devazurrhelflannel-logging-vm-0" {
2020-07-15T07:27:55.5539509Z 07:27:55 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:55.5540200Z 07:27:55 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:55.5653968Z 07:27:55 WARNING cli.engine.terraform.TerraformCommand - Terraform failed with "RetryableError" error. Retry: 2/3
.
.
.
2020-07-15T07:29:22.4699527Z 07:29:22 INFO cli.engine.terraform.TerraformCommand - Apply complete! Resources: 1 added, 26 changed, 0 destroyed.
2020-07-15T07:29:22.4789739Z 07:29:22 INFO cli.engine.terraform.TerraformCommand - Done running "terraform apply --auto-approve -state=/shared/build/devazurrhelflannel/terraform//terraform.tfstate -no-color /shared/build/devazurrhelflannel/terraform/"
2020-07-15T07:29:22.4795974Z 07:29:22 INFO cli.engine.terraform.TerraformRunner - Run done in 389254ms

Another cluster:

2020-07-15T07:27:38.9428083Z 07:27:38 ERROR cli.engine.terraform.TerraformCommand - Error: Code="RetryableError" Message="A retryable error occurred."
2020-07-15T07:27:38.9515490Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9517314Z 07:27:38 INFO cli.engine.terraform.TerraformCommand -   on ../../shared/build/devazurubuflannel/terraform/042_ci-devazurubuflannel-kafka-vm-1.tf line 13, in resource "azurerm_virtual_machine" "ci-devazurubuflannel-kafka-vm-1":
2020-07-15T07:27:38.9518617Z 07:27:38 INFO cli.engine.terraform.TerraformCommand -   13: resource "azurerm_virtual_machine" "ci-devazurubuflannel-kafka-vm-1" {
2020-07-15T07:27:38.9519443Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9520066Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9520706Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9521541Z 07:27:38 ERROR cli.engine.terraform.TerraformCommand - Error: Code="RetryableError" Message="A retryable error occurred."
2020-07-15T07:27:38.9522327Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9523662Z 07:27:38 INFO cli.engine.terraform.TerraformCommand -   on ../../shared/build/devazurubuflannel/terraform/081_ci-devazurubuflannel-opendistro-for-elasticsearch-vm-0.tf line 13, in resource "azurerm_virtual_machine" "ci-devazurubuflannel-opendistro-for-elasticsearch-vm-0":
2020-07-15T07:27:38.9525067Z 07:27:38 INFO cli.engine.terraform.TerraformCommand -   13: resource "azurerm_virtual_machine" "ci-devazurubuflannel-opendistro-for-elasticsearch-vm-0" {
2020-07-15T07:27:38.9525925Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9526563Z 07:27:38 INFO cli.engine.terraform.TerraformCommand - 
2020-07-15T07:27:38.9527399Z 07:27:38 WARNING cli.engine.terraform.TerraformCommand - Terraform failed with "RetryableError" error. Retry: 2/3
.
.
.
2020-07-15T07:29:05.4632836Z 07:29:05 INFO cli.engine.terraform.TerraformCommand - Apply complete! Resources: 2 added, 25 changed, 0 destroyed.
2020-07-15T07:29:05.4751878Z 07:29:05 INFO cli.engine.terraform.TerraformCommand - Done running "terraform apply --auto-approve -state=/shared/build/devazurubuflannel/terraform//terraform.tfstate -no-color /shared/build/devazurubuflannel/terraform/"
2020-07-15T07:29:05.4754542Z 07:29:05 INFO cli.engine.terraform.TerraformRunner - Run done in 342329ms

@mkyc mkyc closed this as completed Jul 21, 2020
Bugs automation moved this from Needs triage to Closed Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Bugs
  
Closed
Development

No branches or pull requests

4 participants