Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling services that Consul has deregistered #146

Closed
randomswdev opened this issue Sep 5, 2019 · 7 comments · Fixed by #147
Closed

Handling services that Consul has deregistered #146

randomswdev opened this issue Sep 5, 2019 · 7 comments · Fixed by #147

Comments

@randomswdev
Copy link
Contributor

Terraform Version

0.12

Affected Resource(s)

consul_service

Terraform Configuration Files

resource "consul_service" "redis" {
  name = "redis"
  node = "redis"
  port = 6379

  check {
    name                              = "Redis health check"
    interval                          = "5s"
    timeout                           = "1s"
    deregister_critical_service_after = "30s"
  }
}

Error Output

...
redis: Refreshing state... [id=redis]

Error: Failed to retrieve service: 'redis', services: 1

Actual Behavior

If the service goes into the critical state, Consul deregisters it after the interval defined in deregister_critical_service_after . If this happens, when running again terraform, it outputs an error because the service is defined in the state but cannot be refreshed from Consul (it no longer exists).

Proposal

I would like to discuss options to avoid the error condition, for example by removing the service from the state in case the refresh fails. This behavior can be applied always or, for example, can be controlled through a switch at the resource or provider level. I don't know if there is any other design alternative; any of them is welcome.
If we can agree on a solution for the issue, I can implement and contribute the patch.

@remilapeyre
Copy link
Collaborator

Hi @randomswdev, can you run terraform version to see what version of the provider you are using?

@randomswdev
Copy link
Contributor Author

We use a Consul provider built from the master branch because we need some features/fixes that have been included in the master branch but are not available in version v2.5.0.

Terraform v0.12.7
+ provider.consul (unversioned)

We built the provider about a week or two ago, using the source code avialble at that time in the master branch.

@remilapeyre
Copy link
Collaborator

I just tried with master (64de0bb) and it seems to work:

➜  terraform-provider-consul git:(master) ✗ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # consul_service.redis will be created
  + resource "consul_service" "redis" {
      + address    = (known after apply)
      + datacenter = (known after apply)
      + id         = (known after apply)
      + name       = "redis"
      + node       = "MacBook-Pro-de-Remi.local"
      + port       = 6379
      + service_id = (known after apply)

      + check {
          + check_id                          = (known after apply)
          + deregister_critical_service_after = "30s"
          + interval                          = "5s"
          + method                            = "GET"
          + name                              = "Redis health check"
          + status                            = "critical"
          + timeout                           = "1s"
          + tls_skip_verify                   = false
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

consul_service.redis: Creating...
consul_service.redis: Creation complete after 0s [id=redis]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
➜  terraform-provider-consul git:(master) ✗ terraform apply
consul_service.redis: Refreshing state... [id=redis]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # consul_service.redis will be created
  + resource "consul_service" "redis" {
      + address    = (known after apply)
      + datacenter = (known after apply)
      + id         = (known after apply)
      + name       = "redis"
      + node       = "MacBook-Pro-de-Remi.local"
      + port       = 6379
      + service_id = (known after apply)

      + check {
          + check_id                          = (known after apply)
          + deregister_critical_service_after = "30s"
          + interval                          = "5s"
          + method                            = "GET"
          + name                              = "Redis health check"
          + status                            = "critical"
          + timeout                           = "1s"
          + tls_skip_verify                   = false
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

consul_service.redis: Creating...
consul_service.redis: Creation complete after 0s [id=redis]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
➜  terraform-provider-consul git:(master) ✗ g rev-parse HEAD
64de0bb68fde35178b2a2f23db38da6ad7a5212e

Did I miss something to reproduce the issue? We used to have a bug like this but it's supposed to have been solved in 194fff3.

@randomswdev
Copy link
Contributor Author

I provided an example that was too simple 😄

I finally produced the issue using the following fragment of terraform:

provider "consul" {
}
resource "consul_node" "redis1" {
  name = "redis/redis1"
  address = "hostname1"
}
resource "consul_node" "redis2" {
  name = "redis/redis2"
  address = "hostname2"
}

resource "consul_service" "redis1" {
  name = "redis"
  node = consul_node.redis1.name
  port = 6379

  check {
    check_id                          = "service:redis1"
    name                              = "Redis health check"
    tcp                               = "127.0.0.1:6379"
    interval                          = "5s"
    timeout                           = "1s"
    deregister_critical_service_after = "30s"
  }
}

resource "consul_service" "redis2" {
  name = "redis"
  node = consul_node.redis2.name
  port = 6379

  check {
    check_id                          = "service:redis1"
    name                              = "Redis health check"
    tcp                               = "127.0.0.1:6379"
    interval                          = "5s"
    timeout                           = "1s"
    deregister_critical_service_after = "30s"
  }
}

If you deregister one node and issue again the terraform command, terraform will complain that it is not able to refresh the service.

The difference here is that the same service is defined on two nodes: if you delete a node, the service still exists, but no longer exists on the deleted node and I think this somehow confuses the provider.

@remilapeyre
Copy link
Collaborator

Yes, this is a mistake. Could you confirm that #147 fixes the issue for you?

@randomswdev
Copy link
Contributor Author

The fix worked for me. Thank you very much @remilapeyre

@remilapeyre
Copy link
Collaborator

Thanks for the info :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants