Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul fails to register vault-sealed-check for vault #5439

Open
adragoset opened this issue Mar 7, 2019 · 23 comments
Open

Consul fails to register vault-sealed-check for vault #5439

adragoset opened this issue Mar 7, 2019 · 23 comments
Labels
needs-investigation The issue described is detailed and complex. theme/consul-vault Relating to Consul & Vault interactions

Comments

@adragoset
Copy link

adragoset commented Mar 7, 2019

Overview of the Issue

I am currently running vault server on the same hosts as my consul servers. When upgrading from consul 1.4.0 to consul 1.4.3 vault now fails to register its sealed check with consul 1.4.3 so sealed status is never being reported to consul and im getting the following log messages piling up in my consul and vault logs.

Reproduction Steps

  1. Install consul server 1.4.3 on a host as server
  2. Configure and install Vault server 1.0.2 on same host.
  3. Setup consul backend for vault
  4. Unseal vault
  5. Unseal checks start failing trying to report a check status to consul that does not exist.
  6. Rollback to consul 1.4.0 and seal status checks work correctly.

Operating system and Environment details

Official Docker container for consul 1.4.3 and vault 1.0.2

Log Fragments

Vault Logs


             Api Address: https://192.168.28.63:8200
                     Cgo: disabled
         Cluster Address: https://192.168.28.63:8201
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "192.168.28.63:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: (not set)
                   Mlock: supported: true, enabled: true
                 Storage: consul (HA available)
                 Version: Vault v1.0.2
             Version Sha: 37a1dc9c477c1c68c022d2084550f25bf20cac33

==> Vault server started! Log data will stream in below:

2019-03-07T15:26:05.882Z [WARN]  no `api_addr` value specified in config or in VAULT_API_ADDR; falling back to detection if possible, but this value should be manually set
2019-03-07T15:26:08.434Z [INFO]  core: vault is unsealed
2019-03-07T15:26:08.434Z [INFO]  core: entering standby mode
2019-03-07T15:26:27.479Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:192.168.28.63:8200:vault-sealed-check")"

Consul Logs

 2019/03/07 15:25:58 [INFO] agent: Deregistered service "vault:192.168.28.63:8200"
    2019/03/07 15:25:58 [INFO] agent: Deregistered check "e408baff1455ac4cab95892718cd7494f61693ff"
    2019/03/07 15:25:58 [INFO] agent: Deregistered check "mem-util"
    2019/03/07 15:25:58 [INFO] agent: Deregistered check "dsk-util"
    2019/03/07 15:25:58 [INFO] agent: Deregistered check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
    2019/03/07 15:25:58 [INFO] agent: Deregistered check "vault:192.168.28.63:8200:vault-sealed-check"
    2019/03/07 15:26:05 [INFO] agent: Synced service "vault:192.168.28.63:8200"
    2019/03/07 15:26:05 [INFO] agent: Synced check "vault:192.168.28.63:8200:vault-sealed-check"
    2019/03/07 15:26:08 [INFO] agent: Synced check "vault:192.168.28.63:8200:vault-sealed-check"
    2019/03/07 15:26:20 [INFO] agent: Synced check "e408baff1455ac4cab95892718cd7494f61693ff"
    2019/03/07 15:26:20 [INFO] agent: Synced check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
    2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-acfqsqjxbgap2cvwlfedkvyx55twlbsl"
    2019/03/07 15:26:22 [INFO] agent: Deregistered check "vault:192.168.28.63:8200:vault-sealed-check"
    2019/03/07 15:26:22 [INFO] agent: Deregistered check "e408baff1455ac4cab95892718cd7494f61693ff"
    2019/03/07 15:26:22 [INFO] agent: Deregistered check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
    2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-fwtmi5j5ajxekyuajonqk4pwlauj3dko"
    2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-p4pxc2srsurolx43mc7lztdb7fwnwbh3"
    2019/03/07 15:26:22 [ERR] http: Request PUT /v1/agent/check/deregister/a75809917d97ead0eaebb52cfeabe012dc47abc7, error: Unknown check "a75809917d97ead0eaebb52cfeabe012dc47abc7" from=127.0.0.1:51522
    2019/03/07 15:26:22 [ERR] http: Request PUT /v1/agent/check/deregister/e408baff1455ac4cab95892718cd7494f61693ff, error: Unknown check "e408baff1455ac4cab95892718cd7494f61693ff" from=127.0.0.1:51522
    2019/03/07 15:26:25 [WARN] agent: Check "e408baff1455ac4cab95892718cd7494f61693ff" socket connection failed: dial tcp 0.0.0.0:4648: connect: connection refused
    2019/03/07 15:26:27 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:27 [WARN] agent: Check "vault:192.168.28.63:8200:vault-sealed-check" missed TTL, is now critical
    2019/03/07 15:26:28 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:29 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:30 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:31 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:32 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:33 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
    2019/03/07 15:26:34 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576

@adragoset adragoset changed the title Consul fails to register vault sealed check for vault Consul fails to register vault-sealed-check for vault Mar 7, 2019
@bva
Copy link

bva commented Mar 19, 2019

Same issue here. It happens right after stopping/restarting nomad agent connected to the same consul node. Not sure though what app is exactly to blame: nomad, consul or vault.

@pearkes pearkes added the needs-investigation The issue described is detailed and complex. label Apr 4, 2019
@pearkes
Copy link
Contributor

pearkes commented Apr 4, 2019

Vault manages the various checks here via its integration as a storage backend. However, this could be a bug introduced by a change to Consul APIs. This means a fix for this, if proven to be a bug, would likely end up in Vault.

@bva
Copy link

bva commented Apr 9, 2019

Issue is fixed in 1.4.4, GH-5456

@acarsercan
Copy link

I'm using vault 1.1.2 and consul 1.5.3

Still seen these errors , what could of I done wrong?
:8200:vault-sealed-check" missed TTL, is now critical

@ab-fuze
Copy link

ab-fuze commented Oct 22, 2019

I have the same with consul 1.6.0 and vault 1.1.3

@Serrvosky
Copy link

Serrvosky commented Oct 31, 2019

Hello everyone,

I'm facing the same issue. I already have a Consul cluster deployed in Kubernetes (with ACL), and now I'm trying to deploy Vault in the same cluster. However, i'm facing the same issue.

This is my Vault config:

      storage "consul" {
        address = "<CONSUL_SERVICE_NAME>:8500"
        token = "<CONSUL_TOKEN>"
        scheme = "http"
        path = "vault/"
      }
        
      listener "tcp" {
        address          = "0.0.0.0:8200"
        tls_disable      = "true"
      }

      ui = true
      log_level = "Info"

      api_addr = "https://<CONSUL_POD_IP>:8200"
      cluster_addr = "https://<CONSUL_POD_IP>:8201"

And this is my Vault logs:

2019-10-31T11:06:21.317Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:22.321Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:23.325Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:24.336Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"

And here, below you can find my consul logs:

    2019/10/31 11:07:22 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020
    2019/10/31 11:07:23 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020
    2019/10/31 11:07:24 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020

What am i missing?

//Vault version: 1.2.3
//Consul version: 1.6.0

--- UPDATE ---
Ignoring that logs, everything looks working fine. I just create a test kv, and it works:

~/infrastructure/kubernetes/vault master ⇡2 !6 ?1 ❯ vault list cubbyhole/                                                                                        
Keys
----
first

~/infrastructure/kubernetes/vault master ⇡2 !6 ?1 ❯ vault kv get cubbyhole/first/                                                                               
====== Data ======
Key         Value
---         -----

@Strum355
Copy link

Strum355 commented Oct 31, 2019

Consul 1.6.1
Vault 1.2.3
Have a similar issue in Docker Swarm after some time with a Consul cluster of 3 server agents and 0 client agents, a single Vault instance. The health check passes for at least a day before going critical. Pasting the docker-compose snippets:

Vault logs:

...
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:07.108749Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:08.110733Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:09.117084Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:10.119469Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:11.121813Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:12.123746Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:13.125820Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
...

Consul logs:

...
2019/10/31 14:21:30 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:31 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:32 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:33 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:34 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:35 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
...

Vault:

compose:
- version: '3.7'
  secrets:
    vault_config.hcl:
      external: true      
  networks: 
    consul:
      external: true
    traefik:
      external: true
    vault:
      external: true
  services:
    server:
      image: vault:1.2.3
      command: server -config=/run/secrets/vault_config.hcl
      secrets:
        - vault_config.hcl
      networks:
        - consul
        - traefik
        - vault

Consul:

compose: 
- version: '3.7'
  secrets:
    consul_config.hcl:
      external: true
  networks:
    consul:
      external: true
    traefik:
      external: true
  services:
    server:
      image: consul:1.6.1
      networks:
        traefik:
        consul:
          aliases:
            - consul
      command: 'agent -config-file=/run/secrets/consul_config.hcl -rejoin'
      hostname: '{% raw %}{{ .Node.Hostname }}.consul.netsoc.co{% endraw %}'
      volumes:
        - /netsoc-neo/docker-data/consul:/consul/data
      environment:
        - CONSUL_BIND_INTERFACE=eth0
      secrets:
        - consul_config.hcl
      deploy:
        endpoint_mode: dnsrr # Needed to get cluster to not rely on pre-known IPs
        mode: global

Vault config:

ui = true
log_format = "json"
cluster_name = "main"

listener "tcp" {
    address = "0.0.0.0:8200"
    tls_disable = 1
}

storage "consul" {
    address = "consul:8500"
    path = "hashicorp-vault/"
    token = "{{ consul_vault_token }}"
}

@vprasanna80
Copy link

I have the same issue with Consul 1.6.1 and Vault 1.2.3.

2019/11/11 22:47:11 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical
2019/11/11 22:47:30 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical
2019/11/11 22:48:18 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical

@vinay-g
Copy link

vinay-g commented Nov 18, 2019

Same issue here consul 1.5 and vault 1.2

@Strum355
Copy link

Solved my problem by deploying a single consul client agent outside of swarm, one per host, and having a cluster of consul server agents inside swarm. I have two networks(ish), one for the consul server instances and one for the consul clients (one per host, so in effect, n+1 networks where n have the same name). Services are added to the local consul client network and register with consul through there rather than being added to the consul server network

@Aracki
Copy link

Aracki commented Nov 29, 2019

Any updates on why this is happening? (consul 1.6.2 here)

@invad0r
Copy link

invad0r commented Jan 21, 2020

can confirm with consul v1.4.0 and vault 1.3.1

@etiennejournet
Copy link

same issue

@sub6as
Copy link

sub6as commented Jan 29, 2020

I'm also seeing this issue when using the Vault and Consul helm charts from the Hashicorp repo.

@judahrand
Copy link

judahrand commented Feb 3, 2020

Yup, me too.
Vault: 1.3.2
Consul: 1.6.2

@unittolabs
Copy link

Yup, me too.
Vault: 1.3.2
Consul: 1.6.2

the same =(

@mshivanna
Copy link

The warning logs are only on the standby vault pod. active vault pod does not have these warn logs. vault 1.3.1 and consul 1.5.3

@bhechinger
Copy link

vault 1.3.2 and consul 1.6.2 and it's happening on all three nodes (one active and two standby)

@gkarthiks
Copy link

I am also having the same issue. Any updates on thjis?

@ndobbs
Copy link
Contributor

ndobbs commented Feb 25, 2020

I ended up finding a solution for my case. I had 12 dead checks but the current active nodes were passing. I decided to take an outage window and completely de-register the vault service from consul (if you are using the vault consul k/v store, this will stay in tact).

Steps to solve the problem in my situation:

  1. Stop Vault on all nodes
  2. Deregister services (see script below) - # I had to run this several times
  3. Start vault and unseal vault

script:

#!/usr/bin/env bash

consul_url="https://consul.service.aws.prd:8501/v1/catalog/service/vault"
vault_service_ids=$(curl -s -k $consul_url | jq -r '.[] | .ServiceID')
consul_deregister_command="consul services deregister -id="

for id in $vault_service_ids
do
    echo "$consul_deregister_command$id"
    # UNCOMMENT THIS LINE IF YOU WANT TO REALLY WANT TO DEREGISTER THE VAULT SERVICES
    # $consul_deregister_command$id
done

image

@gkarthiks
Copy link

but do we know what's the reason these error message ? because the services are also listed under failed service checks.

@jdeprin
Copy link

jdeprin commented Feb 28, 2020

I've encountered this issue in k8s with consul and vault and believe I have a working solution. The documentation suggests that Vault should always communicate with a local consul agent and not directly to the server. I think the issue is that vault is looking for a consul agent locally (local to the node) and not finding one. This would explain the sporadic nature of the error. If Vault pods landed on a node with consul, great! if not the issue would appear.

To fix this I added some affinity to both vault and consul. Node affinity and pod affinity such that my vault and consul pods would always be on the same nodes.

In the vault chart this is a working configuration... depending on your specific environment labeling.

  affinity: |
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/node-role
            operator: In
            values:
            - management
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: {{ template "vault.name" . }}
              app.kubernetes.io/instance: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              component: consul
          topologyKey: kubernetes.io/hostname

EDIT:
Following up on this comment. If you happen to be running vault with the consul chart from the helm/stable repo you'll likely run into this error. The helm/stable chart is a simple standalone server implementation of consul. It does not include configuration for running consul in agent mode.
The official Hashicorp chart of consul does include an agent daemonset.
A full fix for my case was to re-deploy consul with the server statefulset and agent daemonset. Once that is working, deploy vault with a configuration pointing to the daemonset in the connection config.

storage "consul" {
      path = "vault"
      address = "HOST_IP:8500"
    }

@invad0r
Copy link

invad0r commented Feb 28, 2020

Can confirm @jdeprin

After changing the config for vault to use the local consul-agent I stop getting vault-sealed-check and the state is displayed correctly at consul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-investigation The issue described is detailed and complex. theme/consul-vault Relating to Consul & Vault interactions
Projects
None yet
Development

No branches or pull requests