New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul fails to register vault-sealed-check for vault #5439
Comments
Same issue here. It happens right after stopping/restarting nomad agent connected to the same consul node. Not sure though what app is exactly to blame: nomad, consul or vault. |
Vault manages the various checks here via its integration as a storage backend. However, this could be a bug introduced by a change to Consul APIs. This means a fix for this, if proven to be a bug, would likely end up in Vault. |
Issue is fixed in 1.4.4, GH-5456 |
I'm using vault 1.1.2 and consul 1.5.3 Still seen these errors , what could of I done wrong? |
I have the same with consul 1.6.0 and vault 1.1.3 |
Hello everyone, I'm facing the same issue. I already have a Consul cluster deployed in Kubernetes (with ACL), and now I'm trying to deploy Vault in the same cluster. However, i'm facing the same issue. This is my Vault config:
And this is my Vault logs:
And here, below you can find my consul logs:
What am i missing? //Vault version: 1.2.3 --- UPDATE ---
|
Consul 1.6.1 Vault logs:
Consul logs:
Vault: compose:
- version: '3.7'
secrets:
vault_config.hcl:
external: true
networks:
consul:
external: true
traefik:
external: true
vault:
external: true
services:
server:
image: vault:1.2.3
command: server -config=/run/secrets/vault_config.hcl
secrets:
- vault_config.hcl
networks:
- consul
- traefik
- vault Consul: compose:
- version: '3.7'
secrets:
consul_config.hcl:
external: true
networks:
consul:
external: true
traefik:
external: true
services:
server:
image: consul:1.6.1
networks:
traefik:
consul:
aliases:
- consul
command: 'agent -config-file=/run/secrets/consul_config.hcl -rejoin'
hostname: '{% raw %}{{ .Node.Hostname }}.consul.netsoc.co{% endraw %}'
volumes:
- /netsoc-neo/docker-data/consul:/consul/data
environment:
- CONSUL_BIND_INTERFACE=eth0
secrets:
- consul_config.hcl
deploy:
endpoint_mode: dnsrr # Needed to get cluster to not rely on pre-known IPs
mode: global Vault config: ui = true
log_format = "json"
cluster_name = "main"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}
storage "consul" {
address = "consul:8500"
path = "hashicorp-vault/"
token = "{{ consul_vault_token }}"
} |
I have the same issue with Consul 1.6.1 and Vault 1.2.3. 2019/11/11 22:47:11 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical |
Same issue here consul 1.5 and vault 1.2 |
Solved my problem by deploying a single consul client agent outside of swarm, one per host, and having a cluster of consul server agents inside swarm. I have two networks(ish), one for the consul server instances and one for the consul clients (one per host, so in effect, n+1 networks where n have the same name). Services are added to the local consul client network and register with consul through there rather than being added to the consul server network |
Any updates on why this is happening? (consul |
can confirm with consul |
same issue |
I'm also seeing this issue when using the Vault and Consul helm charts from the Hashicorp repo. |
Yup, me too. |
the same =( |
The warning logs are only on the standby vault pod. active vault pod does not have these warn logs. vault 1.3.1 and consul 1.5.3 |
vault 1.3.2 and consul 1.6.2 and it's happening on all three nodes (one active and two standby) |
I am also having the same issue. Any updates on thjis? |
I ended up finding a solution for my case. I had 12 dead checks but the current active nodes were passing. I decided to take an outage window and completely de-register the vault service from consul (if you are using the vault consul k/v store, this will stay in tact). Steps to solve the problem in my situation:
script:
|
but do we know what's the reason these error message ? because the services are also listed under failed service checks. |
I've encountered this issue in k8s with consul and vault and believe I have a working solution. The documentation suggests that Vault should always communicate with a local consul agent and not directly to the server. I think the issue is that vault is looking for a consul agent locally (local to the node) and not finding one. This would explain the sporadic nature of the error. If Vault pods landed on a node with consul, great! if not the issue would appear. To fix this I added some affinity to both vault and consul. Node affinity and pod affinity such that my vault and consul pods would always be on the same nodes. In the vault chart this is a working configuration... depending on your specific environment labeling. affinity: |
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/node-role
operator: In
values:
- management
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: {{ template "vault.name" . }}
app.kubernetes.io/instance: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
component: consul
topologyKey: kubernetes.io/hostname EDIT:
|
Can confirm @jdeprin After changing the config for |
Overview of the Issue
I am currently running vault server on the same hosts as my consul servers. When upgrading from consul 1.4.0 to consul 1.4.3 vault now fails to register its sealed check with consul 1.4.3 so sealed status is never being reported to consul and im getting the following log messages piling up in my consul and vault logs.
Reproduction Steps
Operating system and Environment details
Official Docker container for consul 1.4.3 and vault 1.0.2
Log Fragments
Vault Logs
Consul Logs
The text was updated successfully, but these errors were encountered: