Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault cluster with PostgreSQL storage backend fails on database reboot #17643

Open
johanneswuerbach opened this issue Oct 24, 2022 · 0 comments · May be fixed by #17924
Open

Vault cluster with PostgreSQL storage backend fails on database reboot #17643

johanneswuerbach opened this issue Oct 24, 2022 · 0 comments · May be fixed by #17924
Labels
bug Used to indicate a potential bug core/storage storage/postgresql

Comments

@johanneswuerbach
Copy link

Describe the bug

A vault cluster using the PostgreSQL storage backend becomes unhealthy when the backing postgres instead is restarted and requires to manually restart a pod.

To Reproduce
Steps to reproduce the behavior:

  1. Create a 3-node vault cluster using https://github.com/hashicorp/vault-helm with a PostgreSQL backend.
  2. Restart the PostgresSQL instance.
  3. See error

Expected behavior

Cluster recovering automatically once Postgres is up and running.

Environment:

  • Vault Server Version (retrieve with vault status): 1.11.3
  • Vault CLI Version (retrieve with vault version): Vault v1.11.3
  • Server Operating System/Architecture: linux/amd64

Vault server configuration file(s):

disable_mlock = true
ui = true

listener "tcp" {
  tls_disable = 1
  address = "[::]:8200"
  cluster_address = "[::]:8201"

  telemetry {
    unauthenticated_metrics_access = true
  }
}

telemetry {
  disable_hostname = true
  prometheus_retention_time = "12h"
}

storage "postgresql" {
  # connection configured using env vars
  ha_enabled = true
  max_idle_connections = 64
  max_parallel = 32
}

seal "gcpckms" {
  # configured using env vars
}

Additional context
Add any other context about the problem here.

Example logs
Date Service display_container_name Message
2022-10-20T07:30:20.300Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h4c2658292fb8dfe45786ce9be5ea1a5755fd8dbfbf9a40c3f9e4221ab689f067
2022-10-20T07:30:25.787Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h8b0f5261d51fe5b8e120c2c93eef59f20047a17e9aa9b9bb0b1ff2e00f9b4320
2022-10-20T07:30:26.413Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h1d68cc27026b3462efcdd8274d89620e9c56532b7fe40012d974fa1dfdfbf776
2022-10-20T07:30:26.521Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h7b8e182cff853548e2c9dbd170ab4a0e4622f8690d175e3c6d2abb2ca5ae570d
2022-10-20T07:30:42.466Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h8f25f81eeca76fa3f9099d2e973b3ec52134d514f9d2197c5fc729c215098093
2022-10-20T07:30:48.522Z "vault" "vault_vault-0" revoked lease: lease_id=auth/kubernetes/login/h0086e3714a140a915004a261303a53282104398b21b0e11088afa1a6a5c5a812
2022-10-20T07:30:54.419Z "vault" "vault_vault-1" failed to acquire lock: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:30:54.667Z "vault" "vault_vault-1" key rotation periodic upgrade check failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:30:54.810Z "vault" "vault_vault-2" failed to acquire lock: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:30:54.933Z "vault" "vault_vault-0" leadership lost, stopping active operation
2022-10-20T07:30:54.934Z "vault" "vault_vault-0" pre-seal teardown starting
2022-10-20T07:30:55.436Z "vault" "vault_vault-0" stopping rollback manager
2022-10-20T07:30:55.605Z "vault" "vault_vault-0" pre-seal teardown complete
2022-10-20T07:30:55.606Z "vault" "vault_vault-0" clearing leader advertisement failed: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:30:55.606Z "vault" "vault_vault-0" unlocking HA lock failed: error="unexpected EOF"
2022-10-20T07:30:55.952Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:30:57.608Z "vault" "vault_vault-0" failed to acquire lock: error="unexpected EOF"
2022-10-20T07:30:58.346Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:00.949Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:01.111Z "vault" "vault_vault-2" key rotation periodic upgrade check failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:31:02.035Z "vault" "vault_vault-0" key rotation periodic upgrade check failed: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:31:03.340Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:04.675Z "vault" "vault_vault-1" key rotation periodic upgrade check failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:31:05.430Z "vault" "vault_vault-1" failed to acquire lock: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:31:05.821Z "vault" "vault_vault-2" failed to acquire lock: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:31:05.951Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:08.340Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:08.609Z "vault" "vault_vault-0" failed to acquire lock: error="unexpected EOF"
2022-10-20T07:31:10.949Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:12.036Z "vault" "vault_vault-0" key rotation periodic upgrade check failed: error="unexpected EOF"
2022-10-20T07:31:13.340Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:15.949Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:18.342Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:19.611Z "vault" "vault_vault-0" failed to acquire lock: error="unexpected EOF"
2022-10-20T07:31:25.950Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:28.341Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:35.951Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:38.343Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:55.949Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:31:58.341Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:32:20.953Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:32:28.341Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:33:10.951Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:33:13.342Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:33:24.842Z "vault" "vault_vault-1" key rotation periodic upgrade check failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:33:24.843Z "vault" "vault_vault-1" failed to acquire lock: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:33:28.987Z "vault"   failed to acquire lock: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:33:32.335Z "vault" "vault_vault-0" key rotation periodic upgrade check failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:33:35.878Z "vault" "vault_vault-1" acquired lock, enabling active operation
2022-10-20T07:33:40.528Z "vault" "vault_vault-0" failed to acquire lock: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:34:23.350Z "vault" "vault_vault-0" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:34:26.524Z "vault" "vault_vault-1" post-unseal setup starting
2022-10-20T07:34:26.563Z "vault" "vault_vault-1" loaded wrapping token key
2022-10-20T07:34:26.639Z "vault" "vault_vault-1" successfully setup plugin catalog: plugin-directory=""
2022-10-20T07:34:26.694Z "vault" "vault_vault-1" successfully mounted backend: type=system path=sys/
2022-10-20T07:34:26.708Z "vault" "vault_vault-1" successfully mounted backend: type=identity path=identity/
2022-10-20T07:34:26.734Z "vault" "vault_vault-1" successfully mounted backend: type=kv path=secret/
2022-10-20T07:34:26.734Z "vault" "vault_vault-1" successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-10-20T07:34:26.833Z "vault" "vault_vault-1" successfully enabled credential backend: type=token path=token/ namespace="ID: root. Path: "
2022-10-20T07:34:26.834Z "vault" "vault_vault-1" successfully enabled credential backend: type=kubernetes path=kubernetes/ namespace="ID: root. Path: "
2022-10-20T07:34:26.882Z "vault" "vault_vault-1" starting rollback manager
2022-10-20T07:34:26.885Z "vault" "vault_vault-1" restoring leases
2022-10-20T07:34:26.985Z "vault" "vault_vault-1" entities restored
2022-10-20T07:34:27.004Z "vault" "vault_vault-1" groups restored
2022-10-20T07:34:27.278Z "vault" "vault_vault-1" upgrading recovery key
2022-10-20T07:34:27.380Z "vault" "vault_vault-1" upgrading stored keys
2022-10-20T07:34:27.519Z "vault" "vault_vault-1" post-unseal setup complete
2022-10-20T07:34:30.530Z "vault" "vault_vault-0" failed to acquire lock: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:30.989Z "vault" "vault_vault-2" failed to acquire lock: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:31.097Z "vault" "vault_vault-2" key rotation periodic upgrade check failed: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:32.050Z "vault" "vault_vault-0" key rotation periodic upgrade check failed: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:32.063Z "vault" "vault_vault-1" leadership lost, stopping active operation
2022-10-20T07:34:32.064Z "vault" "vault_vault-1" pre-seal teardown starting
2022-10-20T07:34:32.068Z "vault" "vault_vault-1" error restoring leases: error="failed to read lease entry auth/kubernetes/login/he0f3194c1d788d56f128452f61362e901706518d8e63938b9ea7a6e4dd387fe6: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:32.069Z "vault" "vault_vault-1" shutting down
2022-10-20T07:34:32.069Z "vault" "vault_vault-1" marked as sealed
2022-10-20T07:34:32.566Z "vault" "vault_vault-1" stopping rollback manager
2022-10-20T07:34:32.566Z "vault" "vault_vault-1" pre-seal teardown complete
2022-10-20T07:34:32.602Z "vault" "vault_vault-1" clearing leader advertisement failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:34:32.602Z "vault" "vault_vault-1" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:34:32.643Z "vault" "vault_vault-1" unlocking HA lock failed: error="failed to connect to host=localhost user=user_XXX database=db_XXX: dial error (dial tcp [::1]:5432: connect: cannot assign requested address)"
2022-10-20T07:34:32.730Z "vault" "vault_vault-1" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:34:33.644Z "vault" "vault_vault-1" stopping cluster listeners
2022-10-20T07:34:33.644Z "vault" "vault_vault-1" forwarding rpc listeners stopped
2022-10-20T07:34:33.732Z "vault" "vault_vault-1" rpc listeners successfully shut down
2022-10-20T07:34:33.732Z "vault" "vault_vault-1" cluster listeners successfully shut down
2022-10-20T07:34:33.732Z "vault" "vault_vault-1" vault is sealed
2022-10-20T07:34:41.531Z "vault" "vault_vault-0" failed to acquire lock: error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2022-10-20T07:34:42.053Z "vault" "vault_vault-2" acquired lock, enabling active operation
2022-10-20T07:34:42.277Z "vault" "vault_vault-2" post-unseal setup starting
2022-10-20T07:34:42.314Z "vault" "vault_vault-2" loaded wrapping token key
2022-10-20T07:34:42.405Z "vault" "vault_vault-2" successfully setup plugin catalog: plugin-directory=""
2022-10-20T07:34:42.458Z "vault" "vault_vault-2" successfully mounted backend: type=system path=sys/
2022-10-20T07:34:42.481Z "vault" "vault_vault-2" successfully mounted backend: type=identity path=identity/
2022-10-20T07:34:42.508Z "vault" "vault_vault-2" successfully mounted backend: type=kv path=secret/
2022-10-20T07:34:42.508Z "vault" "vault_vault-2" successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-10-20T07:34:42.589Z "vault" "vault_vault-2" successfully enabled credential backend: type=token path=token/ namespace="ID: root. Path: "
2022-10-20T07:34:42.589Z "vault" "vault_vault-2" successfully enabled credential backend: type=kubernetes path=kubernetes/ namespace="ID: root. Path: "
2022-10-20T07:34:42.634Z "vault" "vault_vault-2" starting rollback manager
2022-10-20T07:34:42.635Z "vault" "vault_vault-2" restoring leases
2022-10-20T07:34:42.709Z "vault" "vault_vault-2" no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2022-10-20T07:34:43.370Z "vault" "vault_vault-2" entities restored
2022-10-20T07:34:43.398Z "vault" "vault_vault-2" groups restored
2022-10-20T07:34:43.666Z "vault" "vault_vault-2" lease restore complete
2022-10-20T07:34:43.882Z "vault" "vault_vault-2" post-unseal setup complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/storage storage/postgresql
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants