Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault stopped responding after concurrent iteration and write on a map #24022

Closed
kirby0025 opened this issue Nov 6, 2023 · 1 comment · Fixed by #24027
Closed

Vault stopped responding after concurrent iteration and write on a map #24022

kirby0025 opened this issue Nov 6, 2023 · 1 comment · Fixed by #24027

Comments

@kirby0025
Copy link

Describe the bug

The instance stopped working properly after a concurrent iteration and write on map.
In the log we found these lines :

vault[3649760] fatal error: concurrent map iteration and map write
 vault[3649760] goroutine 59827 [running]
 vault[3649760] sync.(*Map).Range(0xc0107b0160, 0xc012275c00)
 vault[3649760]         /opt/hostedtoolcache/go/1.21.3/x64/src/sync/map.go
 vault[3649760] github.com/hashicorp/vault/vault.(*ExpirationManager).WalkTokens(0xc002ccd880, 0xc012275cc0?)
 vault[3649760]         /home/runner/work/vault/vault/vault/expiration.go
 vault[3649760] github.com/hashicorp/vault/vault.(*TokenStore).gaugeCollector(0xc000179b30, {0xb8c7980, 0xc013890930})
 vault[3649760]         /home/runner/work/vault/vault/vault/token_store.go
 vault[3649760] github.com/hashicorp/vault/vault.(*Core).tokenGaugeCollector(0xc00177a000, {0xb8c7980, 0xc013890930})
 vault[3649760]         /home/runner/work/vault/vault/vault/core_metrics.go
 vault[3649760] github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).collectAndFilterGauges(0xc0029fedc0)
 vault[3649760]         /home/runner/work/vault/vault/helper/metricsutil/gauge_process.go
 vault[3649760] github.com/hashicorp/vault/helper/metricsutil.(*GaugeCollectionProcess).Run(0xc0029fedc0)
 vault[3649760]         /home/runner/work/vault/vault/helper/metricsutil/gauge_process.go
 vault[3649760] created by github.com/hashicorp/vault/vault.(*Core).emitMetricsActiveNode in goroutine 59601
 vault[3649760]         /home/runner/work/vault/vault/vault/core_metrics.go
 vault[3649760] goroutine 1 [select, 2188 minutes]
 vault[3649760] github.com/hashicorp/vault/command.(*ServerCommand).Run(0xc00017f200, {0xc0000b4950, 0x1, 0x1})
 vault[3649760]         /home/runner/work/vault/vault/command/server.go
 vault[3649760] github.com/mitchellh/cli.(*CLI).Run(0xc002ecf2c0)
 vault[3649760]         /home/runner/go/pkg/mod/github.com/mitchellh/cli@v1.1.5/cli.go
 vault[3649760] github.com/hashicorp/vault/command.RunCustom({0xc0000b4940?, 0x2?, 0x2?}, 0xc0000061a0?)
 vault[3649760]         /home/runner/work/vault/vault/command/main.go
 vault[3649760] github.com/hashicorp/vault/command.Run(...)
 vault[3649760]         /home/runner/work/vault/vault/command/main.go
 vault[3649760] main.main()
 vault[3649760]         /home/runner/work/vault/vault/main.go
 vault[3649760] goroutine 17 [select]
 vault[3649760] go.opencensus.io/stats/view.(*worker).start(0xc00293ad80)
 vault[3649760]         /home/runner/go/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go
 vault[3649760] created by go.opencensus.io/stats/view.init.0 in goroutine 1
 vault[3649760]         /home/runner/go/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go
 vault[3649760] goroutine 10 [syscall, 2188 minutes]
 vault[3649760] os/signal.signal_recv()
 vault[3649760]         /opt/hostedtoolcache/go/1.21.3/x64/src/runtime/sigqueue.go
 vault[3649760] os/signal.loop()
 vault[3649760]         /opt/hostedtoolcache/go/1.21.3/x64/src/os/signal/signal_unix.go
 vault[3649760] created by os/signal.Notify.func1.1 in goroutine 1
 vault[3649760]         /opt/hostedtoolcache/go/1.21.3/x64/src/os/signal/signal.go
 vault[3649760] goroutine 19 [chan receive, 2188 minutes]
 vault[3649760] github.com/hashicorp/vault/command.MakeShutdownCh.func1()
 vault[3649760]         /home/runner/work/vault/vault/command/commands.go
 vault[3649760] created by github.com/hashicorp/vault/command.MakeShutdownCh in goroutine 1
 vault[3649760]         /home/runner/work/vault/vault/command/commands.go
 vault[3649760] goroutine 20 [chan receive, 2188 minutes]
 vault[3649760] github.com/hashicorp/vault/command.MakeSighupCh.func1()
 vault[3649760]         /home/runner/work/vault/vault/command/commands.go
 vault[3649760] created by github.com/hashicorp/vault/command.MakeSighupCh in goroutine 1
 vault[3649760]         /home/runner/work/vault/vault/command/commands.go
vault[3649760]: github.com/hashicorp/vault/command.MakeSigUSR2Ch.func1()
vault[3649760]:         /home/runner/work/vault/vault/command/commands_nonwindows.go:24 +0x27
vault[3649760]: created by github.com/hashicorp/vault/command.MakeSigUSR2Ch in goroutine 1
vault[3649760]:         /home/runner/work/vault/vault/command/commands_nonwindows.go:22 +0xc5

For information : systemd was able to restart it properly.

To Reproduce
I have no steps to reproduce this bug as this is the first time this happens, but I thought it was severe enough to be shared.
Feel free to close this issue if you think it's not relevant enough.

Expected behavior
Vault to not crash because of concurrent iteration and write on a map.

Environment:

  • Vault Server Version (retrieve with vault status): 1.15.1-1 2023-10-20T19:16:11Z
  • Vault CLI Version (retrieve with vault version): Vault v1.15.1 (b94e275), built 2023-10-20T19:16:11Z
  • Server Operating System/Architecture: Ubuntu 20.04

Vault server configuration file(s):

ui = true
disable_mlock = true
storage "raft" {
    path = "/opt/vault/raft/data"
    node_id = "vault_1"
}
cluster_addr = "http://127.0.0.1:8201"
listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/etc/ssl/certs/<redacted>.pem"
  tls_key_file = "/etc/ssl/private/<redacted>"
}
api_addr = "http://0.0.0.0:8200"
telemetry {
  disable_hostname = true
  prometheus_retention_time = "72h"
}

Additional context
We use approle, kubernetes, token, userpass and ldap authentication method. The process is running on a dedicated VM with sufficient ressources. Memory was not full at the time and cpu load is low.

@daoz1026
Copy link

daoz1026 commented Nov 6, 2023

Hello

We recently upgraded from 1.11 to 1.15.1, we are having the same issue, it happens randomly

"journalctl -u vault "

Nov 03 23:28:31  systemd[1]: vault.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 03 23:28:31  systemd[1]: Unit vault.service entered failed state.
Nov 03 23:28:31  systemd[1]: vault.service failed.
Nov 03 23:28:36  systemd[1]: vault.service holdoff time over, scheduling restart.
Nov 03 23:28:36  systemd[1]: Stopped "HashiCorp Vault - A tool for managing secrets".
Nov 03 23:28:36  systemd[1]: Started "HashiCorp Vault - A tool for managing secrets".
Nov 04 03:49:39  systemd[1]: Reloading "HashiCorp Vault - A tool for managing secrets".
Nov 04 03:49:39  systemd[1]: Reloaded "HashiCorp Vault - A tool for managing secrets".
Nov 04 16:26:21  systemd[1]: vault.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 04 16:26:21  systemd[1]: Unit vault.service entered failed state.
Nov 04 16:26:21  systemd[1]: vault.service failed.
Nov 04 16:26:26  systemd[1]: vault.service holdoff time over, scheduling restart.
Nov 04 16:26:26  systemd[1]: Stopped "HashiCorp Vault - A tool for managing secrets".
Nov 04 16:26:26  systemd[1]: Started "HashiCorp Vault - A tool for managing secrets".
Nov 05 03:26:16  systemd[1]: Reloading "HashiCorp Vault - A tool for managing secrets".
Nov 05 03:26:16  systemd[1]: Reloaded "HashiCorp Vault - A tool for managing secrets".
Nov 05 03:28:57  systemd[1]: Reloading "HashiCorp Vault - A tool for managing secrets".
Nov 05 03:28:57  systemd[1]: Reloaded "HashiCorp Vault - A tool for managing secrets".
Nov 05 23:22:40  systemd[1]: vault.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 05 23:22:40  systemd[1]: Unit vault.service entered failed state.
Nov 05 23:22:40  systemd[1]: vault.service failed.
Nov 05 23:22:45  systemd[1]: vault.service holdoff time over, scheduling restart.
Nov 05 23:22:45  systemd[1]: Stopped "HashiCorp Vault - A tool for managing secrets".
Nov 05 23:22:45  systemd[1]: Started "HashiCorp Vault - A tool for managing secrets".
Nov 05 23:23:06  systemd[1]: Stopping "HashiCorp Vault - A tool for managing secrets"...
Nov 05 23:23:08  systemd[1]: Stopped "HashiCorp Vault - A tool for managing secrets".
Nov 05 23:23:08  systemd[1]: Started "HashiCorp Vault - A tool for managing secrets".
Nov 06 03:15:13  systemd[1]: Reloading "HashiCorp Vault - A tool for managing secrets".
Nov 06 03:15:13  systemd[1]: Reloaded "HashiCorp Vault - A tool for managing secrets".
Nov 06 03:17:56  systemd[1]: Reloading "HashiCorp Vault - A tool for managing secrets".
Nov 06 03:17:56  systemd[1]: Reloaded "HashiCorp Vault - A tool for managing secrets".
Nov 06 05:42:03  systemd[1]: vault.service: main process exited, code=exited, status=2/INVALIDARGUMENT

Each time vault fails with INVALIDARGUMENT, I have a "fatal error: concurrent map iteration and map write", then it restarts

2023-11-06T05:42:03.374Z [INFO]  core: ...
2023-11-06T05:42:03.391Z [INFO]  core: ...
**fatal error: concurrent map iteration and map write**
==> Vault server configuration:

Administrative Namespace:
             Api Address: https://172.30.26.183:8200
                     Cgo: disabled
         Cluster Address: https://172.30.26.183:8201
   Environment Variables: GODEBUG, GOTRACEBACK, HOME, LANG, LOGNAME, PATH, PWD, SHELL, SHLVL, USER
              Go Version: go1.21.3
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: info
                   Mlock: supported: true, enabled: true
           Recovery Mode: false
                 Storage: postgresql (HA available)
                 Version: Vault v1.15.1, built 2023-10-20T19:16:11Z
             Version Sha: b94e275f25ccd9011146d14c00ea9e49fd5032dc

==> Vault server started! Log data will stream in below:

2023-11-06T05:42:08.789Z [WARN]  unknown or unsupported field tls_prefer_server_cipher_suites found in configuration at /etc/vault.d/vault_main.hcl:19:5
2023-11-06T05:42:08.789Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants