Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash at startup with local mount #14750

Closed
pbriet opened this issue Mar 29, 2022 · 6 comments
Closed

Crash at startup with local mount #14750

pbriet opened this issue Mar 29, 2022 · 6 comments
Assignees
Labels
bug Used to indicate a potential bug

Comments

@pbriet
Copy link

pbriet commented Mar 29, 2022

Describe the bug

Vault crashes after being unsealed :

ttp2: panic serving 127.0.0.1:37738: runtime error: invalid memory address or nil pointer dereference
goroutine 803 [running]:
net/http.(*http2serverConn).runHandler.func1()
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/h2_bundle.go:5842 +0x125
panic({0x45020e0, 0x8c520d0})
	/opt/hostedtoolcache/go/1.17.7/x64/src/runtime/panic.go:1038 +0x215
github.com/hashicorp/vault/vault.(*Core).loadMounts(0xc000d2ce00, {0x60ba1a8, 0xc000de65c0})
	/home/runner/work/vault/vault/vault/mount.go:1038 +0x58f
github.com/hashicorp/vault/vault.standardUnsealStrategy.unseal({}, {0x60ba1a8, 0xc000de65c0}, {0x100, 0xc000d62be8}, 0xc000d2ce00)
	/home/runner/work/vault/vault/vault/core.go:2077 +0x36c
github.com/hashicorp/vault/vault.(*Core).postUnseal(0xc000d2ce00, {0x60ba1a8, 0xc000de65c0}, 0xc00138c320, {0x6029080, 0x8ceee50})
	/home/runner/work/vault/vault/vault/core.go:2209 +0x391
github.com/hashicorp/vault/vault.(*Core).unsealInternal(0xc000d2ce00, {0x60ba1e0, 0xc00007c070}, {0xc000c2e150, 0x618d248, 0xc0008819a0})
	/home/runner/work/vault/vault/vault/core.go:1670 +0x55e
github.com/hashicorp/vault/vault.(*Core).unsealFragment(0xc000d2ce00, {0xc0009a26e0, 0x21, 0x50}, 0x0)
	/home/runner/work/vault/vault/vault/core.go:1316 +0x68a
github.com/hashicorp/vault/vault.(*Core).Unseal(...)
	/home/runner/work/vault/vault/vault/core.go:1211
github.com/hashicorp/vault/http.handleSysUnseal.func1({0x6069c78, 0xc0011760c0}, 0xc00066ae00)
	/home/runner/work/vault/vault/http/sys_seal.go:130 +0x2cc
net/http.HandlerFunc.ServeHTTP(0xc0010d9598, {0x6069c78, 0xc0011760c0}, 0x0)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
net/http.(*ServeMux).ServeHTTP(0x605c9b8, {0x6069c78, 0xc0011760c0}, 0xc00066ae00)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2425 +0x149
github.com/hashicorp/vault/http.wrapHelpHandler.func1({0x6069c78, 0xc0011760c0}, 0xc00066ae00)
	/home/runner/work/vault/vault/http/help.go:23 +0x129
net/http.HandlerFunc.ServeHTTP(0xc00067c060, {0x6069c78, 0xc0011760c0}, 0xc0010d9648)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
github.com/hashicorp/vault/http.wrapCORSHandler.func1({0x6069c78, 0xc0011760c0}, 0xc0010d9708)
	/home/runner/work/vault/vault/http/cors.go:29 +0x6e4
net/http.HandlerFunc.ServeHTTP(0xc000d2ce00, {0x6069c78, 0xc0011760c0}, 0xc0007fc500)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
github.com/hashicorp/vault/http.rateLimitQuotaWrapping.func1({0x6069c78, 0xc0011760c0}, 0xc00066ae00)
	/home/runner/work/vault/vault/http/util.go:97 +0x9d0
net/http.HandlerFunc.ServeHTTP(0xc000c4cd80, {0x6069c78, 0xc0011760c0}, 0xc00066c290)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
github.com/hashicorp/vault/http.wrapGenericHandler.func1({0x6076fc8, 0xc00011e118}, 0xc0014a7000)
	/home/runner/work/vault/vault/http/handler.go:422 +0x119c
net/http.HandlerFunc.ServeHTTP(0xc00066c2e0, {0x6076fc8, 0xc00011e118}, 0x1)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
github.com/hashicorp/go-cleanhttp.PrintablePathCheckHandler.func1({0x6076fc8, 0xc00011e118}, 0xc0014a7000)
	/home/runner/go/pkg/mod/github.com/hashicorp/go-cleanhttp@v0.5.2/handlers.go:42 +0x98
net/http.HandlerFunc.ServeHTTP(0x0, {0x6076fc8, 0xc00011e118}, 0x0)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0x0}, {0x6076fc8, 0xc00011e118}, 0xc0014a7000)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:2879 +0x43b
net/http.initALPNRequest.ServeHTTP({{0x60ba250, 0xc000c4c840}, 0xc000e10700, {0xc0009cf0a0}}, {0x6076fc8, 0xc00011e118}, 0xc0014a7000)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/server.go:3480 +0x245
net/http.(*http2serverConn).runHandler(0x113d585, 0xc000e66100, 0x0, 0xc000e66100)
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/h2_bundle.go:5849 +0x78
created by net/http.(*http2serverConn).processHeaders
	/opt/hostedtoolcache/go/1.17.7/x64/src/net/http/h2_bundle.go:5579 +0x510

To Reproduce

Vault has been installed through the Banzai Cloud operator on Openshift.
It works correctly, but at some point, when restarting the pod, it fails to start. The crash occurs once the unsealing process is finalized.

It is configured with a local storage (non-replicated)

Code Analysis

Something looks odd to me. In loadMounts (https://github.com/hashicorp/vault/blob/main/vault/mount.go#L995), it seems that raw and rawLocal can both be nil.

In our case, raw is probably nil here : https://github.com/hashicorp/vault/blob/main/vault/mount.go#L1038

Not sure what this means though

Environment:

  • Vault 1.10
  • Banzai Cloud operator 1.15.2
  • Openshift 4.8

vault-config.json

{"listener":{"tcp":{"address":"0.0.0.0:8200","telemetry":{"unauthenticated_metrics_access":true},"tls_cert_file":"/vault/tls/server.crt","tls_key_file":"/vault/tls/server.key"}},"storage":{"file":{"path":"${ .Env.VAULT_STORAGE_FILE }"}},"telemetry":{"disable_hostname":true,"prometheus_retention_time":"24h"},"ui":true}
@stevendpclark stevendpclark added the bug Used to indicate a potential bug label Mar 29, 2022
stevendpclark added a commit that referenced this issue Mar 29, 2022
 - Reported within issue #14750 as a panic, it was identified that
   we were using the wrong value for local mounts within the table metrics.
@stevendpclark
Copy link
Contributor

Hi @pbriet! Thanks for the bug report, I agree with your conclusion and have submitted a fix for the table metric which should address the panic.

What is unclear to me still though is how within your environment managed to get a nil value for the raw variable. That shouldn't ever happen from my understanding, so additional insight on your setup might be useful. The only way I managed to reproduce the issue was to zero out the data/core/_mounts file, but again that should never happen on an initialized cluster.

@stevendpclark stevendpclark self-assigned this Mar 29, 2022
@pbriet
Copy link
Author

pbriet commented Mar 30, 2022

Hi @stevendpclark . Thanks for the fix.

What do raw and rawLocal stand for exactly?
We are in a non-replicated environment (simple PoC).

Maybe this crash hides another error that will occur once the patch is released, let's see.

Which elements could I provide to help you digging?
Here are a few things :

  • We installed Vault with the Banzai cloud operator. A single, non-HA, install, with a file storage (mounted PVC)
  • We also use the RedHat Vault Config Operator (https://github.com/redhat-cop/vault-config-operator) to configure roles, policies, secret engines, and random secrets.
  • Everything works fine for a while. Then one day, the pod fails to restart (we haven't identified why / when) with this bug. This happened twice (the first time, we reinstalled everything from scratch)

Thanks,

@pbriet
Copy link
Author

pbriet commented Mar 30, 2022

Aren't we in this case ? (no write on coreMountConfigPath)
https://github.com/hashicorp/vault/blob/main/vault/mount.go#L1225

@stevendpclark
Copy link
Contributor

Hi @pbriet,

So the mount points under the raw variable are core to Vault and/or are marked for replication (no matter if you are setup for replication or not). The default is for the paths to be within raw (replicated) and should always contain at least the /sys and /identity mount points. These two are critical to Vault and the only time they shouldn't exist, to my knowledge, is before the vault operator init has been performed.

rawLocal will contain mount paths that have been enabled/setup with the local argument set when creating the mount path , see https://www.vaultproject.io/docs/commands/secrets/enable#local

As an example on an initial setup of Vault post vault operator init, the Vault instance will have /sys and /identity within the raw and /cubbyhole mount within rawLocal.

If raw is nil basically that means that something wiped out the data/core/_mounts file with the storage backend you are currently using. That is pretty horrible and I don't believe even with the bugfix that you will recover from it. At this point I can't see what could possibly do that but I'm really not familiar with OpenShift and mounted PVCs.

@pbriet
Copy link
Author

pbriet commented Mar 30, 2022

Thanks for your feedback.

Maybe one operator is messing with the data, or the storage gets corrupted (unlikely, but...)
When your fix is released, we'll see if Vault recovers or not. I will come back to you.

Best regards,

stevendpclark added a commit that referenced this issue Mar 30, 2022
* Address incorrect table metric value for local mounts

 - Reported within issue #14750 as a panic, it was identified that
   we were using the wrong value for local mounts within the table metrics.

* Add changelog
@stevendpclark
Copy link
Contributor

Hi @pbriet

Thanks again for the bug report. I'll close this issue out for now, the fix should be included within the next minor releases of Vault 1.9 and 1.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug
Projects
None yet
Development

No branches or pull requests

2 participants