Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault HTTP2 no cached connection #6433

Closed
mtneug opened this issue Oct 7, 2019 · 8 comments · Fixed by #7673
Closed

Vault HTTP2 no cached connection #6433

mtneug opened this issue Oct 7, 2019 · 8 comments · Fixed by #7673

Comments

@mtneug
Copy link

mtneug commented Oct 7, 2019

Nomad version

Nomad v0.9.5 ('0.9.5')

Operating system and Environment details

Nomad is running on Alpine Linux with enabled Docker, exec, and raw_exec task drivers. It is connected to a Consul and Vault cluster all having three nodes. The issue occurred when testing integration with Vault (have not yet tested Consul with consul-template, but the Nomad instances find each other via Consul so I assume this is working). Both, Consul and Vault use server and require client HTTPS certificates. Nomad was compiled with Go 1.13 (Alpine package 1.13-r0).

$ uname -a
Linux ... 4.19.76-0-virt #1-Alpine SMP Tue Oct 1 09:34:00 UTC 2019 x86_64 Linux

Issue

Any interactions with Vault will fail after some time (e.g. renew tokens, delete tokens, create tokens for allocations). The root cause is the Go HTTP2 connection pool (see logs). Searching for the error seems to indicate that it was fixed at some time in the past. Due to time constraints, I'm unsure if the vendored golang.org/x/net/http2 simply needs to be updated to fix the issue.

Nomad Server logs (if appropriate)

2019-10-07T16:23:08.637+0200 [WARN ] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "c3ee58b4-3548-4eed-28a7-0b35534bed19", node: "9c66dbd2-3dc1-f859-7bee-7e6516bf5a36", task: "..."): Post https://active.vault.service.consul:8200/v1/auth/token/revoke-accessor: http2: no cached connection was available"

2019-10-07T16:38:44.047+0200 [ERROR] nomad.client: Vault token creation for alloc failed: alloc_id=fe0b827d-7594-eab2-8e4f-3b9334b2b473 error="failed to create an alloc vault token: Post https://active.vault.service.consul:8200/v1/auth/token/create/nomad-cluster: http2: no cached connection was available"

Maybe related issues

golang/go#16582
kubernetes/kubernetes#74412

@cgbaker
Copy link
Contributor

cgbaker commented Oct 7, 2019

thanks for the report, @mtneug !

@cgbaker cgbaker modified the milestones: near-term, 0.10.1 Oct 7, 2019
@cgbaker cgbaker added this to Needs Triage in Nomad - Community Issues Triage via automation Oct 29, 2019
@schmichael schmichael modified the milestones: 0.10.1, 0.10.2 Nov 5, 2019
@schmichael schmichael modified the milestones: 0.10.2, 0.10.3 Nov 19, 2019
@cgbaker cgbaker removed their assignment Nov 20, 2019
@tgross tgross removed this from Needs Triage in Nomad - Community Issues Triage Nov 25, 2019
@nvx
Copy link
Contributor

nvx commented Nov 26, 2019

A quick workaround seems to be setting the following environment variable on the nomad servers:

GODEBUG=http2client=0

@mtneug
Copy link
Author

mtneug commented Nov 27, 2019

@nvx thanks! I thought I tried that out, but I will try again and report back.

@preetapan preetapan removed this from the 0.10.3 milestone Jan 29, 2020
@nvx
Copy link
Contributor

nvx commented Feb 18, 2020

I played around with this a bit more, it looks like GODEBUG trick didn't work after all.

Updating the golang.org/x/net/http2 dependency and rebuilding did the trick however (against 0.10.3 tag):

govendor fetch golang.org/x/net/http2

Was used to update the dependency. Looks like it should be a pretty easy fix.

@thepeak99
Copy link

I can confirm it happens with Nomad 0.10.4 too and, as @nvx said, updating golang.org/x/net/http2 did the trick.

@tsarna
Copy link

tsarna commented Apr 7, 2020

This is a complete showstopper for me for using Vault and Nomad together. Nomad consistently fails to renew its token.

Given that this is a serious bug that prevents two of your flagship products from working together, and that it is apparently an low-effort fix, can we please get the change integrated?

@notnoop
Copy link
Contributor

notnoop commented Apr 9, 2020

Thank you folks! Sorry that this has slipped our attention for so long. The fix here, will be out in 0.11.1!

@schmichael schmichael added this to the 0.11.1 milestone Apr 9, 2020
@github-actions
Copy link

github-actions bot commented Nov 9, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants