Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault root token revoked after consul leader election #8323

Open
daktari opened this issue Feb 10, 2020 · 1 comment
Open

Vault root token revoked after consul leader election #8323

daktari opened this issue Feb 10, 2020 · 1 comment
Labels
bug Used to indicate a potential bug core/token

Comments

@daktari
Copy link

daktari commented Feb 10, 2020

Describe the bug
Hi,

we have a root token revoked after a consul multiple leader election due to network problems.

The token lookup shows a num_uses = -1 and stop generating orphan tokens to our nomad infrastructure. We have to generate a new root token due the revocation but we are now working to move to a role based Integration instead of root ones.

Do you have any thoughts about how this could happen?

vault token lookup <My-token>

Key                 Value
---                 -----
accessor            <My-accessor>
creation_time       1504770201
creation_ttl        0s
display_name        root
entity_id           n/a
expire_time         <nil>
explicit_max_ttl    0s
id                  <My-token>
meta                <nil>
num_uses            -1
orphan              true
path                auth/token/root
policies            [root]
ttl                 0s
type                service

We are not able to reproduce it.

Expected behavior
A root token should never expire without operator requirement

Environment:

  • Vault Server Version: 1.2.4
  • Server Operating System/Architecture:
  • 3 Vault servers.
  • 3 Consul servers

Vault server configuration file(s):

max_lease_ttl = "87600h"
disable_mlock = false
ui = true
plugin_directory = "/opt/vault/plugins"
api_addr = "https://<ip>:8200"

storage "consul" {
  token = "<consul-token>"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/etc/certs/server.crt"
  tls_key_file = "/etc/certs/server.key"
}

telemetry {
  statsd_address = "localhost:8125"
}

Attach some logs:

Feb  8 08:55:40 nomad-server-1 nomad: {"@level":"error","@message":"Vault token creation for alloc failed","@module":"nomad.client","@timestamp":"2020-02-08T08:55:40.215247Z","alloc_id":"alloc-id","error":"failed to create an alloc vault token: Error making API request.\n\nURL: POST https://vault-addr:8200/v1/auth/token/create\nCode: 400. Errors:\n\n* failed to persist accessor index entry: Unexpected response code: 500 (rpc error making call: rpc error making call: node is not the leader)"}
Feb  8 08:55:42 nomad-server-1 nomad: {"@level":"warn","@message":"failed to revoke tokens. Will reattempt until TTL","@module":"nomad.vault","@timestamp":"2020-02-08T08:55:42.154423Z","error":"failed to revoke token (alloc: \"alloc-id\", node: \"node-id\", task: \"goxy\"): Error making API request.\n\nURL: POST https://vault-addr:8200/v1/auth/token/revoke-accessor\nCode: 403. Errors:\n\n* permission denied"}
Feb  8 08:55:44 nomad-server-1 nomad: {"@level":"error","@message":"Vault token creation for alloc failed","@module":"nomad.client","@timestamp":"2020-02-08T08:55:44.948960Z","alloc_id":"alloc-id","error":"failed to create an alloc vault token: Error making API request.\n\nURL: POST https://vault-addr:8200/v1/auth/token/create\nCode: 403. Errors:\n\n* permission denied"}```
@catsby catsby added core/token storage/consul version/1.2.x bug Used to indicate a potential bug labels Feb 13, 2020
@ncabatoff
Copy link
Collaborator

Hi @daktari,

For num_uses to get to -1, it must have had an explicit non-zero value before. In other words, when you created this root token, you specified that it could only be used N times, and now those N uses have been exhausted. From our code:

// Used to restrict the number of uses (zero is unlimited). This is to
// support one-time-tokens (generalized). There are a few special values:
// if it's -1 it has run through its use counts and is executing its final
// use; if it's -2 it is tainted, which means revocation is currently
// running on it; and if it's -3 it's also tainted but revocation
// previously ran and failed, so this hints the tidy function to try it
// again.
NumUses int `json:"num_uses" mapstructure:"num_uses" structs:"num_uses"`

The fact that it stuck around hints that maybe we're not properly revoking these exhausted root tokens. This shouldn't cause any problems unless you were generating a tremendous number of these, but I shall follow up on principle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/token
Projects
None yet
Development

No branches or pull requests

4 participants