Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set agent tokens on an agent other than itself #7972

Open
matthiasng opened this issue May 28, 2020 · 6 comments
Open

Set agent tokens on an agent other than itself #7972

matthiasng opened this issue May 28, 2020 · 6 comments
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic

Comments

@matthiasng
Copy link

Feature Description

Bootstrapping ACL is not an easy task. You must call consul acl bootstrap, create a token for all your node and set the agent token on them.
This means you have to call this endpoint (or consul acl set-agent-token) on each node.

It would simplify things if v1/agent/token (or maybe a new endpoint, for example v1/operator/agent/token) would optionally supports specifying the node on which we want to set the token.
Consul then forwards the request to the target agent.

Use Case(s)

  • Possibility to improve terraform-provider-consul ACL support. See [Feature] resource "consul_node_token" terraform-provider-consul#197.
    That's why I created this feature request. I think with this feature and [Feature] resource "consul_node_token" terraform-provider-consul#197, bootstrapping (and managing) your node tokens gets much easier and without the need of some manual work or glue scripts.
    Extending the provider with node token support is already possible, but if you provision your whole cluster with terraform (for example on AWS), there is the problem that Terraform probably cannot access the API on all nodes.

  • Allow to set agent tokens, without the need for a direct connection to the desired node.

  • further prospects: set agent tokens trough UI

Research

I already did some reserch and came accross the following, how it could be possible to implement it:

  1. Redirect the HTTP request to the leader via RPC and call the HTTP API of the the desired node from there
    I think this is the way with the at least impact, but on the downside this feature is then dependant on the HTTP API for internal use.

  2. Serf Event/Query. Documentation says no guaranteed delivery. Needs futher research.

I hope someone with more knowledge about Consul's internals can give me some feedback about this.

If you think this would be usefull and we find a good way to implement this, i am willing to contribute a patch.

@mkeeler
Copy link
Member

mkeeler commented May 28, 2020

@matthiasng Thanks for the report. Token distribution is pretty difficult right now due to the scenario you have highlighted. There is no way to centrally push tokens out to every Consul agent.

Before diving into some of our ideas around making this easier I want to address the possibilities you brought up.

You correctly identified one big issue with serf events: they have no guaranteed delivery. There are also potential security concerns as well as serf events are broadcast to every node in the cluster. Therefore, if we sent ACL tokens to clients in this manner we would be exposing those tokens to all agents in the cluster.

As for the first idea of using the existing RPC routing there is a possibility here that it could work in some deployments. Only the Consul servers run RPC servers so this could end up looking like:

  1. Agent receives token update request for another agent.
  2. Agent makes an RPC to one of the servers in its datacenter requesting a token to be updated.
  3. The server potentially must forward the RPC to a server in another datacenter.
  4. The last-hop server then needs to translate the RPC back to an HTTP request.
  5. The HTTP request is made to the client in the destination datacenter.

Steps 1-3 are how most of Consul’s API works today. Everything not under v1/agent/* typically just gets translated into an RPC and forwarded to the servers.

The biggest problems are around how servers can communicate with clients. In a typical setup the only inbound communication that needs to be allowed from servers to clients is gossip communication (even that can be problematic in some environments with clients in different networks that have overlapping IP space). Assuming that all firewalls and networking is taken care of to allow servers to make HTTP requests to clients then we have an issue with knowing what address and port to reach out to the clients on. Today, consul agents do not register their services (HTTP, HTTPs, DNS, gRPC) in the Catalog so the only information the servers would know about is the serf address. Is that the right IP? Does the client have HTTPs enabled (hopefully they do or else we would be transmitting tokens in plain text). What port is HTTPs running on. Basically there is a general service discover problem here. This is a little odd because that is one of Consul’s core use cases and regardless of how we solve the token distribution issue Consul should be updated to register its own services in the Catalog.

While it wouldn’t help with existing clusters, I have been working on a new feature for consul to facilitate setting up a secure consul cluster. The feature involves client agents being able to request a configuration package from the servers when they start up for the first time. This request would be authorized using a JWT token instead of a Consul ACL token. The servers would then be configured to validate those JWT tokens using configuration that is mostly the same as the JWT Auth Method being introduced in Consul 1.8.0. After validating the JWT signature and claims, the consul server would generate an appropriate ACL token, TLS certificates and send them back with other configuration including the gossip encryption key. This should make the process of starting new agents a little simpler as you wont have to push any of those consul specific secrets out of band. This still leaves the burden of JWT generation up to users. However, it doesn’t have to be Consul generating them. It could be terraform or some other orchestrator bringing up new agents that signs the JWT.

@banks
Copy link
Member

banks commented May 28, 2020

It could be terraform or some other orchestrator bringing up new agents that signs the JWT.

Just to elaborate a bit, this is our near-term goal for making ACL (and other secrets/config) distribution much easier for agents. Once Matt's feature is done, it will be possible (though will need some extra work) to build the terraform provider so that it can generate the "provisioning tokens" using keys available in the terraform runner environment (which is already trusted to build your whole infrastructure) without needing to build any new communication paths into Consul.

CC @remilapeyre FYI, we'll let you know when this is closer to ready but it should be in the next couple of months latest.

For now I'd like to (if it's OK with you @matthiasng) treat this issue as a placeholder for "make Consul ACL bootstrapping better". I can update the title and add a note to the description to help with discovery. Once we have released the ACL improvements Matt is working on we can then work with you and/or @remilapeyre to get these integrated and usable from Terraform in the cleanest way possible.

As Matt said though - thanks for opening this, it's very timely as we are literally discussing and working on this problem this week!

@matthiasng
Copy link
Author

These are pretty good news and thanks for your fast feedback.

@banks feel free to use the issue as a placeholer.
@remilapeyre of course i'm happy to help with the Terraform integration, as soon as the new function is ready. Just CC me.

@jsosulska jsosulska added type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic type/enhancement Proposed improvement or new feature labels May 29, 2020
@remilapeyre
Copy link
Contributor

This looks really good, it will make the deployment of ACLs easier, as far as I can see only the creation of the master token will require a manual step. The signing of the JWT could even leverage cloud features like AWS IAM and KMS to have fully automatic introduction of new nodes.

Thanks @matthiasng for giving a hand with the Terraform integration :)

@sebastianreloaded
Copy link

what happened to this proposal?

@marco-m-pix4d
Copy link

marco-m-pix4d commented Feb 9, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic
Projects
None yet
Development

No branches or pull requests

8 participants