Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymous client agents could join the cluster despite ACL enabled on server agents #7915

Closed
nixikanius opened this issue May 18, 2020 · 3 comments

Comments

@nixikanius
Copy link

nixikanius commented May 18, 2020

Overview of the Issue

During securing of my consul cluster I suddenly found that anonymous client agents (which don't have ACL enabled) could join consul cluster with ACL enabled. In my case, anonymous agents could stable join the cluster only on "serf" level but fortunately couldn't publish any service.

I hope I made something wrong in my configuration, but lots of checks and RTFMs show that this is a security bug. It's very strange that anyone could join the cluster despite ACL enabled in it.

Reproduction Steps

  1. Bootstrap the cluster (I have only one server in it).
  2. Enable whitelist ACL on server agent(s).
  3. Try to join the cluster by the client agent (no matter ACL enabled or not on it).

Consul info for both Client and Server

Client info
agent:
	check_monitors = 1
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease =
	revision =
	version = 1.7.3
consul:
	acl = disabled
	known_servers = 1
	server = false
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 47
	max_procs = 1
	os = linux
	version = go1.14.3
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 45
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 216
	members = 2
	query_queue = 0
	query_time = 10
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = 8b4a3d95
        version = 1.7.3
consul:
        acl = enabled
        bootstrap = true
        known_datacenters = 1
        leader = true
        leader_addr = 192.168.0.254:8300
        server = true
raft:
        applied_index = 159180
        commit_index = 159180
        fsm_pending = 0
        last_contact = 0
        last_log_index = 159180
        last_log_term = 46
        last_snapshot_index = 147480
        last_snapshot_term = 38
        latest_configuration = [{Suffrage:Voter ID:6508cb69-ace8-a047-eefd-f40828ab425e Address:172.18.0.2:8300}]
        latest_configuration_index = 0
        num_peers = 0
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 46
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 94
        max_procs = 2
        os = linux
        version = go1.13.7
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 46
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 216
        members = 2
        query_queue = 0
        query_time = 10
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Operating system and Environment details

The cluster with one server agent in docker (with host network) on Debian 10. Client agent running natively on Debian 10.

Server config (demo-dir):

{
  "server": true,
  "bootstrap_expect": 1,
  "ui": true,
  "bind_addr": "{{GetPrivateInterfaces | include \"network\" \"192.168.0.0/16\" | attr \"address\"}}",
  "addresses": {
    "http": "0.0.0.0"
  },
  "acl": {
    "enabled": true,
    "default_policy": "deny",
    "enable_token_persistence": true
  },
  "log_level": "TRACE",
  "log_file": "/consul/logs/",
  "watches": [
    {
      "type": "checks",
      "service": "hydra_demo",
      "handler_type": "http",
      "http_handler_config": {
        "path":"https://stackstorm.local:8443/api/v1/webhooks/hydra_demos/consul/service_change",
        "method": "POST",
        "header": {"St2-Api-Key":["CENSORED"]},
        "tls_skip_verify": true
      }
    }
  ]
}

Client config (demo-ref-en):

{
  "bind_addr": "{{GetPrivateInterfaces | include \"network\" \"192.168.0.0/24\" | attr \"address\"}}",
  "retry_join": ["demo-dir.local"],
  "acl": {
    "enabled": true,
    "default_policy": "deny",
    "enable_token_persistence": true
  },
  "enable_local_script_checks": true,
  "services": [
    {
      "name": "hydra-demo",
      "check": {
        "args": ["/usr/local/scripts/file_exists.sh", "/var/run/hydra-demo.prov"],
        "interval": "10s"
      }
    }
  ]
}

Log Fragments

Client log:

May 18 23:54:14 demo-ref-en consul[2532]: ==> Starting Consul agent...
May 18 23:54:14 demo-ref-en consul[2532]:            Version: '1.7.3-dev'
May 18 23:54:14 demo-ref-en consul[2532]:            Node ID: '47925609-a608-eff6-f97c-f915c60ce1a5'
May 18 23:54:14 demo-ref-en consul[2532]:          Node name: 'demo-ref-en'
May 18 23:54:14 demo-ref-en consul[2532]:         Datacenter: 'dc1' (Segment: '')
May 18 23:54:14 demo-ref-en consul[2532]:             Server: false (Bootstrap: false)
May 18 23:54:14 demo-ref-en consul[2532]:        Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
May 18 23:54:14 demo-ref-en consul[2532]:       Cluster Addr: 192.168.0.2 (LAN: 8301, WAN: 8302)
May 18 23:54:14 demo-ref-en consul[2532]:            Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
May 18 23:54:14 demo-ref-en consul[2532]: ==> Log data will now stream in as it occurs:
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.068+0300 [DEBUG] agent.tlsutil: Update: version=1
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.069+0300 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: demo-ref-en 192.168.0.2
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.069+0300 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.070+0300 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.070+0300 [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.071+0300 [INFO]  agent: started state syncer
May 18 23:54:14 demo-ref-en consul[2532]: ==> Consul agent running!
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.071+0300 [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws gce mdns os packet"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.071+0300 [INFO]  agent: Joining cluster...: cluster=LAN
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.072+0300 [INFO]  agent: (LAN) joining: lan_addresses=[demo-dir.local]
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.072+0300 [WARN]  agent.client.manager: No servers available
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.072+0300 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.075+0300 [DEBUG] agent.client.memberlist.lan: memberlist: Initiating push/pull sync with: 192.168.0.254:8301
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.077+0300 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: demo-dir 192.168.0.254
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.077+0300 [INFO]  agent: (LAN) joined: number_of_nodes=1
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.077+0300 [DEBUG] agent: systemd notify failed: error="No socket"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.078+0300 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.078+0300 [INFO]  agent.client: adding server: server="demo-dir (Addr: tcp/192.168.0.254:8300) (DC: dc1)"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.120+0300 [DEBUG] agent.client: transitioned out of legacy ACL mode
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.121+0300 [INFO]  agent.client.serf.lan: serf: EventMemberUpdate: demo-ref-en
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.136+0300 [DEBUG] agent.client.serf.lan: serf: messageUserEventType: consul:new-leader
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.336+0300 [DEBUG] agent.client.serf.lan: serf: messageUserEventType: consul:new-leader
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.337+0300 [DEBUG] agent.client.serf.lan: serf: messageJoinType: demo-ref-en
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.457+0300 [DEBUG] agent.tlsutil: OutgoingRPCWrapper: version=1
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.462+0300 [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=192.168.0.254:8300 error="rpc error making call: Permission denied"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.463+0300 [WARN]  agent: Node info update blocked by ACLs: node=47925609-a608-eff6-f97c-f915c60ce1a5 accessorID=
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.464+0300 [ERROR] agent.client: RPC failed to server: method=Catalog.Register server=192.168.0.254:8300 error="rpc error making call: Permission denied"
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.464+0300 [WARN]  agent: Service registration blocked by ACLs: service=hydra-demo accessorID=
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.465+0300 [DEBUG] agent: Check in sync: check=service:hydra-demo
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.465+0300 [DEBUG] agent: Node info in sync
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.465+0300 [DEBUG] agent: Service in sync: service=hydra-demo
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.465+0300 [DEBUG] agent: Check in sync: check=service:hydra-demo
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.536+0300 [DEBUG] agent.client.serf.lan: serf: messageUserEventType: consul:new-leader
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.537+0300 [DEBUG] agent.client.serf.lan: serf: messageJoinType: demo-ref-en
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.736+0300 [DEBUG] agent.client.serf.lan: serf: messageUserEventType: consul:new-leader
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.736+0300 [DEBUG] agent.client.serf.lan: serf: messageJoinType: demo-ref-en
May 18 23:54:14 demo-ref-en consul[2532]:     2020-05-18T23:54:14.936+0300 [DEBUG] agent.client.serf.lan: serf: messageJoinType: demo-ref-en
May 18 23:54:16 demo-ref-en consul[2532]:     2020-05-18T23:54:16.336+0300 [DEBUG] agent.client.memberlist.lan: memberlist: Stream connection from=192.168.0.254:40820

Server log (partial, see full):

consul    |     2020-05-18T23:53:56.990+0300 [WARN]  agent: Coordinate update blocked by ACLs: accessorID=
consul    |     2020-05-18T23:54:06.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:54:08.135+0300 [DEBUG] agent.server.router.manager: Rebalanced servers, new active server: number_of_servers=1 active_server="demo-dir.dc1 (Addr: tcp/192.168.0.254:8300) (DC: dc1)"
consul    |     2020-05-18T23:54:14.074+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=192.168.0.2:55562
consul    |     2020-05-18T23:54:14.076+0300 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: demo-ref-en 192.168.0.2
consul    |     2020-05-18T23:54:14.076+0300 [INFO]  agent.server: member joined, marking health alive: member=demo-ref-en
consul    |     2020-05-18T23:54:14.270+0300 [INFO]  agent.server.serf.lan: serf: EventMemberUpdate: demo-ref-en
consul    |     2020-05-18T23:54:14.270+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: demo-ref-en
consul    |     2020-05-18T23:54:14.460+0300 [DEBUG] agent.acl: dropping check from result due to ACLs: check=serfHealth
consul    |     2020-05-18T23:54:14.469+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: demo-ref-en
consul    |     2020-05-18T23:54:14.669+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: demo-ref-en
consul    |     2020-05-18T23:54:14.869+0300 [DEBUG] agent.server.serf.lan: serf: messageJoinType: demo-ref-en
consul    |     2020-05-18T23:54:15.623+0300 [DEBUG] agent.server: Skipping self join check for node since the cluster is too small: node=demo-dir
consul    |     2020-05-18T23:54:16.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:54:16.334+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: demo-ref-en 192.168.0.2:8301
consul    |     2020-05-18T23:54:25.863+0300 [WARN]  agent: Coordinate update blocked by ACLs: accessorID=
consul    |     2020-05-18T23:54:26.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:54:36.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:54:46.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:54:46.341+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: demo-ref-en 192.168.0.2:8301
consul    |     2020-05-18T23:54:46.602+0300 [WARN]  agent: Coordinate update blocked by ACLs: accessorID=
consul    |     2020-05-18T23:54:56.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:55:04.950+0300 [WARN]  agent: Coordinate update blocked by ACLs: accessorID=
consul    |     2020-05-18T23:55:06.252+0300 [DEBUG] agent.server.autopilot: Failed to remove dead servers: error="denied, because removing the majority of servers 1/1 is not safe"
consul    |     2020-05-18T23:55:06.361+0300 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=192.168.0.2:55564

Cluster nodes (on server agent):

# consul catalog nodes
Node         ID        Address        DC
demo-dir     6508cb69  192.168.0.254  dc1
demo-ref-en  47925609  192.168.0.2    dc1
@slackpad
Copy link
Contributor

Hi - you'll need to enable gossip encryption to prevent that - https://learn.hashicorp.com/consul/security-networking/agent-encryption?utm_source=consul.io&utm_medium=docs. That's turned off in the configuration you posted.

@nixikanius
Copy link
Author

@slackpad, thank you. It seems I was wrong thinking that node-level ACLs manage registration of clients. As I understand now, this ACLs only control registering additional info about clients (node metadata, tagged addresses).

@preetapan
Copy link
Member

@nixikanius besides gossip encryption, we also recommend securing agent->agent communication with TLS, more details here - https://learn.hashicorp.com/consul/security-networking/certificates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants