-
Notifications
You must be signed in to change notification settings - Fork 238
Consul Swarm mode network configuration #66
Comments
@seafoodbuffet Here's what I am doing.. Server config: {
"advertise_addr" : "{{ GetInterfaceIP \"eth2\" }}",
"addresses" : {
"https" : "0.0.0.0"
},
"bind_addr": "{{ GetInterfaceIP \"eth2\" }}",
"check_update_interval": "1m",
"client_addr": "0.0.0.0",
"data_dir": "/tmp/consul",
"datacenter": "docker_dc",
"disable_host_node_id" : true,
"disable_remote_exec": true,
"disable_update_check": true,
"ca_file": "/run/secrets/consul_ca_file.cer",
"cert_file": "/run/secrets/consul_cert_file.cer",
"key_file": "/run/secrets/consul_key_file.key",
"verify_outgoing" : true,
"verify_incoming_https" : false,
"verify_incoming_rpc" : true,
"verify_server_hostname" : true,
"encrypt_verify_incoming" : true,
"encrypt_verify_outgoing" : true,
"http_config": {
"response_headers": {
"Access-Control-Allow-Origin": "*"
}
},
"leave_on_terminate" : true,
"retry_interval" : "10s",
"retry_join" : [
"server.consul.swarm.container:8301",
"server.consul.swarm.container:8301",
"server.consul.swarm.container:8301"
],
"server_name" : "server.docker_dc.consul",
"skip_leave_on_interrupt" : true,
"bootstrap_expect": 3,
"node_meta": {
"instance_type": "Docker container"
},
"ports" : {
"https" : 8700
},
"server" : true,
"ui" : true
}
Compose config (This is part of a Hashicorp Vault config so you'll notice some of that here as well as some verbiage around a consul agent which I didn't include in the config here): ---
version: '3.3'
configs:
consul_server_config:
file: ./consul/data/server_config.json
consul_agent_config:
file: ./consul/data/agent_config.json
common_config:
file: ./consul/data/common.json
secrets:
consul_ca_file.cer:
file: ./consul/data/certificates/consul-root.cer
consul_cert_file.cer:
file: ./consul/data/certificates/consul-server.cer
consul_key_file.key:
file: ./consul/data/certificates/consul-server.key
consul_common_secrets_config.json:
file: ./consul/data/common_secrets_config.json
consul_server_secrets_config.json:
file: ./consul/data/server_secrets_config.json
consul_agent_secrets_config.json:
file: ./consul/data/agent_secrets_config.json
networks:
vault-network:
services:
consul_server:
image: consul:0.9.3
networks:
vault-network:
aliases:
- server.consul.swarm.container
command: "consul agent -config-dir=/data/config -config-file=/run/secrets/consul_server_secrets_config.json -config-file=/run/secrets/consul_common_secrets_config.json"
ports:
- "8700:8700"
deploy:
mode: replicated
replicas: 3
update_config:
parallelism: 1
failure_action: pause
delay: 10s
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.role == worker
configs:
- source: common_config
target: /data/config/node_swarm_config.json
- source: consul_server_config
target: /data/config/config.json
secrets:
- consul_ca_file.cer
- consul_cert_file.cer
- consul_key_file.key
- consul_common_secrets_config.json
- consul_server_secrets_config.json
Then I can simply I think the trick here is to have the same well known address in retry_join. It will launch 3 nodes and join them up as expected. Afterwards, I can also scale via Does this make sense to you? I was also running into the same split-brain issue and I didn't want to spend the energy creating a seeding swarm service as that also requires maintenance in terms of bringing it up, bringing it down when a quorum is reached and then re-joining it to the swarm. Output:
|
I am running into this issues as well. My service configuration is identical to @seafoodbuffet's compose above. @isuftin what is the reason for specifying the same retry-join value three times?
|
@seafoodbuffet I tried your solution with the latest version and the consul swarm wasn't able to elect a leader.
So I tried myself and got a working and testable compose file inside play with docker.
But it doesn't solve the fixed IP's problem you mentioned before. |
So does someone have a robust Consul compose file that will be able to handle containers being potentially rescheduled to a newer host? Because we're also running into this problem of fixed IPs. |
This configuration seems to work for me to spin up consul servers.
Consul Configuration
|
@bhavikkumar does it handle re-elections if the leader node is lost? That has been the major issue I have had with almost all of these configurations |
@Gabology it seems to work fine during my testing. I terminated the leader ec2 instance which caused a graceful exit. When the ASG brought up another instance and joined the swarm, the consul server container joined without any issues. I also then ran
|
@bhavikkumar
|
I am also looking for a good configuration which handles reconnections when the quorum is lost. I have a kinda working solution but it involves hard encoding the hostnames which aren't ideal. I would like to set a number of replicas and it then sorts itself out. Does anyone have any other suggestions? other then whats listed here? Heres my configs below
This is really not ideal but it does work. Would prefer to set replicas: X and control that way. Is there any work around can anyone think? TIA |
@Gabology What is your setup? And how are you terminating nodes? I will try to replicate the problem and see if I cannot resolve it. |
To make this complete, I have finally got a perfect working config which I hope will assist others and save them having a nasty headake like I have had. This will setup consul on all your manager nodes but will restrict it to exactly one per manager node (which is what I wanted) as I have three managers. You can replace this with repelicas without causing a problem. This also uses local volumes for persistent storage but you can replace them with proxworx or whatever volumes which I will do now that this is working. The key really is the alias on the network as well as dnsrr (DNS Round Robbin) which took a while to find in the docker docs. With this combo it will find a node to connect to which fixes the initial connection via DNS. I have tested this thoroughly by rebooting each node and they recover perfectly. This also has ACL support so enjoy. HTH someone with fighting to get a reliable config.
|
@soakes Any particular reason that you are creating the network externally? Anyhow. happy to say that this config works well for us as well. Just surprised about the dnsrr endpoint mode because I thought that only mattered for clients outside of the Docker network that were connecting. |
@Gabology Yes there is a reason, I have several VPN connections which contain a fair few routes and sometimes when docker swarm creates a network on its own, it collides with a range which is used elsewhere. Sadly I can't change the other networks as they are not under my control so the solution is to give specific ranges to docker so it doesn't happen. Apart from that, there's no reason. With regards to the dnsrr mode, it surprised me a bit too, was only after testing inside a container that I figured it out. The only thing I don't currently have added in the SSL bits which I plan to do soon. The below config is my full current configuration including auto-discovery and several worker nodes configured as consul clients. This runs a consul in client mode on anything other than manager nodes. If anyone can think of some other useful tweaks or improvements, please let me know. Thank you.
|
@soakes Does the /consul/data have to be mounted? I posted a config earlier which looks extremely similar but @Gabology could not get it to work and this is the only different I can see. |
@bhavikkumar I think the issue for me was that I hadn't set the |
@bhavikkumar I haven't tried without mounting because IMO you want persistent data, however, I cant think of a reason why it wont work without as long as the config with the ACL keys etc is present. It's also very similar to others posted here because I was testing all configs here trying to find a good solution and so I kept the good that I found and added/removed bits to get it perfect. |
Sorry to bother everyone but while doing some testing I seem to either of hit a bug or something. I am having a problem where the DNS is blank so after looking further I find the API cmds to look it up but they are really showing up blank and I have no idea why. Whats interesting is that it can see the services as a list but thats all, you cant find any further info out. Does anyone have a clue what ive done wrong? My config is above. It can't be an API key issue right now because im using the master key for testing. This is also set as an ENV and this is within the consul server conatiner. I have also confirmed that I can pull info out of the KV store fine, but the DNS/PORT info seems to be missing which would explain why I cant get traefik form playing ball correctly. It works with certs using KV store but having to assign labels :( Must be some config im missing.. anyone got any ideas? would be really grateful. Thank you. I used the docs here for finding how to test:
TIA |
@askulkarni2 Sorry about the very delayed response in regards to #66 (comment) I supply multiple retry-joins because this way, each node will attempt to retry the same address a few times before considering the current join attempt a failure and moving on to the next attempt or failing out completely. This seems to work for me. |
Also there is another problem with interface name if you have multiple networks. / # ip route list
default via 172.24.0.1 dev eth3
10.0.0.0/24 dev eth1 scope link src 10.0.0.64
---> 10.111.111.0/24 dev eth2 scope link src 10.111.111.77
10.255.0.0/16 dev eth0 scope link src 10.255.0.207
172.24.0.0/16 dev eth3 scope link src 172.24.0.8 Other 2 containers: / # ip route list
default via 172.24.0.1 dev eth3
10.0.0.0/24 dev eth2 scope link src 10.0.0.63
---> 10.111.111.0/24 dev eth1 scope link src 10.111.111.76
10.255.0.0/16 dev eth0 scope link src 10.255.0.206
172.24.0.0/16 dev eth3 scope link src 172.24.0.10 I have different interface names for the same network. So it may be useful to add if [ -n "$CONSUL_BIND_SUBNET" ]; then
CONSUL_BIND_INTERFACE=$(ip route list | grep "$CONSUL_BIND_SUBNET" | cut -d' ' -f3)
fi and provide extra env: environment:
- CONSUL_BIND_SUBNET=10.111.111.0/24 |
Hoo boy I've been scratching my head on this one all day. I'm trying to deploy a Consul KV store for Traefik on my Swarm, and it's being difficult. My compose file:
The service logs appear to indicate that it's flapping: looking at stderr I see this spam: I've tried declaring ports or not, host/ingress combinations, adding config via json, standing on one leg. It's a persistent beast! |
Following @soakes config, I got something that works for me finally. As he said, the DNS Round Robin and network alias were key, for me removing the CONSUL_CLIENT_INTERFACE variable and adding -client 0.0.0.0 to the command was the final puzzle piece that sorted it out. |
@soakes I am using the docker file that you posted on 23rd Dec with consul server, client and registrator. It works perfectly. Now suppose I am running a service which needs to communicate to consul, but I want to make sure that it talks to the consul running on the same docker host as the service (regardless of whether consul is running in server or client mode on that host), how can I achieve that? Thanks a lot. |
@aerohit The way I managed to get that working is by using using the Gateway IP of a bridge network. This is generally 172.17.0.1 for the default one but you can check by running
The documentation for this can be found at https://docs.docker.com/network/bridge/ |
@soakes How did you bootstrap |
Ok I found how to generate tokens. Now my issue is:
Here is my docker compose ---
version: "3.4"
networks:
consul:
# external: true
volumes:
consul:
services:
server:
image: consul
volumes:
- consul:/consul
- ./config/consul.multi-node.server.json:/consul/config/consul.json
ports:
- target: 8500
published: 8500
mode: host
networks:
consul:
aliases:
- consul.cluster
environment:
- CONSUL_BIND_INTERFACE=eth0
- CONSUL_HTTP_TOKEN=32e3ed4d-93ba-44f9-a444-5a010b512528
command: "agent -client 0.0.0.0 -config-file /consul/config/consul.json"
deploy:
endpoint_mode: dnsrr
mode: global
placement:
constraints: [node.role == manager]
client:
image: consul
volumes:
- consul:/consul
- ./config/consul.multi-node.client.json:/consul/config/consul.json
networks:
consul:
aliases:
- consul.client.cluster
environment:
- CONSUL_BIND_INTERFACE=eth0
- CONSUL_HTTP_TOKEN=32e3ed4d-93ba-44f9-a444-5a010b512528
command: "agent -client 0.0.0.0 -config-file /consul/config/consul.json"
deploy:
endpoint_mode: dnsrr
mode: global
placement:
constraints: [node.role != manager]
registrator:
image: gliderlabs/registrator:master
command: -internal consul://consul.cluster:8500
volumes:
- /var/run/docker.sock:/tmp/docker.sock
networks:
- consul
environment:
- CONSUL_HTTP_TOKEN=32e3ed4d-93ba-44f9-a444-5a010b512528
deploy:
mode: global With consul config server: {
"server": true,
"skip_leave_on_interrupt": true,
"acl_down_policy":"allow",
"acl_master_token":"8b8cbb0c-1c88-11e8-accf-0ed5f89f718b",
"acl_agent_token":"8b8cbf26-1c88-11e8-accf-0ed5f89f718b",
"acl_datacenter":"dc1",
"acl_default_policy":"deny",
"datacenter":"dc1",
"encrypt":"7jGmTVfQ6WXmUUDVQS2yFQ==",
"data_dir":"/consul/data",
"ui" : true,
"bootstrap_expect": 3,
"retry_join": ["consul.cluster"]
} And client
|
It's ok. I forgot to update the number of expected server. |
For the records, here is a simple stack for a 3 servers Docker Swarm that worked for me:
|
@dperetti As an aside, that's the first time I've seen someone use YAML merging in a compose file. Fascinating |
The Stackfile by @soakes works great in a Docker Swarm cluster. THanks! |
Is it possible with this setup to have consul agents outside the swarm connecting to the consul servers in the swarm? |
With the help of this post and other post that I have found on the internet I would like to share my solution to various problems that I have encountered. Maybe this can help some people. I have tested this with deploying it in a stack and before hand created a network called consul;
|
@nicholasamorim yes thats possible, and thats what ive done. Ive got consul client agents outside swarm, one per host, and then a recommended amount of consul server agents in swarm |
I'm using the official consul docker image version 0.9.3 with the following compose file:
This appears to work okay for me in a Docker Swarm with 3 nodes when deploying this as a stack using docker stack deploy
My question is this: without the -retry-joins the cluster can't bootstrap. Per the bootstrap documentation I believe this is expected to prevent split-brain, etc. So what's the best way to bootstrap a consul cluster running as a Docker Swarm Service?
I am only able to make the cluster bootstrap after having added the
-retry-join
statements to the container command. It seems non-ideal to have to specify these hard-coded IPs. For example, what if another container started up in the consul network first? Presumably it would interfere with the resulting IPs of the consul server containers.Is there a recommendation on how to deal with this? The only other thing I can think of would be this:
The text was updated successfully, but these errors were encountered: