-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Example Docker Compose fails in Docker Swarm #113
Comments
I neglected to mention I changed all references of |
The nodes are failing to form a cluster:
I haven't used Docker Swarm but a brief investigation shows multiple configuration options such as scaling or secure communications that I think could possibly conflict with the OpenSearch cluster model. Can you give a few more details about your configuration, compose file, etc.? It seems like we're trying multiple different ways of running multiple containers and having them talk to each other. |
Here's a slightly modified compose file; I went right back to basics and made another discovery: ---
version: '3'
services:
opensearch-node1:
image: opensearchproject/opensearch:2.5.0
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
# - 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-node2:
image: opensearchproject/opensearch:2.5.0
container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data2:/usr/share/opensearch/data
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:2.5.0
container_name: opensearch-dashboards
ports:
- 5601:5601
environment:
OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # must be a string with no spaces when specified as an environment variable
networks:
- opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net: I have discovered something exciting. If I don't bind the ports on the |
I've been experimenting to see if it is docker networking that is causing the issue, but it isn't as other services in swarm mode can map ports perfectly well, and can communicate with other containers on multiple swarm networks. It's definitely something in the configuration of opensearch. I will do a little more investigating today, but I know little about how this all works. |
It seems that the first node joins the docker swarm ingress network, while any others always join the service network. I have no idea what to do here. For reference" https://stackoverflow.com/questions/70141442/elasticsearch-cluster-doesnt-work-on-docker-swarm |
After some experimentation, here are my findings:
This ensures that the instances communicate on the same networks and therefore can see each other. This is highly unreliable, of course, and I'm wondering if there may be a better way of discovering the right network and port configuration inside the containers. |
@bbarani can you please help on this? |
Hi @designermonkey we only test the docker-compose file for simple If you are doing more complicated setup, you can try our helm charts repo which deployed to Kube, and has been actively contributed and tested by community: Adding @jeffh-aws to take a look on the possible options with docker swarm/ |
@designermonkey I was having fun getting OpenSearch to work on Docker Swarm today. Your tip about setting the I also ran into issues when I had set the memory limit too low, and when my host vms had I have 2 of my instances connected. The third is hitting some weird java exception errors that seem to only happen on the specific worker node... A topic for a different location though. @peterzhuamazon It would be awesome to get more support for Docker Swarm out there. I get that the big companies all use Kube. But Kube is not exactly friendly for smaller organizations. I almost went into a whole spiel on this topic, but this really isn't the place for it. If anyone is interested, feel free to email me. :) |
Let's move this to opensearch-devops. |
i'll throw my hand in here, I too am getting this same issue. I made a post on the forum about it with my compose file, aswell as log outputs. https://forum.opensearch.org/t/multi-node-docker-setup-not-working/15235/3 |
Looping in @pallavipr and @bbarani for comments on supporting docker swarm. thanks. |
I suppose there hasn't been any movement here? I'm seeing the exact same issue with docker compose, locally, so I don't think it's related to swarm, nor is it fixed, at least. My config: services:
opensearch-node1:
image: opensearchproject/opensearch:latest
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
# - plugins.security.disabled=true
# - cluster.routing.allocation.enable=all
- 'DISABLE_INSTALL_DEMO_CONFIG=true'
- 'DISABLE_SECURITY_PLUGIN=true'
- 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m'
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=teSt!1
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200 # REST API
- 9600:9600 # Performance Analyzer
networks:
- opensearch-net
- otel-net
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
opensearch-node2:
image: opensearchproject/opensearch:latest
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
# - plugins.security.disabled=true
# - cluster.routing.allocation.enable=all
- 'DISABLE_INSTALL_DEMO_CONFIG=true'
- 'DISABLE_SECURITY_PLUGIN=true'
- 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m'
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=teSt!1
volumes:
- opensearch-data2:/usr/share/opensearch/data
networks:
- opensearch-net
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
# opensearch-dashboard:
# image: opensearchproject/opensearch-dashboards:latest
# ports:
# - 5601:5601
# expose:
# - '5601'
# environment:
# DISABLE_SECURITY_DASHBOARDS_PLUGIN: 'true'
# OPENSEARCH_HOSTS: '["http://opensearch-node1:9200","http://opensearch-node2:9200"]'
# networks:
# - opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net:
Seeing errors such as:
|
Describe the bug
I have proven locally that I can get the example docker compose file to work locally, yet when I try the exact same file using docker swarm mode, it will not bring the cluster up.
The first node always tries to connect to itself and fails
To Reproduce
Steps to reproduce the behavior:
docker stack deploy --prune --with-registry-auth --compose-file docker-compose.yml
This continues forever.
Expected behavior
I would expect the node can join the cluster and elect a manager node exactly as it is capable of doing in docker compose.
Plugins
Nothing but default.
Host/Environment (please complete the following information):
Additional context
As far as I can tell, there is nothing wrong with the docker networking.
The text was updated successfully, but these errors were encountered: