Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Example Docker Compose fails in Docker Swarm #113

Open
designermonkey opened this issue Feb 16, 2023 · 14 comments
Open

[BUG] Example Docker Compose fails in Docker Swarm #113

designermonkey opened this issue Feb 16, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@designermonkey
Copy link

Describe the bug

I have proven locally that I can get the example docker compose file to work locally, yet when I try the exact same file using docker swarm mode, it will not bring the cluster up.

The first node always tries to connect to itself and fails

To Reproduce
Steps to reproduce the behavior:

  1. Use the example compose file
  2. Run docker stack deploy --prune --with-registry-auth --compose-file docker-compose.yml
  3. View the docker service logs for the first node
  4. See error:
Enabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
OpenSearch Security Demo Installer
 ** Warning: Do not use on production or public reachable systems **
Basedir: /usr/share/opensearch
OpenSearch install type: rpm/deb on NAME="Amazon Linux"
OpenSearch config dir: /usr/share/opensearch/config
OpenSearch config file: /usr/share/opensearch/config/opensearch.yml
OpenSearch bin dir: /usr/share/opensearch/bin
OpenSearch plugins dir: /usr/share/opensearch/plugins
OpenSearch lib dir: /usr/share/opensearch/lib
Detected OpenSearch Version: x-content-2.5.0
Detected OpenSearch Security Version: 2.5.0.0

### Success
### Execute this script now on all your nodes and then start all nodes
### OpenSearch Security will be automatically initialized.
### If you like to change the runtime configuration 
### change the files in ../../../config/opensearch-security and execute: 
"/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh" -cd "/usr/share/opensearch/config/opensearch-security" -icl -key "/usr/share/opensearch/config/kirk-key.pem" -cert "/usr/share/opensearch/config/kirk.pem" -cacert "/usr/share/opensearch/config/root-ca.pem" -nhnv
### or run ./securityadmin_demo.sh
### To use the Security Plugin ConfigurationGUI
### To access your secured cluster open https://<hostname>:<HTTP port> and log in with admin/admin.
### (Ignore the SSL certificate warning because we installed self-signed demo certificates)
Enabling OpenSearch Security Plugin
Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
[2023-02-16T16:23:42,004][INFO ][o.o.n.Node               ] [node01] version[2.5.0], pid[103], build[tar/b8a8b6c4d7fc7a7e32eb2cb68ecad8057a4636ad/2023-01-18T23:49:00.584806002Z], OS[Linux/5.10.104-linuxkit/aarch64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.5/17.0.5+8]
[2023-02-16T16:23:42,005][INFO ][o.o.n.Node               ] [node01] JVM home [/usr/share/opensearch/jdk], using bundled JDK [true]
[2023-02-16T16:23:42,005][INFO ][o.o.n.Node               ] [node01] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/opensearch-7599599729676416352, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Dclk.tck=100, -Djdk.attach.allowAttachSelf=true, -Djava.security.policy=/usr/share/opensearch/config/opensearch-performance-analyzer/opensearch_security.policy, --add-opens=jdk.attach/sun.tools.attach=ALL-UNNAMED, -Dopensearch.cgroups.hierarchy.override=/, -Xms2048m, -Xmx2048m, -XX:MaxDirectMemorySize=1073741824, -Dopensearch.path.home=/usr/share/opensearch, -Dopensearch.path.conf=/usr/share/opensearch/config, -Dopensearch.distribution.type=tar, -Dopensearch.bundled_jdk=true]
[2023-02-16T16:23:43,095][WARN ][stderr                   ] [node01] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[2023-02-16T16:23:43,095][WARN ][stderr                   ] [node01] SLF4J: Defaulting to no-operation (NOP) logger implementation
[2023-02-16T16:23:43,095][WARN ][stderr                   ] [node01] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2023-02-16T16:23:43,109][INFO ][o.o.s.s.t.SSLConfig      ] [node01] SSL dual mode is disabled
[2023-02-16T16:23:43,109][INFO ][o.o.s.OpenSearchSecurityPlugin] [node01] OpenSearch Config path is /usr/share/opensearch/config
[2023-02-16T16:23:43,648][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] JVM supports TLSv1.3
[2023-02-16T16:23:43,649][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] Config directory is /usr/share/opensearch/config/, from there the key- and truststore files are resolved relatively
[2023-02-16T16:23:44,113][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] TLS Transport Client Provider : JDK
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/opensearch/lib/opensearch-2.5.0.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
WARNING: System::setSecurityManager will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/opensearch/lib/opensearch-2.5.0.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
WARNING: System::setSecurityManager will be removed in a future release
[2023-02-16T16:23:44,113][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] TLS Transport Server Provider : JDK
[2023-02-16T16:23:44,113][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] TLS HTTP Provider             : JDK
[2023-02-16T16:23:44,114][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] Enabled TLS protocols for transport layer : [TLSv1.3, TLSv1.2]
[2023-02-16T16:23:44,114][INFO ][o.o.s.s.DefaultSecurityKeyStore] [node01] Enabled TLS protocols for HTTP layer      : [TLSv1.3, TLSv1.2]
[2023-02-16T16:23:44,119][INFO ][o.o.s.OpenSearchSecurityPlugin] [node01] Clustername: cluster
[2023-02-16T16:23:44,123][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] Directory /usr/share/opensearch/config has insecure file permissions (should be 0700)
[2023-02-16T16:23:44,123][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] File /usr/share/opensearch/config/esnode-key.pem has insecure file permissions (should be 0600)
[2023-02-16T16:23:44,123][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] File /usr/share/opensearch/config/kirk.pem has insecure file permissions (should be 0600)
[2023-02-16T16:23:44,123][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] File /usr/share/opensearch/config/root-ca.pem has insecure file permissions (should be 0600)
[2023-02-16T16:23:44,124][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] File /usr/share/opensearch/config/esnode.pem has insecure file permissions (should be 0600)
[2023-02-16T16:23:44,124][WARN ][o.o.s.OpenSearchSecurityPlugin] [node01] File /usr/share/opensearch/config/kirk-key.pem has insecure file permissions (should be 0600)
[2023-02-16T16:23:44,453][INFO ][o.o.p.c.PluginSettings   ] [node01] Config: metricsLocation: /dev/shm/performanceanalyzer/, metricsDeletionInterval: 1, httpsEnabled: false, cleanup-metrics-db-files: true, batch-metrics-retention-period-minutes: 7, rpc-port: 9650, webservice-port 9600
[2023-02-16T16:23:44,761][INFO ][o.o.i.r.ReindexPlugin    ] [node01] ReindexPlugin reloadSPI called
[2023-02-16T16:23:44,761][INFO ][o.o.i.r.ReindexPlugin    ] [node01] Unable to find any implementation for RemoteReindexExtension
[2023-02-16T16:23:44,786][INFO ][o.o.j.JobSchedulerPlugin ] [node01] Loaded scheduler extension: reports-scheduler, index: .opendistro-reports-definitions
[2023-02-16T16:23:44,788][INFO ][o.o.j.JobSchedulerPlugin ] [node01] Loaded scheduler extension: opendistro_anomaly_detector, index: .opendistro-anomaly-detector-jobs
[2023-02-16T16:23:44,789][INFO ][o.o.j.JobSchedulerPlugin ] [node01] Loaded scheduler extension: opendistro-index-management, index: .opendistro-ism-config
[2023-02-16T16:23:44,798][INFO ][o.o.j.JobSchedulerPlugin ] [node01] Loaded scheduler extension: observability, index: .opensearch-observability-job
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [aggs-matrix-stats]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [analysis-common]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [geo]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [ingest-common]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [ingest-geoip]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [ingest-user-agent]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [lang-expression]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [lang-mustache]
[2023-02-16T16:23:44,801][INFO ][o.o.p.PluginsService     ] [node01] loaded module [lang-painless]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [mapper-extras]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [opensearch-dashboards]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [parent-join]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [percolator]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [rank-eval]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [reindex]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [repository-url]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [systemd]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded module [transport-netty4]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-alerting]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-anomaly-detection]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-asynchronous-search]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-cross-cluster-replication]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-geospatial]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-index-management]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-job-scheduler]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-knn]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-ml]
[2023-02-16T16:23:44,802][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-neural-search]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-notifications]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-notifications-core]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-observability]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-performance-analyzer]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-reports-scheduler]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-security]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-security-analytics]
[2023-02-16T16:23:44,803][INFO ][o.o.p.PluginsService     ] [node01] loaded plugin [opensearch-sql]
[2023-02-16T16:23:44,811][INFO ][o.o.s.OpenSearchSecurityPlugin] [node01] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in opensearch.yml
[2023-02-16T16:23:44,832][DEPRECATION][o.o.d.c.s.Settings       ] [node01] [node.max_local_storage_nodes] setting was deprecated in OpenSearch and will be removed in a future release! See the breaking changes documentation for the next major version.
[2023-02-16T16:23:44,839][INFO ][o.o.e.NodeEnvironment    ] [node01] using [1] data paths, mounts [[/usr/share/opensearch/data (virtiofs0)]], net usable_space [240gb], net total_space [460.4gb], types [virtiofs]
[2023-02-16T16:23:44,839][INFO ][o.o.e.NodeEnvironment    ] [node01] heap size [2gb], compressed ordinary object pointers [true]
[2023-02-16T16:23:44,953][INFO ][o.o.n.Node               ] [node01] node name [node01], node ID [Bqk8Khh8R5GjoDQaF-C-Cg], cluster name [cluster], roles [ingest, remote_cluster_client, data, cluster_manager]
[2023-02-16T16:23:47,055][WARN ][o.o.s.c.Salt             ] [node01] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes
[2023-02-16T16:23:47,076][INFO ][o.o.s.a.i.AuditLogImpl   ] [node01] Message routing enabled: true
[2023-02-16T16:23:47,098][INFO ][o.o.s.f.SecurityFilter   ] [node01] <NONE> indices are made immutable.
[2023-02-16T16:23:47,289][INFO ][o.o.a.b.ADCircuitBreakerService] [node01] Registered memory breaker.
[2023-02-16T16:23:47,498][INFO ][o.o.m.b.MLCircuitBreakerService] [node01] Registered ML memory breaker.
[2023-02-16T16:23:47,499][INFO ][o.o.m.b.MLCircuitBreakerService] [node01] Registered ML disk breaker.
[2023-02-16T16:23:47,499][INFO ][o.o.m.b.MLCircuitBreakerService] [node01] Registered ML native memory breaker.
[2023-02-16T16:23:47,574][INFO ][o.r.Reflections          ] [node01] Reflections took 31 ms to scan 1 urls, producing 12 keys and 32 values 
[2023-02-16T16:23:48,095][INFO ][o.o.t.NettyAllocator     ] [node01] creating NettyAllocator with the following configs: [name=opensearch_configured, chunk_size=256kb, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=1mb}]
[2023-02-16T16:23:48,133][INFO ][o.o.d.DiscoveryModule    ] [node01] using discovery type [zen] and seed hosts providers [settings]
[2023-02-16T16:23:48,416][WARN ][o.o.g.DanglingIndicesState] [node01] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2023-02-16T16:23:48,663][INFO ][o.o.p.h.c.PerformanceAnalyzerConfigAction] [node01] PerformanceAnalyzer Enabled: false
[2023-02-16T16:23:48,681][INFO ][o.o.n.Node               ] [node01] initialized
[2023-02-16T16:23:48,681][INFO ][o.o.n.Node               ] [node01] starting ...
[2023-02-16T16:23:48,823][INFO ][o.o.t.TransportService   ] [node01] publish_address {10.0.0.212:9300}, bound_addresses {0.0.0.0:9300}
[2023-02-16T16:23:48,981][INFO ][o.o.b.BootstrapChecks    ] [node01] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2023-02-16T16:23:48,984][INFO ][o.o.c.c.Coordinator      ] [node01] cluster UUID [b0uSEvAVTFSEd_HDG0NTjw]
[2023-02-16T16:23:58,998][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:09,013][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:18,996][WARN ][o.o.n.Node               ] [node01] timed out while waiting for initial discovery state - timeout: 30s
[2023-02-16T16:24:19,010][INFO ][o.o.h.AbstractHttpServerTransport] [node01] publish_address {10.0.0.212:9200}, bound_addresses {0.0.0.0:9200}
[2023-02-16T16:24:19,010][INFO ][o.o.n.Node               ] [node01] started
[2023-02-16T16:24:19,011][INFO ][o.o.s.OpenSearchSecurityPlugin] [node01] Node started
[2023-02-16T16:24:19,011][INFO ][o.o.s.c.ConfigurationRepository] [node01] Will attempt to create index .opendistro_security and default configs if they are absent
[2023-02-16T16:24:19,012][INFO ][o.o.s.OpenSearchSecurityPlugin] [node01] 0 OpenSearch Security modules loaded so far: []
[2023-02-16T16:24:19,013][INFO ][o.o.s.c.ConfigurationRepository] [node01] Background init thread started. Install default config?: true
[2023-02-16T16:24:19,018][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:20,416][INFO ][o.o.c.c.JoinHelper       ] [node01] failed to join {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=70, optionalJoin=Optional[Join{term=70, lastAcceptedTerm=68, lastAcceptedVersion=18, sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [node02][10.0.24.6:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [node01][10.0.0.212:9300] connect_timeout[30s]
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1082) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.5.0.jar:2.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-02-16T16:24:20,429][INFO ][o.o.c.c.JoinHelper       ] [node01] failed to join {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=70, optionalJoin=Optional[Join{term=70, lastAcceptedTerm=68, lastAcceptedVersion=18, sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [node02][10.0.24.6:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [node01][10.0.0.212:9300] connect_timeout[30s]
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1082) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.5.0.jar:2.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-02-16T16:24:29,023][WARN ][o.o.c.c.JoinHelper       ] [node01] last failed join attempt was 8.5s ago, failed to join {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=70, optionalJoin=Optional[Join{term=70, lastAcceptedTerm=68, lastAcceptedVersion=18, sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [node02][10.0.24.6:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [node01][10.0.0.212:9300] connect_timeout[30s]
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1082) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.5.0.jar:2.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-02-16T16:24:29,025][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:39,034][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:49,025][ERROR][o.o.s.c.ConfigurationRepository] [node01] Cannot apply default config (this is maybe not an error!)
org.opensearch.discovery.ClusterManagerNotDiscoveredException: null
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onTimeout(TransportClusterManagerNodeAction.java:348) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:394) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:294) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:707) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.5.0.jar:2.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-02-16T16:24:49,032][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [node01] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
        at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:205) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:191) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:461) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:577) [opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:208) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:99) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.5.0.0.jar:2.5.0.0]
        at java.lang.Thread.run(Thread.java:833) [?:?]
[2023-02-16T16:24:49,035][WARN ][o.o.c.c.ClusterFormationFailureHelper] [node01] cluster-manager not discovered or elected yet, an election requires a node with id [enQl9djoRA24TYJYOUGMnw], have discovered [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is a quorum; discovery will continue using [10.0.24.2:9300, 10.0.24.5:9300] from hosts providers and [{node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 70, last-accepted version 18 in term 68
[2023-02-16T16:24:51,126][INFO ][o.o.c.c.JoinHelper       ] [node01] failed to join {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=70, optionalJoin=Optional[Join{term=70, lastAcceptedTerm=68, lastAcceptedVersion=18, sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [node02][10.0.24.6:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [node01][10.0.0.212:9300] connect_exception
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1076) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-2.5.0.jar:2.5.0]
        at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-2.5.0.jar:2.5.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
        at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-2.5.0.jar:2.5.0]
        at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[transport-netty4-client-2.5.0.jar:2.5.0]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:262) ~[netty-transport-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) [netty-transport-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.86.Final.jar:4.1.86.Final]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.io.IOException: connection timed out: 10.0.0.212/10.0.0.212:9300
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) [netty-transport-4.1.86.Final.jar:4.1.86.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]

This continues forever.

Expected behavior
I would expect the node can join the cluster and elect a manager node exactly as it is capable of doing in docker compose.

Plugins
Nothing but default.

Host/Environment (please complete the following information):

  • OS: MacOS Monteray
  • Version 12.6.3

Additional context

As far as I can tell, there is nothing wrong with the docker networking.

@designermonkey designermonkey added bug Something isn't working untriaged Issues that have not yet been triaged labels Feb 16, 2023
@designermonkey
Copy link
Author

I neglected to mention I changed all references of opensearch-node1 to node01 and all references of opensearch-node2 to node02.

@dbwiddis
Copy link
Member

The nodes are failing to form a cluster:

2023-02-16T16:24:20,416][INFO ][o.o.c.c.JoinHelper       ] [node01] failed to join {node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=70, optionalJoin=Optional[Join{term=70, lastAcceptedTerm=68, lastAcceptedVersion=18, sourceNode={node01}{Bqk8Khh8R5GjoDQaF-C-Cg}{brqDhviuRpKIrMGzzWX7Xw}{10.0.0.212}{10.0.0.212:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={node02}{enQl9djoRA24TYJYOUGMnw}{L3OlmhvHRY2Y1HxH_wibhQ}{10.0.24.6}{10.0.24.6:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [node02][10.0.24.6:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.transport.ConnectTransportException: [node01][10.0.0.212:9300] connect_timeout[30s]
        at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout([TcpTransport.java:1082](http://tcptransport.java:1082/)) ~[opensearch-2.5.0.jar:2.5.0]
        at [org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run](http://org.opensearch.common.util.concurrent.threadcontext%24contextpreservingrunnable.run/)([ThreadContext.java:747](http://threadcontext.java:747/)) ~[opensearch-2.5.0.jar:2.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker([ThreadPoolExecutor.java:1136](http://threadpoolexecutor.java:1136/)) ~[?:?]
        at [java.util.concurrent.ThreadPoolExecutor$Worker.run](http://java.util.concurrent.threadpoolexecutor%24worker.run/)([ThreadPoolExecutor.java:635](http://threadpoolexecutor.java:635/)) ~[?:?]
        at [java.lang.Thread.run](http://java.lang.thread.run/)([Thread.java:833](http://thread.java:833/)) [?:?]

I haven't used Docker Swarm but a brief investigation shows multiple configuration options such as scaling or secure communications that I think could possibly conflict with the OpenSearch cluster model. Can you give a few more details about your configuration, compose file, etc.? It seems like we're trying multiple different ways of running multiple containers and having them talk to each other.

@designermonkey
Copy link
Author

Here's a slightly modified compose file; I went right back to basics and made another discovery:

---
version: '3'
services:
  opensearch-node1:
    image: opensearchproject/opensearch:2.5.0
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
    #   - 9600:9600 # required for Performance Analyzer
    networks:
      - opensearch-net
  opensearch-node2:
    image: opensearchproject/opensearch:2.5.0
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net
  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:2.5.0
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    environment:
      OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # must be a string with no spaces when specified as an environment variable
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:

I have discovered something exciting. If I don't bind the ports on the opensearch-node1, then it works fine, but if I uncomment the port binding, it is never able to join the cluster.

@designermonkey
Copy link
Author

I've been experimenting to see if it is docker networking that is causing the issue, but it isn't as other services in swarm mode can map ports perfectly well, and can communicate with other containers on multiple swarm networks.

It's definitely something in the configuration of opensearch. I will do a little more investigating today, but I know little about how this all works.

@designermonkey
Copy link
Author

designermonkey commented Feb 18, 2023

It seems that the first node joins the docker swarm ingress network, while any others always join the service network. I have no idea what to do here.

For reference" https://stackoverflow.com/questions/70141442/elasticsearch-cluster-doesnt-work-on-docker-swarm

@designermonkey
Copy link
Author

After some experimentation, here are my findings:

  1. If a docker port is mapped, then network.publish_host: _eth1_ must be added.
  2. If no docker port is mapped, then network.publish_host: _eth0_ must be added.

This ensures that the instances communicate on the same networks and therefore can see each other. This is highly unreliable, of course, and I'm wondering if there may be a better way of discovering the right network and port configuration inside the containers.

@kartg kartg removed the untriaged Issues that have not yet been triaged label Feb 24, 2023
@anasalkouz anasalkouz added Build Libraries & Interfaces untriaged Issues that have not yet been triaged labels Feb 28, 2023
@minalsha
Copy link

@bbarani can you please help on this?

@peterzhuamazon
Copy link
Member

Hi @designermonkey we only test the docker-compose file for simple docker-compose, and did not test on docker swarm.

If you are doing more complicated setup, you can try our helm charts repo which deployed to Kube, and has been actively contributed and tested by community:
https://github.com/opensearch-project/helm-charts

Adding @jeffh-aws to take a look on the possible options with docker swarm/
Thanks.

@minalsha minalsha removed the untriaged Issues that have not yet been triaged label Mar 2, 2023
@jerrac
Copy link

jerrac commented Mar 4, 2023

@designermonkey I was having fun getting OpenSearch to work on Docker Swarm today. Your tip about setting the network.publish_host setting helped. Though I ended up with network.host: "_eth0_" where eth0 was the nic with the ip of the Docker overlay network I was putting my containers on. So thanks for posting this issue!

I also ran into issues when I had set the memory limit too low, and when my host vms had vm.max_map_count set to low.

I have 2 of my instances connected. The third is hitting some weird java exception errors that seem to only happen on the specific worker node... A topic for a different location though.

@peterzhuamazon It would be awesome to get more support for Docker Swarm out there. I get that the big companies all use Kube. But Kube is not exactly friendly for smaller organizations.

I almost went into a whole spiel on this topic, but this really isn't the place for it. If anyone is interested, feel free to email me. :)

@dblock
Copy link
Member

dblock commented Mar 8, 2023

Let's move this to opensearch-devops.

@dblock dblock transferred this issue from opensearch-project/OpenSearch Mar 8, 2023
@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Mar 8, 2023
@gaiksaya
Copy link
Member

[Triage] @CEHENKLE @elfisher Any thoughts about onboarding to docker swarm?

@gaiksaya gaiksaya removed the untriaged Issues that have not yet been triaged label Mar 14, 2023
@jbates58
Copy link

i'll throw my hand in here, I too am getting this same issue. I made a post on the forum about it with my compose file, aswell as log outputs.

https://forum.opensearch.org/t/multi-node-docker-setup-not-working/15235/3

@jordarlu
Copy link
Contributor

jordarlu commented Dec 6, 2023

Looping in @pallavipr and @bbarani for comments on supporting docker swarm. thanks.

@perry-mitchell
Copy link

perry-mitchell commented Jun 19, 2024

I suppose there hasn't been any movement here? I'm seeing the exact same issue with docker compose, locally, so I don't think it's related to swarm, nor is it fixed, at least. My config:

services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      # - plugins.security.disabled=true
      # - cluster.routing.allocation.enable=all
      - 'DISABLE_INSTALL_DEMO_CONFIG=true'
      - 'DISABLE_SECURITY_PLUGIN=true'
      - 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m'
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=teSt!1
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net
      - otel-net
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
  opensearch-node2:
    image: opensearchproject/opensearch:latest
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      # - plugins.security.disabled=true
      # - cluster.routing.allocation.enable=all
      - 'DISABLE_INSTALL_DEMO_CONFIG=true'
      - 'DISABLE_SECURITY_PLUGIN=true'
      - 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m'
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=teSt!1
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
  # opensearch-dashboard:
  #   image: opensearchproject/opensearch-dashboards:latest
  #   ports:
  #     - 5601:5601
  #   expose:
  #     - '5601'
  #   environment:
  #     DISABLE_SECURITY_DASHBOARDS_PLUGIN: 'true'
  #     OPENSEARCH_HOSTS: '["http://opensearch-node1:9200","http://opensearch-node2:9200"]'
  #   networks:
  #     - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:

Seeing errors such as:

opensearch-node2-1  | [2024-06-19T08:28:03,609][INFO ][o.o.c.c.JoinHelper       ] [opensearch-node2] failed to join {opensearch-node1}{hNhhBK9MR4q5jugObtxpRw}{edTUBzEaR1qsvz0GM4gDBg}{172.18.0.2}{172.18.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={opensearch-node2}{p80y6oPlSuG2MKrQltpAzA}{gtBnsp2ASKS8LPyVmM_xFA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=2, optionalJoin=Optional[Join{term=3, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={opensearch-node2}{p80y6oPlSuG2MKrQltpAzA}{gtBnsp2ASKS8LPyVmM_xFA}{172.19.0.3}{172.19.0.3:9300}{dimr}{shard_indexing_pressure_enabled=true}, targetNode={opensearch-node1}{hNhhBK9MR4q5jugObtxpRw}{edTUBzEaR1qsvz0GM4gDBg}{172.18.0.2}{172.18.0.2:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests