Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve out of the box Docker experience #10

Closed
eminn opened this issue Jul 20, 2016 · 20 comments · Fixed by #78
Closed

Improve out of the box Docker experience #10

eminn opened this issue Jul 20, 2016 · 20 comments · Fixed by #78

Comments

@eminn
Copy link
Contributor

eminn commented Jul 20, 2016

Currently users have to configure tcp-ip configuration for each hazelcast instance in docker.

To overcome this tackle, we should utilize discovery spi mechanisms in our docker images.

Users should be able to run a hazelcast cluster with just configuring the URL of the discovery mechanism.

@mesutcelik
Copy link
Contributor

Dear @eminn ,

That means we should distribute some discovery service libraries and ask users provide right configuration defined in their hazelcast.xml. Which discovery services you think we should support?

@eminn
Copy link
Contributor Author

eminn commented Oct 6, 2016

Dear @mesutcelik ,

I don't have usage statistics of the discovery services among the ecosystem but I think we can start with ZooKeeper. What do you think ?

@mesutcelik
Copy link
Contributor

Dear @eminn ,
Thanks for the answer!

If we make sure existing or future hazelcast discovery service libraries i.e hazelcast-zookeeper available in the classpath of hazelcast docker image and have the user provide correct hazelcast.xml configuration, would you be happy to use that way?

I am just trying to make up my mind to provide this feature request through one hazelcast docker image. I am happy to listen to other's opinions cc: @noctarius @jerrinot @bitsofinfo @bilalyasar

@jerrinot
Copy link
Contributor

jerrinot commented Oct 7, 2016

Zookeeper appears to be the only discovery mechanism supported directly by Hazelcast while Consul and Etcd are community supported? From this perspective it makes sense to use Zookeeper.

However I'm not sure how popular Zookeeper is in the cloud world.

@bitsofinfo
Copy link

In the next month or so I'll be revisiting hazelcast and re-attempting configuring it to work in a swarm with nodes coming up/down randomly.... so I'm glad this issue was referenced above, as I dread revisiting it: hazelcast/hazelcast#4537

That said...being included in this thread just alterted me that this hazelcast-docker project even exists and I'm curious as to how it would be used (i.e. my usages of hz is embedding it in another app that itself is already containerized).

Overall though? I wouldn't ship anything that forces the user to use any one particular discovery mechanism.

@jerrinot
Copy link
Contributor

@bitsofinfo: Many thanks for your insight!

I am not a docker expert so please forgive my ignorance here. I understood recommending to use Docker Swarm instead of Zookeeper? Isn't it for all practical intents and purposes just another discovery mechanism? I understand it's meant to de-couple Hazelcast from the actual discovery mechanism (Zookeeper, Consul, etc..), but doesn't it just couple Hazelcast to Swarm instead? How commonly is Swarm used? @bilalyasar: What's you view?

@bitsofinfo
Copy link

bitsofinfo commented Oct 10, 2016

No.

Swarm and zookeeper are entirely different categories of technologies. Zookeeper/consul/etcd would be considered in the service discovery/registry/meta-data realm of things.

Traditional swarm (< 1.12) is purely a native clustering technology provided by docker containers. In 1.12 they introduced swarm services which do have some service discovery aspect to them, BUT only in that all swarm nodes provide a consistent set of LB entry points across all swarm participating nodes. But regardless its not really in the same class at all as consul/ZK/etcd etc.

@mesutcelik
Copy link
Contributor

I think having discovery service jar files i.e hazelcast-zookeper in hazelcast docker image would be a good step forward. By doing that, people would not need to override hazelcast image to discovery service jars but instead they can use directly the discovery service libs from the image by enabling it in hazelcast.xml

Currently I am planning to add only hazelcast-zookeeper libraries but we can add more while Hazelcast supports more discovery services officially.

@jeacott1
Copy link

we're in a mesos/dns discovery env, and would like to be able to just assign domain names and let dns resolution work. unfortunately whilst hazelcast seems to work fine with members declared as ip address lists, handing the same list as domain names fails.

@bitsofinfo
Copy link

bitsofinfo commented Nov 4, 2016

All, well its been months and we are now revisiting this, ran into this again, perhaps summarized here: hazelcast/hazelcast#9219

@markvr
Copy link

markvr commented Mar 19, 2017

hi, just wondering if there are any plans / work-in-progress to add a Docker swarm discovery plugin (similar to the AWS and Azure ones) for running Hazelcast in an overlay network? I imagine this could work by querying DNS for "tasks.<servicename>" which returns the swarm IPs of all the containers running the service. Having looked at the Azure plugin, it doesn't look too hard to write a DiscoveryStrategy for this. We're still evaluating Hazelcast, but just wondered if there was any way to run it in a swarm yet?

@noctarius
Copy link
Contributor

There's no discovery plugin as of now but feel free to give it a first shot (just implementing two interfaces) and we're happy to help out on any questions on the way :)

@markvr
Copy link

markvr commented Mar 19, 2017

If we decide to go with Hazelcast I'll have a go at this. In the meantime if anyone else comes across this, the Kubernetes plugin is probably a good place to start as that also uses DNS for discovery, along with the Docker docs.

@noctarius
Copy link
Contributor

Yep it is, wrote it ;) Anyhow in kubernetes DNS discovery is somewhat less reliable than using the REST API services. Still don't know why exactly but technically it should work the same way for any DNS service, only port might be discoverable in a different way.

@markvr
Copy link

markvr commented Mar 23, 2017

I had a go at hacking something together, and it appears to work i.e. the nodes in the swarm on the overlay network discover each other by resolving the "tasks." DNS, but then I get the log output:

INFO: [172.18.0.4]:5701 [dev] [3.8] Connection[id=31, /10.0.0.4:5701->/10.0.0.3:34502, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.18.0.3]:5701! This node is not requested endpoint: [10.0.0.4]:5701

You can see them find each other but then they seem to drop the connection. There isn't any NATting going on (afaik) in the swarm, so they should only be communicating on the 10.x IP range.

I then tried setting the <interface> in the XML to 10...* and got the below output.
10.0.0.2 is the "virtual service" IP, 0.3 and 0.4 are the actual IPs of the containers. I'm not actually sure if the cluster is connected or not, I need to read the docs and see if I can get some insight into it.

INFO: [10.0.0.2]:5701 [dev] [3.8] Found: 2
Mar 23, 2017 1:05:11 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.0.0.2]:5701 [dev] [3.8] Found node service with address: [10.0.0.3]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.spi.discovery.integration.DiscoveryService
INFO: [10.0.0.2]:5701 [dev] [3.8] Found node service with address: [10.0.0.4]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.internal.cluster.impl.DiscoveryJoiner
FINE: [10.0.0.2]:5701 [dev] [3.8] Will send master question to each address in: [[10.0.0.3]:5701, [10.0.0.4]:5701]
Mar 23, 2017 1:05:11 AM com.hazelcast.internal.cluster.impl.DiscoveryJoiner
FINE: [10.0.0.2]:5701 [dev] [3.8] NOT sending master question to blacklisted endpoints: {}
Mar 23, 2017 1:05:11 AM com.hazelcast.internal.cluster.impl.DiscoveryJoiner
FINE: [10.0.0.2]:5701 [dev] [3.8] Sending master question to [10.0.0.3]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketAcceptorThread
INFO: [10.0.0.2]:5701 [dev] [3.8] Accepting socket connection from /10.0.0.3:49913
Mar 23, 2017 1:05:11 AM com.hazelcast.internal.cluster.impl.DiscoveryJoiner
FINE: [10.0.0.2]:5701 [dev] [3.8] Sending master question to [10.0.0.4]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.InitConnectionTask
INFO: [10.0.0.2]:5701 [dev] [3.8] Connecting to /10.0.0.4:5701, timeout: 0, bind-any: true
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.InitConnectionTask
INFO: [10.0.0.2]:5701 [dev] [3.8] Connecting to /10.0.0.3:5701, timeout: 0, bind-any: true
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketAcceptorThread
INFO: [10.0.0.2]:5701 [dev] [3.8] Accepting socket connection from /10.0.0.4:54804
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.0.0.2]:5701 [dev] [3.8] Established socket connection between /10.0.0.4:5701 and /10.0.0.3:49913
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.0.0.2]:5701 [dev] [3.8] Established socket connection between /10.0.0.4:54804 and /10.0.0.4:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketWriterInitializerImpl
FINE: [10.0.0.2]:5701 [dev] [3.8] Initializing SocketWriter WriteHandler with Cluster Protocol
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.0.0.2]:5701 [dev] [3.8] Established socket connection between /10.0.0.4:41259 and /10.0.0.3:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
INFO: [10.0.0.2]:5701 [dev] [3.8] Established socket connection between /10.0.0.4:5701 and /10.0.0.4:54804
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketWriterInitializerImpl
FINE: [10.0.0.2]:5701 [dev] [3.8] Initializing SocketWriter WriteHandler with Cluster Protocol
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketWriterInitializerImpl
FINE: [10.0.0.2]:5701 [dev] [3.8] Initializing SocketWriter WriteHandler with Cluster Protocol
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnectionManager
WARNING: [10.0.0.2]:5701 [dev] [3.8] Wrong bind request from [10.0.0.2]:5701! This node is not requested endpoint: [10.0.0.4]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.TcpIpConnection
INFO: [10.0.0.2]:5701 [dev] [3.8] Connection[id=1, /10.0.0.4:5701->/10.0.0.3:49913, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [10.0.0.2]:5701! This node is not requested endpoint: [10.0.0.4]:5701
Mar 23, 2017 1:05:11 AM com.hazelcast.nio.tcp.SocketWriterInitializerImpl
FINE: [10.0.0.2]:5701 [dev] [3.8] Initializing SocketWriter WriteHandler with Cluster Protocol

@mmedenjak
Copy link

mmedenjak commented Nov 22, 2017

@markvr The root causes of these docker issues are:

  • the DefaultAddressPicker implementation
  • the connection checks that disallow establishing a connection

The connection checks can be disabled but the connection still won't be established because of an another issue: hazelcast/hazelcast#11256
So your options are setting the hazelcast instance public address via the hazelcast properties or JVM param or overriding the DefaultAddressPicker implementation which picks wrong bind and public addresses.

@eminn @markvr
In hazelcast 3.9, a new SPI was added to allow you to plug in custom address picker implementations. For now you will have to use the SPI yourself and write an implementation which will fix your issue but we are planning on releasing implementations of our own which will be bundled into plugins such as the docker or AWS plugin for easier deployment.

Please check out the new SPI:
https://github.com/hazelcast/hazelcast/blob/3cede71cad1fe87312f0901ff77f903ed2d4383d/hazelcast/src/main/java/com/hazelcast/spi/MemberAddressProvider.java

Please create a new issue or reopen this one if this does not suit your use case.

@mesutcelik mesutcelik added this to the 3.10.1 milestone Apr 25, 2018
@mesutcelik
Copy link
Contributor

I see that we have to revisit all comments here and create separate issues if a new enhancement is needed.
Initial request was to include a discovery service in the hazelcast base docker image. Eureka is a good candidate here right now. cc: @googlielmo @leszko

@bitsofinfo
Copy link

are you suggesting embedding a eureka server in the base image?

@mesutcelik
Copy link
Contributor

mesutcelik commented Apr 25, 2018

nope but the putting https://github.com/hazelcast/hazelcast-eureka and its dependencies into image might be an easy to use for dynamic environments... User has to manage its own Eureka Server deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants