JGroups discovery protocol for Kubernetes
Java
Clone or download
Bela Ban
Latest commit 34dbad1 Jul 12, 2018

README.adoc

Kubernetes discovery protocol for JGroups

KUBE_PING is a discovery protocol for JGroups cluster nodes managed by Kubernetes.

Since Kubernetes is in charge of launching nodes, it knows the IP addresses of all pods it started, and is therefore the best place to ask for cluster discovery.

Discovery is therefore done by asking Kubernetes for a list of IP addresses of all cluster nodes.

Combined with bind_port / port_range, the protocol will then send a discovery request to all instances and wait for the responses.

A sample configuration looks like this:

Sample KUBE_PING config
  <TCP
     bind_addr="loopback,match-interface:eth0"
     bind_port="7800"
     ...
  />
  <org.jgroups.protocols.kubernetes.KUBE_PING
     port_range="1"
     namespace="${KUBE_NAMESPACE:production}"
     labels="${KUBE_LABEL:cluster=nyc}"
  />
  ...

When a discovery is started, KUBE_PING asks Kubernetes for a list of the IP addresses of all pods which it launched, matching the given namespace and labels (see below).

Let’s say Kubernetes launched a cluster of 3 pods with IP addresses 172.17.0.2, 172.17.0.3 and 172.17.0.5 (all launched into the same namespace and without any (or the same) labels).

On a discovery request, Kubernetes returns list of 3 IP addresses. JGroups now knows that the ports have been allocated from range [7800 .. 7801] (bind_port .. (bind_port + port_range) in TCP).

KUBE_PING therefore sends discovery requests to members at addresses 172.17.0.2:7800, 172.17.0.2:7801, 172.17.0.3:7800, 172.17.0.3:7801, 172.17.0.5:7800 and 172.17.0.5:7801.

Separating different clusters

If pods with containers in different clusters are launched, we’d get a list of IP addresses of all nodes, not just the ones in the same cluster, as Kubernetes knows nothing about clusters.

If we start multiple clusters, we need to separate them using namespaces and/or labels. If no namespaces or labels were used, things would still work, but we’d see warning messages in the logs about messages from different clusters that were discarded.

Namespaces

Namespaces can be used to separate deployments, services and pods. The example below shows namespaces default (the default namespace) and kube-system (the name space used by Kubernetes for its own deployments, pods etc):

[belasmac] /Users/bela/IspnPerfTest$ kubectl get services,deployments,pods --all-namespaces
NAMESPACE     NAME                       CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
default       svc/ispn-perf-test         10.0.0.137   <nodes>       8090:32700/TCP   8h
default       svc/kubernetes             10.0.0.1     <none>        443/TCP          53d
kube-system   svc/kube-dns               10.0.0.10    <none>        53/UDP,53/TCP    53d
kube-system   svc/kubernetes-dashboard   10.0.0.30    <nodes>       80:30000/TCP     53d

NAMESPACE   NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default     deploy/ispn-perf-test2   3         3         3            3           8h

NAMESPACE     NAME                                  READY     STATUS    RESTARTS   AGE
default       po/ispn-perf-test2-2831437780-g07c5   1/1       Running   1          8h
default       po/ispn-perf-test2-2831437780-np83d   1/1       Running   0          6h
default       po/ispn-perf-test2-2831437780-szl60   1/1       Running   1          8h
kube-system   po/kube-addon-manager-minikube        1/1       Running   12         53d
kube-system   po/kube-dns-v20-b63lk                 3/3       Running   34         53d
kube-system   po/kubernetes-dashboard-73c7v         1/1       Running   11         53d

The last section shows 6 pods that have been started, 3 in the default namespace and 3 by Kubernetes itself in the kube-system namespace. The following 3 namespaces exist:

[belasmac] /Users/bela$ kubectl get namespaces
NAME          STATUS    AGE
belaban       Active    2d
default       Active    53d
kube-system   Active    53d

To create a new namespace foo, run kubectl create namespace foo.

Namespaces have to be created and deleted manually via kubectl.

When launching a new deployment in a pod, the namespace can be given with -n <name space> or --namespace=<name space>. Alternatively, the namespace can be defined in the YAML or JSON config passed to kubectl create -f <config>.

In the JGroups configuration, attribute namespace is used to define the namespace to be used for discovery. The example above used namespace="${KUBE_NAMESPACE:production}". This means that the namespace is

  • the value of system property KUBE_NAMESPACE, if set

  • the value of environment variable KUBE_NAMESPACE if set, or

  • "production" if neither system property nor env var are set

In the example above, Kubernetes will return only IP addresses of pods created in namespace "production".

Labels

Labels are similar to namespaces; different labels can separate clusters running inside the same namespace.

Similarly to namespaces, labels also have to be defined when launching a pod, either via --labels=<label> passed to kubectl run, or in the YAML or JSON configuration file passed to kubectl create -f <config>.

In the sample configuration above, labels were defined as labels="${KUBE_LABEL:cluster=nyc}". This means that Kubernetes will only return IP addresses of pods started with label cluster=nyc.

Namespaces and labels can be both set; in this case, Kubernetes will return the IP addresses of all pods started in the given namespace matching the given label(s).

If neither namespace nor labels are set, then the IP addresses of all pods created by Kubernetes in the default namespace "default" will be returned.

KUBE_PING configuration

Attribute name Description

port_range

Number of additional ports to be probed for membership. A port_range of 0 does not probe additional ports. Example: initial_hosts=A[7800] port_range=0 probes A:7800, port_range=1 probes A:7800 and A:7801

connectTimeout

Max time (in millis) to wait for a connection to the Kubernetes server. If exceeded, an exception will be thrown

readTimeout

Max time (in millis) to wait for a response from the Kubernetes server

operationAttempts

Max number of attempts to send discovery requests

operationSleep

Time (in millis) between operation attempts

masterProtocol

http (default) or https. Used to send the initial discovery request to the Kubernetes server

masterHost

The URL of the Kubernetes server

masterPort

The port on which the Kubernetes server is listening

apiVersion

The version of the protocol to the Kubernetes server

namespace

The namespace to be used (leaving this undefined uses "default")

labels

The labels to use in the discovery request to the Kubernetes server

clientCertFile

Certificate to access the Kubernetes server

clientKeyFile

Client key file (store)

clientKeyPassword

The password to access the client key store

clientKeyAlgo

The algorithm used by the client

caCertFile

Client CA certificate

saTokenFile

Token file

dump_requests

Dumps all discovery requests and responses to the Kubernetes server to stdout when true

split_clusters_during_rolling_update

During the Rolling Update, prevents from putting all Pods into a single cluster

Demo

In this demo, we’re going to let Kubernetes start 3 instances of IspnPerfTest via a YAML configuration. Then we’ll run a separate instance interactively and confirm that the instances have formed a cluster of 4. All instances are created in the default namespace and no labels are used.

The YAML used is:

ispn-perf-test.yaml
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
  kind: Deployment
  metadata:
    annotations:
    name: ispn-perf-test
    namespace: default
  spec:
    replicas: 3
    selector:
    template:
      metadata:
        labels:
          run: ispn-perf-test
      spec:
        containers:
        - args:
          - /opt/jgroups/IspnPerfTest/bin/kube.sh
          - -nohup
          env:
          - name: KUBERNETES_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          image: belaban/ispn_perf_test
          name: ispn-perf-test
          resources: {}
          terminationMessagePath: /dev/termination-log
kind: List
metadata: {}

The image is belaban/ispn_perf_test which contains the IspnPerfTest project plus some scripts to start nodes. 3 instances are started and the start command is kube-debug.sh -nohup; this launches the programs without the loop which reads commands from stdin.

kubectl get pods confirms that 3 instances have been created:

belasmac] /Users/bela/kubetest$ kubectl get pods
NAME                              READY     STATUS    RESTARTS   AGE
ispn-perf-test-2224433472-6l456   1/1       Running   0          29s
ispn-perf-test-2224433472-ksh58   1/1       Running   0          29s
ispn-perf-test-2224433472-rlr0m   1/1       Running   0          29s

We can now run a shell in one of the nodes and confirm that a cluster of 3 has formed. First, we have to exec a bash shell in one of the 3 nodes:

[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-perf-test-2224433472-rlr0m bash
bash-4.3$

Now probe can be used to list all cluster members:

bash-4.3$ cd IspnPerfTest/
bash-4.3$ bin/probe.sh
-- sending probe request to /224.0.75.75:7500

#1 (300 bytes):
local_addr=ispn-perf-test-2224433472-rlr0m-12151
physical_addr=172.17.0.5:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#2 (299 bytes):
local_addr=ispn-perf-test-2224433472-ksh58-1200
physical_addr=172.17.0.6:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#3 (300 bytes):
local_addr=ispn-perf-test-2224433472-6l456-41832
physical_addr=172.17.0.7:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

3 responses (3 matches, 0 non matches)

As can be seen, every member has the same view ispn-perf-test-2224433472-ksh58-1200|2] (3) containing 3 members, so the cluster has formed correctly.

Now a fourth instance can be created, but this time we’ll enable the event loop reading from stdin. To this end, we have to use kubectl run -it (-it for interactively):

[belasmac] /Users/bela/kubetest$ kubectl run ispn -it --rm=true --image=belaban/ispn_perf_test kube.sh
Waiting for pod default/ispn-3105267510-nr9dp to be running, status is Pending, pod ready: false
If you don't see a command prompt, try pressing enter.

-------------------------------------------------------------------
GMS: address=ispn-3105267510-nr9dp-29942, cluster=default, physical address=172.17.0.8:7800
-------------------------------------------------------------------

-------------------------------------------------------------------
GMS: address=ispn-3105267510-nr9dp-43008, cluster=cfg, physical address=172.17.0.8:7900
-------------------------------------------------------------------
created 100,000 keys: [1-100,000], old key set size: 0
Fetched config from ispn-perf-test-2224433472-ksh58-51617: {print_details=true, num_threads=100, print_invokers=false, num_keys=100000, time_secs=60, msg_size=1000, read_percentage=1.0}
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true)  [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all

This starts the instance and it should have joined the cluster, which should now have 4 nodes. This can be confirmed by running probe.sh again in the other shell, or by pressing [2] View):

2

-- local: ispn-3105267510-nr9dp-43008
-- view: [ispn-perf-test-2224433472-ksh58-51617|3] (4) [ispn-perf-test-2224433472-ksh58-51617, ispn-perf-test-2224433472-rlr0m-11878, ispn-perf-test-2224433472-6l456-28251, ispn-3105267510-nr9dp-43008]

We can see that the view is now ispn-perf-test-2224433472-ksh58-51617|3] (4), and the cluster has correctly added the fourth member.

Running on Google Container Engine

The commands for running on Google Container Engine (GKE) are the same as when running locally in minikube.

The only difference is that on GKE, contrary to minikube, IP multicasting is not available. This means that the probe.sh command has to be run as probe.sh -addr localhost instead of simply running probe.sh.