Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Commit

Permalink
Merge pull request #278 from wallrj/232-remove-cql
Browse files Browse the repository at this point in the history
Automatic merge from submit-queue.

Remove the loadbalanced CQL service and document an alternative way to connect.

In no particular order:

* Removed the CQL service.
* Added a headless `nodes` service.
* Renamed the `seedprovider` service to `seeds`.
* Added documentation explaining the two headless services and how to connect a CQL client.
* Refactored the two headless services into a single service control
* Removed the service update code, which was not tested and which added unecessary complication.
* Navigator will now only touch the services if they are missing.
* Updated the E2E tests so that `cql_connect` always attempts to connect to the `nodes` service.
* Removed the ServiceName from the NodePool statefulsets because it no longer made sense with multiple NodePools / StatefulSets. The SS servicename is supposed to match a service dedicated to that statefulset, not a single seedprovider service. We probably should dynamically create a service for each nodepool.
* Removed the, also broken, CASSANDRA_SEEDS configuration which was pointing `seedProviderServiceName` rather than at a service name matching the name of the statefulset.
* In E2E tests Reverted to a better mechanism for simulating node failure. Decommission leaves the node in a decommissioned state causing the C* process to immediately exit.
* Added CONSISTENCY checks to the E2E CQL queries to verify that both C* nodes are reachable and have the test data.

Fixes: #232

**Release note**:
```release-note
NONE
```
  • Loading branch information
jetstack-ci-bot committed Mar 13, 2018
2 parents 429be02 + 7215ca5 commit 3a7e381
Show file tree
Hide file tree
Showing 27 changed files with 372 additions and 479 deletions.
61 changes: 61 additions & 0 deletions docs/cassandra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,64 @@ The ``resources`` field follows exactly the same specification as the Kubernetes
(``pod.spec.containers[].resources``).

See `Managing Compute Resources for Containers <https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/>`_ for more information.


Connecting to Cassandra
-----------------------

If you apply the YAML manifest from the example above,
Navigator will create a Cassandra cluster with three C* nodes running in three pods.
The IP addresses assigned to each C* node may change when pods are rescheduled or restarted, but there are stable DNS names which allow you to connect to the cluster.

Services and DNS Names
~~~~~~~~~~~~~~~~~~~~~~

Navigator creates two `headless services <https://kubernetes.io/docs/concepts/services-networking/service/#headless-services>`_ for every Cassandra cluster that it creates.
Each service has a corresponding DNS domain name:

#. The *nodes* service (e.g. ``cass-demo-nodes``) has a DNS domain name which resolves to the IP addresses of **all** the C* nodes in cluster (nodes 0, 1, and 2 in this example).
#. The *seeds* service (e.g. ``cass-demo-seeds``) has a DNS domain name which resolves to the IP addresses of **only** the `seed nodes <http://cassandra.apache.org/doc/latest/faq/index.html#what-are-seeds>`_ (node 0 in this example).

These DNS names have multiple HOST (`A`) records, one for each **healthy** C* node IP address.

.. note::
The DNS server only includes `healthy <https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/>`_ nodes when answering requests for these two services.

The DNS names can be resolved from any pod in the Kubernetes cluster:

* If the pod is in the same namespace as the Cassandra cluster you need only use the left most label of the DNS name. E.g. ``cass-demo-nodes``.
* If the pod is in a different namespace you must use the fully qualified DNS name. E.g. ``cass-demo-nodes.my-namespace.svc.cluster.local``.

.. note::
Read `DNS for Services and Pods <https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/>`_ for more information about DNS in Kubernetes.

TCP Ports
~~~~~~~~~

The C* nodes all listen on the following TCP ports:

#. **9042**: For CQL client connections.
#. **8080**: For Prometheus client connections.

Connect using a CQL Client
~~~~~~~~~~~~~~~~~~~~~~~~~~

Navigator configures all the nodes in a Cassandra cluster to listen on TCP port 9042 for `CQL client connections <http://cassandra.apache.org/doc/latest/cql/>`_.
And there are `CQL drivers for most popular programming languages <http://cassandra.apache.org/doc/latest/getting_started/drivers.html>`_.
Most drivers have the ability to connect to a single node and then discover all the other cluster nodes.

For example, you could use the `Datastax Python driver <http://datastax.github.io/python-driver/>`_ to connect to the Cassandra cluster as follows:

.. code-block:: python
from cassandra.cluster import Cluster
cluster = Cluster(['cass-demo-nodes'], port=9042)
session = cluster.connect()
rows = session.execute('SELECT ... FROM ...')
for row in rows:
print row
.. note::
The IP address to which the driver makes the initial connection
depends on the DNS server and operating system configuration.
1 change: 0 additions & 1 deletion docs/quick-start/cassandra-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ metadata:
name: "demo"
spec:
version: "3.11.1"
cqlPort: 9042
sysctls:
- "vm.max_map_count=0"
nodePools:
Expand Down
77 changes: 38 additions & 39 deletions hack/e2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ function test_cassandracluster() {
--namespace "${namespace}" \
--filename \
<(envsubst \
'$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_CQL_PORT:$CASS_VERSION' \
'$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_VERSION' \
< "${SCRIPT_DIR}/testdata/cass-cluster-test.template.yaml")
then
fail_test "Failed to create cassandracluster"
Expand Down Expand Up @@ -246,31 +246,31 @@ function test_cassandracluster() {
# Wait 5 minutes for cassandra to start and listen for CQL queries.
if ! retry TIMEOUT=300 cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9042; then
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}"; then
fail_test "Navigator controller failed to create cassandracluster service"
fi

if ! retry TIMEOUT=300 in_cluster_command \
"${namespace}" \
"alpine:3.6" \
/bin/sh -c "apk add --no-cache curl && curl -vv http://cass-${CASS_NAME}-ringnodes-0.cass-${CASS_NAME}-seedprovider:8080"; then
/bin/sh -c "apk add --no-cache curl && curl -vv http://cass-${CASS_NAME}-nodes:8080"; then
fail_test "Pilot did not start Prometheus metric exporter"
fi

# Create a database
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9042 \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug \
< "${SCRIPT_DIR}/testdata/cassandra_test_database1.cql"

# Insert a record
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9042 \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug \
--execute="INSERT INTO space1.testtable1(key, value) VALUES('testkey1', 'testvalue1')"

Expand All @@ -282,8 +282,8 @@ function test_cassandracluster() {
not \
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9042 \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug
# Kill the cassandra process gracefully which allows it to flush its data to disk.
# kill_cassandra_process \
Expand All @@ -303,38 +303,21 @@ function test_cassandracluster() {
stdout_matches "testvalue1" \
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9042 \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug \
--execute='SELECT * FROM space1.testtable1'
then
fail_test "Cassandra data was lost"
fi

# Change the CQL port
export CASS_CQL_PORT=9043
kubectl apply \
--namespace "${namespace}" \
--filename \
<(envsubst \
'$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_CQL_PORT:$CASS_VERSION' \
< "${SCRIPT_DIR}/testdata/cass-cluster-test.template.yaml")

# Wait 60s for cassandra CQL port to change
if ! retry TIMEOUT=60 cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9043; then
fail_test "Navigator controller failed to update cassandracluster service"
fi

# Increment the replica count
export CASS_REPLICAS=2
kubectl apply \
--namespace "${namespace}" \
--filename \
<(envsubst \
'$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_CQL_PORT:$CASS_VERSION' \
'$NAVIGATOR_IMAGE_REPOSITORY:$NAVIGATOR_IMAGE_TAG:$NAVIGATOR_IMAGE_PULLPOLICY:$CASS_NAME:$CASS_REPLICAS:$CASS_VERSION' \
< "${SCRIPT_DIR}/testdata/cass-cluster-test.template.yaml")

if ! retry TIMEOUT=300 stdout_equals 2 kubectl \
Expand All @@ -348,7 +331,7 @@ function test_cassandracluster() {

# TODO: A better test would be to query the endpoints and check that only
# the `-0` pods are included. E.g.
# kubectl -n test-cassandra-1519754828-19864 get ep cass-cassandra-1519754828-19864-cassandra-seedprovider -o "jsonpath={.subsets[*].addresses[*].hostname}"
# kubectl -n test-cassandra-1519754828-19864 get ep cass-cassandra-1519754828-19864-cassandra-seeds -o "jsonpath={.subsets[*].addresses[*].hostname}"
if ! stdout_equals "cass-${CASS_NAME}-ringnodes-0" \
kubectl get pods --namespace "${namespace}" \
--selector=navigator.jetstack.io/cassandra-seed=true \
Expand All @@ -357,16 +340,32 @@ function test_cassandracluster() {
fail_test "First cassandra node not marked as seed"
fi

if ! retry \
stdout_matches "testvalue1" \
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug \
--execute='CONSISTENCY ALL; SELECT * FROM space1.testtable1'
then
fail_test "Data was not replicated to second node"
fi

simulate_unresponsive_cassandra_process \
"${namespace}" \
"cass-${CASS_NAME}-ringnodes-0" \
"cassandra"

if ! retry cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-cql" \
9043; then
fail_test "Cassandra readiness probe failed to bypass dead node"
"cass-${CASS_NAME}-ringnodes-0"

if ! retry TIMEOUT=600 \
stdout_matches "testvalue1" \
cql_connect \
"${namespace}" \
"cass-${CASS_NAME}-nodes" \
"${CASS_CQL_PORT}" \
--debug \
--execute='CONSISTENCY ALL; SELECT * FROM space1.testtable1'
then
fail_test "Cassandra liveness probe failed to restart dead node"
fi
}

Expand Down
25 changes: 14 additions & 11 deletions hack/libe2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -116,30 +116,33 @@ function kube_event_exists() {
return 1
}

function simulate_unresponsive_cassandra_process() {
local namespace=$1
local pod=$2
local container=$3
# Decommission causes cassandra to stop accepting CQL connections.
function decommission_cassandra_node() {
local namespace="${1}"
local pod="${2}"
kubectl \
--namespace="${namespace}" \
exec "${pod}" --container="${container}" -- \
exec "${pod}" -- \
/bin/sh -c 'JVM_OPTS="" exec nodetool decommission'
}

function signal_cassandra_process() {
local namespace=$1
local pod=$2
local container=$3
local signal=$4
local namespace="${1}"
local pod="${2}"
local signal="${3}"

# Send STOP signal to all the cassandra user's processes
kubectl \
--namespace="${namespace}" \
exec "${pod}" --container="${container}" -- \
exec "${pod}" -- \
bash -c "kill -${signal}"' -- $(ps -u cassandra -o pid=) && ps faux'
}

function simulate_unresponsive_cassandra_process() {
local namespace="${1}"
local pod="${2}"
signal_cassandra_process "${namespace}" "${pod}" "SIGSTOP"
}

function stdout_equals() {
local expected="${1}"
shift
Expand Down
1 change: 0 additions & 1 deletion hack/testdata/cass-cluster-test.template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ metadata:
name: "${CASS_NAME}"
spec:
version: "${CASS_VERSION}"
cqlPort: ${CASS_CQL_PORT}
sysctls:
- "vm.max_map_count=0"
nodePools:
Expand Down
1 change: 1 addition & 0 deletions internal/test/unit/framework/state_fixture.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ func (s *StateFixture) Start() {
ConfigMapLister: s.kubeSharedInformerFactory.Core().V1().ConfigMaps().Lister(),
PilotLister: s.navigatorSharedInformerFactory.Navigator().V1alpha1().Pilots().Lister(),
PodLister: s.kubeSharedInformerFactory.Core().V1().Pods().Lister(),
ServiceLister: s.kubeSharedInformerFactory.Core().V1().Services().Lister(),
}
s.stopCh = make(chan struct{})
s.kubeSharedInformerFactory.Start(s.stopCh)
Expand Down
1 change: 0 additions & 1 deletion pkg/apis/navigator/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ type CassandraClusterSpec struct {
NodePools []CassandraClusterNodePool
Version version.Version
Image *ImageSpec
CqlPort int32
}

type CassandraClusterNodePool struct {
Expand Down
2 changes: 0 additions & 2 deletions pkg/apis/navigator/v1alpha1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ type CassandraClusterSpec struct {
// Image describes the database image to use
Image *ImageSpec `json:"image"`

CqlPort int32 `json:"cqlPort"`

// The version of the database to be used for nodes in the cluster.
Version version.Version `json:"version"`
}
Expand Down
2 changes: 0 additions & 2 deletions pkg/apis/navigator/v1alpha1/zz_generated.conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,6 @@ func autoConvert_v1alpha1_CassandraClusterSpec_To_navigator_CassandraClusterSpec
}
out.NodePools = *(*[]navigator.CassandraClusterNodePool)(unsafe.Pointer(&in.NodePools))
out.Image = (*navigator.ImageSpec)(unsafe.Pointer(in.Image))
out.CqlPort = in.CqlPort
out.Version = in.Version
return nil
}
Expand All @@ -241,7 +240,6 @@ func autoConvert_navigator_CassandraClusterSpec_To_v1alpha1_CassandraClusterSpec
out.NodePools = *(*[]CassandraClusterNodePool)(unsafe.Pointer(&in.NodePools))
out.Version = in.Version
out.Image = (*ImageSpec)(unsafe.Pointer(in.Image))
out.CqlPort = in.CqlPort
return nil
}

Expand Down
9 changes: 5 additions & 4 deletions pkg/controllers/cassandra/cassandra.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ import (
"github.com/jetstack/navigator/pkg/controllers/cassandra/role"
"github.com/jetstack/navigator/pkg/controllers/cassandra/rolebinding"
"github.com/jetstack/navigator/pkg/controllers/cassandra/seedlabeller"
servicecql "github.com/jetstack/navigator/pkg/controllers/cassandra/service/cql"
serviceseedprovider "github.com/jetstack/navigator/pkg/controllers/cassandra/service/seedprovider"
"github.com/jetstack/navigator/pkg/controllers/cassandra/service"
"github.com/jetstack/navigator/pkg/controllers/cassandra/serviceaccount"
)

Expand Down Expand Up @@ -98,15 +97,17 @@ func NewCassandra(
cc.rolesListerSynced = roles.Informer().HasSynced
cc.roleBindingsListerSynced = roleBindings.Informer().HasSynced
cc.control = NewControl(
serviceseedprovider.NewControl(
service.NewControl(
kubeClient,
services.Lister(),
recorder,
service.NodesServiceForCluster,
),
servicecql.NewControl(
service.NewControl(
kubeClient,
services.Lister(),
recorder,
service.SeedsServiceForCluster,
),
nodepool.NewControl(
kubeClient,
Expand Down
14 changes: 6 additions & 8 deletions pkg/controllers/cassandra/cluster_control.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ import (
"github.com/jetstack/navigator/pkg/controllers/cassandra/role"
"github.com/jetstack/navigator/pkg/controllers/cassandra/rolebinding"
"github.com/jetstack/navigator/pkg/controllers/cassandra/seedlabeller"
servicecql "github.com/jetstack/navigator/pkg/controllers/cassandra/service/cql"
serviceseedprovider "github.com/jetstack/navigator/pkg/controllers/cassandra/service/seedprovider"
"github.com/jetstack/navigator/pkg/controllers/cassandra/serviceaccount"
)

Expand All @@ -39,8 +37,8 @@ type ControlInterface interface {
var _ ControlInterface = &defaultCassandraClusterControl{}

type defaultCassandraClusterControl struct {
seedProviderServiceControl serviceseedprovider.Interface
cqlServiceControl servicecql.Interface
seedProviderServiceControl ControlInterface
nodesServiceControl ControlInterface
nodepoolControl nodepool.Interface
pilotControl pilot.Interface
serviceAccountControl serviceaccount.Interface
Expand All @@ -51,8 +49,8 @@ type defaultCassandraClusterControl struct {
}

func NewControl(
seedProviderServiceControl serviceseedprovider.Interface,
cqlServiceControl servicecql.Interface,
seedProviderServiceControl ControlInterface,
nodesServiceControl ControlInterface,
nodepoolControl nodepool.Interface,
pilotControl pilot.Interface,
serviceAccountControl serviceaccount.Interface,
Expand All @@ -63,7 +61,7 @@ func NewControl(
) ControlInterface {
return &defaultCassandraClusterControl{
seedProviderServiceControl: seedProviderServiceControl,
cqlServiceControl: cqlServiceControl,
nodesServiceControl: nodesServiceControl,
nodepoolControl: nodepoolControl,
pilotControl: pilotControl,
serviceAccountControl: serviceAccountControl,
Expand All @@ -87,7 +85,7 @@ func (e *defaultCassandraClusterControl) Sync(c *v1alpha1.CassandraCluster) erro
)
return err
}
err = e.cqlServiceControl.Sync(c)
err = e.nodesServiceControl.Sync(c)
if err != nil {
e.recorder.Eventf(
c,
Expand Down

0 comments on commit 3a7e381

Please sign in to comment.