Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of a cassandra daemonset. #16004

Merged
merged 1 commit into from
Dec 3, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
95 changes: 95 additions & 0 deletions examples/cassandra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,98 @@ UN 10.244.0.5 74.09 KB 256 49.7% 86feda0f-f070-4a5b-bda1-2ee
UN 10.244.3.3 51.28 KB 256 51.0% dafe3154-1d67-42e1-ac1d-78e7e80dce2b rack1
```

### Using a DaemonSet

In Kubernetes a _[Daemon Set](../../docs/admin/daemons.md)_ can distribute pods onto Kubernetes nodes, one-to-one. Like a _ReplicationController_ it has a selector query which identifies the members of it's set. Unlike a _ReplicationController_ it has a node selector to limit which nodes are scheduled with the templated pods, and replicates not based on a set target number of pods, but rather assigns a single pod to each targeted node.

An example use case: when deploying to the cloud, the expectation is that instances are ephemeral and might die at any time. Cassandra is built to replicate data across the cluster to facilitate data redundancy, so that in the case that an instance dies, the data stored on the instance does not, and the cluster can react by re-replicating the data to other running nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above sentence applies whether or not you are using replication controller or daemonset, so it doesn't need to be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's quite the same - using a rc you could get multiple Cassandra nodes on a single Kubernetes node which wouldn't give you data redundancy. I'm trying to stress the utility of using a daemonset in terms of redundancy and best practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to encourage people to use a DaemonSet solely as a way to get at most one pod per node.

If that is all you need, then the short-term fix is to add a nodePort to your Pod (you don't have to use it, just pick one and that will force max one per node). Longer term, we plan to add less hacky support for expressing your spreading requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erictune It's not just about enforcing only one pod per node. It's about enforcing one node per pod, no more, no less. So, to achieve this by setting nodePort, I'd have to constantly manually update the replicationController to have replicas == number of nodes. With a DaemonSet this is automatic. I don't see what the problem is with encouraging this - the purpose of DaemonSet is to place a single pod on each selected node, correct? So, if you want to have data replication over your entire cluster, and allocate Cassandra onto newly created instances automatically, is this not what DaemonSet is designed to do?


DaemonSet is designed to place a single pod on each node in the Kubernetes cluster. If you're looking for data redundancy with Cassandra, let's create a daemonset to start our storage cluster:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where I would use this is if I was using cassandra on bare metal, and I was storing the data in a hostDir instead of an emptyDir, and I wanted to ensure that a cassandra daemon started on all nodes (or all matching nodes), and that it used any existing files in /var/lib/cassandra from the previous pod, thus saving the network cost of reconstruction, when perfectly good data is still there. This allows you to have less downtime due to reconstruction after a node reboot/power-down. If you want to change it to use a hostDir, and can verify that a cassandra node will come back after reboot without reconstruction, then I will be happy to take this PR. If you don't have access to bare metal, simulating with PD is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is how I'm using it in production right now. I didn't remember to update the example to use hostDir.

In my case I use aws spot instances which quite frequently die so using Cassandra in this way is the only way I have data redundancy and prevent losses due to unexpected instance combustion.

I'll update the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool use case with spot instances that I had not considered.

I guess there is still a possibility that all your spot instances go away at once, but that is uncommon enough that you live with that risk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erictune I presume you could create a few "stable" instances, mark those as being in a different datacenter inside Cassandra, and then ask Cassandra to have two-datacenter reliability. Then the data would be replicated on both a spot instance and a stable instance at all times.


<!-- BEGIN MUNGE: EXAMPLE cassandra-daemonset.yaml -->

```yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
name: cassandra
name: cassandra
spec:
template:
metadata:
labels:
name: cassandra
spec:
# Filter to specific nodes:
# nodeSelector:
# app: cassandra
containers:
- command:
- /run.sh
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: "gcr.io/google_containers/cassandra:v6"
name: cassandra
ports:
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
resources:
request:
cpu: 0.1
volumeMounts:
- mountPath: /cassandra_data
name: data
volumes:
- name: data
hostPath:
path: /var/lib/cassandra
```

[Download example](cassandra-daemonset.yaml?raw=true)
<!-- END MUNGE: EXAMPLE cassandra-daemonset.yaml -->

Most of this daemon set definition is identical to the Cassandra pod and ReplicationController definitions above, it simply gives the daemon set a recipe to use when it creates new Cassandra pods, and targets all Cassandra nodes in the cluster. The other differentiating part from a Replication Controller is the ```nodeSelector``` attribute which allows the daemonset to target a specific subset of nodes, and the lack of a ```replicas``` attribute due to the 1 to 1 node-pod relationship.

Create this daemonset:

```console
$ kubectl create -f examples/cassandra/cassandra-daemonset.yaml
```

Now if you list the pods in your cluster, and filter to the label ```name=cassandra```, you should see one cassandra pod for each node in your network:

```console
$ kubectl get pods -l="name=cassandra"
NAME READY STATUS RESTARTS AGE
cassandra-af6h5 1/1 Running 0 28s
cassandra-2jq1b 1/1 Running 0 32s
cassandra-34j2a 1/1 Running 0 29s
```

To prove that this all works, you can use the ```nodetool``` command to examine the status of the cluster. To do this, use the ```kubectl exec``` command to run ```nodetool``` in one of your Cassandra pods.

```console
$ kubectl exec -ti cassandra-af6h5 -- nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.244.0.5 74.09 KB 256 100.0% 86feda0f-f070-4a5b-bda1-2eeb0ad08b77 rack1
UN 10.244.4.2 32.45 KB 256 100.0% 0b1be71a-6ffb-4895-ac3e-b9791299c141 rack1
UN 10.244.3.3 51.28 KB 256 100.0% dafe3154-1d67-42e1-ac1d-78e7e80dce2b rack1
```

### tl; dr;

For those of you who are impatient, here is the summary of the commands we ran in this tutorial.
Expand All @@ -327,6 +419,9 @@ kubectl exec -ti cassandra -- nodetool status

# scale up to 4 nodes
kubectl scale rc cassandra --replicas=4

# create a daemonset to place a cassandra node on each kubernetes node
kubectl create -f examples/cassandra/cassandra-daemonset.yaml
```

### Seed Provider Source
Expand Down
44 changes: 44 additions & 0 deletions examples/cassandra/cassandra-daemonset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
name: cassandra
name: cassandra
spec:
template:
metadata:
labels:
name: cassandra
spec:
# Filter to specific nodes:
# nodeSelector:
# app: cassandra
containers:
- command:
- /run.sh
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: "gcr.io/google_containers/cassandra:v6"
name: cassandra
ports:
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
resources:
request:
cpu: 0.1
volumeMounts:
- mountPath: /cassandra_data
name: data
volumes:
- name: data
hostPath:
path: /var/lib/cassandra
1 change: 1 addition & 0 deletions examples/examples_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ func TestExampleObjectSchemas(t *testing.T) {
"rbd-with-secret": &api.Pod{},
},
"../examples/cassandra": {
"cassandra-daemonset": &extensions.DaemonSet{},
"cassandra-controller": &api.ReplicationController{},
"cassandra-service": &api.Service{},
"cassandra": &api.Pod{},
Expand Down