-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of a cassandra daemonset. #16004
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -305,6 +305,98 @@ UN 10.244.0.5 74.09 KB 256 49.7% 86feda0f-f070-4a5b-bda1-2ee | |
UN 10.244.3.3 51.28 KB 256 51.0% dafe3154-1d67-42e1-ac1d-78e7e80dce2b rack1 | ||
``` | ||
|
||
### Using a DaemonSet | ||
|
||
In Kubernetes a _[Daemon Set](../../docs/admin/daemons.md)_ can distribute pods onto Kubernetes nodes, one-to-one. Like a _ReplicationController_ it has a selector query which identifies the members of it's set. Unlike a _ReplicationController_ it has a node selector to limit which nodes are scheduled with the templated pods, and replicates not based on a set target number of pods, but rather assigns a single pod to each targeted node. | ||
|
||
An example use case: when deploying to the cloud, the expectation is that instances are ephemeral and might die at any time. Cassandra is built to replicate data across the cluster to facilitate data redundancy, so that in the case that an instance dies, the data stored on the instance does not, and the cluster can react by re-replicating the data to other running nodes. | ||
|
||
DaemonSet is designed to place a single pod on each node in the Kubernetes cluster. If you're looking for data redundancy with Cassandra, let's create a daemonset to start our storage cluster: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The case where I would use this is if I was using cassandra on bare metal, and I was storing the data in a hostDir instead of an emptyDir, and I wanted to ensure that a cassandra daemon started on all nodes (or all matching nodes), and that it used any existing files in /var/lib/cassandra from the previous pod, thus saving the network cost of reconstruction, when perfectly good data is still there. This allows you to have less downtime due to reconstruction after a node reboot/power-down. If you want to change it to use a hostDir, and can verify that a cassandra node will come back after reboot without reconstruction, then I will be happy to take this PR. If you don't have access to bare metal, simulating with PD is fine too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually this is how I'm using it in production right now. I didn't remember to update the example to use hostDir. In my case I use aws spot instances which quite frequently die so using Cassandra in this way is the only way I have data redundancy and prevent losses due to unexpected instance combustion. I'll update the PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool use case with spot instances that I had not considered. I guess there is still a possibility that all your spot instances go away at once, but that is uncommon enough that you live with that risk? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @erictune I presume you could create a few "stable" instances, mark those as being in a different datacenter inside Cassandra, and then ask Cassandra to have two-datacenter reliability. Then the data would be replicated on both a spot instance and a stable instance at all times. |
||
|
||
<!-- BEGIN MUNGE: EXAMPLE cassandra-daemonset.yaml --> | ||
|
||
```yaml | ||
apiVersion: extensions/v1beta1 | ||
kind: DaemonSet | ||
metadata: | ||
labels: | ||
name: cassandra | ||
name: cassandra | ||
spec: | ||
template: | ||
metadata: | ||
labels: | ||
name: cassandra | ||
spec: | ||
# Filter to specific nodes: | ||
# nodeSelector: | ||
# app: cassandra | ||
containers: | ||
- command: | ||
- /run.sh | ||
env: | ||
- name: MAX_HEAP_SIZE | ||
value: 512M | ||
- name: HEAP_NEWSIZE | ||
value: 100M | ||
- name: POD_NAMESPACE | ||
valueFrom: | ||
fieldRef: | ||
fieldPath: metadata.namespace | ||
image: "gcr.io/google_containers/cassandra:v6" | ||
name: cassandra | ||
ports: | ||
- containerPort: 9042 | ||
name: cql | ||
- containerPort: 9160 | ||
name: thrift | ||
resources: | ||
request: | ||
cpu: 0.1 | ||
volumeMounts: | ||
- mountPath: /cassandra_data | ||
name: data | ||
volumes: | ||
- name: data | ||
hostPath: | ||
path: /var/lib/cassandra | ||
``` | ||
|
||
[Download example](cassandra-daemonset.yaml?raw=true) | ||
<!-- END MUNGE: EXAMPLE cassandra-daemonset.yaml --> | ||
|
||
Most of this daemon set definition is identical to the Cassandra pod and ReplicationController definitions above, it simply gives the daemon set a recipe to use when it creates new Cassandra pods, and targets all Cassandra nodes in the cluster. The other differentiating part from a Replication Controller is the ```nodeSelector``` attribute which allows the daemonset to target a specific subset of nodes, and the lack of a ```replicas``` attribute due to the 1 to 1 node-pod relationship. | ||
|
||
Create this daemonset: | ||
|
||
```console | ||
$ kubectl create -f examples/cassandra/cassandra-daemonset.yaml | ||
``` | ||
|
||
Now if you list the pods in your cluster, and filter to the label ```name=cassandra```, you should see one cassandra pod for each node in your network: | ||
|
||
```console | ||
$ kubectl get pods -l="name=cassandra" | ||
NAME READY STATUS RESTARTS AGE | ||
cassandra-af6h5 1/1 Running 0 28s | ||
cassandra-2jq1b 1/1 Running 0 32s | ||
cassandra-34j2a 1/1 Running 0 29s | ||
``` | ||
|
||
To prove that this all works, you can use the ```nodetool``` command to examine the status of the cluster. To do this, use the ```kubectl exec``` command to run ```nodetool``` in one of your Cassandra pods. | ||
|
||
```console | ||
$ kubectl exec -ti cassandra-af6h5 -- nodetool status | ||
Datacenter: datacenter1 | ||
======================= | ||
Status=Up/Down | ||
|/ State=Normal/Leaving/Joining/Moving | ||
-- Address Load Tokens Owns (effective) Host ID Rack | ||
UN 10.244.0.5 74.09 KB 256 100.0% 86feda0f-f070-4a5b-bda1-2eeb0ad08b77 rack1 | ||
UN 10.244.4.2 32.45 KB 256 100.0% 0b1be71a-6ffb-4895-ac3e-b9791299c141 rack1 | ||
UN 10.244.3.3 51.28 KB 256 100.0% dafe3154-1d67-42e1-ac1d-78e7e80dce2b rack1 | ||
``` | ||
|
||
### tl; dr; | ||
|
||
For those of you who are impatient, here is the summary of the commands we ran in this tutorial. | ||
|
@@ -327,6 +419,9 @@ kubectl exec -ti cassandra -- nodetool status | |
|
||
# scale up to 4 nodes | ||
kubectl scale rc cassandra --replicas=4 | ||
|
||
# create a daemonset to place a cassandra node on each kubernetes node | ||
kubectl create -f examples/cassandra/cassandra-daemonset.yaml | ||
``` | ||
|
||
### Seed Provider Source | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
apiVersion: extensions/v1beta1 | ||
kind: DaemonSet | ||
metadata: | ||
labels: | ||
name: cassandra | ||
name: cassandra | ||
spec: | ||
template: | ||
metadata: | ||
labels: | ||
name: cassandra | ||
spec: | ||
# Filter to specific nodes: | ||
# nodeSelector: | ||
# app: cassandra | ||
containers: | ||
- command: | ||
- /run.sh | ||
env: | ||
- name: MAX_HEAP_SIZE | ||
value: 512M | ||
- name: HEAP_NEWSIZE | ||
value: 100M | ||
- name: POD_NAMESPACE | ||
valueFrom: | ||
fieldRef: | ||
fieldPath: metadata.namespace | ||
image: "gcr.io/google_containers/cassandra:v6" | ||
name: cassandra | ||
ports: | ||
- containerPort: 9042 | ||
name: cql | ||
- containerPort: 9160 | ||
name: thrift | ||
resources: | ||
request: | ||
cpu: 0.1 | ||
volumeMounts: | ||
- mountPath: /cassandra_data | ||
name: data | ||
volumes: | ||
- name: data | ||
hostPath: | ||
path: /var/lib/cassandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above sentence applies whether or not you are using replication controller or daemonset, so it doesn't need to be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's quite the same - using a rc you could get multiple Cassandra nodes on a single Kubernetes node which wouldn't give you data redundancy. I'm trying to stress the utility of using a daemonset in terms of redundancy and best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to encourage people to use a DaemonSet solely as a way to get at most one pod per node.
If that is all you need, then the short-term fix is to add a nodePort to your Pod (you don't have to use it, just pick one and that will force max one per node). Longer term, we plan to add less hacky support for expressing your spreading requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erictune It's not just about enforcing only one pod per node. It's about enforcing one node per pod, no more, no less. So, to achieve this by setting nodePort, I'd have to constantly manually update the replicationController to have replicas == number of nodes. With a DaemonSet this is automatic. I don't see what the problem is with encouraging this - the purpose of DaemonSet is to place a single pod on each selected node, correct? So, if you want to have data replication over your entire cluster, and allocate Cassandra onto newly created instances automatically, is this not what DaemonSet is designed to do?