Merge pull request #930 from rax-maas/k8s-resource-rollout

roll out k8s resources - elasticsearch
rax-maas · Sep 15, 2022 · 839f41c · 839f41c
2 parents ec8e50c + 0e062c9
commit 839f41c
Show file tree

Hide file tree

Showing 3 changed files with 218 additions and 0 deletions.
diff --git a/contrib/blueflood-k8s/README.md b/contrib/blueflood-k8s/README.md
@@ -0,0 +1,53 @@
+# Blueflood for Kubernetes
+
+Building on lessons learned from [blueflood-minikube](../blueflood-minikube), this makes a fully deployable Blueflood
+Kubernetes descriptor.
+
+Start by getting your [kubectl](https://kubernetes.io/docs/tasks/tools/) connected to the cluster you want to deploy to.
+
+This project uses [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) as a light
+layer of management to reduce duplication in the normal K8s resources and manage ConfigMaps. The k8s resource files may
+be used the way they are, but they can be customized via overlays, courtesy of Kustomize.
+
+## General organization
+
+Following Kustomize's recommended layout, `base` contains the main set of resources. For organizational purposes,
+resources are grouped into files according to their service, such as `cassandra.yaml` or `elasticsearch.yaml`.
+
+All resources in a given yaml file have a label named `component` whose value is equal to the file base name. Therefore,
+all Cassandra resources are labeled with `component=cassandra`. This makes managing resources in the k8s cluster much
+simpler. This bears repeating: *all* resources here are assigned to a component. If a resource doesn't have a
+`component` label, it shouldn't exist.
+
+The general setup of a yaml file is as follows.
+
+- There's a group of pods to run the actual service, organized as either a [Deployment](
+  https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) or a
+  [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/). Both ensure a certain number of
+  pods stay running with your desired configuration. A StatefulSet also ensures that if a pod dies, its replacement will
+  be assigned the same persistent volume that the old one was using, which is very important for data stores.
+
+- There's a [Service](https://kubernetes.io/docs/concepts/services-networking/service/) with a
+  [ClusterIP](https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types) that
+  makes the set of pods available to other things in the cluster via DNS.
+
+- Pods expect a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) to be present in the
+  cluster with all necessary config files. By convention, the ConfigMap should be named `<component>-config`. As an
+  example, the ConfigMap for Elasticsearch is named `elasticsearch-config`. The files from the ConfigMap are mounted as a
+  directory in the pods. A good way to find what config files go in the ConfigMap is to start an instance of the pod's
+  image and copy the default config files out of it. ConfigMaps are easy to manage with [Kustomize's configMapGenerator](
+  https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#configmapgenerator).
+
+- Kubernetes supports mounting a ConfigMap directly to a file system as either a file or a directory of files. Often,
+  though, the ConfigMap isn't mounted directly onto the real pod due to ownership or access issues. Instead, pods have a
+  small [Volume]( https://kubernetes.io/docs/concepts/storage/volumes/) dedicated to config files. An
+  [InitContainer](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) prepares the volume by copying the
+  files to it from the ConfigMap and setting appropriate ownership, file mode, etc.
+
+- If necessary, pods use a [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) to
+  request persistent storage that will endure a pod restart. This is for pods that need non-ephemeral data and probably
+  means it's in a StatefulSet.
+
+- For groups of pods that form a cluster (Cassandra and Elasticsearch), there's a headless Service. This doesn't provide
+  a ClusterIP. Instead, it resolves in DNS to the IPs of all the cluster members, making it easier to do cluster
+  discovery.
diff --git a/contrib/blueflood-k8s/base/elasticsearch.yaml b/contrib/blueflood-k8s/base/elasticsearch.yaml
@@ -0,0 +1,163 @@
+---
+# A headless service that just returns the IPs of the master Elasticsearch pods. This is for the master nodes to
+# discover each other at bootstrap time.
+apiVersion: v1
+kind: Service
+metadata:
+  name: es-seed-discovery
+  labels:
+    component: elasticsearch
+    role: master
+spec:
+  selector:
+    component: elasticsearch
+    role: master
+  ports:
+    - name: es-transport
+      port: 9300
+      protocol: TCP
+  clusterIP: None
+---
+# The StatefulSet of Elasticsearch master nodes. These are the nodes eligible to be master at any time and also the seed
+# nodes for other nodes to use when joining the cluster. For the moment, this cluster is just this set of nodes, any of
+# which can be elected master. We don't differentiate other node types:
+# https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: es-master
+  labels:
+    component: elasticsearch
+    role: master
+spec:
+  selector:
+    matchLabels:
+      component: elasticsearch
+      role: master
+  serviceName: es-seed-discovery
+  replicas: 3
+  template:
+    metadata:
+      labels:
+        component: elasticsearch
+        role: master
+    spec:
+      initContainers:
+        # Set kernel param required by ES
+        - name: set-max-map-count
+          image: busybox:1.27.2
+          command: ['sysctl', '-w', 'vm.max_map_count=262144']
+          securityContext:
+            privileged: true
+        # Prep data and config volumes. ES runs as user 1000, and it needs to write to its config directory, so the
+        # ConfigMap has to be copied into a volume.
+        - name: prep-volumes
+          image: busybox:1.27.2
+          command:
+            - sh
+            - -c
+            - |
+              cp -rL /elasticsearch-config-source/* /elasticsearch-config-final 
+              chown -R 1000:1000 /elasticsearch-config-final
+              chown -R 1000:1000 /elasticsearch-data-pv
+              # ES 1.7 seems to require wide-open permissions on this dir; not sure why. Do away with this once we're
+              # sure we don't need 1.7 anymore.
+              chmod -R 777 /elasticsearch-data-pv
+          volumeMounts:
+            - name: elasticsearch-data
+              mountPath: /elasticsearch-data-pv
+            - name: config-source
+              mountPath: /elasticsearch-config-source
+            - name: elasticsearch-config
+              mountPath: /elasticsearch-config-final
+      containers:
+        - name: elasticsearch
+          # Requires mode 777 on data mount, and keeps spitting out errors about disk high watermark and moving shards.
+          # Mainly, can't get the initial cluster to form correctly because each pod only waits 30 seconds for
+          # discovery, then never tries again.
+          #image: elasticsearch:1.7
+          # Seems to be the right balance between upgrading and keeping things working with minimal code change.
+          image: elasticsearch:6.8.23
+          # Works for ingest, but querying doesn't work because of types; might be able to work around this in code with
+          # a config setting?
+          #image: elasticsearch:7.17.5
+          # Works great, but removed index types, so current Blueflood code doesn't work with it.
+          #image: elasticsearch:8.3.3
+          env:
+            # Tells Elasticsearch the directory to look for config files in. This path already exists in the image, so
+            # it's convenient to use.
+            - name: ES_PATH_CONF
+              value: "/usr/share/elasticsearch/config"
+          ports:
+            - containerPort: 9200
+              name: http
+            - containerPort: 9300
+              name: es-transport
+          startupProbe:
+            tcpSocket:
+              port: http
+            # ES seems to take well over a minute to start up. I'm not sure if there's something we can do to make that
+            # faster. This will wait 10 seconds * 18 attempts.
+            periodSeconds: 10
+            failureThreshold: 18
+          livenessProbe:
+            tcpSocket:
+              port: http
+          readinessProbe:
+            tcpSocket:
+              port: http
+          resources:
+            limits:
+              cpu: "1"
+              memory: 4Gi
+            requests:
+              cpu: "0.5"
+              memory: 2Gi
+          volumeMounts:
+            # Persistent volume for long-term data storage
+            - name: elasticsearch-data
+              mountPath: /elasticsearch-data-pv
+            # Elasticsearch config files should be in a config map, which we mount here.
+            - name: elasticsearch-config
+              mountPath: /usr/share/elasticsearch/config
+      volumes:
+        - name: config-source
+          configMap:
+            name: elasticsearch-config
+  volumeClaimTemplates:
+    - metadata:
+        name: elasticsearch-data
+        labels:
+          component: elasticsearch
+      spec:
+        accessModes: [ ReadWriteOnce ]
+        resources:
+          requests:
+            storage: 1Gi
+    - metadata:
+        name: elasticsearch-config
+        labels:
+          component: elasticsearch
+      spec:
+        accessModes: [ "ReadWriteOnce" ]
+        resources:
+          requests:
+            storage: 10Mi
+---
+# And finally, the main service that makes Elasticsearch visible to other parts of the application.
+apiVersion: v1
+kind: Service
+metadata:
+  name: elasticsearch
+  labels:
+    component: elasticsearch
+spec:
+  selector:
+    component: elasticsearch
+  ports:
+    - name: http
+      port: 9200
+      protocol: TCP
+    - name: es-transport
+      port: 9300
+      protocol: TCP
diff --git a/contrib/blueflood-k8s/base/kustomization.yaml b/contrib/blueflood-k8s/base/kustomization.yaml
@@ -0,0 +1,2 @@
+resources:
+  - elasticsearch.yaml