add user guide (pingcap#187)

* add user guide * split user guide into tidb-operator setup, tidb-cluster operation and troubleshooting * add deployment recommendation for TiDB cluster * update README for document links
queenliuxx · Nov 23, 2018 · 9bd7590 · 9bd7590
1 parent 907ed75
commit 9bd7590
Show file tree

Hide file tree

Showing 5 changed files with 271 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -38,9 +38,19 @@ Read the [Roadmap](./ROADMAP.md).
 
 ## Quick start
 
-Read the [Deploy TiDB using Kubernetes on Your Laptop for development and testing](./docs/local-dind-tutorial.md), or follow a [tutorial](./docs/google-kubernetes-tutorial.md) to launch in Google Kubernetes Engine:
+Choose one of the following tutorials:
 
-[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/pingcap/tidb-operator&tutorial=docs/google-kubernetes-tutorial.md)
+* [Deploy TiDB using Kubernetes on Your Laptop for deployment and testing](./docs/local-dind-tutorial.md)
+
+* [Deploy TiDB by launching a Google Kubernetes Engine](./docs/google-kubernetes-tutorial.md):
+
+  [![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/pingcap/tidb-operator&tutorial=docs/google-kubernetes-tutorial.md)
+
+* [Deploy TiDB by launching an AWS EKS cluster](./docs/aws-eks-tutorial.md)
+
+## User guide
+
+Read the [user guide](./docs/user-guide.md).
 
 ## Contributing
 

diff --git a/docs/operation-guide.md b/docs/operation-guide.md
@@ -0,0 +1,139 @@
+# TiDB Cluster Operation Guide
+
+TiDB Operator can manage multiple clusters in the same Kubernetes cluster. Clusters are qualified by `namespace` and `clusterName`, namely different clusters may have same `namespace` or `clusterName` but not both.
+
+The default `clusterName` is `demo` which is defined in charts/tidb-cluster/values.yaml. The following variables will be used in the rest of the document:
+
+```shell
+$ releaseName="tidb-cluster"
+$ namespace="tidb"
+$ clusterName="demo" # Make sure this is the same as variable defined in charts/tidb-cluster/values.yaml
+```
+
+> **Note:** The rest of the document will use `values.yaml` to reference `charts/tidb-cluster/values.yaml`
+
+## Deploy TiDB cluster
+
+After TiDB Operator and Helm are deployed correctly, TiDB cluster can be deployed using following command:
+
+```shell
+$ helm install charts/tidb-cluster --name=${releaseName} --namespace=${namespace}
+$ kubectl get po -n ${namespace} -l app.kubernetes.io/name=tidb-operator
+```
+
+The default deployment doesn't set CPU and memory requests or limits for any of the pods, and the storage used is `local-storage` with minimal size. These settings can make TiDB cluster run on a small Kubernetes cluster like DinD or the default GKE cluster for testing. But for production deployment, you would likely to adjust the cpu, memory and storage resources according to the [recommendations](https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md).
+
+The resource limits should be equal or bigger than the resource requests, it is suggested to set limit and request equal to get [`Guaranteed` QoS]( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed).
+
+For other settings, the variables in `values.yaml` are self-explanatory with comments. You can modify them according to your need before installing the charts.
+
+## Access TiDB cluster
+
+By default TiDB service is exposed using [`NodePort`](https://kubernetes.io/docs/concepts/services-networking/service/#nodeport). You can modify it to `ClusterIP` which will disable access from outside of the cluster. Or modify it to [`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) if the underlining Kubernetes supports this kind of service.
+
+By default TiDB cluster is deployed with a random generated password. You can specify a password by setting `tidb.password` in `values.yaml` before deploying. Whether you specify the password or not, you can retrieve the password through `Secret`:
+
+```shell
+$ PASSWORD=$(kubectl get secret -n ${namespace} ${clusterName}-tidb -ojsonpath="{.data.password}" base64 -c | awk '{print $6}')
+$ echo ${PASSWORD}
+$ kubectl get svc -n ${namespace} # check the available services
+```
+
+* Access inside of the Kubernetes cluster
+
+    When your application is deployed in the same Kubernetes cluster, you can access TiDB via domain name `demo-tidb.tidb.svc` with port `4000`. Here `demo` is the `clusterName` which can be modified in `values.yaml`. And the latter `tidb` is the namespace you specified when using `helm install` to deploy TiDB cluster.
+
+* Access outside of the Kubernetes cluster
+
+    * Using kubectl portforward
+
+        ```shell
+        $ kubectl port-forward -n ${namespace} svc/${clusterName}-tidb 4000:4000 &>/tmp/portforward-tidb.log
+        $ mysql -h 127.0.0.1 -P 4000 -u root -p
+        ```
+
+    * Using LoadBalancer
+
+        When you set `tidb.service.type` to `LoadBalancer` and the underlining Kubernetes support LoadBalancer, then a LoadBalancer will be created for TiDB service. You can access it via the external IP with port `4000`. Some cloud platforms support internal load balancer via service annotations, for example you can add annotation `cloud.google.com/load-balancer-type: Internal` to `tidb.service.annotations` to create an internal load balancer for TiDB on GKE.
+
+    * Using NodePort
+
+        You can access TiDB via any node's IP with tidb service node port. The node port is the port after `4000`, usually greater than `30000`.
+
+## Scale TiDB cluster
+
+TiDB Operator has full support of horizontal scaling. But for vertical scaling, if you're using local volumes for PD and TiKV, then scaling up may cause pod pending if the node doesn't have enough resources. So it's not recommended to do vertical scaling.
+
+To scale in/out TiDB cluster, just modify the `replicas` of PD, TiKV and TiDB in `values.yaml` file. And then run the following command:
+
+```shell
+$ helm upgrade ${releaseName} charts/tidb-cluster
+```
+
+To scale up/down TiDB cluster, modify the cpu/memory limits and requests of PD, TiKV and TiDB in `values.yaml` file. And then run the same command as above. (Note: This may fail when using local volumes.)
+
+## Upgrade TiDB cluster
+
+Upgrade TiDB cluster is similar to scale TiDB cluster, but by changing `image` of PD, TiKV and TiDB to different image versions in `values.yaml`. And then run the following command:
+
+```shell
+$ helm upgrade ${releaseName} charts/tidb-cluster
+```
+
+## Destroy TiDB cluster
+
+To destroy TiDB cluster, run the following command:
+
+```shell
+$ helm delete ${releaseName} --purge
+```
+
+The above command only delete the running pods, the data is persistent. If you do not need the data anymore, you can run the following command to clean the data:
+
+```shell
+$ kubectl delete pvc -n ${namespace} -l app.kubernetes.io/instance=${releaseName},app.kubernetes.io/managed-by=tidb-operator
+$ kubectl get pv -l app.kubernetes.io/namespace=${namespace},app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/instance=${releaseName} -o name | xargs -I {} kubectl patch {} -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
+```
+
+> **Note:** the above command will delete the data permanently. Think twice before executing them.
+
+## Monitor
+
+TiDB cluster is monitored with Prometheus and Grafana. When TiDB cluster is created, a Prometheus and Grafana pod will be created and configured to scrape and visualize metrics.
+
+By default the monitor data is not persistent, when the monitor pod is killed for some reason, the data will be lost. This can be avoided by specifying `monitor.persistent` to `true` in `values.yaml` file.
+
+You can view the dashboard using `kubectl portforward`:
+
+```shell
+$ kubectl port-forward -n ${namespace} svc/${clusterName}-grafana 3000:3000 &>/tmp/portforward-grafana.log
+```
+
+Then open your browser at http://localhost:3000 The default username and password are both `admin`
+
+The Grafana service is exposed as `NodePort` by default, you can change it to `LoadBalancer` if the underlining Kubernetes has load balancer support. And then view the dashboard via load balancer endpoint.
+
+## Backup
+
+Currently, TiDB Operator supports two kinds of backup: full backup via [Mydumper](https://github.com/maxbube/mydumper) and incremental backup via binlog.
+
+### Full backup
+
+Full backup can be done periodically just like crontab job. Currently, full backup requires a PersistentVolume, the backup job will create a PVC to store backup data.
+
+To create a full backup job, modify `fullbackup` section in `values.yaml` file.
+
+* `create` must be set to `true`
+* Set `storageClassName` to the PV storage class name used for backup data
+* `schedule` takes the [Cron](https://en.wikipedia.org/wiki/Cron) format
+* `user` and `password` must be set to the correct user which has the permission to read the database to be backuped.
+
+If TiDB cluster is running on GKE, the backup data can be uploaded to GCS bucket. A bucket name and base64 encoded service account credential that has bucket read/write access must be provided. The comments in `values.yaml` is self-explanatory for GCP backup.
+
+### Incremental backup
+
+To enable incremental backup, set `binlog.pump.create` and `binlog.drainer.create` to `true`. By default the incremental backup data is stored in protobuffer format in a PV. You can change `binlog.drainer.destDBType` from `pb` to `mysql` or `kafka` and configure the corresponding downstream.
+
+## Restore
+
+Currently, tidb-operator only supports restoring from full backup in GCS bucket. The `restore` section in `values.yaml` should have enough comments as document.
diff --git a/docs/setup.md b/docs/setup.md
@@ -0,0 +1,81 @@
+# TiDB Operator Setup
+
+## Requirements
+
+Before deploying the TiDB Operator, make sure the following requirements are satisfied:
+
+* Kubernetes v1.10 or later
+* [DNS addons](https://kubernetes.io/docs/tasks/access-application-cluster/configure-dns-cluster/)
+* [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
+* [RBAC](https://kubernetes.io/docs/admin/authorization/rbac) enabled (optional)
+* [Helm](https://helm.sh) v2.8.2 or later
+
+> **Note:** Though TiDB Operator can use network volume to persist TiDB data, it is highly recommended to set up [local volume](https://kubernetes.io/docs/concepts/storage/volumes/#local) for better performance. Because TiDB already replicates data, network volume will add extra replicas which is redundant.
+
+## Kubernetes
+
+TiDB Operator runs on top of Kubernetes cluster, you can use one of the methods listed [here](https://kubernetes.io/docs/setup/pick-right-solution/) to set up a Kubernetes cluster. Just make sure the Kubernetes cluster version is equal or greater than v1.10. If you want to use AWS, GKE or local machine, there are quick start tutorials:
+
+* [Local DinD tutorial](./local-dind-tutorial.md)
+* [Google GKE tutorial](./google-kubernetes-tutorial.md)
+* [AWS EKS tutorial](./aws-eks-tutorial.md)
+
+If you want to use a different envirnoment, a proper DNS addon must be installed in the Kubernetes cluster. You can follow the [official documentation](https://kubernetes.io/docs/tasks/access-application-cluster/configure-dns-cluster/) to set up a DNS addon.
+
+TiDB Operator uses [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) to persist TiDB cluster data (including the database, monitor data, backup data), so the Kubernetes must provide at least one kind of persistent volume. To achieve better performance, local SSD disk persistent volume is recommended. You can follow [this step](#local-persistent-volume) to auto provisioning local persistent volumes.
+
+The Kubernetes cluster is suggested to enable [RBAC](https://kubernetes.io/docs/admin/authorization/rbac). Otherwise you may want to set `rbac.create` to `false` in the values.yaml of both tidb-operator and tidb-cluster charts.
+
+Because TiDB by default will use at most 40960 file descriptors, the [worker node](https://access.redhat.com/solutions/61334) and its [Docker daemon's](https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings) ulimit must be configured to greater than 40960. Otherwise you have to change TiKV's `max-open-files` to match your work node `ulimit -n` in the configuration file `charts/tidb-cluster/templates/config/_tikv-config.tpl`, but this will impact TiDB performance.
+
+## Helm
+
+You can follow Helm official [documentation](https://helm.sh) to install Helm in your Kubernetes cluster. The following instructions are listed here for quick reference:
+
+1. Install helm client
+
+    ```
+    $ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
+    ```
+
+    Or if you use macOS, you can use homebrew to install Helm by `brew install kubernetes-helm`
+
+2. Install helm server
+
+    ```shell
+    $ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/tiller-rbac.yaml
+    $ helm init --service-account=tiller --upgrade
+    $ kubectl get po -n kube-system -l name=tiller # make sure tiller pod is running
+    ```
+
+    If `RBAC` is not enabled for the Kubernetes cluster, then `helm init --upgrade` should be enough.
+
+## Local Persistent Volume
+
+Local disks are recommended to be formatted as ext4 filesystem.
+
+Mount local ssd disks of your Kubernetes nodes at subdirectory of /mnt/disks. For example if your data disk is `/dev/nvme0n1`, you can format and mount with the following commands:
+
+```shell
+$ sudo mkdir -p /mnt/disks/disk0
+$ sudo mkfs.ext4 /dev/nvme0n1
+$ sudo mount -t ext4 -o nodelalloc /dev/nvme0n1 /mnt/disks/disk0
+```
+
+To auto-mount disks when your operating system is booted, you should edit `/etc/fstab` to include these mounting info.
+
+After mounting all data disks on Kubernetes nodes, you can deploy [local-volume-provisioner](https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume) to automatically provision the mounted disks as Local PersistentVolumes.
+
+```shell
+$ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/local-dind/local-volume-provisioner.yaml
+$ kubectl get po -n kube-system -l app=local-volume-provisioner
+$ kubectl get pv | grep local-storage
+```
+
+## Install TiDB Operator
+
+```shell
+$ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/crd.yaml
+$ helm install charts/tidb-operator --name=tidb-operator --namespace=tidb-admin
+$ kubectl get po -n tidb-admin -l app.kubernetes.io/name=tidb-operator
+```
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
@@ -0,0 +1,26 @@
+# Troubleshooting
+
+## Some pods are pending for a long time
+
+When a pod is pending, it means the required resources are not satisfied. The most common cases are:
+
+* CPU, memory or storage insufficient
+
+  Check the detail info of the pod by:
+
+  ```shell
+  $ kubectl describe po -n <ns> <pod-name>
+  ```
+
+  When this happens, either reduce the resource requests of the TiDB cluster and then using `helm` to upgrade the cluster. If the storage request is larger than any of the available volumes, you have to delete the pod and corresponding pending PVC.
+
+* Storage class not exist or no PV available
+
+  You can check this by:
+
+  ```shell
+  $ kubectl get pvc -n <ns>
+  $ kubectl get pv | grep <storage-class-name> | grep Available
+  ```
+
+  When this happens, you can change the `storageClassName` and then using `helm` to upgrade the cluster. After that, delete the pending pods and the corresponding pending PVC and waiting new pod and pvc to be created.
diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -0,0 +1,13 @@
+# TiDB Operator User Guide
+
+For quick start, please reference one of the following tutorials:
+
+* [Local DinD tutorial](./local-dind-tutorial.md)
+* [Google GKE tutorial](./google-kubernetes-tutorial.md)
+* [AWS EKS tutorial](./aws-eks-tutorial.md)
+
+If you are already familiar with [Kubernetes](https://kubernetes.io) and [TiDB](https://pingcap.com/docs), the following docs can be helpful for managing TiDB clusters with TiDB Operator
+
+* [TiDB Operator Setup](./setup.md)
+* [TiDB Cluster Operation Guide](./operation-guide.md)
+* [Troubleshooting](./troubleshooting.md)