|
| 1 | +# Deploying Multi-Cluster-App-Wrapper Controller |
| 2 | +Follow the instructions below to deploy the Multi-Cluster-App-Wrapper controller in an existing Kubernetes cluster: |
| 3 | + |
| 4 | +## Pre-Reqs |
| 5 | +### - Cluster running Kubernetes v1.10 or higher. |
| 6 | +``` |
| 7 | +kubectl version |
| 8 | +``` |
| 9 | +### - Access to the `kube-system` namespace. |
| 10 | +``` |
| 11 | +kubectl get pods -n kube-system |
| 12 | +``` |
| 13 | +### - Install the Helm Package Manager |
| 14 | +Install the Helm Client on your local machine and the Helm Cerver on your kubernetes cluster. Helm installation documentation is [here] |
| 15 | +(https://docs.helm.sh/using_helm/#installing-helm). After you install Helm you can list the Help packages installed with the following command: |
| 16 | +``` |
| 17 | +helm list |
| 18 | +``` |
| 19 | + |
| 20 | +### Determine if the cluster has enough resources for installing the Helm chart for the Multi-Cluster-App-Dispatcher. |
| 21 | + |
| 22 | +The default memory resource demand for the multi-cluster-app-dispatcher controller is `2G`. If your cluster is a small installation such as MiniKube you will want to adjust the Helm installation resource requests accordingly. |
| 23 | + |
| 24 | + |
| 25 | +To list available compute nodes on your cluster enter the following command: |
| 26 | +``` |
| 27 | +kubectl get nodes |
| 28 | +``` |
| 29 | +For example: |
| 30 | +``` |
| 31 | +$ kubectl get nodes |
| 32 | + NAME STATUS ROLES AGE VERSION |
| 33 | + minikube Ready master 91d v1.10.0 |
| 34 | +``` |
| 35 | + |
| 36 | +To find out the available resources in you cluster inspect each node from the command output above with the following command: |
| 37 | +``` |
| 38 | +$ kubectl describe node <node_name> |
| 39 | +``` |
| 40 | +For example: |
| 41 | +``` |
| 42 | +$ kubectl describe node minikube |
| 43 | +... |
| 44 | +Name: minikube |
| 45 | +Roles: master |
| 46 | +Labels: beta.kubernetes.io/arch=amd64 |
| 47 | + beta.kubernetes.io/os=linux |
| 48 | +... |
| 49 | +Capacity: |
| 50 | + cpu: 2 |
| 51 | + ephemeral-storage: 16888216Ki |
| 52 | + hugepages-2Mi: 0 |
| 53 | + memory: 2038624Ki |
| 54 | + pods: 110 |
| 55 | +Allocatable: |
| 56 | + cpu: 2 |
| 57 | + ephemeral-storage: 15564179840 |
| 58 | + hugepages-2Mi: 0 |
| 59 | + memory: 1936224Ki |
| 60 | + pods: 110 |
| 61 | +... |
| 62 | +Allocated resources: |
| 63 | + (Total limits may be over 100 percent, i.e., overcommitted.) |
| 64 | + Resource Requests Limits |
| 65 | + -------- -------- ------ |
| 66 | + cpu 1915m (95%) 1 (50%) |
| 67 | + memory 1254Mi (66%) 1364Mi (72%) |
| 68 | +Events: <none> |
| 69 | +
|
| 70 | +``` |
| 71 | +In the example above, there is only one node (`minikube`) in the cluster with the majority of the cluster memory used (`1,254Mi` used out of `1,936Mi` allocatable capacity) leaving less than `700Mi` available capacity for new pod deployments in the cluster. Since the default memory demand for the Enhanced QueueuJob Controller pod is `2G` the cluster has insufficient memory to deploy the controller. Instruction notes provided below show how to override the defaults according to the available capacity in your cluster. |
| 72 | + |
| 73 | +## Installation Instructions |
| 74 | +### 1. Download the github project. |
| 75 | +Download this github project to your local machine. |
| 76 | +``` |
| 77 | +git clone -b queuejob-dispatcher --single-branch git@github.ibm.com:ARMS/extended-queuejob.git |
| 78 | +``` |
| 79 | +### 2. Navigate to the Helm deployment directory. |
| 80 | +``` |
| 81 | +cd extended-queuejob/contrib/DLaaS/deployment |
| 82 | +``` |
| 83 | + |
| 84 | +### 3. Run the installation using Helm. |
| 85 | +Install the Multi-Cluster-App-Dispatcher Controller using the commands below. The `--wait` parameter in the Helm command below is used to ensure all pods of the helm chart are running and will not return unless the default timeout expires (*typically 300 seconds*) or all the pods are in `Running` state. |
| 86 | + |
| 87 | + |
| 88 | +Before submitting the command below you should ensure you have enough resources in your cluster to deploy the helm chart (*see Pre-Reqs section above*). If you do not have enough compute resources in your cluster you can adjust the resource request via the command line. See an example in the `Note` below. |
| 89 | + |
| 90 | +All Helm parameters and described in the table below. |
| 91 | +#### 3.a Start the Multi-Cluster-App-Dispatcher Controller on All Target Deployment Clusters (*Agent Mode*). |
| 92 | +__Agent Mode__: Install and set up the Multi-Cluster-App-Dispatcher Controler (XQJ) in *Agent Mode* for each clusters that will orchestrate the resources defined within an XQJ using Helm. *Agent Mode* is the default mode when deploying the XQJ controller. |
| 93 | +``` |
| 94 | +helm install kube-arbitrator --namespace kube-system --wait --set image.repository=<image repository and name> --set image.tag=<image tag> --set imagePullSecret.name=<Name of image pull kubernetes secret> --set imagePullSecret.password=<REPLACE_WITH_REGISTRY_TOKEN_GENERATED_IN_PREREQs_STAGE1_REGISTRY.d)> --set localConfigName=<Local Kubernetes Config File for Current Cluster> --set volumes.hostPath=<Host_Path_location_of_local_Kubernetes_config_file> |
| 95 | +``` |
| 96 | + |
| 97 | +For example (*Assuming the default for `image.repository`, `image.tag`*): |
| 98 | +``` |
| 99 | +helm install kube-arbitrator --namespace kube-system |
| 100 | +``` |
| 101 | +or |
| 102 | +``` |
| 103 | +helm install kube-arbitrator --namespace kube-system --wait --set imagePullSecret.name=extended-queuejob-controller-registry-secret --set imagePullSecret.password=eyJhbGc...y8gJNcpnipUu0 --set image.pullPolicy=Always --set localConfigName=config_110 --set volumes.hostPath=/etc/kubernetes |
| 104 | +``` |
| 105 | +NOTE: You can adjust the cpu and memory demands of the deployment with command line overrides. For example: |
| 106 | + |
| 107 | +``` |
| 108 | +helm install kube-arbitrator --namespace kube-system --wait -set resources.requests.cpu=1000m --set resources.requests.memory=1024Mi --set resources.limits.cpu=1000m --set resources.limits.memory=1024Mi --set image.repository=k8s-spark-mcm-dispatcher-master-1:8443/xqueuejob-controller --set image.tag=v1.11 --set image.pullPolicy=Always |
| 109 | +``` |
| 110 | +#### 3.b Start the Multi-Cluster-App-Dispatcher Controller on the Controller Cluster (*Dispatcher Mode*). |
| 111 | +_Dispatcher Mode__: Install and set up the Multi-Cluster-App-Dispatcher Controler (XQJ) in *Dispatcher Mode* for the control cluster that will dispatch the XQJ to an *Agent* cluster using Helm. |
| 112 | + |
| 113 | + |
| 114 | +__Dispatcher Mode__: Installing the Multi-Cluster-App-Dispatcher Controler in *Dispatcher Mode*. |
| 115 | +``` |
| 116 | +helm install kube-arbitrator --namespace kube-system --wait --set image.repository=<image repository and name> --set image.tag=<image tag> --set configMap.name=<Config> --set configMap.dispatcherMode='"true"' --set configMap.agentConfigs=agent101config:uncordon --set volumes.hostPath=<Host_Path_location_of_all_agent_Kubernetes_config_files> |
| 117 | +``` |
| 118 | + |
| 119 | +For example: |
| 120 | +``` |
| 121 | +helm install kube-arbitrator --namespace kube-system --wait --set image.repository=tonghoon --set image.tag=both --set configMap.name=xqj-deployer --set configMap.dispatcherMode='"true"' --set configMap.agentConfigs=agent101config:uncordon --set volumes.hostPath=/etc/kubernetes |
| 122 | +``` |
| 123 | +### Chart configuration |
| 124 | + |
| 125 | +The following table lists the configurable parameters of the helm chart and their default values. |
| 126 | + |
| 127 | +| Parameter | Description | Default | Sample values | |
| 128 | +| ----------------------- | ------------------------------------ | ------------- | ------------------------------------------------ | |
| 129 | +| `configMap.agentConfigs` | *For Every Agent Cluster separated by commas(,):* Name of *agent* config file _:_ Set the dispatching mode for the _*Agent Cluster*_. Note:For the dispatching mode `uncordon`, indicating XQJ controller is allowed to dispatched jobs to the _*Agent Cluster*_, is only supported. | <_No default for agent config file_>:`uncordon` | `agent101config:uncordon,agent110config:uncordon` | |
| 130 | +| `configMap.dispatcherMode` | Whether the XQJ Controller should be launched in Dispatcher mode or not | `false` | `true` | |
| 131 | +| `configMap.name` | Name of the Kubernetes *ConfigMap* resource to configure the Enhance QueueJob Controller | | `xqj-deployer` | |
| 132 | +| `deploymentName` | Name of XQJ Controller Deployment Object | `xqueuejob-controller` | `my-xqj-controller` | |
| 133 | +| `image.pullPolicy` | Policy that dictates when the specified image is pulled | `Always` | `Never` | |
| 134 | +| `imagePullSecret.name` | Kubernetes secret name to store password for image registry | | `extended-queuejob-controller-registry-secret` | |
| 135 | +| `imagePullSecret.password` | Image registry pull secret password | | `eyJhbGc...y8gJNcpnipUu0` | |
| 136 | +| `imagePullSecret.username` | Image registry pull user name | `iamapikey` | `token` | |
| 137 | +| `image.repository` | Name of repository containing XQueueJob Controller image | `registry.stage1.ng.bluemix.net/ibm/kube-arbitrator` | `my-repository` | |
| 138 | +| `image.tag` | Tag of desired image within repository | `latest` | `my-image` | |
| 139 | +| `namespace` | Namespace in which XQJ Controller Deployment is created | `kube-system` | `my-namespace` | |
| 140 | +| `nodeSelector.hostname` | Host Name field for XQJ Controller Pod Node Selector | | `example-host` | |
| 141 | +| `replicaCount` | Number of replicas of XQJ Controller Deployment | 1 | 2 | |
| 142 | +| `resources.limits.cpu` | CPU Limit for XQJ Controller Deployment | `2000m` | `1000m` | |
| 143 | +| `resources.limits.memory` | Memory Limit for XQJ Controller Deployment | `2048Mi` | `1024Mi` | |
| 144 | +| `resources.requests.cpu` | CPU Request for XQJ Controller Deployment (must be less than CPU Limit) | `2000m` | `1000m` | |
| 145 | +| `resources.requests.memory` | Memory Request for XQJ Controller Deployment (must be less than Memory Limit) | `2048Mi` | `1024Mi` | |
| 146 | +| `serviceAccount` | Name of service account of XQJ Controller | `xqueuejob-controller` | `my-service-account` | |
| 147 | +| `volumes.hostPath` | Full path on the host location where the `localConfigName` file is stored | | `/etc/kubernetes` | |
| 148 | + |
| 149 | + |
| 150 | +### 5. Verify the installation. |
| 151 | +List the Helm installation. The `STATUS` should be `DEPLOYED`. |
| 152 | + |
| 153 | +NOTE: The `--wait` parameter in the helm installation command from *step #3* above ensures all resources are deployed and running if the `STATUS` indicates `DEPLOYED`. Installing the Helm Chart without the `--wait` parameter does not ensure all resources are successfully running but may still show a `Status` of `Deployed`. |
| 154 | + |
| 155 | +The `STATUS` value of `FAILED` indicates all resources were not created and running before the timeout occurred. Usually this indicates a pod creation failure is due to insufficient resources to create the Multi-Cluster-App-Dispatcher Controller pod. Example instructions on how to adjust the resources requested for the Helm chart are described in the `NOTE` comment of *step #4* above. |
| 156 | +``` |
| 157 | +$ helm list |
| 158 | +NAME REVISION UPDATED STATUS CHART NAMESPACE |
| 159 | +opinionated-antelope 1 Mon Jan 21 00:52:39 2019 DEPLOYED kube-arbitrator-0.1.0 kube-system |
| 160 | +
|
| 161 | +``` |
| 162 | + |
| 163 | +Ensure the new resource but listing the Extended QueueJobs. |
| 164 | +```bash |
| 165 | +kubectl get xqueuejobs |
| 166 | +``` |
| 167 | + |
| 168 | +Since no `xqueuejobs` have been deploy yet to your cluster you should receive a message indicating `No resources found.` for `xqueuejobs` but your cluster now has `xqueuejobs` enabled. Use the [tutorial](../doc/usage/tutorial.md) to deploy an example `xqueuejob`. |
| 169 | + |
| 170 | +### 6. Remove the Multi-Cluster-App-Dispatcher Controller from your cluster. |
| 171 | + |
| 172 | +List the deployed Helm charts and identify the name of the Multi-Cluster-App-Dispatcher Controller installation. |
| 173 | +```bash |
| 174 | +helm list |
| 175 | +``` |
| 176 | +For Example |
| 177 | +``` |
| 178 | +$ helm list |
| 179 | +NAME REVISION UPDATED STATUS CHART NAMESPACE |
| 180 | +opinionated-antelope 1 Mon Jan 21 00:52:39 2019 DEPLOYED kube-arbitrator-0.1.0 kube-system |
| 181 | +
|
| 182 | +``` |
| 183 | +Delete the Helm deployment. |
| 184 | +``` |
| 185 | +helm delete <deployment_name> |
| 186 | +``` |
| 187 | +For example: |
| 188 | +```bash |
| 189 | +helm delete opinionated-antelope |
| 190 | +``` |
0 commit comments