|
| 1 | +# CodeFlare on OpenShift Container Platform (OCP) |
| 2 | + |
| 3 | +A few installation deployment targets are provided below. |
| 4 | + |
| 5 | +- [Ray Cluster Using Operator on Openshift](#Openshift-Ray-Cluster-Operator) |
| 6 | +- [Ray Cluster on Openshift](#Openshift-Cluster) |
| 7 | +- [Ray Cluster on Openshift for Jupyter](#Jupyter) |
| 8 | + |
| 9 | +## Openshift Ray Cluster Operator |
| 10 | + |
| 11 | +Deploying the [Ray Operator](https://docs.ray.io/en/master/cluster/kubernetes.html?highlight=operator#the-ray-kubernetes-operator) |
| 12 | + |
| 13 | +## Openshift Cluster |
| 14 | + |
| 15 | +### Dispatch Ray Cluster on Openshift |
| 16 | + |
| 17 | +#### Pre-req |
| 18 | +- Access to openshift cluster |
| 19 | +- Python 3.8+ |
| 20 | + |
| 21 | +We recommend installing Python 3.8.7 using |
| 22 | +[pyenv](https://github.com/pyenv/pyenv). |
| 23 | + |
| 24 | +<p> </p> |
| 25 | + |
| 26 | +#### Setup |
| 27 | + |
| 28 | + |
| 29 | +1. Install CodeFlare |
| 30 | + |
| 31 | +Install from PyPI: |
| 32 | +```bash |
| 33 | +pip3 install --upgrade codeflare |
| 34 | +``` |
| 35 | + |
| 36 | +Alternatively, you can also build locally with: |
| 37 | +```shell |
| 38 | +git clone https://github.com/project-codeflare/codeflare.git |
| 39 | +pip3 install --upgrade pip |
| 40 | +pip3 install . |
| 41 | +pip3 install -r requirements.txt |
| 42 | +``` |
| 43 | + |
| 44 | +<p> </p> |
| 45 | + |
| 46 | +2. Create Cluster (https://docs.ray.io/en/master/cluster/cloud.html#kubernetes) |
| 47 | + |
| 48 | + Assuming openshift cluster access from pre-reqs. |
| 49 | + |
| 50 | + a) Create namespace |
| 51 | + |
| 52 | + ``` |
| 53 | + $ oc create namespace codefalre |
| 54 | + namespace/codeflare created |
| 55 | + $ |
| 56 | + ``` |
| 57 | + |
| 58 | + b) Bring up Ray cluster |
| 59 | + |
| 60 | + ``` |
| 61 | + $ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml |
| 62 | + Cluster: default |
| 63 | + |
| 64 | + Checking Kubernetes environment settings |
| 65 | + 2021-02-09 06:40:09,612 INFO config.py:169 -- KubernetesNodeProvider: using existing namespace 'ray' |
| 66 | + 2021-02-09 06:40:09,671 INFO config.py:202 -- KubernetesNodeProvider: autoscaler_service_account 'autoscaler' not found, attempting to create it |
| 67 | + 2021-02-09 06:40:09,738 INFO config.py:204 -- KubernetesNodeProvider: successfully created autoscaler_service_account 'autoscaler' |
| 68 | + 2021-02-09 06:40:10,196 INFO config.py:228 -- KubernetesNodeProvider: autoscaler_role 'autoscaler' not found, attempting to create it |
| 69 | + 2021-02-09 06:40:10,265 INFO config.py:230 -- KubernetesNodeProvider: successfully created autoscaler_role 'autoscaler' |
| 70 | + 2021-02-09 06:40:10,573 INFO config.py:261 -- KubernetesNodeProvider: autoscaler_role_binding 'autoscaler' not found, attempting to create it |
| 71 | + 2021-02-09 06:40:10,646 INFO config.py:263 -- KubernetesNodeProvider: successfully created autoscaler_role_binding 'autoscaler' |
| 72 | + 2021-02-09 06:40:10,704 INFO config.py:294 -- KubernetesNodeProvider: service 'ray-head' not found, attempting to create it |
| 73 | + 2021-02-09 06:40:10,788 INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-head' |
| 74 | + 2021-02-09 06:40:11,098 INFO config.py:294 -- KubernetesNodeProvider: service 'ray-workers' not found, attempting to create it |
| 75 | + 2021-02-09 06:40:11,185 INFO config.py:296 -- KubernetesNodeProvider: successfully created service 'ray-workers' |
| 76 | + No head node found. Launching a new cluster. Confirm [y/N]: y |
| 77 | + |
| 78 | + Acquiring an up-to-date head node |
| 79 | + 2021-02-09 06:40:14,396 INFO node_provider.py:113 -- KubernetesNodeProvider: calling create_namespaced_pod (count=1). |
| 80 | + Launched a new head node |
| 81 | + Fetching the new head node |
| 82 | + |
| 83 | + <1/1> Setting up head node |
| 84 | + Prepared bootstrap config |
| 85 | + New status: waiting-for-ssh |
| 86 | + [1/7] Waiting for SSH to become available |
| 87 | + Running `uptime` as a test. |
| 88 | + 2021-02-09 06:40:15,296 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)' |
| 89 | + error: unable to upgrade connection: container not found ("ray-node") |
| 90 | + SSH still not available (Exit Status 1): kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds. |
| 91 | + 2021-02-09 06:40:22,197 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)' |
| 92 | + |
| 93 | + 03:41:41 up 81 days, 14:25, 0 users, load average: 1.42, 0.87, 0.63 |
| 94 | + Success. |
| 95 | + Updating cluster configuration. [hash=16487b5e0285fc46d5f1fd6da0370b2f489a6e5f] |
| 96 | + New status: syncing-files |
| 97 | + [2/7] Processing file mounts |
| 98 | + 2021-02-09 06:41:42,330 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p ~)' |
| 99 | + [3/7] No worker file mounts to sync |
| 100 | + New status: setting-up |
| 101 | + [4/7] No initialization commands to run. |
| 102 | + [5/7] Initalizing command runner |
| 103 | + [6/7] No setup commands to run. |
| 104 | + [7/7] Starting the Ray runtime |
| 105 | + 2021-02-09 06:42:10,643 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ray stop)' |
| 106 | + Did not find any active Ray processes. |
| 107 | + 2021-02-09 06:42:13,845 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"GPU":0}'"'"';ulimit -n 65536; ray start --head --num-cpus=$MY_CPU_REQUEST --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)' |
| 108 | + Local node IP: 172.30.236.163 |
| 109 | + 2021-02-09 03:42:17,373 INFO services.py:1195 -- View the Ray dashboard at http://172.30.236.163:8265 |
| 110 | + |
| 111 | + -------------------- |
| 112 | + Ray runtime started. |
| 113 | + -------------------- |
| 114 | + |
| 115 | + Next steps |
| 116 | + To connect to this Ray runtime from another node, run |
| 117 | + ray start --address='172.30.236.163:6379' --redis-password='5241590000000000' |
| 118 | + |
| 119 | + Alternatively, use the following Python code: |
| 120 | + import ray |
| 121 | + ray.init(address='auto', _redis_password='5241590000000000') |
| 122 | + |
| 123 | + If connection fails, check your firewall settings and network configuration. |
| 124 | + |
| 125 | + To terminate the Ray runtime, run |
| 126 | + ray stop |
| 127 | + New status: up-to-date |
| 128 | + |
| 129 | + Useful commands |
| 130 | + Monitor autoscaling with |
| 131 | + ray exec /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*' |
| 132 | + Connect to a terminal on the cluster head: |
| 133 | + ray attach /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml |
| 134 | + Get a remote shell to the cluster manually: |
| 135 | + kubectl -n ray exec -it ray-head-ql46b -- bash |
| 136 | + ``` |
| 137 | +<p> </p> |
| 138 | + |
| 139 | +3. Verify |
| 140 | + a) Check for head node |
| 141 | + |
| 142 | + ``` |
| 143 | + $ oc get pods |
| 144 | + NAME READY STATUS RESTARTS AGE |
| 145 | + ray-head-ql46b 1/1 Running 0 118m |
| 146 | + $ |
| 147 | + ``` |
| 148 | + b) Run example test |
| 149 | + |
| 150 | + ``` |
| 151 | + ray submit python/ray/autoscaler/kubernetes/example-full.yaml x.py |
| 152 | + Loaded cached provider configuration |
| 153 | + If you experience issues with the cloud provider, try re-running the command with --no-config-cache. |
| 154 | + 2021-02-09 08:50:51,028 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/x.py)' |
| 155 | + 2021-02-09 05:52:10,538 INFO worker.py:655 -- Connecting to existing Ray cluster at address: 172.30.236.163:6379 |
| 156 | + [0, 1, 4, 9] |
| 157 | + ``` |
| 158 | +
|
| 159 | +### Jupyter |
| 160 | +
|
| 161 | +Jupyter setup demo [Reference repository](https://github.com/erikerlandson/ray-odh-demo) |
| 162 | +
|
| 163 | +### Running examples |
| 164 | +
|
| 165 | +Once in a Jupyer envrionment, refer to [notebooks](../../notebooks) for example pipeline. Documentation for reference use cases can be found in [Examples](https://codeflare.readthedocs.io/en/latest/). |
0 commit comments