### Start the Ray cluster - NVIDIA GPUs

> **Note**: Follow these instructions only if you are running this experiment on a node with NVIDIA GPUs.

For the Ray experiment, you must use a node with two GPUs. Run

``` bash
# run on node-mltrain
nvidia-smi
```

and confirm that you see two GPUs.

We’ll bring up our Ray cluster with Docker Compose. Run:

``` bash
# run on node-mltrain
export HOST_IP=$(curl --silent http://169.254.169.254/latest/meta-data/public-ipv4 )
docker compose -f LLM_LegalDocSummarization/docker/docker-compose-ray-cuda.yaml up -d
```

You can see this Docker Compose YAML here: [docker-compose-ray-cuda.yaml](https://github.com/teaching-on-testbeds/mltrain-chi/blob/main/docker/docker-compose-ray-cuda.yaml).

When it is finished, the output of

``` bash
# run on node-mltrain
docker ps
```

should show that the `ray-head`, `ray-worker-0`, and `ray-worker-1` containers are running.

Although the host has 2 GPUs, we only passed one to each worker. Run

``` bash
# run on node-mltrain
docker exec -it ray-worker-0 nvidia-smi --list-gpus
```

and

``` bash
# run on node-mltrain
docker exec -it ray-worker-1 nvidia-smi --list-gpus
```

and confirm that only one GPU appears in the output, and it is a different GPU (different UUID) in each.

### Start a Jupyter container

Next, let’s start a Jupyter notebook container that does *not* have any GPUs attached. We’ll use this container to submit jobs to the Ray cluster.

``` bash
# run on node-mltrain
docker build -t jupyter-ray -f LLM_LegalDocSummarization/docker/Dockerfile.jupyter-ray .
```

Run

``` bash
# run on node-mltrain
HOST_IP=$(curl --silent http://169.254.169.254/latest/meta-data/public-ipv4 )
docker run  -d --rm  -p 8888:8888 \
    -v ~/LLM_LegalDocSummarization:/home/jovyan/work/ \
    -e RAY_ADDRESS=http://${HOST_IP}:8265/ \
    -e MERGED_DATA_DIR=/mnt/LLMData \
    --mount type=bind,source=/mnt/LLMData,target=/mnt/LLMData,readonly \
    --name jupyter \
    jupyter-ray
```

Then, run

``` bash
# run on node-mltrain
docker logs jupyter
```

and look for a line like

    http://127.0.0.1:8888/lab?token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Paste this into a browser tab, but in place of `127.0.0.1`, substitute the floating IP assigned to your instance, to open the Jupyter notebook interface.

In the file browser on the left side, open the `work` directory.

Open a terminal (“File \> New \> Terminal”) inside the Jupyter server environment, and in this terminal, run

``` bash
# runs on jupyter container inside node-mltrain
env
```

to see environment variables. Confirm that the `RAY_ADDRESS` is set, with the correct floating IP address.

### Access Ray cluster dashboard

The Ray head node serves a dashboard on port 8265. In a browser, open

    http://A.B.C.D:8265

where in place of `A.B.C.D`, substitute the floating IP associated with your server.

Click on the “Cluster” tab and verify that you see your head node and two worker nodes.