## Create Lease

Follow the guide provided by the professor and reserve a lease.

### Naming Convention

NodeType_project##: `gpu_rtx_6000_project46`

### Available Resource
Bare Metal GPU Resources (4/12 - 5/1+)

- **CHI@UC**
  - **gpu_rtx6000**: RTX6000 GPU (variable number)
    - General model training/inference
  - **compute_gigaio**: A100 80GB GPU (variable number)
    - Large model training/inference
  - **gpu_a100_pcie** (J0BG3Q3 or 307G3Q3): 4× A100 80GB GPUs
    - Reserved 4/12–5/6
    - Distributed training for very large models

- **CHI@TACC**
  - **compute_liqid**: A100 40GB GPU (4 nodes)
    - General training/inference, NVIDIA Triton service
  - **gpu_mi100**: 2× MI100 AMD GPUs (2 nodes)
    - General model training/inference

## Launch and set up Chameleon server - with python-chi

At the beginning of the lease time for your bare metal server, we will bring up our GPU instance. We will use the `python-chi` Python API to Chameleon to provision our server.

We will execute the cells in this notebook inside the **Chameleon Jupyter environment**.

Run the following cell, and make sure the correct project is selected. Also, **change the site to CHI@TACC or CHI@UC**, depending on where your reservation is, and **edit your lease name**.

In [None]:
CHAMELEON_SITE="CHI@UC"  # EDIT THIS
LEASE_NAME="compute_gigaio_project46"  # EDIT THIS

GITHUB_USERNAME=""
GITHUB_EMAIL=""

In [None]:
from chi import server, context, lease
import os

context.version = "1.0" 
context.choose_project()
context.choose_site(default=CHAMELEON_SITE)

In [None]:
if not CHAMELEON_SITE == "KVM@TACC":
    l = lease.get_lease(f"{LEASE_NAME}")
    l.show()

The status should show as “ACTIVE” now that we are past the lease start time.

We will use the lease to bring up a server with the `CC-Ubuntu24.04-CUDA` disk image.

> **Note**: the following cell brings up a server only if you don’t already have one with the same name! (Regardless of its error state.) If you have a server in ERROR state already, delete it first in the Horizon GUI before you run this cell.

In [None]:
username = os.getenv('USER') # all exp resources will have this prefix

if CHAMELEON_SITE == "KVM@TACC":
    s = server.Server(
        f"node-{LEASE_NAME}-{username}",
        image_name="CC-Ubuntu24.04",
        flavor_name="m1.xlarge"
    )
    s.submit(idempotent=True)
else:
    s = server.Server(
        f"node-{LEASE_NAME}-{username}",
        reservation_id=l.node_reservations[0]["id"],
        image_name="CC-Ubuntu24.04-CUDA"
    )
    s.submit(idempotent=True)



Note: security groups are not used at Chameleon bare metal sites, so we do not have to configure any security groups on this instance.

Then, we’ll associate a floating IP with the instance, so that we can access it over SSH.

In [None]:
s.associate_floating_ip()

In [None]:
s.refresh()
s.check_connectivity()

In the output below, make a note of the floating IP that has been assigned to your instance (in the “Addresses” row).

In [None]:
s.refresh()
s.show(type="widget")

### Set up SSH Key and Hugging Face Token from your local terminal

#### (One time) Prepare Your SSH Key (on Your Local Machine) if you have not.

Before interacting with the remote server through Jupyter Notebook, it is recommended to prepare an SSH key locally, instead of generating it inside the Notebook, to keep your private key secure.

Step-by-Step:
1.	Generate an SSH key locally (replace the filename with your preferred name, e.g., id_ed25519_chameleon_git):

    ssh-keygen -t ed25519 -C "your_email@example.com" -f ~/.ssh/id_ed25519_chameleon_git

- This command will generate two files:
- Private key: ~/.ssh/id_ed25519_chameleon_git
- Public key: ~/.ssh/id_ed25519_chameleon_git.pub

2.	Add the public key to your GitHub account:

- Go to GitHub → Settings → SSH and GPG keys → New SSH key.
- Copy the contents of id_ed25519_chameleon_git.pub and paste it there.


#### (Optional) Set Hugging Face Token as a Local Environment Variable

If you want the Notebook to automatically access your Hugging Face account without manually pasting your token each time, you can set the HUGGINGFACE_TOKEN as a local environment variable on your machine.

Step-by-Step:

**For macOS or Linux (Bash or Zsh)**
1. Open your terminal.
2. Determine your shell by running:

    echo $SHELL

   - If you see `/bin/bash`, you are using Bash.
   - If you see `/bin/zsh`, you are using Zsh.

3. Edit the appropriate shell configuration file:
   - For Bash: `~/.bashrc`
   - For Zsh: `~/.zshrc` or `~/.zshenv`

4. Add the following line at the end of the file (replace YOUR_TOKEN):

    export HUGGINGFACE_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

5. Reload your shell configuration:

    source ~/.bashrc   # for Bash
    
    source ~/.zshrc or `source ~/.zshenv`   # for Zsh

**For Windows (Command Prompt or PowerShell)**
1. Open Command Prompt or PowerShell.
2. Temporarily set the token (will be lost after closing the window):

    set HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

3. To set it permanently (recommended):
   - Open System Properties → Advanced → Environment Variables.
   - Add a new **User variable** named `HUGGINGFACE_TOKEN` with your token as the value.

#### Copy Key, Setup Permission and Config

Edit the following and run in your **local terminal**


```bash
    REMOTE_USER="cc"
    REMOTE_HOST="192.5.86.181"   # *EDIT* floating IP address
    REMOTE_SSH_DIR="~/.ssh"
    KEY_NAME="id_ed25519_chameleon_git" # *EDIT* your key name
    SSH_OPTS=(-i ~/.ssh/id_rsa_chameleon -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null)

    # Your Authentication key
    LOCAL_PRIVATE_KEY="$HOME/.ssh/${KEY_NAME}"
    LOCAL_PUBLIC_KEY="$HOME/.ssh/${KEY_NAME}.pub"

    # Your Hugging Face token
    HUGGINGFACE_TOKEN=${HUGGINGFACE_TOKEN}

    # Precheck: Verify SSH connection
    echo "Checking SSH connection to ${REMOTE_USER}@${REMOTE_HOST}..."
    ssh "${SSH_OPTS[@]}" ${REMOTE_USER}@${REMOTE_HOST} "echo 'SSH connection successful.'" || { echo "Error: SSH connection failed. Aborting."; return 1; }

    # Step 1: mkdir + chmod .ssh
    echo "Step 1: Creating .ssh directory and setting permission on remote..."
    ssh "${SSH_OPTS[@]}" ${REMOTE_USER}@${REMOTE_HOST} "mkdir -p ${REMOTE_SSH_DIR} && chmod 700 ${REMOTE_SSH_DIR}"

    # Step 2: Copy SSH private/public key
    echo "Step 2: Copying SSH private key to remote..."
    scp "${SSH_OPTS[@]}" "$LOCAL_PRIVATE_KEY" ${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_SSH_DIR}/${KEY_NAME}

    echo "Step 2: Copying SSH public key to remote..."
    scp "${SSH_OPTS[@]}" "$LOCAL_PUBLIC_KEY" ${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_SSH_DIR}/${KEY_NAME}.pub

    # Step 3: Setup permission
    echo "Step 3: Setting SSH key permissions on remote..."
    ssh "${SSH_OPTS[@]}" ${REMOTE_USER}@${REMOTE_HOST} "chmod 600 ${REMOTE_SSH_DIR}/${KEY_NAME} && chmod 644 ${REMOTE_SSH_DIR}/${KEY_NAME}.pub"

    # Step 4: Create SSH config
    echo "Step 4: Creating SSH config file for GitHub on remote..."
    ssh "${SSH_OPTS[@]}" ${REMOTE_USER}@${REMOTE_HOST} "echo -e '\
    Host github.com\n\
        HostName github.com\n\
        User git\n\
        IdentityFile ~/.ssh/${KEY_NAME}\n\
        StrictHostKeyChecking no\n\
    ' >> ${REMOTE_SSH_DIR}/config && chmod 600 ${REMOTE_SSH_DIR}/config"

    # Step 5: Export HUGGINGFACE_TOKEN to remote .bashrc (or create .env)
    if [ -z "$HUGGINGFACE_TOKEN" ]; then
      echo "Step 5: HUGGINGFACE_TOKEN not set locally. Skipping token transfer."
    else
      echo "Step 5: Uploading HUGGINGFACE_TOKEN to remote..."
      ssh "${SSH_OPTS[@]}" ${REMOTE_USER}@${REMOTE_HOST} "echo 'export HUGGINGFACE_TOKEN=${HUGGINGFACE_TOKEN}' >> ~/.bashrc"
    fi

    echo "All steps completed successfully."
```


### Retrieve the project on the instance

Now, we can use `python-chi` to execute commands on the instance, to set it up. We’ll start by retrieving the code and other materials on the instance.

In [None]:
s.execute("git clone -b eval-close-loop git@github.com:LawrenceLu0904/Fine-Tuning-Taiwanese-Hokkien-LLM-for-Medical-Advising.git")

# Use https if you did not send git credential
# s.execute("git clone -b eval-close-loop https://github.com/LawrenceLu0904/Fine-Tuning-Taiwanese-Hokkien-LLM-for-Medical-Advising.git")

s.execute(f'git config --global user.name {GITHUB_USERNAME}')
s.execute(f'git config --global user.email {GITHUB_EMAIL}')

### Set up Docker

To use common deep learning frameworks like Tensorflow or PyTorch, and ML training platforms like MLFlow and Ray, we can run containers that have all the prerequisite libraries necessary for these frameworks. Here, we will set up the container framework.

In [None]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")

### Set up the NVIDIA container toolkit

We will also install the NVIDIA container toolkit, with which we can access GPUs from inside our containers.

In [None]:
s.execute("curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list")
s.execute("sudo apt update")
s.execute("sudo apt-get install -y nvidia-container-toolkit")
s.execute("sudo nvidia-ctk runtime configure --runtime=docker")
# for https://github.com/NVIDIA/nvidia-container-toolkit/issues/48
s.execute("sudo jq 'if has(\"exec-opts\") then . else . + {\"exec-opts\": [\"native.cgroupdriver=cgroupfs\"]} end' /etc/docker/daemon.json | sudo tee /etc/docker/daemon.json.tmp > /dev/null && sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json")
s.execute("sudo systemctl restart docker")

### Setup Project Dependencies for inferencing (Optional)

The following steps are just for easier execution and are basically same with Instruction Steps 1 to 5

In [None]:
project_dir = "~/Fine-Tuning-Taiwanese-Hokkien-LLM-for-Medical-Advising"

# Create a Virtual environment: (taigi-env) or any name you prefer
s.execute(f"cd {project_dir} && python3 -m venv taigi-env")
s.execute(f"cd {project_dir} && source taigi-env/bin/activate")

s.execute(f"cd {project_dir} && ./taigi-env/bin/pip -V")


# Install requirement packages in your virtual environment
s.execute(f"cd {project_dir} && ./taigi-env/bin/pip install -r requirements.txt")

# There are some packages that need to be installed manually (follow the instructions below):
s.execute(f"cd {project_dir} && ./taigi-env/bin/pip install sentencepiece --prefer-binary")
s.execute(f"cd {project_dir} && ./taigi-env/bin/pip install datasets")

# Reinstall bitsandbytes to prevent RuntimeError
s.execute(f"cd {project_dir} && ./taigi-env/bin/pip uninstall bitsandbytes -y")
s.execute(f"cd {project_dir} && ./taigi-env/bin/pip install bitsandbytes --no-cache-dir")

### Hugging Face Token Setup

- This script **automatically** checks if the remote server already has `HUGGINGFACE_TOKEN` in `~/.bashrc`.
- If found, it writes the token to `~/.cache/huggingface/token`.
- If not found, it uses the manually provided `manual_token` to run `huggingface-cli login`.

**➔ If your remote server has HUGGINGFACE_TOKEN set, you don't need to change anything.**
**➔ If not, replace `manual_token` with your own Hugging Face token.**

In [None]:
def setup_huggingface_token(server, project_dir, manual_token):
    """
    ### Hugging Face Token Setup

    - This script automatically checks if the remote server has `HUGGINGFACE_TOKEN` in `~/.bashrc`.
    - If found, it writes the token into `~/.cache/huggingface/token`.
    - If not found, it uses the manually provided `manual_token` to run `huggingface-cli login`.

    ➔ If your remote server has HUGGINGFACE_TOKEN set, you don't need to change anything.
    ➔ If not, replace `manual_token` with your own Hugging Face token.
    """

    # Safely try to store token
    result = server.execute('''
if grep -q HUGGINGFACE_TOKEN ~/.bashrc; then
    token=$(grep HUGGINGFACE_TOKEN ~/.bashrc | tail -n 1 | sed 's/.*=//g' | tr -d '"')
    mkdir -p ~/.cache/huggingface
    echo "$token" > ~/.cache/huggingface/token
    echo "[TOKEN STORED]"
else
    echo "[NO TOKEN]"
fi
''')

    remote_status = result.stdout.strip()

    if "[TOKEN STORED]" in remote_status:
        print("✅ A HUGGINGFACE_TOKEN is set on the remote machine; automatically writing it into .cache/huggingface/token.")
    else:
        print("⚠️ No HUGGINGFACE_TOKEN found on the remote machine; falling back to manually provided token for huggingface-cli login.")
        server.execute(f'''
cd {project_dir} && ./taigi-env/bin/huggingface-cli login --token "{manual_token}"
''')

In [None]:
setup_huggingface_token(s, project_dir, "YOUR_huggingface_token")

## Open an SSH session

Finally, open an SSH sesson on your server. From your local terminal, run

    ssh -i ~/.ssh/id_rsa_chameleon cc@A.B.C.D

where

-   in place of `~/.ssh/id_rsa_chameleon`, substitute the path to your own key that you had uploaded to CHI@TACC
-   in place of `A.B.C.D`, use the floating IP address you just associated to your instance.