## Setup Google Colab

Clone the Repository and set the working directory to the clarity-models folder.

In [None]:
!git clone https://github.com/omarelbeltagy/clarity

Now select the runtime type to GPU:
1. Click on "Runtime" in the top menu.
2. Select "Change runtime type".
3. In the popup window, choose "GPU" from the "Hardware accelerator" dropdown menu.
4. Click "Save".

## Set Working Directory

Set the current working directory to the clarity-models folder.

In [None]:
%cd /content/clarity/clarity-models

## Get Dependencies

Install the required packages from the requirements file.

In [None]:
!pip uninstall -y torch torchvision torchaudio tensorflow jax jaxlib cupy-cuda12x cudf-cu12 dask-cudf-cu12 pylibcudf-cu12
!pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0
!pip install transformers[torch]==4.57.1 accelerate peft bitsandbytes
!pip install datasets fastapi uvicorn loguru PyYAML sentencepiece tensorboard==2.19.0 pandas==2.2.2 requests==2.32.4 pillow==11.1.0 pydantic==2.11.3 protobuf==5.29.1 numpy==2.1.0

## Check Environment

Print out the versions of Python, PyTorch, and check for GPU availability.

In [None]:
import torch

print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Configure Model

Set up a custom configuration for training a LoRA model based on Facebook's OPT-6.7B

In [None]:
config = """
models:
  - name: "opt-6.7b"
    type: "lora"
    enabled: true

    model_config:
      model_name: "facebook/opt-6.7b"
      use_8bit: true

    training_config:
      max_length: 256
      batch_size: 2
      gradient_accumulation_steps: 8
      learning_rate: 3e-4
      num_epochs: 3

    label_config:
      labels:
        - "Clear Reply"
        - "Clear Non-Reply"
        - "Ambivalent"
"""

# Write config (choose one)
with open('ipynb-config.yaml', 'w') as f:
    f.write(config)

print("Configuration saved to ipynb-config.yaml")

## List available Models

Print out the available models based on the configuration file to verify setup.

In [None]:
!python app.py --config ipynb-config.yaml list

## Train the Model

Start training the model using the specified configuration.

In [None]:
!python app.py --config ipynb-config.yaml train --tensorboard

## Test the Model

Test the trained model with a sample question and answer.

In [None]:
# Single prediction
!python app.py --config ipynb-config.yaml test --question "Will you invite them to the White House?" --answer "We are ready if they are serious."

## Save Model to Drive

Save the trained model to Google Drive for later use.

In [None]:
from google.colab import drive

drive.mount('/content/drive')

# Copy trained model to Drive
!cp -r ./.artifacts /content/drive/MyDrive/clarity-models-trained/

print("Model saved to Google Drive!")

## Download Model

Download the trained model as a zip file to your local machine.

In [None]:
# Create zip of trained model
!zip -r trained_model.zip ./.artifacts/

# Download
from google.colab import files

files.download('trained_model.zip')