# LLM Fine-tuning for Question-Answering on Google Colab

This guide will help you run the LLM fine-tuning pipeline on Google Colab. The pipeline fine-tunes the Llama-3.2-3B-Instruct model for domain-specific question-answering tasks.

## Step 1: Set Up GPU and Install Dependencies

In [None]:
# Check GPU availability
!nvidia-smi

# Install required packages
!pip install -q torch transformers datasets peft huggingface-hub tqdm wandb evaluate rouge-score nlp-metrics bitsandbytes sentencepiece protobuf sacrebleu typing-extensions numpy pandas scikit-learn

## Step 2: Clone the Repository

In [None]:
# Clone the repository
!git clone https://github.com/srivastavanik/model-training.git

%cd model-training

# List the contents
!ls

## Step 3: Set Up Hugging Face Access

In [None]:
from huggingface_hub import login

# You'll need to enter your Hugging Face token here
# Get your token from https://huggingface.co/settings/tokens
token = input('Enter your Hugging Face token: ')
login(token=token)

## Step 4: Run the Fine-tuning Pipeline

Let's run the complete fine-tuning pipeline. This will:
1. Load the preprocessed data
2. Fine-tune the model using LoRA
3. Evaluate the model's performance

Note: This process may take several hours depending on your GPU and dataset size.

In [None]:
# Run the fine-tuning pipeline
!python run_with_gpu.py --use_8bit

## Step 5: Save and Download the Model

After training is complete, you can download the fine-tuned model.

In [None]:
# Create a zip file of the fine-tuned model
!zip -r fine_tuned_model.zip fine_tuned_model/

# Download the zip file
from google.colab import files
files.download('fine_tuned_model.zip')

## Optional: Use WandB for Monitoring

If you want to monitor the training process in real-time, you can use Weights & Biases (WandB).

In [None]:
# Install WandB
!pip install wandb

# Login to WandB
import wandb
wandb.login()

# Run with WandB monitoring
!python run_with_gpu.py --use_8bit --wandb

## Troubleshooting Tips

1. If you run out of GPU memory:
   - Add `--use_8bit` flag to use 8-bit quantization
   - Reduce batch size in config.py

2. If the training takes too long:
   - Consider using a smaller subset of your data
   - Reduce the number of epochs

3. If you encounter CUDA errors:
   - Restart the Colab runtime
   - Ensure you're using a GPU runtime
   - Check if your Hugging Face token is valid

## Next Steps

After training is complete:
1. Download the fine-tuned model
2. Test the model's performance
3. Deploy the model for inference

You can find the evaluation results in the `evaluation_results` directory.