# Visual Question Answering Training

This notebook runs the VQA training pipeline on Google Colab with GPU acceleration.

## 1. Setup Environment

First, let's install the required dependencies and clone the repository.

In [None]:
# Clone repository
!git clone https://github.com/YOUR_USERNAME/ML-4.git
%cd ML-4

# Install dependencies
!pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
!pip install transformers timm stable-baselines3 opencv-python pillow sentence-transformers wandb tqdm numpy matplotlib

## 2. Verify GPU Availability

Make sure you've enabled GPU acceleration in Colab (Runtime > Change runtime type > GPU)

In [None]:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

## 3. Setup Data

Upload your data to Google Drive and mount it here.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

# Create symbolic links to your data
!ln -s /content/gdrive/MyDrive/path_to_your_data/visual7w-images ITM_Classifier-baselines/visual7w-images

## 4. Configure WandB

Set up Weights & Biases for experiment tracking.

In [None]:
import wandb
!wandb login

## 5. Run Training

Execute the training script.

In [None]:
!python ITM_Classifier-baselines/train_vqa_colab.py

## 6. Monitor Results

The training progress can be monitored in:
1. The output above
2. The WandB dashboard
3. Saved models will be in your Google Drive under 'ML4_models/'