# DCASE 2022 VQ-VAE + PixelSNAIL â€” Training on Colab

Runs the project training pipeline on Google Colab with **longer epochs**.

**Setup:** Mount Google Drive, then clone this repo into a folder under `/content/drive/MyDrive/`. The Colab kernel needs the project on the VM (Drive mount or clone); cloning into Drive keeps everything in one place and persists. Then set `DATA_ROOT` in the config cell (e.g. same Drive path for the dataset).

## 1. Mount Google Drive
Required so we can clone the project under `/content/drive/MyDrive/` and (optionally) keep dataset and checkpoints on Drive.

In [10]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 3. Project and dependencies

Uses `PROJECT_ROOT` from the clone path above, adds it to `sys.path`, and installs deps so `src` can be imported.

## 2. Clone project into Drive
Clone the repo under your Drive so the Colab kernel has the full project (including `src/`). If the folder already exists, we pull the latest; otherwise we clone. Change `REPO_DIR` if you want a different path.

In [17]:
import os

# Clone into this folder under your Drive (My Drive = /content/drive/MyDrive)
REPO_DIR = "/content/drive/MyDrive/semcom_asd_vqar"
REPO_URL = "https://github.com/raidantimosquitos/semcom_asd_vqar.git"

if os.path.isdir(REPO_DIR):
    !cd "{REPO_DIR}" && git pull
else:
    os.makedirs(os.path.dirname(REPO_DIR), exist_ok=True)
    !git clone {REPO_URL} {REPO_DIR}

Cloning into '/content/drive/MyDrive/semcom_asd_vqar'...
remote: Enumerating objects: 29, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (27/27), done.[K
remote: Total 29 (delta 1), reused 29 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (29/29), 22.54 KiB | 607.00 KiB/s, done.
Resolving deltas: 100% (1/1), done.


In [18]:
import sys
import os

# Use the clone path from the cell above (or set manually if you cloned elsewhere)
try:
    PROJECT_ROOT = REPO_DIR
except NameError:
    PROJECT_ROOT = "/content/drive/MyDrive/semcom_asd_vqar"

if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)
os.chdir(PROJECT_ROOT)

# Install dependencies (Colab usually has torch/numpy/sklearn; add PyYAML if missing)
!pip install -q PyYAML

## 4. Config and paths
Set `DATA_ROOT` to your dataset (e.g. on Drive: `/content/drive/MyDrive/datasets/dcase2020-task2-dev-dataset`). Checkpoints and logs are under the cloned project on Drive.

In [19]:
# Dataset path: on Colab use Drive path after mount, e.g. "/content/drive/MyDrive/datasets/dcase2020-task2-dev-dataset"
DATA_ROOT = "/content/drive/MyDrive/datasets/dcase2020-task2-dev-dataset"

# Checkpoints and logs under the cloned project on Drive
CHECKPOINT_DIR = "./checkpoints"
LOG_DIR = "./logs"

# Overrides: longer epochs (merge into config)
OVERRIDES = {
    "data": {"root_dir": DATA_ROOT},
    "phase1": {
        "num_epochs": 50,
        "checkpoint": f"{CHECKPOINT_DIR}/mobilenetv2_8x_vqvae.pth",
    },
    "phase2": {
        "num_epochs": 80,
        "checkpoint": f"{CHECKPOINT_DIR}/pixelsnail_prior.pth",
    },
    "eval": {
        "vqvae_checkpoint": f"{CHECKPOINT_DIR}/mobilenetv2_8x_vqvae.pth",
        "prior_checkpoint": f"{CHECKPOINT_DIR}/pixelsnail_prior.pth",
    },
    "logging": {"log_dir": LOG_DIR},
}

os.makedirs(CHECKPOINT_DIR, exist_ok=True)
os.makedirs(LOG_DIR, exist_ok=True)

## 5. Run training
Uses `configs/colab.yaml` as base and applies the overrides above. All logs go to the logger (console + file).

In [20]:
from src.main import run

run(
    config_path=os.path.join(PROJECT_ROOT, "configs", "colab.yaml"),
    overrides=OVERRIDES,
    mode="train",
    log_dir=LOG_DIR,
)

ModuleNotFoundError: No module named 'src.data'