# ETSP - Run on Google Colab

This notebook sets up the environment to run the project on Google Colab.
It handles dependency installation, data retrieval, and execution.


## 1. Environment Setup

Mount Google Drive and install conda via condacolab.

**Note**: After running this cell, the runtime will restart automatically. This is expected - just continue to the next cell.


In [None]:
%pip install -q condacolab gdown
from google.colab import drive
import condacolab

drive.mount("/content/drive", force_remount=True)

condacolab.install()  # This will restart the runtime automatically


## 2. Clone Repository and Setup Conda Environment

**This might take 5 to 10 min and disconnect the runtime. Colab should be able to reconnect automatically.**

Clone the source code from GitHub and install dependencies via conda.


In [None]:
import condacolab
from pathlib import Path

condacolab.check()

repo_url = "https://github.com/gdoda/ETSP.git"
repo_dir = "/content/etsp-github"

!git clone --quiet $repo_url $repo_dir
%cd $repo_dir

# Create conda environment
!conda env create -f environment.yml --quiet

# Setup Google Drive folder for model checkpoints and reports
drive_checkpoints = Path("/content/drive/MyDrive/ETSP_checkpoints")
drive_checkpoints.mkdir(parents=True, exist_ok=True)

# Create symlink from models/ to Google Drive folder
local_models = Path("models")
if local_models.exists():
    import shutil

    shutil.rmtree(local_models)
local_models.symlink_to(drive_checkpoints)

print(f"Models directory linked to: {drive_checkpoints}")

## 3. Prepare Dataset

This cell automatically downloads both required files:

- **flac.zip** - Audio files
- **trial_metadata.txt** - Ground truth labels

Tries to find the files from Google Drive if available (team members), downloads from public links if not (for TAs)

Adjust `MAX_FILES` to limit extraction (set to `None` for all flac files).


In [None]:
import zipfile
import shutil
from pathlib import Path
from src.config import config

MAX_FILES = 10000  # Set to None for all files

# Source paths (Google Drive or fallback URLs)
DRIVE = Path("/content/drive/MyDrive/ASVspoof21")
FLAC_URL = "https://drive.google.com/uc?id=1E26Zptq_Uh_zVl17nJkcrUAAO-BqNLCm"
META_URL = "https://drive.google.com/uc?id=1293N5dDYwhxBTDtOzSEalEm_ud8Np55s"

# Destination paths (from config.py)
DEST_AUDIO = Path(config.raw_audio_dir)
DEST_METADATA = Path(config.protocol_file)

DEST_AUDIO.mkdir(parents=True, exist_ok=True)

# 1. Metadata
if not DEST_METADATA.exists():
    src = DRIVE / "trial_metadata.txt"
    if src.exists():
        shutil.copy(src, DEST_METADATA)
    else:
        !gdown -q {META_URL} -O {DEST_METADATA}

# 2. Audio files
if not any(DEST_AUDIO.rglob("*.flac")):
    zip_src = DRIVE / "flac.zip" if (DRIVE / "flac.zip").exists() else Path("flac.zip")
    if not zip_src.exists():
        !gdown {FLAC_URL} -O {zip_src}

    with zipfile.ZipFile(zip_src) as z:
        flacs = [f for f in z.namelist() if f.endswith(".flac") and "__MACOSX" not in f]
        selected = flacs[:MAX_FILES] if MAX_FILES else flacs
        print(f"Extracting {len(selected)}/{len(flacs)} files...")

        for member in selected:
            filename = Path(member).name
            target_path = DEST_AUDIO / filename
            with z.open(member) as src, open(target_path, "wb") as dst:
                dst.write(src.read())

print(f"Audio: {len(list(DEST_AUDIO.rglob('*.flac')))} files")
print(f"Labels: {'OK' if DEST_METADATA.exists() else 'MISSING'}")

## 5. Configuration

The configuration parameters and paths are available in `src/config.py`. Review before running the next cell.


## 6. Run pipeline

Execute the main pipeline to process data and train models.


In [None]:
!conda run --no-capture-output -n etsp python main.py
