# ETSP - Run on Google Colab

This notebook sets up the environment to run the project on Google Colab.
It handles dependency installation, data retrieval, and execution.


## 1. Environment Setup

Mount Google Drive to access the dataset shortcut.
The team members have access to the shourtcut available on Drive, TAs will have to download the content from the web.


In [None]:
from google.colab import drive
import os

%pip install -q gdown
drive.mount("/content/drive", force_remount=True)

# Install uv
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Add uv to PATH for this session
os.environ["PATH"] += ":/root/.local/bin"

# Install Python 3.10.19
!uv python install 3.10.19

## 2. Clone Repository

Clone the source code from GitHub and create the virtual environment with all dependencies


In [None]:
repo_url = "https://github.com/gdoda/ETSP.git"
repo_dir = "/content/etsp-github"
!rm -rf $repo_dir
!git clone $repo_url $repo_dir
%cd $repo_dir
# Install dependencies
!uv sync --python 3.10.19

## 3. Prepare Dataset

This cell automatically downloads both required files:

- **flac.zip** - Audio files
- **trial_metadata.txt** - Ground truth labels

Tries to find the files from Google Drive if available (team members), Downloads from public links if not (for TAs)

Adjust `MAX_FILES` to limit extraction (set to `None` for all flac files).


In [None]:
import zipfile
import shutil
from pathlib import Path
from src.config import config

MAX_FILES = 50000  # Set to None for all files

# Source paths (Google Drive or fallback URLs)
DRIVE = Path("/content/drive/MyDrive/ASVspoof21")
FLAC_URL = "https://drive.google.com/uc?id=1E26Zptq_Uh_zVl17nJkcrUAAO-BqNLCm"
META_URL = "https://drive.google.com/uc?id=1293N5dDYwhxBTDtOzSEalEm_ud8Np55s"

# Destination paths (from config.py)
DEST_AUDIO = Path(config.raw_audio_dir)
DEST_METADATA = Path(config.protocol_file)

DEST_AUDIO.mkdir(parents=True, exist_ok=True)

# 1. Metadata
if not DEST_METADATA.exists():
    src = DRIVE / "trial_metadata.txt"
    if src.exists():
        shutil.copy(src, DEST_METADATA)
    else:
        !gdown -q {META_URL} -O {DEST_METADATA}

# 2. Audio files
if not any(DEST_AUDIO.rglob("*.flac")):
    zip_src = DRIVE / "flac.zip" if (DRIVE / "flac.zip").exists() else Path("flac.zip")
    if not zip_src.exists():
        !gdown {FLAC_URL} -O {zip_src}

    with zipfile.ZipFile(zip_src) as z:
        flacs = [f for f in z.namelist() if f.endswith(".flac") and "__MACOSX" not in f]
        print(
            f"Extracting {len(flacs[:MAX_FILES]) if MAX_FILES else len(flacs)}/{len(flacs)} files..."
        )
        z.extractall(DEST_AUDIO, flacs[:MAX_FILES] if MAX_FILES else flacs)

print(f"Audio: {len(list(DEST_AUDIO.rglob('*.flac')))} files")
print(f"Labels: {'OK' if DEST_METADATA.exists() else 'MISSING'}")

## 5. Configuration

The configuration parameters and paths are available in `src/config.py`. Review before running the next cell.


## 6. Run pipeline

Execute the main pipeline to process data and train models.


In [None]:
!uv run --python 3.10.19 python main.py