# MASt3R Face Authentication - Colab Template

**Purpose**: This notebook provides a template for team members to work with MASt3R on Google Colab.

**Who should use this**:
- CS-2: For testing visualization with real point cloud data
- DS-1: For developing matching algorithms
- DS-2: For evaluation and anti-spoofing analysis

**What you'll need**:
- A Google account with Google Drive access
- The shared `face-auth-data` folder on Google Drive

---

## How to use this notebook

1. **First time**: Go to `File > Save a copy in Drive` to create your own copy
2. **Every session**: Run cells 1-3 to set up the environment
3. **Before closing**: Push your changes to GitHub (see final cell)

**Important**: Colab sessions are temporary! Always push your code changes to GitHub before closing.

---

*Author: CS-1 | Last updated: 2026-02*

## Cell 1: GPU Check & Google Drive Mount

This cell verifies that a GPU is available and mounts your Google Drive.

**What is Google Drive Mount?**
- It makes your Drive files accessible as if they were local files
- Your Drive will appear at `/content/drive/MyDrive/`
- This is where we store large files (model weights, datasets)

In [None]:
# ============================================================
# Cell 1: GPU Check + Google Drive Mount
# ============================================================
# This cell must be run at the START of every Colab session.
# It checks for GPU availability and mounts Google Drive.
# ============================================================

import torch
import os

# ----- GPU Check -----
# MASt3R requires a GPU for reasonable performance.
# If no GPU is detected, go to: Runtime > Change runtime type > T4 GPU

if not torch.cuda.is_available():
    raise RuntimeError(
        "\n" + "="*60 + "\n" +
        "ERROR: No GPU detected!\n" +
        "Please enable GPU: Runtime > Change runtime type > T4 GPU\n" +
        "="*60
    )

# Print GPU information
gpu_name = torch.cuda.get_device_name(0)
vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
print(f"‚úÖ GPU detected: {gpu_name}")
print(f"   VRAM: {vram_gb:.1f} GB")

# ----- Mount Google Drive -----
# This allows access to shared data files stored on Drive.
# You'll be prompted to authorize access the first time.

from google.colab import drive
drive.mount('/content/drive')

# ----- Set up shared data directory -----
# SHARED_DIR is where all team data lives:
#   - checkpoints/: MASt3R model weights (~1GB)
#   - raw_captures/: Webcam images captured by CS-1
#   - mast3r_outputs/: Pre-computed point clouds & descriptors
#   - evaluation_results/: DS team experiment outputs

SHARED_DIR = "/content/drive/MyDrive/face-auth-data"

# Create subdirectories if they don't exist
subdirs = ["checkpoints", "raw_captures", "mast3r_outputs", "evaluation_results"]
for subdir in subdirs:
    os.makedirs(f"{SHARED_DIR}/{subdir}", exist_ok=True)

print(f"\n‚úÖ Google Drive mounted")
print(f"   Shared data directory: {SHARED_DIR}")

# List what's in the shared directory
print(f"\nüìÅ Contents of shared directory:")
for item in os.listdir(SHARED_DIR):
    item_path = os.path.join(SHARED_DIR, item)
    if os.path.isdir(item_path):
        n_files = len(os.listdir(item_path))
        print(f"   üìÇ {item}/ ({n_files} items)")
    else:
        print(f"   üìÑ {item}")

## Cell 2: Clone GitHub Repository

This cell clones our team's code repository from GitHub.

**Why clone every session?**
- Colab's filesystem is temporary (erased when session ends)
- GitHub is our "source of truth" for all code
- This ensures you always have the latest code

**Important**: Update `REPO_URL` with your team's actual repository URL!

In [None]:
# ============================================================
# Cell 2: Clone GitHub Repository
# ============================================================
# Clone the team's repository from GitHub.
# This gives you access to all source code.
# ============================================================

# ----- CONFIGURATION (UPDATE THESE!) -----
# Replace with your team's actual repository URL
REPO_URL = "https://github.com/YOUR-TEAM/face-auth-mast3r.git"  # <-- UPDATE THIS!

# Which branch to check out (usually 'develop' for development work)
BRANCH = "develop"

# Your git identity (needed for commits)
GIT_EMAIL = "your-email@example.com"  # <-- UPDATE THIS!
GIT_NAME = "Your Name"                # <-- UPDATE THIS!
# ------------------------------------------

import os

REPO_DIR = "/content/repo"

# Clone if not already cloned
if not os.path.exists(REPO_DIR):
    print(f"Cloning repository from {REPO_URL}...")
    !git clone {REPO_URL} {REPO_DIR}
else:
    print(f"Repository already exists at {REPO_DIR}")

# Change to repo directory
%cd {REPO_DIR}

# Fetch latest changes and checkout the specified branch
!git fetch origin
!git checkout {BRANCH}
!git pull origin {BRANCH}

# Configure git identity for commits
!git config user.email "{GIT_EMAIL}"
!git config user.name "{GIT_NAME}"

print(f"\n‚úÖ Repository ready at {REPO_DIR}")
print(f"   Branch: {BRANCH}")

# Show recent commits
print(f"\nüìú Recent commits:")
!git log --oneline -5

## Cell 3: Symlink Large Files from Drive

This cell creates symbolic links from Google Drive to the repository.

**Why symlinks?**
- Large files (model weights, datasets) live on Google Drive
- They're too big for GitHub (100MB limit)
- Symlinks let the code access them as if they were in the repo

**What is a symlink?**
- A "shortcut" that points to another file/folder
- Like a Windows shortcut, but for Linux

In [None]:
# ============================================================
# Cell 3: Symlink Large Files from Google Drive
# ============================================================
# Create symbolic links so code can access Drive data.
# These symlinks are gitignored - they won't be committed.
# ============================================================

import os

REPO_DIR = "/content/repo"
SHARED_DIR = "/content/drive/MyDrive/face-auth-data"

# ----- Symlink 1: Model checkpoints -----
# The MASt3R model weights file is ~1GB
checkpoints_dir = f"{REPO_DIR}/checkpoints"
os.makedirs(checkpoints_dir, exist_ok=True)

checkpoint_file = "MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth"
checkpoint_src = f"{SHARED_DIR}/checkpoints/{checkpoint_file}"
checkpoint_dst = f"{checkpoints_dir}/{checkpoint_file}"

if os.path.exists(checkpoint_src):
    if not os.path.exists(checkpoint_dst):
        os.symlink(checkpoint_src, checkpoint_dst)
    print(f"‚úÖ Checkpoint linked: {checkpoint_file}")
else:
    print(f"‚ö†Ô∏è  Checkpoint not found on Drive: {checkpoint_src}")
    print(f"   CS-1 needs to upload it to the shared folder.")

# ----- Symlink 2: Pre-computed MASt3R outputs -----
# These .npz files contain point clouds and descriptors
# DS team uses these to develop matching algorithms
data_shared_link = f"{REPO_DIR}/data_shared"
mast3r_outputs = f"{SHARED_DIR}/mast3r_outputs"

if os.path.exists(mast3r_outputs):
    if not os.path.exists(data_shared_link):
        os.symlink(mast3r_outputs, data_shared_link)
    print(f"‚úÖ MASt3R outputs linked to: data_shared/")

    # List available .npz files
    npz_files = [f for f in os.listdir(mast3r_outputs) if f.endswith('.npz')]
    if npz_files:
        print(f"   Available templates:")
        for f in npz_files[:10]:  # Show first 10
            print(f"     - {f}")
        if len(npz_files) > 10:
            print(f"     ... and {len(npz_files) - 10} more")
    else:
        print(f"   No .npz files yet (CS-1 will generate these)")
else:
    print(f"‚ö†Ô∏è  MASt3R outputs folder not found: {mast3r_outputs}")

print(f"\n‚úÖ Drive data symlinked into repository")

## Cell 4: Install MASt3R Dependencies

This cell installs the Python packages that MASt3R needs.

**Note**: This takes 3-5 minutes and must be run every session.

**What gets installed?**
- PyTorch (deep learning framework)
- Open3D (point cloud processing)
- Various scientific Python packages

In [None]:
# ============================================================
# Cell 4: Install MASt3R Dependencies
# ============================================================
# Install Python packages required by MASt3R.
# This takes 3-5 minutes and must be done every session.
# ============================================================

%%bash
# Check if MASt3R is available as submodule or standalone
if [ -d "/content/repo/third_party/mast3r" ]; then
    MAST3R_DIR="/content/repo/third_party/mast3r"
    echo "Using MASt3R from repository submodule"
else
    # Clone MASt3R standalone if submodule not initialized
    if [ ! -d "/content/mast3r" ]; then
        echo "Cloning MASt3R repository..."
        git clone --recursive https://github.com/naver/mast3r.git /content/mast3r
    fi
    MAST3R_DIR="/content/mast3r"
    echo "Using standalone MASt3R clone"
fi

cd $MAST3R_DIR

# Install MASt3R requirements
echo "Installing MASt3R requirements..."
pip install -q -r requirements.txt

# Install DUSt3R requirements (MASt3R depends on DUSt3R)
echo "Installing DUSt3R requirements..."
pip install -q -r dust3r/requirements.txt

# Optional: Compile CUDA kernels for faster inference
# This may fail on some Colab instances, but it's not critical
echo "Attempting to compile CUDA kernels (optional)..."
cd dust3r/croco/models/curope/
python setup.py build_ext --inplace 2>/dev/null || echo "CUDA kernel compilation skipped (non-critical)"

echo ""
echo "‚úÖ Dependencies installed!"

## Cell 5: Configure Python Path & Load MASt3R Model

This cell sets up Python import paths and loads the MASt3R model.

**What does this do?**
- Adds MASt3R and our project to Python's import path
- Loads the pre-trained model into GPU memory

**After this cell**: You can use `model` for inference!

In [None]:
# ============================================================
# Cell 5: Configure Python Path & Load MASt3R Model
# ============================================================
# Set up Python imports and load the MASt3R model.
# After this cell, you can use 'model' for inference.
# ============================================================

import sys
import os
import torch

# ----- Configure Python path -----
# This tells Python where to find MASt3R and our project code

# Check for MASt3R location (submodule or standalone)
if os.path.exists("/content/repo/third_party/mast3r"):
    MAST3R_PATH = "/content/repo/third_party/mast3r"
else:
    MAST3R_PATH = "/content/mast3r"

# Add paths to Python's import search path
paths_to_add = [
    "/content/repo",           # Our project
    MAST3R_PATH,                # MASt3R
    f"{MAST3R_PATH}/dust3r",   # DUSt3R (MASt3R dependency)
]

for path in paths_to_add:
    if path not in sys.path:
        sys.path.insert(0, path)

print(f"‚úÖ Python paths configured")
print(f"   MASt3R location: {MAST3R_PATH}")

# ----- Import MASt3R -----
try:
    from mast3r.model import AsymmetricMASt3R
    print(f"‚úÖ MASt3R module imported")
except ImportError as e:
    print(f"‚ùå Failed to import MASt3R: {e}")
    print(f"   Make sure Cell 4 (dependencies) ran successfully.")

# ----- Load the model -----
print(f"\n‚è≥ Loading MASt3R model (this takes 30-60 seconds)...")

# Load from Hugging Face Hub (auto-downloads if not cached)
model = AsymmetricMASt3R.from_pretrained(
    "naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric"
)

# Move model to GPU
device = torch.device("cuda")
model = model.to(device)
model.eval()  # Set to evaluation mode (disables dropout, etc.)

# Print model info
n_params = sum(p.numel() for p in model.parameters()) / 1e6
print(f"\n‚úÖ MASt3R model loaded!")
print(f"   Parameters: {n_params:.1f}M")
print(f"   Device: {device}")

# Check GPU memory usage
allocated = torch.cuda.memory_allocated(0) / 1e9
total = torch.cuda.get_device_properties(0).total_memory / 1e9
print(f"   GPU Memory: {allocated:.1f}GB / {total:.1f}GB used")

---

## Cell 6: Load Pre-computed Data (For DS Team)

**Who needs this?**: DS-1 and DS-2

This cell shows how to load `.npz` files that CS-1 has pre-computed.
These files contain point clouds and descriptors that you can use
to develop matching algorithms **without running MASt3R yourself**.

**Contents of each .npz file:**
- `point_cloud`: (N, 3) - 3D coordinates of face surface
- `descriptors`: (N, D) - Feature vectors for matching
- `confidence`: (N,) - How confident each point is
- `colors`: (N, 3) - RGB color per point

In [None]:
# ============================================================
# Cell 6: Load Pre-computed Data (For DS Team)
# ============================================================
# Load .npz files containing point clouds and descriptors.
# DS team uses these to develop matching algorithms.
# No MASt3R inference needed!
# ============================================================

import numpy as np
import os

SHARED_DIR = "/content/drive/MyDrive/face-auth-data/mast3r_outputs"

def load_template(filename: str) -> dict:
    """
    Load a pre-computed face template from the shared Drive folder.

    Args:
        filename: Name of the .npz file (e.g., "alice_enrollment.npz")

    Returns:
        Dictionary containing:
        - point_cloud: (N, 3) - 3D face surface coordinates
        - descriptors: (N, D) - Dense feature vectors per point
        - confidence: (N,) - Confidence value per point
        - colors: (N, 3) - RGB color per point

    Example:
        >>> template = load_template("alice_enrollment.npz")
        >>> print(template["point_cloud"].shape)  # (N, 3)
    """
    filepath = os.path.join(SHARED_DIR, filename)

    if not os.path.exists(filepath):
        raise FileNotFoundError(
            f"Template not found: {filepath}\n"
            f"Available files: {os.listdir(SHARED_DIR) if os.path.exists(SHARED_DIR) else 'Directory not found'}"
        )

    data = np.load(filepath, allow_pickle=True)

    return {
        "point_cloud": data["point_cloud"],   # (N, 3) float32
        "descriptors": data["descriptors"],   # (N, D) float32
        "confidence": data["confidence"],     # (N,) float32
        "colors": data["colors"],             # (N, 3) uint8
    }


# ----- Example usage -----
# List available templates
if os.path.exists(SHARED_DIR):
    npz_files = [f for f in os.listdir(SHARED_DIR) if f.endswith('.npz')]
    print(f"üìÇ Available templates in {SHARED_DIR}:")
    for f in npz_files:
        filepath = os.path.join(SHARED_DIR, f)
        size_mb = os.path.getsize(filepath) / 1e6
        print(f"   {f} ({size_mb:.1f} MB)")

    # Load the first available template as an example
    if npz_files:
        print(f"\nüì• Loading example: {npz_files[0]}")
        example = load_template(npz_files[0])
        print(f"   Point cloud shape: {example['point_cloud'].shape}")
        print(f"   Descriptor shape: {example['descriptors'].shape}")
        print(f"   Total points: {len(example['point_cloud'])}")
else:
    print(f"‚ö†Ô∏è  Shared directory not found: {SHARED_DIR}")
    print(f"   CS-1 needs to upload pre-computed templates to Drive.")

## Cell 7: Visualize a Point Cloud

**Recommended first step**: See what the data looks like!

This cell creates an interactive 3D visualization of a face point cloud.
You can rotate, zoom, and explore the 3D structure.

In [None]:
# ============================================================
# Cell 7: Visualize a Point Cloud
# ============================================================
# Create an interactive 3D visualization of the face.
# This helps you understand the data before writing algorithms.
# ============================================================

import numpy as np
import plotly.graph_objects as go

def visualize_point_cloud(points: np.ndarray, colors: np.ndarray = None,
                         title: str = "Point Cloud", max_points: int = 10000):
    """
    Create an interactive 3D visualization of a point cloud.

    Args:
        points: (N, 3) array of 3D coordinates
        colors: (N, 3) array of RGB colors (optional)
        title: Title for the plot
        max_points: Subsample to this many points for performance

    Returns:
        plotly Figure object (displayed automatically in Colab)
    """
    # Subsample if too many points (for rendering performance)
    if len(points) > max_points:
        idx = np.random.choice(len(points), max_points, replace=False)
        points = points[idx]
        if colors is not None:
            colors = colors[idx]
        print(f"Subsampled to {max_points} points for visualization")

    # Prepare colors
    if colors is not None:
        # Convert RGB to plotly color format
        color_strings = [f'rgb({r},{g},{b})' for r, g, b in colors]
    else:
        # Default: color by Z-depth
        color_strings = points[:, 2]

    # Create the 3D scatter plot
    fig = go.Figure(data=[go.Scatter3d(
        x=points[:, 0],
        y=points[:, 1],
        z=points[:, 2],
        mode='markers',
        marker=dict(
            size=1.5,
            color=color_strings,
            opacity=0.8
        )
    )])

    # Configure layout
    fig.update_layout(
        title=title,
        scene=dict(
            aspectmode='data',  # Equal aspect ratio
            xaxis_title='X',
            yaxis_title='Y',
            zaxis_title='Z (depth)',
        ),
        width=800,
        height=600,
    )

    return fig


# ----- Example: Visualize a loaded template -----
# (Assuming you ran Cell 6 to load data)

SHARED_DIR = "/content/drive/MyDrive/face-auth-data/mast3r_outputs"

if os.path.exists(SHARED_DIR):
    npz_files = [f for f in os.listdir(SHARED_DIR) if f.endswith('.npz')]

    if npz_files:
        # Load and visualize the first available template
        template = load_template(npz_files[0])
        fig = visualize_point_cloud(
            template["point_cloud"],
            template["colors"],
            title=f"Face Point Cloud: {npz_files[0]}"
        )
        fig.show()
    else:
        print("No .npz files available. Creating demo with random data...")
        # Create a demo sphere point cloud
        theta = np.random.uniform(0, np.pi, 5000)
        phi = np.random.uniform(0, 2*np.pi, 5000)
        r = 1 + 0.1 * np.random.randn(5000)
        demo_points = np.stack([
            r * np.sin(theta) * np.cos(phi),
            r * np.sin(theta) * np.sin(phi),
            r * np.cos(theta)
        ], axis=1)
        fig = visualize_point_cloud(demo_points, title="Demo: Random Sphere")
        fig.show()

---

## Your Work Goes Here!

Add your own cells below for:
- **DS-1**: Matching algorithm experiments
- **DS-2**: Evaluation and anti-spoofing analysis
- **CS-2**: Visualization testing

Remember to save your work to GitHub before the session ends!

In [None]:
# ============================================================
# Your experiments go here!
# ============================================================

# Example: Load two templates and compare them
# alice = load_template("alice_enrollment.npz")
# bob = load_template("bob_enrollment.npz")
#
# # Your matching algorithm here...
# score = your_matching_function(alice, bob)
# print(f"Match score: {score}")

---

## ‚ö†Ô∏è Before Closing: Push to GitHub!

**CRITICAL**: Colab sessions are temporary. Any code you write will be lost when the session ends unless you push it to GitHub.

Run this cell to commit and push your changes.

In [None]:
# ============================================================
# BEFORE CLOSING: Push Changes to GitHub
# ============================================================
# Run this cell to save your work before the session ends!
# ============================================================

# Uncomment the lines below when you're ready to push

# %cd /content/repo

# # Stage all changes
# !git add -A

# # Show what will be committed
# !git status

# # Create a commit (update the message!)
# !git commit -m "feat: describe your changes here"

# # Push to your branch
# !git push origin {BRANCH}

---

## Appendix: Quick Reference

### Common Operations

```python
# Load a template
template = load_template("alice_enrollment.npz")
points = template["point_cloud"]    # (N, 3)
desc = template["descriptors"]      # (N, D)

# Center a point cloud
centered = points - points.mean(axis=0)

# Chamfer distance between two point clouds
from scipy.spatial import cKDTree
d1, _ = cKDTree(cloud_b).query(cloud_a)
d2, _ = cKDTree(cloud_a).query(cloud_b)
chamfer = (d1.mean() + d2.mean()) / 2

# Nearest neighbor matching (descriptors)
_, idx_a2b = cKDTree(desc_b).query(desc_a)
_, idx_b2a = cKDTree(desc_a).query(desc_b)
# Reciprocal match: idx_a2b[i] == j AND idx_b2a[j] == i
```

### Useful Imports

```python
import numpy as np
import torch
from scipy.spatial import cKDTree
import plotly.graph_objects as go
import open3d as o3d  # pip install open3d
```