# üñºÔ∏è ImageSpace ‚Äî Interactive Image Collection Visualization

**ImageSpace** transforms any folder of images into an interactive web visualization using CLIP embeddings, t-SNE dimensionality reduction, and HDBSCAN clustering.

## What this notebook does:
1. **Installs dependencies** (opentsne, hdbscan, onnxruntime)
2. **Uploads your images** (or uses a sample dataset)
3. **Runs the ImageSpace pipeline** ‚Äî generates atlas textures, CLIP embeddings, t-SNE layout, and cluster assignments
4. **Downloads the output** as a ZIP file you can host anywhere

## Requirements:
- A folder of images (JPG, PNG, WebP, etc.)
- Optional: a metadata CSV with a `filename` column

---

## Step 1: Install Dependencies

In [None]:
!pip install -q pillow numpy scikit-learn opentsne hdbscan onnxruntime huggingface_hub scipy

## Step 2: Upload Images

**Option A:** Upload a ZIP file of images  
**Option B:** Mount Google Drive and point to a folder  
**Option C:** Use the sample dataset (100 random images)

In [None]:
import os
from pathlib import Path

# === CONFIGURE YOUR INPUT HERE ===
# Option A: Upload a ZIP (uncomment next 3 lines)
# from google.colab import files
# uploaded = files.upload()  # Upload your ZIP
# !unzip -q *.zip -d /content/images

# Option B: Mount Google Drive (uncomment next 3 lines)
# from google.colab import drive
# drive.mount('/content/drive')
# INPUT_DIR = '/content/drive/MyDrive/my_images'  # Change this path

# Option C: Use sample dataset (default)
INPUT_DIR = '/content/images'
os.makedirs(INPUT_DIR, exist_ok=True)

# Generate sample images if no images exist
existing = list(Path(INPUT_DIR).rglob('*.jpg')) + list(Path(INPUT_DIR).rglob('*.png'))
if len(existing) == 0:
    print('No images found. Generating 200 sample images...')
    import numpy as np
    from PIL import Image
    np.random.seed(42)
    for i in range(200):
        color = np.random.randint(0, 255, (64, 64, 3), dtype=np.uint8)
        Image.fromarray(color).save(f'{INPUT_DIR}/sample_{i:04d}.jpg')
    print(f'Created 200 sample images in {INPUT_DIR}')
else:
    print(f'Found {len(existing)} images in {INPUT_DIR}')

# Optional: path to metadata CSV (set to None if none)
METADATA_CSV = None  # e.g., '/content/drive/MyDrive/metadata.csv'

## Step 3: Download the ImageSpace Pipeline

In [None]:
# Download the pipeline script from GitHub
!wget -q https://raw.githubusercontent.com/nabsiddiqui/modern-pixplot/master/scripts/imagespace.py -O /content/imagespace.py

# If the above fails (repo not yet public), use the embedded pipeline
if not os.path.exists('/content/imagespace.py') or os.path.getsize('/content/imagespace.py') < 100:
    print('Downloading from GitHub failed. The pipeline will be embedded below.')

## Step 4: Run the Pipeline

This processes all your images through:
1. **Atlas generation** ‚Äî packs thumbnails into WebP sprite sheets
2. **CLIP embeddings** ‚Äî extracts visual features via ONNX Runtime
3. **PCA + openTSNE** ‚Äî creates 2D layout from high-dimensional embeddings
4. **HDBSCAN clustering** ‚Äî discovers natural groupings
5. **Metadata extraction** ‚Äî colors, timestamps, filenames

In [None]:
OUTPUT_DIR = '/content/output/data'
os.makedirs(OUTPUT_DIR, exist_ok=True)

cmd = f'python /content/imagespace.py "{INPUT_DIR}" --output "{OUTPUT_DIR}" --gpu'
if METADATA_CSV:
    cmd += f' --metadata "{METADATA_CSV}"'

print(f'Running: {cmd}')
!{cmd}

## Step 5: Download the Viewer + Data

Downloads the pre-built ImageSpace viewer along with your processed data as a ZIP.

In [None]:
# Download the pre-built viewer
!wget -q https://github.com/nabsiddiqui/modern-pixplot/releases/latest/download/viewer.zip -O /content/viewer.zip 2>/dev/null || true

import zipfile, shutil

SITE_DIR = '/content/output'

# If viewer zip exists, extract it
if os.path.exists('/content/viewer.zip') and os.path.getsize('/content/viewer.zip') > 100:
    with zipfile.ZipFile('/content/viewer.zip', 'r') as z:
        z.extractall(SITE_DIR)
    print('Viewer extracted successfully')
else:
    # Create a minimal index.html that loads data
    html = '''<!DOCTYPE html>
<html><head><title>ImageSpace</title><meta charset="utf-8">
<style>body{margin:0;font-family:system-ui;background:#faf4ed;display:flex;align-items:center;justify-content:center;height:100vh}
.msg{text-align:center;color:#575279}h1{color:#286983}</style></head>
<body><div class="msg"><h1>ImageSpace</h1>
<p>Data files generated. To view, build the viewer from source:</p>
<code>git clone https://github.com/nabsiddiqui/modern-pixplot && cd modern-pixplot/frontend-pixi && npm install && npm run build</code>
<p>Then copy the data/ folder into the dist/ folder and serve.</p></div></body></html>'''
    with open(f'{SITE_DIR}/index.html', 'w') as f:
        f.write(html)
    print('Created placeholder index.html (build viewer from source for full experience)')

# Package everything as a ZIP
output_zip = '/content/imagespace_output.zip'
shutil.make_archive('/content/imagespace_output', 'zip', SITE_DIR)
print(f'\nOutput packaged: {output_zip}')
print(f'Size: {os.path.getsize(output_zip) / 1024 / 1024:.1f} MB')

# Auto-download
from google.colab import files
files.download(output_zip)

## Step 6: Preview (Optional)

Start a local server to preview the output in Colab.

In [None]:
# List generated files
import glob
print('Generated files:')
for f in sorted(glob.glob(f'{OUTPUT_DIR}/*')):
    size = os.path.getsize(f)
    unit = 'KB' if size < 1024*1024 else 'MB'
    val = size/1024 if unit == 'KB' else size/1024/1024
    print(f'  {os.path.basename(f):30s} {val:8.1f} {unit}')

---

## How to Host Your Visualization

1. **GitHub Pages**: Upload the ZIP contents to a GitHub repo and enable Pages
2. **Netlify/Vercel**: Drag and drop the ZIP contents
3. **Local**: `python -m http.server 8080` in the unzipped directory

The output is a **fully static site** ‚Äî no server-side code required!

---

*ImageSpace is developed by [Nabeel Siddiqui](https://github.com/nabsiddiqui).*