
# Bimaminobolonana — Quickstart Notebook

This notebook mirrors the repo's README usage, but in an executable format.  
It uses the **`encoder`** package (singular) now present in the repo.



## 0) Environment setup

If you're running on NYU HPC (or locally) and need a clean environment, here are two options.  
**You don't need to execute these cells inside the notebook** if your environment is already prepared.



### Option A — venv + CPU PyTorch (works everywhere)
```bash
python3.10 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install --index-url https://download.pytorch.org/whl/cpu torch==2.4.1 torchvision==0.19.1
pip install -r requirements.txt
pytest -q
```



### Option B — Conda (if you prefer)
```bash
conda env create -f environment.yaml
conda activate dev
pip install -r requirements.txt
```


## 1) Imports

In [None]:

import torch, yaml
from PIL import Image
from encoder import build_encoder
from encoder.transforms import build_image_transform, prepare_batch

print("Torch:", torch.__version__)



## 2) Transforms (input preprocessing)

Helpers to get correctly sized/normalized tensors for each encoder.
- `kind="clip"` for CLIP
- `kind="imagenet"` for Pri3D


In [None]:

# CLIP-style preprocessing (use for clip_vit)
tfm_clip = build_image_transform(kind="clip", size=224)
# ImageNet-style preprocessing (use for pri3d)
tfm_im = build_image_transform(kind="imagenet", size=224)
print("Transforms ready.")



## 3) Encoder skeleton quickstart

`build_encoder(cfg)` returns an object with `encode((left, right)) -> dict`  
containing `left`, `right`, and `fused` features (all with shape `B×512` by default).


In [None]:

# Minimal config (CLIP stub; no weights) — uses the repo's YAML if present.
try:
    with open("configs/encoder_clip_b32.yaml", "r") as f:
        cfg_clip = yaml.safe_load(f)
except FileNotFoundError:
    cfg_clip = {"name":"clip_vit","model_name":"ViT-B-32","out_dim":512,"freeze":True,"fuse":"mean"}

enc_clip = build_encoder(cfg_clip)
# Create two dummy RGB images and batch them
left_img  = Image.new("RGB", (320, 240), color=(255, 0, 0))
right_img = Image.new("RGB", (240, 320), color=(0, 255, 0))
x_left  = prepare_batch(left_img,  transform=tfm_clip)
x_right = prepare_batch(right_img, transform=tfm_clip)
out = enc_clip.encode((x_left, x_right))
{k: v.shape for k, v in out.items()}



### 3.1) CLIP (pretrained via open-clip)

If you have `configs/encoder_clip_b32_openai.yaml`, this cell will use it; otherwise it falls back to the stub.


In [None]:

from pathlib import Path
cfg_path = Path("configs/encoder_clip_b32_openai.yaml")
if cfg_path.exists():
    with open(cfg_path, "r") as f:
        cfg_clip_openai = yaml.safe_load(f)
    enc_clip_openai = build_encoder(cfg_clip_openai)
    out_openai = enc_clip_openai.encode((x_left, x_right))
    print({k: v.shape for k, v in out_openai.items()})
else:
    print("configs/encoder_clip_b32_openai.yaml not found — skipping pretrained CLIP demo.")



## 4) Pri3D encoder (random-init ResNet)

Capacity-matched control using torchvision ResNet (18/34/50), random initialization, ImageNet preprocessing.


In [None]:

try:
    with open("configs/encoder_pri3d_random.yaml", "r") as f:
        cfg_pri = yaml.safe_load(f)
except FileNotFoundError:
    cfg_pri = {"name":"pri3d","variant":"resnet50","pretrained":False,"freeze":False,"out_dim":512,"fuse":"mean"}

enc_pri = build_encoder(cfg_pri)
# Reuse the ImageNet transform
left_img2  = Image.new("RGB", (320, 240), color=(30, 30, 200))
right_img2 = Image.new("RGB", (240, 320), color=(30, 200, 30))
x_left2  = prepare_batch(left_img2,  transform=tfm_im)
x_right2 = prepare_batch(right_img2, transform=tfm_im)
out2 = enc_pri.encode((x_left2, x_right2))
{k: v.shape for k, v in out2.items()}



### 4.1) Pri3D (pretrained)

If you have a Pri3D checkpoint (e.g., the authors' **ResNet-50 Pri3D (View+Geo)**), point the config to it:
```yaml
name: pri3d
variant: resnet50
pretrained: true
ckpt_path: /path/to/ScanNet_Combine_BatchSize64_LearningRate01_Epoch5_ImageSize240x320_ResNet50.pth
freeze: true
out_dim: 512
fuse: mean
```

This cell will try to use `configs/encoder_pri3d_pretrained.yaml` if it exists.


In [None]:

cfg_pre = Path("configs/encoder_pri3d_pretrained.yaml")
if cfg_pre.exists():
    cfg = yaml.safe_load(cfg_pre.read_text())
    if cfg.get("ckpt_path") and Path(cfg["ckpt_path"]).expanduser().exists():
        enc_pri_pre = build_encoder(cfg)
        outp = enc_pri_pre.encode((x_left2, x_right2))
        print({k: v.shape for k, v in outp.items()})
    else:
        print("configs/encoder_pri3d_pretrained.yaml found but ckpt_path missing/not found — skipping.")
else:
    print("configs/encoder_pri3d_pretrained.yaml not found — skipping Pri3D pretrained demo.")



## 5) Notes
- All encoders return `B×512` features and a `fused` vector when `fuse` is set.
- Use `build_image_transform(kind=...)` to match the encoder's expected normalization.
- For reproducible CI, prefer CPU Torch + the stub configs; for experiments, use pretrained CLIP and Pri3D.
