# Q2 — Text-Driven Image Segmentation with SAM 2 (Pipeline)
This notebook provides a runnable pipeline in Colab to perform text-prompted segmentation using a combination of a grounding model (e.g., GroundingDINO / CLIPSeg) to convert text -> region seeds, then using Segment Anything (SAM) to produce masks.

**Important:** Installing SAM and grounding models may download model weights. Use Colab GPU and follow prompts to mount Google Drive if you want to cache weights.

This notebook is written to run end-to-end in Colab. It includes install cells and an example image pipeline.


In [ ]:
# Install dependencies (Colab). These installs may take several minutes.
!pip install -q git+https://github.com/facebookresearch/segment-anything.git
!pip install -q groundingdino_clip
!pip install -q transformers timm torchvision
print('Installed packages (may take a while).')

In [ ]:
# Example pipeline (high-level). Replace model load lines to match available checkpoints in Colab.
from PIL import Image
import requests
import torch
from segment_anything import sam_model_registry, SamPredictor
from torchvision import transforms

# Helper: download an example image
img_url = 'https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png'
img = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
display(img.resize((512,512)))

# NOTE: You must provide SAM model checkpoint path. Example (in Colab):
# sam_checkpoint = '/path/to/sam_vit_h_4b8939.pth'
# sam = sam_model_registry['vit_h'](checkpoint=sam_checkpoint)
# predictor = SamPredictor(sam)

print('This cell demonstrates the high-level flow.\n')
print('1) Convert text prompt to region proposals using grounding model (GroundingDINO / CLIPSeg).')
print('2) Convert proposals to SAM input (points/boxes) and run predictor.predict(...) to get masks.')
print('\nPlease replace placeholders with actual checkpoint paths in Colab and run the cells.')

## Limitations & Notes
- SAM 2 weights may not be publicly released; this notebook assumes you can obtain an appropriate SAM checkpoint (or use SAMv1 official checkpoints).
- GroundingDINO / CLIPSeg require downloading model weights; follow the referenced GitHub repos in Colab.
- For the video bonus: after getting masks for a frame, you can propagate masks using optical flow or a lightweight tracker (e.g., RAFT for flow + warp).


## References
- SAM (Segment Anything) repo: https://github.com/facebookresearch/segment-anything
- Grounding DINO / CLIPSeg repositories for converting text prompts to boxes/points.