# PS-03 Visual Search - Team AIGR-S47377

**Geospatial Object Detection & Retrieval for Satellite Imagery**

---

## Overview
- **Task**: Visual search and detection in multispectral satellite imagery
- **Classes**: 8 (Solar Panel, Brick Kiln, Ponds, Playground, Sheds, etc.)
- **Method**: CNN Embedder + FAISS + NMS
- **Team**: AIGR-S47377

## Setup Environment

In [None]:
# Install required packages
!pip install -q rasterio faiss-gpu opencv-python-headless scikit-image pyyaml omegaconf tqdm
!pip install -q torch torchvision --index-url https://download.pytorch.org/whl/cu118

In [None]:
# Import libraries
import os
import sys
import json
import numpy as np
import torch
import faiss
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Clone Repository

In [None]:
# Clone your GitHub repo (update with your repo URL)
!git clone https://github.com/YOUR_USERNAME/ps03-visual-search.git
%cd ps03-visual-search

## Upload Datasets

**Option 1**: Upload as Kaggle Dataset (recommended)
- Go to https://www.kaggle.com/datasets
- Click "New Dataset"
- Upload your `data/sample_set`, `data/training_set`, `data/testing_set`
- Add dataset to this notebook

**Option 2**: Upload directly to notebook (slower)

In [None]:
# If using Kaggle Dataset, symlink to correct location
# Replace 'your-username' and 'dataset-name' with actual values

# !ln -s /kaggle/input/ps03-sample-set data/sample_set
# !ln -s /kaggle/input/ps03-training-set data/training_set  
# !ln -s /kaggle/input/ps03-testing-set data/testing_set

# OR if uploaded directly:
# !mkdir -p data
# !cp -r /kaggle/input/your-data/* data/

In [None]:
# Verify data structure
!echo "Sample set:"
!ls -la data/sample_set/ | head -10
!echo "\nTraining set:"
!ls data/training_set/*.tif | wc -l
!echo "\nTesting set:"
!ls data/testing_set/*.tif | wc -l

## Extract Query Chips

In [None]:
# Extract chips from all classes
!python scripts/batch_extract_chips.py \
    --sample-dir data/sample_set \
    --out-dir chips \
    --max-chips 5 \
    --padding 20

In [None]:
# Verify chips extracted
!echo "Extracted chips:"
!find chips -name '*.tif' | wc -l
!echo "\nChips by class:"
!for class_dir in chips/*/; do echo "$(basename "$class_dir"): $(ls "$class_dir"/*.tif 2>/dev/null | wc -l)"; done

## (Optional) Train Embedder

**For higher accuracy, train the embedder on your training set.**

Skip this if you want baseline results first.

In [None]:
# Train embedder (takes 2-4 hours with GPU)
# !python scripts/train_embedder.py \
#     --data data/training_set \
#     --config configs/default.yaml \
#     --epochs 50 \
#     --batch-size 32 \
#     --device cuda \
#     --output models/checkpoints

## Build FAISS Index

In [None]:
# Build index from testing set
!python scripts/build_index.py \
    --targets data/testing_set \
    --out cache/indexes \
    --config configs/default.yaml \
    --device cuda \
    --tile-size 512 \
    --stride 256

# If you trained embedder, add:
# --checkpoint models/checkpoints/best.pth

In [None]:
# Verify index built
!ls -lh cache/indexes/

## Run Visual Search for All Classes

In [None]:
# Create output directory
!mkdir -p outputs

# Define classes
classes = [
    "Solar Panel",
    "Brick Kiln",
    "Pond-1 & Pond-2",
    "Pond-1,Pond-2 & Playground",
    "Pond-2,STP & Sheds",
    "MetroShed,STP & Sheds",
    "Playground",
    "Sheds"
]

print(f"Total classes: {len(classes)}")

In [None]:
# Search for each class
import glob

for class_name in classes:
    print(f"\n{'='*60}")
    print(f"Searching for: {class_name}")
    print(f"{'='*60}")
    
    # Find chips for this class
    chip_dir = f"chips/{class_name}"
    chips = glob.glob(f"{chip_dir}/*.tif")
    
    if not chips:
        print(f"WARNING: No chips found for {class_name}")
        continue
    
    # Use up to 5 chips
    chips = chips[:5]
    chip_args = ' '.join([f'\"{c}\"' for c in chips])
    
    # Output filename (safe for filesystem)
    safe_name = class_name.replace(" ", "_").replace(",", "").replace("&", "and")
    output_file = f"outputs/temp_{safe_name}.txt"
    
    # Run search
    !python scripts/run_search.py \
        --chips {chip_args} \
        --index cache/indexes \
        --name "{class_name}" \
        --out "{output_file}" \
        --team AIGR-S47377 \
        --config configs/default.yaml \
        --device cuda \
        --top-k 1000 \
        --nms-threshold 0.3
    
    print(f"✓ {class_name} complete")

## Combine All Results

In [None]:
# Combine all temp files into one submission
import datetime

# Get current date
date_str = datetime.datetime.now().strftime("%d-%b-%Y")
submission_file = f"outputs/GC_PS03_{date_str}_AIGR-S47377.txt"

# Find all temp files
temp_files = glob.glob("outputs/temp_*.txt")

print(f"Combining {len(temp_files)} result files...")

# Combine
with open(submission_file, 'w') as outfile:
    for temp_file in sorted(temp_files):
        with open(temp_file, 'r') as infile:
            outfile.write(infile.read())

print(f"\n✓ Combined submission file created: {submission_file}")

## Summary Statistics

In [None]:
# Count detections
with open(submission_file, 'r') as f:
    lines = f.readlines()

print(f"\n{'='*60}")
print(f"SUBMISSION SUMMARY - Team AIGR-S47377")
print(f"{'='*60}")
print(f"\nTotal Detections: {len(lines)}")
print(f"\nDetections by Class:")

for class_name in classes:
    count = sum(1 for line in lines if class_name in line)
    print(f"  {class_name:35s}: {count:4d}")

print(f"\nOutput File: {submission_file}")
print(f"Format: PS-03 Standard (space-delimited)")
print(f"{'='*60}")

## Preview Results

In [None]:
# Show first 20 lines
print("\nFirst 20 detections:")
print("-" * 100)
with open(submission_file, 'r') as f:
    for i, line in enumerate(f):
        if i >= 20:
            break
        print(line.strip())

## Download Submission File

In [None]:
# Copy to output for download
from shutil import copy
copy(submission_file, '/kaggle/working/')

print(f"\n✓ Submission file ready for download!")
print(f"\nFile: {submission_file}")
print(f"\nDownload from: Files → {os.path.basename(submission_file)}")

## Clean Up (Optional)

In [None]:
# Remove temp files
!rm -f outputs/temp_*.txt
print("✓ Temp files cleaned")

---

## Next Steps

1. **Download** the submission file
2. **Verify** format matches PS-03 requirements
3. **Submit** to hackathon portal

## For Higher Accuracy

1. **Train embedder** (uncomment training cell)
2. **Tune per-class thresholds**
3. **Use ensemble** (multiple models)
4. **Add augmentation** during search

---

**Team AIGR-S47377 | PS-03 Visual Search**