# detr-geo Quickstart: Vehicle Detection from Satellite Imagery

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gpriceless/detr-geo/blob/main/notebooks/quickstart.ipynb)
[![GitHub](https://img.shields.io/badge/github-detr--geo-blue?logo=github)](https://github.com/gpriceless/detr-geo)

**Detect vehicles in satellite and aerial imagery, export georeferenced vector data.**

[detr-geo](https://github.com/gpriceless/detr-geo) wraps RF-DETR and adds everything a geospatial workflow needs: automatic tiling, CRS handling, multispectral band mapping, and export to GeoJSON/GeoPackage/Shapefile.

This notebook walks through the full workflow:

1. Install detr-geo and dependencies
2. Download a real satellite image (NAIP imagery from USDA)
3. Run the **COCO-pretrained model** -- see how it struggles with overhead imagery
4. Run the **xView fine-tuned model** -- see accurate vehicle detection from above
5. Visualize results and export georeferenced data

**Runtime**: ~5 minutes on a free Colab GPU (T4)

---

## 1. Install detr-geo

This installs the library with all dependencies: RF-DETR, PyTorch, rasterio, geopandas, matplotlib, and leafmap.

In [None]:
%%capture
# Install detr-geo with all optional dependencies
!pip install "detr-geo[all] @ git+https://github.com/gpriceless/detr-geo.git"
!pip install pystac-client  # For searching satellite imagery catalogs

In [None]:
import detr_geo
print(f"detr-geo {detr_geo.__version__} installed successfully")

## 2. Download Sample Satellite Imagery

We will fetch a crop of **NAIP** (National Agriculture Imagery Program) data -- public domain aerial imagery covering the entire US at **0.6m resolution**. Perfect for vehicle detection.

We use Microsoft's [Planetary Computer](https://planetarycomputer.microsoft.com/) STAC catalog to find a scene over a large parking lot, then read a small window with rasterio. No API key required.

In [None]:
import json
import warnings
from pathlib import Path

import numpy as np
import rasterio
from pystac_client import Client
from rasterio.windows import from_bounds
from shapely.geometry import box

warnings.filterwarnings("ignore", category=rasterio.errors.NotGeoreferencedWarning)

# Target: a Costco parking lot in Fresno, CA
# Large, well-organized lot with a mix of cars, trucks, and empty spaces.
TARGET_LON, TARGET_LAT = -119.7895, 36.8385

# Search for NAIP imagery at this location
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(
    collections=["naip"],
    intersects={"type": "Point", "coordinates": [TARGET_LON, TARGET_LAT]},
    datetime="2020-01-01/2023-12-31",
    max_items=5,
)

items = list(search.items())
print(f"Found {len(items)} NAIP scenes")

# Pick the most recent scene
item = sorted(items, key=lambda x: x.datetime, reverse=True)[0]
print(f"Using: {item.id} ({item.datetime.strftime('%Y-%m-%d')})")
print(f"GSD: {item.properties.get('gsd', 'unknown')}m")

In [None]:
# Read a small crop around the parking lot
# NAIP scenes are huge (10k+ pixels), so we window-read a ~1000x1000 pixel area

# Define a bounding box: ~600m x 600m centered on the parking lot
HALF_SIZE_DEG = 0.003  # ~300m at this latitude
bbox = box(
    TARGET_LON - HALF_SIZE_DEG,
    TARGET_LAT - HALF_SIZE_DEG,
    TARGET_LON + HALF_SIZE_DEG,
    TARGET_LAT + HALF_SIZE_DEG,
)

# Get the image URL from the STAC item
image_url = item.assets["image"].href
print(f"Reading from: {image_url[:80]}...")

# Read the windowed region and save as a local GeoTIFF
output_path = "sample_parking_lot.tif"

with rasterio.open(image_url) as src:
    # Convert geographic bounds to pixel window
    window = from_bounds(*bbox.bounds, transform=src.transform)
    
    # Read RGB bands (NAIP has 4 bands: R, G, B, NIR)
    data = src.read([1, 2, 3], window=window)
    transform = src.window_transform(window)
    
    # Write cropped GeoTIFF
    profile = src.profile.copy()
    profile.update(
        width=data.shape[2],
        height=data.shape[1],
        count=3,
        transform=transform,
        driver="GTiff",
        compress="deflate",
    )
    with rasterio.open(output_path, "w", **profile) as dst:
        dst.write(data)

print(f"Saved {output_path}: {data.shape[2]}x{data.shape[1]} pixels, {data.shape[0]} bands")
print(f"File size: {Path(output_path).stat().st_size / 1024:.0f} KB")

## 3. View the Satellite Image

Let's see what we're working with. This is a parking lot as seen from above at 0.6m resolution -- each pixel covers about 2 feet on the ground.

In [None]:
import matplotlib.pyplot as plt

with rasterio.open(output_path) as src:
    img = src.read([1, 2, 3])  # (3, H, W)

# Display as RGB
fig, ax = plt.subplots(figsize=(12, 10))
ax.imshow(np.transpose(img, (1, 2, 0)))  # (H, W, 3)
ax.set_title("NAIP Aerial Imagery -- Parking Lot (0.6m GSD)", fontsize=14)
ax.set_xlabel(f"{img.shape[2]} x {img.shape[1]} pixels")
ax.set_axis_off()
plt.tight_layout()
plt.show()

## 4. COCO-Pretrained Model (The Problem)

The default RF-DETR model was trained on **COCO** -- a dataset of ground-level photographs. It has never seen a car from above.

Watch what happens when we point it at satellite imagery: vehicles get labeled as *motorcycles*, *skateboards*, *boats*, and other nonsensical classes. The model sees vaguely rectangular blobs and guesses the closest COCO category it knows.

In [None]:
from detr_geo import DetrGeo

# Load the standard COCO-pretrained model
dg_coco = DetrGeo(model_size="base", confidence_threshold=0.3)
dg_coco.set_image(output_path, suppress_gsd_warning=True)

# Run detection
coco_detections = dg_coco.detect(threshold=0.3)
print(f"COCO model found {len(coco_detections)} objects")

if len(coco_detections) > 0:
    print("\nClass distribution (COCO labels):")
    print(coco_detections["class_name"].value_counts().to_string())

In [None]:
# Visualize the confused COCO detections
if len(coco_detections) > 0:
    fig, ax = dg_coco.show_detections(figsize=(14, 12))
    ax.set_title(
        "COCO-Pretrained Model: Confused by Overhead Perspective\n"
        "(motorcycles? skateboards? boats? These are cars.)",
        fontsize=13,
    )
    plt.show()
else:
    print("No detections -- the COCO model cannot recognize overhead vehicles.")

The COCO model sees top-down vehicle shapes and tries to match them to ground-level categories. It is confidently wrong. This is the fundamental problem that fine-tuning solves.

---

## 5. xView Fine-Tuned Model (The Solution)

The **xView fine-tuned model** was trained on the [xView dataset](http://xviewdataset.org/) -- satellite imagery at 0.3m GSD with labeled overhead vehicles. It detects 5 classes:

| Class | Examples |
|---|---|
| Car | Sedans, SUVs, hatchbacks |
| Pickup Truck | Pickup trucks, utility pickups |
| Truck | Semi trucks, cargo trucks, tankers |
| Bus | Transit buses, school buses |
| Other Vehicle | Construction equipment, specialty vehicles |

First, download the fine-tuned weights from HuggingFace:

In [None]:
# Download xView fine-tuned weights (~100 MB)
!pip install -q huggingface_hub
!huggingface-cli download gpriceless/detr-geo-xview checkpoint_best_ema.pth --local-dir checkpoints/

In [None]:
# xView vehicle class mapping (must match training configuration)
XVIEW_CLASSES = {
    0: "Car",
    1: "Pickup Truck",
    2: "Truck",
    3: "Bus",
    4: "Other Vehicle",
}

# Load the xView fine-tuned model
dg = DetrGeo(
    model_size="medium",
    pretrain_weights="checkpoints/checkpoint_best_ema.pth",
    custom_class_names=XVIEW_CLASSES,
    confidence_threshold=0.3,
)

dg.set_image(output_path, suppress_gsd_warning=True)

# Run detection
detections = dg.detect(threshold=0.3)
print(f"xView model found {len(detections)} vehicles")

## 6. Visualize xView Detections

The fine-tuned model correctly identifies vehicle types from above. Compare this to the COCO results above.

In [None]:
# Visualize the xView detections with bounding boxes
fig, ax = dg.show_detections(figsize=(14, 12))
ax.set_title(
    f"xView Fine-Tuned Model: {len(detections)} Vehicles Detected\n"
    "Correct classes: Car, Pickup Truck, Truck, Bus, Other Vehicle",
    fontsize=13,
)
plt.show()

## 7. Before vs. After Comparison

Scroll up to compare the COCO model (Section 4) with the xView model (Section 6). The difference is dramatic:

- **COCO model**: Labels overhead vehicles as motorcycles, skateboards, boats, and other nonsensical ground-level categories
- **xView model**: Correctly identifies Car, Pickup Truck, Truck, Bus, and Other Vehicle from the overhead perspective

The table below summarizes the difference:

In [None]:
# Quick comparison table
print("=" * 55)
print("  MODEL COMPARISON")
print("=" * 55)
print(f"  {'Metric':<25s} {'COCO':>12s} {'xView':>12s}")
print("-" * 55)
print(f"  {'Total detections':<25s} {len(coco_detections):>12d} {len(detections):>12d}")

if len(coco_detections) > 0:
    n_coco_classes = coco_detections["class_name"].nunique()
    coco_top = coco_detections["class_name"].value_counts().index[0]
else:
    n_coco_classes = 0
    coco_top = "N/A"

if len(detections) > 0:
    n_xview_classes = detections["class_name"].nunique()
    xview_top = detections["class_name"].value_counts().index[0]
else:
    n_xview_classes = 0
    xview_top = "N/A"

print(f"  {'Unique classes':<25s} {n_coco_classes:>12d} {n_xview_classes:>12d}")
print(f"  {'Top class':<25s} {coco_top:>12s} {xview_top:>12s}")
print(f"  {'Correct labels?':<25s} {'No':>12s} {'Yes':>12s}")
print("=" * 55)

if len(coco_detections) > 0:
    print(f"\nCOCO classes found: {', '.join(coco_detections['class_name'].unique())}")
if len(detections) > 0:
    print(f"xView classes found: {', '.join(detections['class_name'].unique())}")

## 8. Per-Class Vehicle Counts

A summary of what the xView model detected, broken down by vehicle type.

In [None]:
if len(detections) > 0:
    # Per-class summary
    counts = detections["class_name"].value_counts()
    
    print("=" * 40)
    print("  VEHICLE DETECTION SUMMARY")
    print("=" * 40)
    for cls, count in counts.items():
        cls_data = detections[detections["class_name"] == cls]
        avg_conf = cls_data["confidence"].mean()
        print(f"  {cls:<15s}  {count:>4d}  (avg conf: {avg_conf:.2f})")
    print("-" * 40)
    print(f"  {'TOTAL':<15s}  {len(detections):>4d}")
    print("=" * 40)
    
    # Confidence distribution
    scores = detections["confidence"]
    print(f"\nConfidence stats:")
    print(f"  Min:  {scores.min():.3f}")
    print(f"  Mean: {scores.mean():.3f}")
    print(f"  Max:  {scores.max():.3f}")
else:
    print("No vehicles detected. Try lowering the threshold.")

In [None]:
# Bar chart of vehicle counts by class
if len(detections) > 0:
    counts = detections["class_name"].value_counts()
    
    fig, ax = plt.subplots(figsize=(8, 4))
    colors = plt.cm.Set2(np.linspace(0, 1, len(counts)))
    bars = ax.barh(counts.index, counts.values, color=colors, edgecolor="#333333", linewidth=0.5)
    
    # Add count labels on bars
    for bar, val in zip(bars, counts.values):
        ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height() / 2,
                str(val), va="center", fontweight="bold", fontsize=11)
    
    ax.set_xlabel("Count", fontsize=12)
    ax.set_title("Detected Vehicles by Class", fontsize=13, fontweight="bold")
    ax.invert_yaxis()
    plt.tight_layout()
    plt.show()

## 9. Export Georeferenced Results

Detection results are a **GeoDataFrame** -- each bounding box is a polygon with real-world coordinates in the raster's CRS. Export to GeoPackage or GeoJSON for use in QGIS, ArcGIS, PostGIS, or any GIS tool.

In [None]:
# The detection results are a standard GeoDataFrame
print(f"CRS: {detections.crs}")
print(f"Columns: {list(detections.columns)}")
print()
detections.head(10)

In [None]:
# Export to GeoPackage (recommended -- preserves CRS, supports layers)
dg.to_gpkg("vehicle_detections.gpkg")
print(f"Exported: vehicle_detections.gpkg ({Path('vehicle_detections.gpkg').stat().st_size / 1024:.0f} KB)")

# Export to GeoJSON (auto-reprojects to WGS84 per the GeoJSON spec)
dg.to_geojson("vehicle_detections.geojson")
print(f"Exported: vehicle_detections.geojson ({Path('vehicle_detections.geojson').stat().st_size / 1024:.0f} KB)")

print("\nThese files can be opened directly in QGIS, ArcGIS, or loaded into PostGIS.")

## 10. Interactive Map

View detections on an interactive satellite basemap. Click on any detection to see its class and confidence score.

In [None]:
# Interactive map with satellite basemap
m = dg.show_map(basemap="SATELLITE")
m

---

## Summary

In this notebook we demonstrated the full detr-geo workflow:

| Step | Code | What it does |
|------|------|--------------|
| Load model | `DetrGeo(model_size="medium", pretrain_weights=..., custom_class_names=...)` | Initialize with xView fine-tuned weights |
| Load image | `dg.set_image("scene.tif")` | Read CRS and transform from GeoTIFF |
| Detect | `dg.detect(threshold=0.3)` | Run inference, return GeoDataFrame |
| Visualize | `dg.show_detections()` | Matplotlib bounding boxes on imagery |
| Export | `dg.to_gpkg("out.gpkg")` | Georeferenced vector data for GIS |
| Map | `dg.show_map()` | Interactive leafmap with satellite basemap |

For **large rasters** (orthomosaics, full satellite scenes), use `detect_tiled()` instead of `detect()`. It automatically tiles the image, runs detection on each tile, and merges results with cross-tile NMS:

```python
detections = dg.detect_tiled(overlap=0.2, nms_threshold=0.5, threshold=0.3)
```

### Next Steps

- **Fine-tune on your own data**: See the [Fine-Tuning Guide](https://github.com/gpriceless/detr-geo/blob/main/docs/fine-tuning-guide.md)
- **More examples**: See the [examples/](https://github.com/gpriceless/detr-geo/tree/main/examples) directory
- **API Reference**: See the [full docs](https://github.com/gpriceless/detr-geo/blob/main/docs/api-reference.md)

---

*detr-geo is MIT licensed. xView fine-tuned weights are CC BY-NC-SA 4.0 (following the xView dataset license).*