# Export Training Data in Multiple Formats (PASCAL VOC, COCO, YOLO)

[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/geoai/blob/main/docs/examples/export_training_data_formats.ipynb)

This notebook demonstrates how to export geospatial training data in three popular object detection formats:

- **PASCAL VOC**: XML-based format, widely used in computer vision
- **COCO**: JSON-based format, standard for object detection benchmarks
- **YOLO**: Text-based format with normalized coordinates, optimized for YOLO models

## Install packages

Ensure the required packages are installed.

In [None]:
# %pip install geoai-py

## Import libraries

In [1]:
import geoai
import json
from pathlib import Path

## Download sample data

We'll use the same building detection dataset from the segmentation example.

In [2]:
train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"

In [3]:
train_raster_path = geoai.download_file(train_raster_url)
train_vector_path = geoai.download_file(train_vector_url)

File already exists: naip_rgb_train.tif
File already exists: naip_train_buildings.geojson


## Visualize sample data

In [4]:
geoai.get_raster_info(train_raster_path)

{'driver': 'GTiff',
 'width': 2503,
 'height': 1126,
 'count': 3,
 'dtype': 'uint8',
 'crs': 'EPSG:26911',
 'transform': Affine(0.6000000000000046, 0.0, 454780.8,
        0.0, -0.6, 5278242.6),
 'bounds': BoundingBox(left=454780.8, bottom=5277567.0, right=456282.6, top=5278242.6),
 'resolution': (0.6000000000000046, 0.6),
 'nodata': None,
 'band_stats': [{'band': 1,
   'min': 12.0,
   'max': 251.0,
   'mean': 150.6730747259594,
   'std': 48.01908734374099},
  {'band': 2,
   'min': 49.0,
   'max': 251.0,
   'mean': 141.92468895229808,
   'std': 43.46595463573498},
  {'band': 3,
   'min': 53.0,
   'max': 251.0,
   'mean': 120.89909373405554,
   'std': 41.78086244480776}]}

In [5]:
geoai.view_vector_interactive(train_vector_path, tiles=train_raster_path)

## Format 1: PASCAL VOC (XML)

PASCAL VOC format stores annotations in XML files with bounding boxes and class labels. This is the default format and is widely used in traditional object detection frameworks.

**Output structure:**
```
pascal_voc_output/
├── images/          # GeoTIFF tiles
├── labels/          # Label masks (GeoTIFF)
└── annotations/     # XML annotation files
```

In [6]:
pascal_output = "buildings_pascal_voc"

stats = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=pascal_output,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
    max_tiles=10,  # Limit for demo purposes
    metadata_format="PASCAL_VOC",
)


Raster info for naip_rgb_train.tif:
  CRS: EPSG:26911
  Dimensions: 2503 x 1126
  Resolution: (0.6000000000000046, 0.6)
  Bands: 3
  Bounds: BoundingBox(left=454780.8, bottom=5277567.0, right=456282.6, top=5278242.6)
Loaded 735 features from naip_train_buildings.geojson
Vector CRS: EPSG:4326
Reprojecting features from EPSG:4326 to EPSG:26911
Found 1 unique classes: ['building']


Generated: 10, With features: 10: 100%|██████████| 10/10 [00:01<00:00,  8.49it/s]


------- Export Summary -------
Total tiles exported: 10
Tiles with features: 10 (100.0%)
Average feature pixels per tile: 36778.6
Output saved to: buildings_pascal_voc

------- Georeference Verification -------





### Examine PASCAL VOC output

In [7]:
# List annotation files
xml_files = list(Path(f"{pascal_output}/annotations").glob("*.xml"))
print(f"Found {len(xml_files)} XML annotation files")

# Display first annotation file
if xml_files:
    with open(xml_files[0], 'r') as f:
        print(f"\nSample annotation ({xml_files[0].name}):\n")
        print(f.read())

Found 10 XML annotation files

Sample annotation (tile_000000.xml):

<annotation><folder>images</folder><filename>tile_000000.tif</filename><size><width>512</width><height>512</height><depth>3</depth></size><georeference><crs>EPSG:26911</crs><transform>| 0.60, 0.00, 454780.80|| 0.00,-0.60, 5278242.60|| 0.00, 0.00, 1.00|</transform><bounds>454780.8, 5277935.399999999, 455088.0, 5278242.6</bounds></georeference><object><name>building</name><difficult>0</difficult><bndbox><xmin>47</xmin><ymin>417</ymin><xmax>66</xmax><ymax>456</ymax></bndbox></object><object><name>building</name><difficult>0</difficult><bndbox><xmin>123</xmin><ymin>418</ymin><xmax>142</xmax><ymax>457</ymax></bndbox></object><object><name>building</name><difficult>0</difficult><bndbox><xmin>182</xmin><ymin>435</ymin><xmax>199</xmax><ymax>453</ymax></bndbox></object><object><name>building</name><difficult>0</difficult><bndbox><xmin>47</xmin><ymin>356</ymin><xmax>66</xmax><ymax>395</ymax></bndbox></object><object><name>build

## Format 2: COCO (JSON)

COCO format uses a single JSON file containing all annotations, images, and categories. This is the standard format for modern object detection benchmarks.

**Output structure:**
```
coco_output/
├── images/              # GeoTIFF tiles
├── labels/              # Label masks (GeoTIFF)
└── annotations/
    └── instances.json   # COCO annotations
```

**COCO JSON structure:**
```json
{
  "images": [{"id": 0, "file_name": "tile_000000.tif", "width": 512, "height": 512}],
  "annotations": [{"id": 1, "image_id": 0, "category_id": 1, "bbox": [x, y, w, h]}],
  "categories": [{"id": 1, "name": "building", "supercategory": "object"}]
}
```

In [8]:
coco_output = "buildings_coco"

stats = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=coco_output,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
    max_tiles=10,
    metadata_format="COCO",
)


Raster info for naip_rgb_train.tif:
  CRS: EPSG:26911
  Dimensions: 2503 x 1126
  Resolution: (0.6000000000000046, 0.6)
  Bands: 3
  Bounds: BoundingBox(left=454780.8, bottom=5277567.0, right=456282.6, top=5278242.6)
Loaded 735 features from naip_train_buildings.geojson
Vector CRS: EPSG:4326
Reprojecting features from EPSG:4326 to EPSG:26911
Found 1 unique classes: ['building']


Generated: 10, With features: 10: 100%|██████████| 10/10 [00:01<00:00,  9.48it/s]

Saved COCO annotations: 10 images, 643 annotations, 1 categories

------- Export Summary -------
Total tiles exported: 10
Tiles with features: 10 (100.0%)
Average feature pixels per tile: 36778.6
Output saved to: buildings_coco

------- Georeference Verification -------





### Examine COCO output

In [9]:
# Load COCO annotations
coco_file = f"{coco_output}/annotations/instances.json"
with open(coco_file, 'r') as f:
    coco_data = json.load(f)

print(f"COCO Dataset Summary:")
print(f"  Images: {len(coco_data['images'])}")
print(f"  Annotations: {len(coco_data['annotations'])}")
print(f"  Categories: {len(coco_data['categories'])}")

# Display categories
print(f"\nCategories:")
for cat in coco_data['categories']:
    print(f"  {cat}")

# Display first image
if coco_data['images']:
    print(f"\nFirst image:")
    print(f"  {coco_data['images'][0]}")

# Display first annotation
if coco_data['annotations']:
    print(f"\nFirst annotation:")
    print(f"  {coco_data['annotations'][0]}")

COCO Dataset Summary:
  Images: 10
  Annotations: 643
  Categories: 1

Categories:
  {'id': 1, 'name': 'building', 'supercategory': 'object'}

First image:
  {'id': 0, 'file_name': 'tile_000000.tif', 'width': 512, 'height': 512, 'crs': 'EPSG:26911', 'transform': '| 0.60, 0.00, 454780.80|\n| 0.00,-0.60, 5278242.60|\n| 0.00, 0.00, 1.00|'}

First annotation:
  {'id': 1, 'image_id': 0, 'category_id': 1, 'bbox': [47, 417, 19, 39], 'area': 741, 'iscrowd': 0}


## Format 3: YOLO (Text)

YOLO format uses text files with normalized bounding box coordinates. Each image has a corresponding `.txt` file with one line per object.

**Output structure:**
```
yolo_output/
├── images/           # GeoTIFF tiles
├── labels/           # Label masks (GeoTIFF) + YOLO .txt files
└── classes.txt       # Class names (one per line)
```

**YOLO annotation format (normalized coordinates 0-1):**
```
<class_id> <x_center> <y_center> <width> <height>
0 0.5 0.5 0.3 0.2
```

In [10]:
yolo_output = "buildings_yolo"

stats = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=yolo_output,
    in_class_data=train_vector_path,
    tile_size=512,
    stride=256,
    buffer_radius=0,
    max_tiles=10,
    metadata_format="YOLO",
)


Raster info for naip_rgb_train.tif:
  CRS: EPSG:26911
  Dimensions: 2503 x 1126
  Resolution: (0.6000000000000046, 0.6)
  Bands: 3
  Bounds: BoundingBox(left=454780.8, bottom=5277567.0, right=456282.6, top=5278242.6)
Loaded 735 features from naip_train_buildings.geojson
Vector CRS: EPSG:4326
Reprojecting features from EPSG:4326 to EPSG:26911
Found 1 unique classes: ['building']


Generated: 10, With features: 10: 100%|██████████| 10/10 [00:01<00:00,  8.23it/s]

Saved YOLO classes file with 1 classes

------- Export Summary -------
Total tiles exported: 10
Tiles with features: 10 (100.0%)
Average feature pixels per tile: 36778.6
Output saved to: buildings_yolo

------- Georeference Verification -------





### Examine YOLO output

In [11]:
# Load classes
classes_file = f"{yolo_output}/classes.txt"
with open(classes_file, 'r') as f:
    classes = f.read().strip().split('\n')

print(f"Classes ({len(classes)}):")
for i, cls in enumerate(classes):
    print(f"  {i}: {cls}")

# List annotation files
txt_files = list(Path(f"{yolo_output}/labels").glob("*.txt"))
print(f"\nFound {len(txt_files)} YOLO annotation files")

# Display first annotation file
if txt_files:
    with open(txt_files[0], 'r') as f:
        lines = f.readlines()
    print(f"\nSample annotation ({txt_files[0].name}):")
    print(f"  Format: <class_id> <x_center> <y_center> <width> <height>")
    for line in lines[:5]:  # Show first 5 objects
        print(f"  {line.strip()}")
    if len(lines) > 5:
        print(f"  ... and {len(lines) - 5} more objects")

Classes (1):
  0: building

Found 10 YOLO annotation files

Sample annotation (tile_000000.txt):
  Format: <class_id> <x_center> <y_center> <width> <height>
  0 0.110675 0.854126 0.037752 0.076194
  0 0.259439 0.854983 0.037751 0.076193
  0 0.373169 0.868586 0.033545 0.036001
  0 0.111744 0.735265 0.037751 0.076230
  0 0.260361 0.735994 0.037751 0.076193
  ... and 23 more objects


## Format Comparison

### When to Use Each Format

| Format | Best For | Pros | Cons |
|--------|----------|------|------|
| **PASCAL VOC** | Traditional CV frameworks, quick inspection | Human-readable XML, one file per image | Verbose, not ideal for large datasets |
| **COCO** | Modern object detection, benchmarking, complex datasets | Efficient JSON, supports multiple annotations types | Single file can be large, requires parsing |
| **YOLO** | YOLO models (v3-v8), real-time detection | Compact, fast to parse, normalized coordinates | Less human-readable, limited metadata |

### Coordinate Systems

- **PASCAL VOC**: Absolute pixel coordinates `[xmin, ymin, xmax, ymax]`
- **COCO**: Absolute pixel coordinates `[x, y, width, height]` (top-left corner)
- **YOLO**: Normalized coordinates `[x_center, y_center, width, height]` (0-1 range)

### GeoAI Extensions

All formats preserve geospatial information:
- **PASCAL VOC**: CRS, transform, and bounds in `<georeference>` element
- **COCO**: CRS and transform as custom fields in image metadata
- **YOLO**: Georeferenced GeoTIFF tiles maintain spatial context

## Multi-Class Example

The formats also support multi-class datasets. Here's how class information is stored:

**PASCAL VOC:**
```xml
<object>
  <name>building</name>
  <bndbox>...</bndbox>
</object>
```

**COCO:**
```json
{
  "categories": [
    {"id": 1, "name": "building", "supercategory": "object"},
    {"id": 2, "name": "road", "supercategory": "object"}
  ]
}
```

**YOLO:**
```
classes.txt:
building
road

annotations:
0 0.5 0.5 0.3 0.2  # class_id 0 = building
1 0.7 0.3 0.2 0.1  # class_id 1 = road
```

## Summary

The `export_geotiff_tiles` function now supports three popular annotation formats:

- ✅ **PASCAL VOC** (XML) - Traditional, human-readable
- ✅ **COCO** (JSON) - Modern benchmark standard
- ✅ **YOLO** (TXT) - Lightweight, optimized for YOLO

All formats maintain geospatial context through georeferenced GeoTIFF tiles, making them ideal for training object detection models on remote sensing imagery.

Choose the format that best fits your model training framework:
- Use **COCO** for detectron2, MMDetection, or benchmark comparisons
- Use **YOLO** for YOLOv5, YOLOv8, or ultralytics
- Use **PASCAL VOC** for TensorFlow Object Detection API or legacy frameworks