# GCP Support - Interactive Tutorial with Mock Data

This notebook demonstrates how to use the GCP Support library to find and export Ground Control Points (GCPs) for drone imagery processing.

## Overview

This tutorial covers:
1. **Parsing H3 cells from manifest files** - Extract H3 cell identifiers from input manifests
2. **Generating mock GCPs** - Create sample GCPs for testing
3. **Finding GCPs using GCPFinder** - Use the main finder class to search for GCPs
4. **Filtering GCPs** - Apply quality filters to GCPs
5. **Exporting GCPs** - Export in formats compatible with MetaShape and ArcGIS Pro

## Prerequisites

- Python 3.8+
- All required packages will be installed in the next cell


In [None]:
# Install required packages
# Note: If you get import errors after running this, you may need to restart the kernel
!pip install -q h3>=3.7.0 requests>=2.31.0 geopandas>=0.14.0 shapely>=2.0.0 pandas>=2.0.0 numpy>=1.24.0 scipy>=1.10.0

print("✓ Packages installed successfully!")
print("⚠️  If you get import errors, restart the kernel: Kernel -> Restart Kernel")


zsh:1: 3.7.0 not found
✓ Packages installed successfully!


In [None]:
# Import necessary libraries
import os
import sys
from datetime import datetime

# Add the parent directory to the path to import gcp_support
# This assumes the notebook is in the gcp_support directory or its parent
notebook_dir = os.getcwd()
script_dir = os.path.dirname(os.path.abspath('__file__' if '__file__' in globals() else '.'))

# Try multiple paths to find gcp_support
possible_paths = [
    notebook_dir,  # Current directory
    os.path.dirname(notebook_dir),  # Parent directory
    script_dir,  # Script directory
    os.path.dirname(script_dir),  # Parent of script directory
]

for path in possible_paths:
    if os.path.exists(os.path.join(path, 'gcp_support')):
        if path not in sys.path:
            sys.path.insert(0, path)
        break

# Verify scipy is installed (required for spatial distribution analysis)
try:
    import scipy
    print(f"✓ scipy version {scipy.__version__} is installed")
except ImportError:
    print("❌ scipy is not installed!")
    print("Please run the installation cell above (cell with !pip install)")
    print("Or run: !pip install scipy>=1.10.0")
    raise

try:
    from gcp_support import GCPFinder
    from gcp_support.mock_gcp import MockGCPGenerator
    from gcp_support.h3_utils import h3_cells_to_bbox
    from gcp_support.manifest_parser import parse_manifest, get_h3_cells_from_manifest
    from gcp_support.gcp_filter import calculate_spatial_distribution_score, GCPFilter
    print("✓ Imports successful!")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("\nIf you see this error, make sure:")
    print("1. The gcp_support library is installed: pip install -e .")
    print("2. All required packages are installed (run the installation cell above)")
    print("3. Or adjust the sys.path in this cell to point to the library location")
    raise


❌ Import error: No module named 'scipy'

If you see this error, make sure:
1. The gcp_support library is installed: pip install -e .
2. Or adjust the sys.path in this cell to point to the library location


ModuleNotFoundError: No module named 'scipy'

## Step 1: Setup Output Directory

Create a directory to store all generated GCP files.


In [3]:
# Create output directory
output_dir = './gcps_output'
os.makedirs(output_dir, exist_ok=True)

print(f"✓ Output directory created: {output_dir}")


✓ Output directory created: ./gcps_output


## Step 2: Parse Manifest File

The manifest file contains information about drone imagery, including H3 cell identifiers. We'll extract the H3 cells to determine the area of interest.


In [4]:
# Path to the manifest file
# For this notebook, we'll create a sample manifest inline
# In production, you would load this from a file

manifest_content = '''[{"prefix":"s3://spexi-data-domain-assets-production-ca-central-1/standardized-images/89121ab6d23ffff/36247/"},"89121ab6d23ffff_36247_0018.jpg","89121ab6d23ffff_36247_0019.jpg","89121ab6d23ffff_36247_0020.jpg","89121ab6d23ffff_36247_0021.jpg","89121ab6d23ffff_36247_0022.jpg","89121ab6d23ffff_36247_0023.jpg","89121ab6d23ffff_36247_0024.jpg","89121ab6d23ffff_36247_0026.jpg","89121ab6d23ffff_36247_0025.jpg","89121ab6d23ffff_36247_0027.jpg","89121ab6d23ffff_36247_0028.jpg","89121ab6d23ffff_36247_0029.jpg","89121ab6d23ffff_36247_0030.jpg","89121ab6d23ffff_36247_0031.jpg","89121ab6d23ffff_36247_0032.jpg","89121ab6d23ffff_36247_0033.jpg","89121ab6d23ffff_36247_0034.jpg"]'''

# Save manifest to a temporary file
manifest_path = os.path.join(output_dir, 'input-file.manifest')
with open(manifest_path, 'w') as f:
    f.write(manifest_content)

# Parse the manifest
h3_cells, prefix = parse_manifest(manifest_path)

print(f"✓ Parsed manifest: {len(h3_cells)} H3 cell(s) found")
print(f"  H3 cells: {h3_cells}")
if prefix:
    print(f"  S3 prefix: {prefix}")


✓ Parsed manifest: 1 H3 cell(s) found
  H3 cells: ['89121ab6d23ffff']
  S3 prefix: s3://spexi-data-domain-assets-production-ca-central-1/standardized-images/89121ab6d23ffff/36247/


## Step 3: Convert H3 Cells to Bounding Box

Convert the H3 cell identifiers to a latitude/longitude bounding box that defines our area of interest.


In [5]:
# Get bounding box from H3 cells
bbox = h3_cells_to_bbox(h3_cells)

print(f"✓ Bounding box calculated:")
print(f"  Min Latitude: {bbox[0]:.6f}")
print(f"  Min Longitude: {bbox[1]:.6f}")
print(f"  Max Latitude: {bbox[2]:.6f}")
print(f"  Max Longitude: {bbox[3]:.6f}")
print(f"\n  Full bbox: {bbox}")


✓ Bounding box calculated:
  Min Latitude: 55.759442
  Min Longitude: -120.236556
  Max Latitude: 55.762814
  Max Longitude: -120.230586

  Full bbox: (55.759442008791275, -120.23655581102841, 55.76281378955122, -120.23058648289093)


## Step 7: Example 4 - Test Spatial Distribution Filtering

Demonstrate how to detect and filter out GCP sets with poor spatial distribution (clustered points).


In [None]:
print("Example 4: Test spatial distribution filtering")
print("-" * 70)

# Import if not already imported (in case cells are run out of order)
try:
    calculate_spatial_distribution_score
except NameError:
    from gcp_support.gcp_filter import calculate_spatial_distribution_score

# Generate clustered GCPs (simulate poor distribution)
# Create a small bbox in the center to simulate clustering
center_lat = (bbox[0] + bbox[2]) / 2
center_lon = (bbox[1] + bbox[3]) / 2
small_bbox = (
    center_lat - (bbox[2] - bbox[0]) * 0.1,
    center_lon - (bbox[3] - bbox[1]) * 0.1,
    center_lat + (bbox[2] - bbox[0]) * 0.1,
    center_lon + (bbox[3] - bbox[1]) * 0.1
)
clustered_gcps = MockGCPGenerator.generate_gcps_in_bbox(small_bbox, count=15)

# Check spatial distribution
spatial_metrics_clustered = calculate_spatial_distribution_score(clustered_gcps, bbox)
print(f"Clustered GCPs distribution:")
print(f"  Spread score: {spatial_metrics_clustered.get('spread_score', 0):.3f}")
print(f"  Confidence score: {spatial_metrics_clustered.get('confidence_score', 0):.3f}")
print(f"  Convex hull ratio: {spatial_metrics_clustered.get('convex_hull_ratio', 0):.3f}")

if spatial_metrics_clustered.get('confidence_score', 0) < 0.5:
    print("  ⚠️  Warning: Poor spatial distribution detected!")
    print("  This could affect bundle adjustment quality in MetaShape")

# Test with spatial distribution filter (would reject if below threshold)
finder_with_filter = GCPFinder(min_confidence_score=0.5)
print(f"\nTesting with min_confidence_score=0.5 filter...")
# Note: This will use mock data, so it may not actually filter
# In real usage, this would reject GCP sets with confidence < 0.5


## Step 4: Example 1 - Generate Mock GCPs from Bounding Box

Generate sample GCPs within the bounding box. These mock GCPs simulate what you would get from real GCP sources.


In [6]:
print("Example 1: Generate mock GCPs from bounding box (from manifest H3 cells)")
print("-" * 70)

# Generate mock GCPs
mock_gcps = MockGCPGenerator.generate_gcps_in_bbox(bbox, count=20)
print(f"✓ Generated {len(mock_gcps)} mock GCPs")

# Display first few GCPs
print("\nSample GCPs:")
for i, gcp in enumerate(mock_gcps[:3]):
    print(f"  GCP {i+1}: {gcp['id']} at ({gcp['lat']:.6f}, {gcp['lon']:.6f}), accuracy: {gcp['accuracy']:.2f}m")


Example 1: Generate mock GCPs from bounding box (from manifest H3 cells)
----------------------------------------------------------------------
✓ Generated 20 mock GCPs

Sample GCPs:
  GCP 1: USGS_GCP_0001 at (55.760561, -120.230893), accuracy: 1.60m
  GCP 2: USGS_GCP_0002 at (55.762807, -120.231950), accuracy: 0.99m
  GCP 3: USGS_GCP_0003 at (55.761157, -120.232609), accuracy: 0.39m


## Step 5: Export GCPs for MetaShape and ArcGIS Pro

Export the GCPs in formats compatible with both MetaShape and ArcGIS Pro.


In [7]:
# Initialize GCPFinder
finder = GCPFinder()

# Export all formats
finder.export_all(mock_gcps, output_dir, 'mock_bbox')

print(f"✓ Exported all formats to {output_dir}/mock_bbox_*")
print("\nGenerated files:")
files = [f for f in os.listdir(output_dir) if f.startswith('mock_bbox')]
for file in sorted(files):
    filepath = os.path.join(output_dir, file)
    size = os.path.getsize(filepath)
    print(f"  - {file} ({size} bytes)")


✓ Exported all formats to ./gcps_output/mock_bbox_*

Generated files:
  - mock_bbox_arcgis.cpg (5 bytes)
  - mock_bbox_arcgis.csv (2803 bytes)
  - mock_bbox_arcgis.dbf (4342 bytes)
  - mock_bbox_arcgis.geojson (5750 bytes)
  - mock_bbox_arcgis.prj (145 bytes)
  - mock_bbox_arcgis.shp (660 bytes)
  - mock_bbox_arcgis.shx (260 bytes)
  - mock_bbox_metashape.txt (1880 bytes)
  - mock_bbox_metashape.xml (5292 bytes)


  gdf.to_file(output_path, driver='ESRI Shapefile')
  ogr_write(


## Step 6: Example 2 - Find GCPs Using GCPFinder

Use the GCPFinder class to find GCPs using the H3 cells. This demonstrates the full workflow including USGS/NOAA search and filtering.


In [None]:
print("Example 2: Find GCPs using GCPFinder with H3 cells from manifest")
print("-" * 70)
print(f"Using H3 cells from manifest: {h3_cells}")

# Find GCPs using the H3 cells
# Note: This uses mock data since we don't have API credentials configured
gcps_from_finder = finder.find_gcps(h3_cells=h3_cells, max_results=20)
print(f"✓ Found {len(gcps_from_finder)} GCPs using GCPFinder")

# Display spatial distribution metrics if available
if finder.last_spatial_metrics and len(gcps_from_finder) >= 2:
    metrics = finder.last_spatial_metrics
    print(f"\n  Spatial distribution metrics:")
    print(f"    Spread score: {metrics.get('spread_score', 0):.3f} (0-1, higher is better)")
    print(f"    Confidence score: {metrics.get('confidence_score', 0):.3f} (0-1, higher is better)")
    print(f"    Convex hull ratio: {metrics.get('convex_hull_ratio', 0):.3f}")
    print(f"    Grid coverage: {metrics.get('grid_coverage', 0):.3f}")

# Export
finder.export_all(gcps_from_finder, output_dir, 'mock_h3')
print(f"✓ Exported all formats to {output_dir}/mock_h3_*")


Example 2: Find GCPs using GCPFinder with H3 cells from manifest
----------------------------------------------------------------------
Using H3 cells from manifest: ['89121ab6d23ffff']
Searching USGS for GCPs...
  Searching 6 WRS-2 Path/Row combinations...
Note: USGS GCP API integration requires specific endpoint configuration.
Please configure the actual USGS API endpoint in usgs_gcp.py
For testing, you can use MockGCPGenerator from mock_gcp.py
  Found 20 GCPs from USGS
  USGS results (20) meet threshold (10), skipping NOAA search
  Final filtered GCPs: 0
✓ Found 0 GCPs using GCPFinder
✓ Exported all formats to ./gcps_output/mock_h3_*


## Step 7: Example 3 - Test GCP Filtering

Demonstrate how to filter GCPs based on quality criteria such as accuracy.


In [None]:
from gcp_support.gcp_filter import GCPFilter, calculate_spatial_distribution_score

print("Example 3: Test GCP filtering and spatial distribution")
print("-" * 70)

# Generate GCPs with varying accuracy
all_gcps = MockGCPGenerator.generate_gcps_in_bbox(
    bbox, 
    count=30, 
    accuracy_range=(0.1, 3.0)  # Accuracy from 0.1m to 3.0m
)
print(f"✓ Generated {len(all_gcps)} GCPs with varying accuracy")

# Filter for high accuracy (<= 1.0m)
filter_obj = GCPFilter(min_accuracy=1.0)
filtered_gcps = filter_obj.filter_gcps(all_gcps, bbox=bbox)
print(f"✓ Filtered to {len(filtered_gcps)} GCPs with accuracy <= 1.0m")

# Display spatial distribution metrics
if len(filtered_gcps) >= 2:
    spatial_metrics = calculate_spatial_distribution_score(filtered_gcps, bbox)
    print(f"\n  Spatial distribution metrics:")
    print(f"    Spread score: {spatial_metrics.get('spread_score', 0):.3f} (0-1, higher is better)")
    print(f"    Confidence score: {spatial_metrics.get('confidence_score', 0):.3f} (0-1, higher is better)")
    print(f"    Convex hull ratio: {spatial_metrics.get('convex_hull_ratio', 0):.3f}")
    print(f"    Grid coverage: {spatial_metrics.get('grid_coverage', 0):.3f}")

# Export filtered results
finder.export_all(filtered_gcps, output_dir, 'mock_filtered')
print(f"✓ Exported filtered GCPs to {output_dir}/mock_filtered_*")


Example 3: Test GCP filtering
----------------------------------------------------------------------
✓ Generated 30 GCPs with varying accuracy
✓ Filtered to 8 GCPs with accuracy <= 1.0m
✓ Exported filtered GCPs to ./gcps_output/mock_filtered_*


  gdf.to_file(output_path, driver='ESRI Shapefile')
  ogr_write(


## Step 8: Example 4 - Test Spatial Distribution Filtering

Demonstrate how to detect and filter out GCP sets with poor spatial distribution (clustered points).


In [None]:
print("Example 4: Test spatial distribution filtering")
print("-" * 70)

# Generate clustered GCPs (simulate poor distribution)
# Create a small bbox in the center to simulate clustering
center_lat = (bbox[0] + bbox[2]) / 2
center_lon = (bbox[1] + bbox[3]) / 2
small_bbox = (
    center_lat - (bbox[2] - bbox[0]) * 0.1,
    center_lon - (bbox[3] - bbox[1]) * 0.1,
    center_lat + (bbox[2] - bbox[0]) * 0.1,
    center_lon + (bbox[3] - bbox[1]) * 0.1
)
clustered_gcps = MockGCPGenerator.generate_gcps_in_bbox(small_bbox, count=15)

# Check spatial distribution
spatial_metrics_clustered = calculate_spatial_distribution_score(clustered_gcps, bbox)
print(f"Clustered GCPs distribution:")
print(f"  Spread score: {spatial_metrics_clustered.get('spread_score', 0):.3f}")
print(f"  Confidence score: {spatial_metrics_clustered.get('confidence_score', 0):.3f}")
print(f"  Convex hull ratio: {spatial_metrics_clustered.get('convex_hull_ratio', 0):.3f}")

if spatial_metrics_clustered.get('confidence_score', 0) < 0.5:
    print("  ⚠️  Warning: Poor spatial distribution detected!")
    print("  This could affect bundle adjustment quality in MetaShape")

# Test with spatial distribution filter (would reject if below threshold)
finder_with_filter = GCPFinder(min_confidence_score=0.5)
print(f"\nTesting with min_confidence_score=0.5 filter...")
# Note: This will use mock data, so it may not actually filter
# In real usage, this would reject GCP sets with confidence < 0.5


## Step 9: Summary - View All Generated Files

Display all the files that were created during this tutorial.


In [None]:
print("Example 5: Generated files")
print("-" * 70)
files = [f for f in os.listdir(output_dir) if f.startswith('mock_')]
print(f"Total files generated: {len(files)}\n")

for file in sorted(files):
    filepath = os.path.join(output_dir, file)
    size = os.path.getsize(filepath)
    print(f"  {file} ({size} bytes)")

print("\n" + "=" * 70)
print("✓ All examples completed successfully!")
print(f"✓ Check the '{output_dir}' directory for exported files")
print("=" * 70)


Example 4: Generated files
----------------------------------------------------------------------
Total files generated: 27

  mock_bbox_arcgis.cpg (5 bytes)
  mock_bbox_arcgis.csv (2803 bytes)
  mock_bbox_arcgis.dbf (4342 bytes)
  mock_bbox_arcgis.geojson (5750 bytes)
  mock_bbox_arcgis.prj (145 bytes)
  mock_bbox_arcgis.shp (660 bytes)
  mock_bbox_arcgis.shx (260 bytes)
  mock_bbox_metashape.txt (1880 bytes)
  mock_bbox_metashape.xml (5292 bytes)
  mock_filtered_arcgis.cpg (5 bytes)
  mock_filtered_arcgis.csv (1124 bytes)
  mock_filtered_arcgis.dbf (1834 bytes)
  mock_filtered_arcgis.geojson (2387 bytes)
  mock_filtered_arcgis.prj (145 bytes)
  mock_filtered_arcgis.shp (324 bytes)
  mock_filtered_arcgis.shx (164 bytes)
  mock_filtered_metashape.txt (768 bytes)
  mock_filtered_metashape.xml (2204 bytes)
  mock_h3_arcgis.cpg (5 bytes)
  mock_h3_arcgis.csv (31 bytes)
  mock_h3_arcgis.dbf (34 bytes)
  mock_h3_arcgis.geojson (162 bytes)
  mock_h3_arcgis.prj (145 bytes)
  mock_h3_arcgis.sh

## Next Steps

- **For MetaShape**: Import the `.txt` or `.xml` files into Agisoft MetaShape
- **For ArcGIS Pro**: Import the `.csv`, `.geojson`, or `.shp` files into ArcGIS Pro
- **Configure API access**: See `USGS_API_NOTES.md` for instructions on configuring real USGS/NOAA API access
- **Customize filtering**: Adjust the `min_accuracy` and other filter parameters based on your needs
- **Monitor spatial distribution**: Check the confidence and spread scores to ensure good GCP coverage

## File Formats

- **MetaShape**: `.txt` (CSV format) and `.xml` (marker file format)
- **ArcGIS Pro**: `.csv` (simple CSV), `.geojson` (GeoJSON format), `.shp` (shapefile with supporting files)

## Spatial Distribution Metrics

The library automatically calculates spatial distribution metrics to help ensure GCPs are well-distributed:
- **Spread Score** (0-1): Overall measure of spatial distribution
- **Confidence Score** (0-1): Overall confidence in GCP set quality
- **Convex Hull Ratio**: How much of the area is covered
- **Grid Coverage**: Distribution across a 3x3 grid

Low scores (< 0.5) indicate clustered GCPs that may not provide good geometric control for bundle adjustment.
