# Westminster Ground Truth Analysis with MetaShape (Spexi Data)

This notebook processes drone imagery from Spexi data to create orthomosaics using **Agisoft MetaShape**:

1. **KMZ Processing**: Load KMZ file and extract H3 cells
2. **Image Download**: Download images from 6 manifest files (with caching to avoid re-downloading)
3. **GCP Loading**: Load GCPs from CSV (UTM) and convert to WGS84 for MetaShape
4. **Orthomosaic Creation**: Generate orthomosaics with and without GCPs using MetaShape
5. **Intermediate File Saving**: All MetaShape intermediate files are saved to avoid recomputation

## Data Sources:
- **KMZ File**: NewWest_AOI.kmz (contains H3 cell information)
- **Manifest Files**: 6 manifest files in Cells/ortho/ directory
- **GCPs**: 25-3288-CONTROL-NAD83-UTM10N-EGM2008.csv (UTM Zone 10N)

We create two orthomosaics:
- Orthomosaic **without** GCPs (using only image matching)
- Orthomosaic **with** GCPs (using image matching + ground control points)

**Note**: GeoTIFF orthomosaics are exported with LZW lossless compression to reduce file size without losing visual information.

## Setup: Install Dependencies

## Imports


In [1]:
import sys
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import warnings
import logging
import csv
import utm
warnings.filterwarnings('ignore')

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

# Add qualicum package to path (if available)
package_dir = Path.cwd()
sys.path.insert(0, str(package_dir))

# Try to import from qualicum_beach_gcp_analysis package
try:
    from qualicum_beach_gcp_analysis import (
        load_gcps_from_kmz,
        calculate_gcp_bbox,
        bbox_to_h3_cells,
        download_all_images_from_input_dir,
        export_to_metashape_csv,
        export_to_metashape_xml,
        process_orthomosaic,
        PhotoMatchQuality,
        DepthMapQuality,
    )
    USE_QUALICUM_PACKAGE = True
    print("‚úì Using qualicum_beach_gcp_analysis package")
except ImportError:
    USE_QUALICUM_PACKAGE = False
    print("‚ö†Ô∏è  qualicum_beach_gcp_analysis package not available")
    print("   Some functionality may be limited")

print("‚úì Imports successful!")


‚úì Using qualicum_beach_gcp_analysis package
‚úì Imports successful!


## Step 1: Load KMZ File and Extract H3 Cells


In [2]:
# Path to the KMZ file
kmz_path = Path("/Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25/NewWest_AOI.kmz")

if not kmz_path.exists():
    raise FileNotFoundError(f"KMZ file not found: {kmz_path}")

print(f"Loading KMZ file: {kmz_path}")

if USE_QUALICUM_PACKAGE:
    # Load GCPs from KMZ (these contain H3 cell IDs)
    gcps_from_kmz = load_gcps_from_kmz(str(kmz_path))
    print(f"\n‚úì Loaded {len(gcps_from_kmz)} placemarks from KMZ")
    
    # Extract H3 cell IDs from GCP IDs (they are the H3 cell identifiers)
    h3_cells = [gcp.get('id', '') for gcp in gcps_from_kmz if gcp.get('id')]
    print(f"\n‚úì Extracted {len(h3_cells)} H3 cells from KMZ")
    
    # Also calculate H3 cells from bounding box for verification
    bbox = calculate_gcp_bbox(gcps_from_kmz, padding=0.01)
    h3_cells_from_bbox = bbox_to_h3_cells(bbox, resolution=12)
    print(f"\n‚úì Calculated {len(h3_cells_from_bbox)} H3 cells from bounding box (resolution 12)")
    
    # Display first few H3 cells
    if h3_cells:
        print("\nFirst few H3 cells from KMZ:")
        for i, cell_id in enumerate(h3_cells[:10]):
            print(f"  {i+1}. {cell_id}")
else:
    print("‚ö†Ô∏è  Cannot load KMZ without qualicum_beach_gcp_analysis package")
    h3_cells = []


Loading KMZ file: /Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25/NewWest_AOI.kmz
Loading GCPs from: /Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25/NewWest_AOI.kmz
Found 1 KML file(s) in KMZ
Found 1 placemarks in KMZ file (namespace: http://www.opengis.net/kml/2.2)
Successfully parsed 1 GCPs from KMZ file

‚úì Loaded 1 placemarks from KMZ

‚úì Extracted 1 H3 cells from KMZ

‚úì Calculated 36 H3 cells from bounding box (resolution 12)

First few H3 cells from KMZ:
  1. GCP_0000


## Step 2: Download Images from Manifest Files


In [3]:
# Setup paths
data_dir = Path("/Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25")
manifest_dir = data_dir / "Cells" / "ortho"
photos_dir = Path("input/images")

if not manifest_dir.exists():
    raise FileNotFoundError(f"Manifest directory not found: {manifest_dir}")

print(f"Manifest directory: {manifest_dir}")
print(f"Photos will be downloaded to: {photos_dir.absolute()}")

if USE_QUALICUM_PACKAGE:
    # Download all images from manifest files
    # skip_existing=True ensures files are not re-downloaded if they already exist
    print("\nDownloading images from S3...")
    print("=" * 60)
    download_stats = download_all_images_from_input_dir(
        input_dir=manifest_dir,
        photos_dir=photos_dir,
        skip_existing=True  # Don't re-download if images already exist
    )
    print("=" * 60)
    print("‚úì Image download complete")
    
    # Count total images downloaded
    total_images = sum(s.get('total', 0) for s in download_stats.values())
    total_downloaded = sum(s.get('downloaded', 0) for s in download_stats.values())
    total_skipped = sum(s.get('skipped', 0) for s in download_stats.values())
    
    print(f"\nSummary:")
    print(f"  Total images: {total_images}")
    print(f"  Downloaded: {total_downloaded}")
    print(f"  Skipped (already exist): {total_skipped}")
else:
    print("‚ö†Ô∏è  Cannot download images without qualicum_beach_gcp_analysis package")
    print("   Please install the package or manually download images to:")
    print(f"   {photos_dir.absolute()}")


2025-12-03 13:55:08,258 - qualicum_beach_gcp_analysis.s3_downloader - INFO - Found 6 manifest files
2025-12-03 13:55:08,395 - qualicum_beach_gcp_analysis.s3_downloader - INFO - Processing manifest: input-file_8928de89117ffff.txt
2025-12-03 13:55:08,396 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Bucket: spexi-data-domain-assets-production-ca-central-1
2025-12-03 13:55:08,396 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   S3 prefix: standardized-images/8928de89117ffff/134449/
2025-12-03 13:55:08,397 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Total images: 155
2025-12-03 13:55:08,431 - botocore.tokens - INFO - Loading cached SSO token for spexi


Manifest directory: /Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25/Cells/ortho
Photos will be downloaded to: /Users/mauriciohessflores/Documents/Code/MyCode/research-westminster_ground_truth_analysis/input/images

Downloading images from S3...


2025-12-03 13:55:13,873 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 10/155 images...
2025-12-03 13:55:17,792 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 20/155 images...
2025-12-03 13:55:21,788 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 30/155 images...
2025-12-03 13:55:26,469 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 40/155 images...
2025-12-03 13:55:30,522 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 50/155 images...
2025-12-03 13:55:35,415 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 60/155 images...
2025-12-03 13:55:39,575 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 70/155 images...
2025-12-03 13:55:44,126 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 80/155 images...
2025-12-03 13:55:48,288 - qualicum_beach_gcp_analysis.s3_downloader - INFO -   Downloaded 90/155 images...
2025-12-03 13:55:52,431 - qualicum_be

‚úì Image download complete

Summary:
  Total images: 952
  Downloaded: 949
  Skipped (already exist): 3


## Step 3: Load and Convert GCPs to WGS84


In [None]:
# Ensure Path is imported
try:
    from pathlib import Path
except ImportError:
    pass  # Already imported

# Define data_dir if not already defined (from Step 2)
try:
    _ = data_dir
except NameError:
    data_dir = Path("/Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25")
    print(f"‚ÑπÔ∏è  data_dir not defined, using default: {data_dir}")

# Path to GCP CSV file (UTM coordinates)
gcp_file = data_dir / "25-3288-CONTROL-NAD83-UTM10N-EGM2008.csv"

if not gcp_file.exists():
    raise FileNotFoundError(f"GCP file not found: {gcp_file}")

print(f"Loading GCPs from: {gcp_file}")

# Parse GCP file (UTM coordinates)
gcps_utm = []
with open(gcp_file, 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        try:
            name = row.get('Name', row.get('name', ''))
            x = float(row.get('X', row.get('x', 0)))
            y = float(row.get('Y', row.get('y', 0)))
            z = float(row.get('Z', row.get('z', row.get('Elevation', row.get('elevation', 0)))))
            
            gcps_utm.append({
                'name': name,
                'x': x,  # Northing
                'y': y,  # Easting
                'z': z
            })
        except (ValueError, KeyError) as e:
            print(f"‚ö†Ô∏è  Skipping invalid row: {e}")
            continue

print(f"\n‚úì Loaded {len(gcps_utm)} GCPs from CSV (UTM Zone 10N)")

# Convert UTM to WGS84 lat/lon for MetaShape
# NOTE: UTM coordinates need to be in valid ranges:
# - Easting: 100,000 to 999,999 m
# - Northing: 0 to 10,000,000 m
# We need to determine which column is easting and which is northing
gcps_wgs84 = []

# First, check the values to determine correct ordering
print("\nChecking coordinate ranges...")
first_gcp = gcps_utm[0] if gcps_utm else None
if first_gcp:
    print(f"  First GCP values: X={first_gcp['x']:.2f}, Y={first_gcp['y']:.2f}")
    print(f"  X range: {min(g['x'] for g in gcps_utm):.2f} to {max(g['x'] for g in gcps_utm):.2f}")
    print(f"  Y range: {min(g['y'] for g in gcps_utm):.2f} to {max(g['y'] for g in gcps_utm):.2f}")

# Determine which is easting and which is northing
# Easting should be 100,000-999,999, Northing should be 0-10,000,000
x_min, x_max = min(g['x'] for g in gcps_utm), max(g['x'] for g in gcps_utm)
y_min, y_max = min(g['y'] for g in gcps_utm), max(g['y'] for g in gcps_utm)

# Check if X is in easting range (100k-999k) or northing range (0-10M)
x_is_easting = 100000 <= x_min <= 999999 and 100000 <= x_max <= 999999
x_is_northing = 0 <= x_min <= 10000000 and 0 <= x_max <= 10000000
y_is_easting = 100000 <= y_min <= 999999 and 100000 <= y_max <= 999999
y_is_northing = 0 <= y_min <= 10000000 and 0 <= y_max <= 10000000

print(f"\nCoordinate analysis:")
print(f"  X is easting: {x_is_easting}, X is northing: {x_is_northing}")
print(f"  Y is easting: {y_is_easting}, Y is northing: {y_is_northing}")

# Determine correct mapping
if x_is_easting and y_is_northing:
    # X is easting, Y is northing (standard UTM)
    easting_col = 'x'
    northing_col = 'y'
    print(f"\n‚úì Using X as easting, Y as northing (standard UTM)")
elif y_is_easting and x_is_northing:
    # Y is easting, X is northing (swapped)
    easting_col = 'y'
    northing_col = 'x'
    print(f"\n‚úì Using Y as easting, X as northing (swapped)")
else:
    # Try both and see which works
    print(f"\n‚ö†Ô∏è  Cannot determine automatically, trying both orderings...")
    # Try X as easting first
    try:
        test_easting = gcps_utm[0]['x']
        test_northing = gcps_utm[0]['y']
        if 100000 <= test_easting <= 999999 and 0 <= test_northing <= 10000000:
            easting_col = 'x'
            northing_col = 'y'
            print(f"  Trying X as easting, Y as northing...")
        else:
            raise ValueError("X/Y don't match expected ranges")
    except:
        # Try Y as easting
        try:
            test_easting = gcps_utm[0]['y']
            test_northing = gcps_utm[0]['x']
            if 100000 <= test_easting <= 999999 and 0 <= test_northing <= 10000000:
                easting_col = 'y'
                northing_col = 'x'
                print(f"  Trying Y as easting, X as northing...")
            else:
                raise ValueError("Neither ordering matches expected ranges")
        except:
            raise ValueError(f"Cannot determine easting/northing. X range: {x_min:.2f}-{x_max:.2f}, Y range: {y_min:.2f}-{y_max:.2f}")

# Convert UTM to WGS84 lat/lon
for gcp in gcps_utm:
    try:
        easting = gcp[easting_col]
        northing = gcp[northing_col]
        
        # Validate ranges before conversion
        if not (100000 <= easting <= 999999):
            raise ValueError(f"Easting {easting:.2f} out of range (100,000-999,999)")
        if not (0 <= northing <= 10000000):
            raise ValueError(f"Northing {northing:.2f} out of range (0-10,000,000)")
        
        # Convert UTM to lat/lon (UTM Zone 10N)
        lat, lon = utm.to_latlon(easting, northing, 10, 'N')
        
        gcp_dict = {
            'id': gcp['name'],
            'label': gcp['name'],
            'lat': lat,
            'lon': lon,
            'z': gcp['z'],
            'accuracy': 0.01  # Very high accuracy (1cm) for high weight in bundle adjustment
        }
        gcps_wgs84.append(gcp_dict)
    except Exception as e:
        print(f"‚ö†Ô∏è  Error converting GCP {gcp['name']}: {e}")
        print(f"   X={gcp['x']:.2f}, Y={gcp['y']:.2f}, Easting={gcp.get(easting_col, 'N/A'):.2f}, Northing={gcp.get(northing_col, 'N/A'):.2f}")
        raise

print(f"\n‚úì Converted {len(gcps_wgs84)} GCPs to WGS84 lat/lon")
print("\nFirst few GCPs (WGS84):")
for gcp in gcps_wgs84[:5]:
    print(f"  {gcp['id']}: ({gcp['lat']:.6f}, {gcp['lon']:.6f}, z={gcp['z']:.2f})")


Loading GCPs from: /Users/mauriciohessflores/Documents/Code/Data/New Westminster Oct _25/25-3288-CONTROL-NAD83-UTM10N-EGM2008.csv

‚úì Loaded 22 GCPs from CSV (UTM Zone 10N)


OutOfRangeError: easting out of range (must be between 100,000 m and 999,999 m)

## Step 4: Export GCPs for MetaShape


In [None]:
# Create output directory
output_dir = Path("outputs")
output_dir.mkdir(exist_ok=True)
gcp_output_dir = output_dir / "gcps"
gcp_output_dir.mkdir(exist_ok=True)

if USE_QUALICUM_PACKAGE:
    # Export GCPs to MetaShape XML format (preferred by MetaShape)
    gcp_xml_path = gcp_output_dir / "gcps_metashape.xml"
    export_to_metashape_xml(gcps_wgs84, str(gcp_xml_path))
    print(f"‚úì GCPs exported to XML: {gcp_xml_path}")
    
    # Also export CSV for reference
    gcp_csv_path = gcp_output_dir / "gcps_metashape.csv"
    export_to_metashape_csv(gcps_wgs84, str(gcp_csv_path))
    print(f"‚úì GCPs also exported to CSV: {gcp_csv_path}")
    
    # Use CSV file for processing (more reliable than XML)
    gcp_file_for_processing = gcp_csv_path
else:
    print("‚ö†Ô∏è  Cannot export GCPs without qualicum_beach_gcp_analysis package")
    print("   GCPs are already in WGS84 format and can be used directly")
    gcp_file_for_processing = None


## Step 5: Process Orthomosaic WITHOUT GCPs


In [None]:
# Check if METASHAPE_AVAILABLE is defined
try:
    _ = METASHAPE_AVAILABLE
except NameError:
    METASHAPE_AVAILABLE = False
    print("‚ö†Ô∏è  METASHAPE_AVAILABLE not defined. Assuming MetaShape is not available.")

if not METASHAPE_AVAILABLE:
    print("‚ö†Ô∏è  MetaShape not available. Skipping processing.")
else:
    # Setup paths for processing
    intermediate_dir = output_dir / "intermediate"
    ortho_output_dir = output_dir / "orthomosaics"
    
    # Process orthomosaic WITHOUT GCPs
    # Note: clean_intermediate_files=False will reuse existing processing steps
    # Set to True to start fresh and delete previous work
    print("=" * 60)
    print("Processing orthomosaic WITHOUT GCPs...")
    print("=" * 60)
    
    project_path_no_gcps = intermediate_dir / "orthomosaic_no_gcps.psx"
    
    if USE_QUALICUM_PACKAGE:
        stats_no_gcps = process_orthomosaic(
            photos_dir=photos_dir,
            output_path=ortho_output_dir,
            project_path=project_path_no_gcps,
            product_id="orthomosaic_no_gcps",
            clean_intermediate_files=False,  # Reuse existing processing if available
            photo_match_quality=PhotoMatchQuality.MediumQuality,
            depth_map_quality=DepthMapQuality.MediumQuality,
            tiepoint_limit=10000,
            use_gcps=False
        )
    else:
        print("‚ö†Ô∏è  Cannot process orthomosaic without qualicum_beach_gcp_analysis package")
        stats_no_gcps = None
    
    if stats_no_gcps:
        print("\n‚úì Orthomosaic processing (without GCPs) complete!")
        print(f"  Number of photos: {stats_no_gcps['num_photos']}")
        print(f"\nüìÅ Output Files:")
        ortho_path_no_gcps = Path(stats_no_gcps['ortho_path'])
        if ortho_path_no_gcps.exists():
            file_size_mb = ortho_path_no_gcps.stat().st_size / (1024 * 1024)
            print(f"  ‚úì Orthomosaic GeoTIFF: {ortho_path_no_gcps.absolute()}")
            print(f"    Size: {file_size_mb:.2f} MB (LZW compressed, lossless)")
        if 'log_file_path' in stats_no_gcps:
            print(f"  üìù Log file: {stats_no_gcps['log_file_path']}")


## Step 6: Process Orthomosaic WITH GCPs


In [None]:
# Check if METASHAPE_AVAILABLE is defined
try:
    _ = METASHAPE_AVAILABLE
except NameError:
    METASHAPE_AVAILABLE = False
    print("‚ö†Ô∏è  METASHAPE_AVAILABLE not defined. Assuming MetaShape is not available.")

if not METASHAPE_AVAILABLE:
    print("‚ö†Ô∏è  MetaShape not available. Skipping processing.")
else:
    # Setup paths for processing
    intermediate_dir = output_dir / "intermediate"
    ortho_output_dir = output_dir / "orthomosaics"
    
    # Process orthomosaic WITH GCPs
    # Note: clean_intermediate_files=False will reuse existing processing steps
    # Set to True to start fresh and delete previous work
    print("=" * 60)
    print("Processing orthomosaic WITH GCPs...")
    print("=" * 60)
    
    project_path_with_gcps = intermediate_dir / "orthomosaic_with_gcps.psx"
    
    if USE_QUALICUM_PACKAGE and gcp_file_for_processing:
        stats_with_gcps = process_orthomosaic(
            photos_dir=photos_dir,
            output_path=ortho_output_dir,
            project_path=project_path_with_gcps,
            gcp_file=gcp_file_for_processing,
            product_id="orthomosaic_with_gcps",
            clean_intermediate_files=False,  # Reuse existing processing if available
            photo_match_quality=PhotoMatchQuality.MediumQuality,
            depth_map_quality=DepthMapQuality.MediumQuality,
            tiepoint_limit=10000,
            use_gcps=True,
            gcp_accuracy=0.01  # Very low accuracy (1cm) for very high weight in bundle adjustment
        )
    else:
        print("‚ö†Ô∏è  Cannot process orthomosaic with GCPs without qualicum_beach_gcp_analysis package")
        stats_with_gcps = None
    
    if stats_with_gcps:
        print("\n‚úì Orthomosaic processing (with GCPs) complete!")
        print(f"  Number of photos: {stats_with_gcps['num_photos']}")
        print(f"  Number of markers: {stats_with_gcps.get('num_markers', 0)}")
        print(f"\nüìÅ Output Files:")
        ortho_path_with_gcps = Path(stats_with_gcps['ortho_path'])
        if ortho_path_with_gcps.exists():
            file_size_mb = ortho_path_with_gcps.stat().st_size / (1024 * 1024)
            print(f"  ‚úì Orthomosaic GeoTIFF: {ortho_path_with_gcps.absolute()}")
            print(f"    Size: {file_size_mb:.2f} MB (LZW compressed, lossless)")
        if 'log_file_path' in stats_with_gcps:
            print(f"  üìù Log file: {stats_with_gcps['log_file_path']}")


## Summary

This notebook has:
1. ‚úì Loaded KMZ file and extracted H3 cells
2. ‚úì Downloaded images from 6 manifest files (with caching to avoid re-downloading)
3. ‚úì Loaded and converted GCPs from UTM to WGS84
4. ‚úì Created orthomosaics with and without GCPs using MetaShape
5. ‚úì Saved all intermediate files to avoid recomputation

**Note**: 
- All MetaShape processing steps check for existing intermediate results and skip recomputation if they already exist. This means you can safely re-run cells without losing progress.
- GeoTIFF orthomosaics are exported with **LZW lossless compression** to reduce file size by 30-50% without any visual quality loss.


In [None]:
# Install required packages
import subprocess
import sys
from pathlib import Path

# Try to install from requirements.txt first
requirements_file = Path("requirements.txt")
if requirements_file.exists():
    print("Installing packages from requirements.txt...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "-r", str(requirements_file)])
    print("‚úì Packages installed from requirements.txt")
else:
    # Fallback: install packages individually
    print("requirements.txt not found. Installing packages individually...")
    packages = [
        "numpy>=1.24.0",
        "rasterio>=1.3.0",
        "pillow>=10.0.0",
        "matplotlib>=3.7.0",
        "pandas>=2.0.0",
        "pyproj>=3.6.0",
        "requests>=2.31.0",
        "utm>=0.7.0",
        "h3>=3.7.0",
        "boto3>=1.28.0",
    ]
    for package in packages:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
    print("‚úì All packages installed")

# Check for MetaShape
try:
    import Metashape
    print("‚úì MetaShape Python API is available")
    METASHAPE_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è  MetaShape Python API not found. Please install Agisoft MetaShape and its Python API.")
    METASHAPE_AVAILABLE = False

print("\nSetup complete!")