# Satellite Imagery Extraction Demo

This notebook demonstrates how to download **Sentinel-2 RGB satellite imagery** from Google Earth Engine for user-specified polygons.

**Output Location**: All data saved to `../../data/satellite/`

## Workflow
```
┌─────────────────────────────────────┐
│  Define AOI (Area of Interest)      │
│  - GeoJSON, Shapefile, or GeoPackage│
│  - Or use batch mode (AfricaPolis)  │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Configure Extraction               │
│  - Set date range                   │
│  - Set cloud threshold              │
│  - Set output resolution (10m)      │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Download from Earth Engine         │
│  - Query Sentinel-2 collection      │
│  - Apply cloud masking (SCL band)   │
│  - Create median composite          │
│  - Export as GeoTIFF                │
└─────────────────────────────────────┘
```

## Data Source
- **Collection**: Sentinel-2 MSI Level-2A (Surface Reflectance)
- **Resolution**: 10 meters (RGB bands)
- **Coverage**: Global, 5-day revisit time
- **Archive**: 2015 - present

---
## Setup

In [None]:
import sys
from pathlib import Path
import logging

# Visualization imports
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
import numpy as np

# Add src to path
project_root = Path.cwd().parent
src_path = project_root / "src"
sys.path.insert(0, str(src_path))

# GeoWorkflow imports
from geoworkflow.schemas import SatelliteImageryConfig
from geoworkflow.processors.extraction import SatelliteImageryProcessor

# Setup logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

# Define base output directory (in parent/data, not inside geoworkflow)
BASE_OUTPUT_DIR = project_root.parent / "data" / "satellite"
BASE_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print("✓ Imports successful")
print(f"Project root: {project_root}")
print(f"Output directory: {BASE_OUTPUT_DIR}")

---
## Example 1: Single Polygon (Accra, Ghana)

Download satellite imagery for a custom polygon defined in a GeoJSON file.

### Step 1: Create AOI File

In [None]:
import json

# Create a polygon for central Accra (~5km x 5km area)
accra_geojson = {
    "type": "FeatureCollection",
    "features": [{
        "type": "Feature",
        "properties": {"name": "Accra_Central"},
        "geometry": {
            "type": "Polygon",
            "coordinates": [[
                [-0.22, 5.54],
                [-0.17, 5.54],
                [-0.17, 5.58],
                [-0.22, 5.58],
                [-0.22, 5.54]
            ]]
        }
    }]
}

# Save to file
aoi_dir = project_root.parent / "data" / "aoi"
aoi_dir.mkdir(parents=True, exist_ok=True)
aoi_file = aoi_dir / "accra_central.geojson"

with open(aoi_file, 'w') as f:
    json.dump(accra_geojson, f, indent=2)

print(f"✓ AOI file created: {aoi_file}")

### Step 2: Configure Extraction

All available options shown below with **defaults marked**:

In [None]:
config = SatelliteImageryConfig(
    # ==================== REQUIRED PARAMETERS ====================
    aoi_file=aoi_file,                              # Path to polygon file
    output_dir=BASE_OUTPUT_DIR,                     # Output directory
    start_date="2024-01-01",                        # Start of date range
    end_date="2024-06-30",                          # End of date range
    
    # ==================== EARTH ENGINE AUTH ====================
    project_id="africa-cities-jdf277",              # GCP Project ID
    # service_account_key=None,                     # DEFAULT (use user credentials)
    # service_account_email=None,                   # DEFAULT
    
    # ==================== IMAGERY SETTINGS ====================
    resolution_m=10,                                # DEFAULT (10m for Sentinel-2)
    max_cloud_probability=20.0,                     # DEFAULT (scene-level filter)
    apply_cloud_mask=True,                          # DEFAULT (per-pixel masking)
    
    # ==================== EXPORT OPTIONS ====================
    scale_to_uint8=True,                            # DEFAULT (0-255 RGB)
    clip_to_aoi=True,                               # DEFAULT
    output_crs="EPSG:4326",                         # DEFAULT (or "ESRI:102022")
    compression="lzw",                              # DEFAULT (or "deflate", "none")
    overwrite_existing=True,                        # Changed for demo
    buffer_aoi_m=0.0,                               # DEFAULT (no buffer)
    
    # ==================== PERFORMANCE ====================
    num_workers=4,                                  # DEFAULT (parallel threads)
    tile_size_m=10000                               # DEFAULT (for large AOIs)
)

print("Configuration:")
print(f"  AOI: {config.aoi_file}")
print(f"  Date range: {config.start_date} to {config.end_date}")
print(f"  Resolution: {config.resolution_m}m")
print(f"  Cloud threshold: {config.max_cloud_probability}%")
print(f"  Output: {config.output_dir}")

### Step 3: Run Extraction

In [None]:
print("Starting satellite imagery download...")
print("This may take 1-2 minutes...\n")

processor = SatelliteImageryProcessor(config)
result = processor.process()

print("\n" + "="*50)
print("RESULTS")
print("="*50)
print(f"Success: {result.success}")
print(f"Processed: {result.processed_count}")

if result.output_paths:
    output_file = result.output_paths[0]
    print(f"Output: {output_file}")
    if output_file.exists():
        size_kb = output_file.stat().st_size / 1024
        print(f"File size: {size_kb:.1f} KB")

### Step 4: Visualize Results

In [None]:
if result.success and result.output_paths:
    output_file = result.output_paths[0]
    
    with rasterio.open(output_file) as src:
        print(f"Raster Info:")
        print(f"  Size: {src.width} x {src.height} pixels")
        print(f"  Bands: {src.count}")
        print(f"  CRS: {src.crs}")
        print(f"  Bounds: {src.bounds}")
        
        # Read RGB bands
        r = src.read(1)
        g = src.read(2)
        b = src.read(3)
        
        # Stack into RGB array
        rgb = np.dstack((r, g, b))
        
        # Plot
        fig, ax = plt.subplots(figsize=(12, 10))
        ax.imshow(rgb)
        ax.set_title(f'Sentinel-2 RGB: Accra, Ghana\n{config.start_date} to {config.end_date}', fontsize=14)
        ax.axis('off')
        plt.tight_layout()
        plt.show()
else:
    print("No output file to visualize")

---
## Example 2: Batch Mode (AfricaPolis)

Download imagery for multiple cities automatically using AfricaPolis urban agglomeration boundaries.

**Note**: This requires the AfricaPolis dataset to be configured.

In [None]:
# Batch configuration for Ghana cities
batch_config = SatelliteImageryConfig(
    # ==================== BATCH MODE ====================
    aoi_file="africapolis",                         # Keyword for batch mode
    country=["GHA"],                                # Ghana (ISO3 code)
    # city=["Accra", "Kumasi"],                     # Optional: filter specific cities
    
    # ==================== OTHER SETTINGS ====================
    output_dir=BASE_OUTPUT_DIR / "ghana_cities",
    start_date="2024-01-01",
    end_date="2024-06-30",
    project_id="africa-cities-jdf277",
    
    # For batch mode, consider:
    max_cloud_probability=30.0,                     # Slightly higher tolerance
    num_workers=2                                   # Lower to avoid rate limits
)

print("Batch Configuration:")
print(f"  Mode: AfricaPolis")
print(f"  Country: {batch_config.country}")
print(f"  Output: {batch_config.output_dir}")

In [None]:
# Uncomment to run batch processing
# WARNING: This will download imagery for ALL cities in Ghana

# print("Starting batch download...")
# print("This may take 10-30 minutes depending on number of cities...\n")
#
# batch_processor = SatelliteImageryProcessor(batch_config)
# batch_result = batch_processor.process()
#
# print(f"\nBatch Results:")
# print(f"  Total: {batch_result.total_count}")
# print(f"  Succeeded: {batch_result.succeeded_count}")
# print(f"  Failed: {batch_result.failed_count}")
#
# if batch_result.failed:
#     print(f"\nFailed cities: {list(batch_result.failed.keys())}")

---
## Configuration Reference

### All Parameters with Defaults

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| **Required** |
| `aoi_file` | Path/str | - | Path to polygon file, or `"africapolis"`/`"countries"` |
| `output_dir` | Path | - | Output directory for GeoTIFF files |
| `start_date` | str | - | Start date (YYYY-MM-DD) |
| `end_date` | str | - | End date (YYYY-MM-DD) |
| **Batch Filtering** |
| `country` | list/str | `None` | ISO3 codes or `"all"` (required for batch) |
| `city` | list | `None` | City names filter (AfricaPolis only) |
| **Earth Engine** |
| `project_id` | str | `None` | GCP Project ID |
| `service_account_key` | Path | `None` | Service account JSON |
| `service_account_email` | str | `None` | Service account email |
| **Imagery** |
| `resolution_m` | int | `10` | Output resolution (meters) |
| `max_cloud_probability` | float | `20.0` | Max cloud % for scene filter |
| `apply_cloud_mask` | bool | `True` | Per-pixel cloud masking |
| **Export** |
| `scale_to_uint8` | bool | `True` | Scale to 0-255 RGB |
| `clip_to_aoi` | bool | `True` | Clip to AOI boundary |
| `output_crs` | str | `"EPSG:4326"` | Output CRS |
| `compression` | str | `"lzw"` | `"lzw"`, `"deflate"`, `"none"` |
| `overwrite_existing` | bool | `False` | Overwrite existing files |
| `buffer_aoi_m` | float | `0.0` | Buffer AOI (meters) |
| **Performance** |
| `num_workers` | int | `4` | Parallel workers (1-16) |
| `tile_size_m` | int | `10000` | Tile size for large AOIs |

---
## Tips for Cloudy Areas (Tropical Regions)

Sub-Saharan Africa, especially coastal and equatorial areas, can have persistent cloud cover.

### Recommended Settings for Cloudy Areas

```python
config = SatelliteImageryConfig(
    aoi_file=aoi_file,
    output_dir=output_dir,
    
    # Use LONGER date range (6-12 months)
    start_date="2024-01-01",
    end_date="2024-12-31",
    
    # Increase cloud tolerance
    max_cloud_probability=50.0,  # Higher tolerance
    
    # Keep cloud masking enabled
    apply_cloud_mask=True,
    
    project_id="your-project-id"
)
```

### Dry Season Windows (Better Imagery)

| Region | Dry Season | Best Months |
|--------|------------|-------------|
| West Africa (Sahel) | Nov - Apr | Dec - Feb |
| West Africa (Coast) | Nov - Mar | Dec - Jan |
| East Africa | Jun - Oct | Jul - Sep |
| Southern Africa | May - Oct | Jun - Aug |

---
## Troubleshooting

### "No cloud-free imagery found"
- **Cause**: Date range too short or cloud threshold too strict
- **Fix**: Extend date range to 6-12 months, increase `max_cloud_probability` to 40-50%

### "Please authorize access to Earth Engine"
- **Cause**: Not authenticated
- **Fix**: Run `earthengine authenticate` in terminal

### "Project not registered to use Earth Engine"
- **Cause**: GCP project not set up for Earth Engine
- **Fix**: Visit https://console.cloud.google.com/earth-engine and register your project

### "ee.Initialize: no project found"
- **Cause**: Missing `project_id` parameter
- **Fix**: Add `project_id="your-project-id"` to config

### Slow downloads
- **Cause**: Large AOI being tiled
- **Fix**: This is normal for large areas. Reduce `resolution_m` for faster downloads

### Import errors
```bash
pip install earthengine-api geopandas rasterio requests s2sphere gcsfs
```

---
## Summary

This notebook demonstrated:

1. **Single polygon mode**: Download imagery for a custom GeoJSON polygon
2. **Batch mode**: Download imagery for multiple cities using AfricaPolis
3. **Visualization**: Display the downloaded RGB imagery
4. **Configuration options**: All available parameters and defaults

### Output Files

All outputs are saved to `../../data/satellite/` with naming convention:
```
{ISO3}_{location}_S2_{start_date}_{end_date}.tif
```

Example: `AOI_accra_central_S2_20240101_20240630.tif`