# Open Buildings Extraction via GCS (Fast Method)

This notebook demonstrates the **GCS-based extraction method** - significantly faster than Earth Engine.

```
┌─────────────────────────────────────┐
│  Create AOI (Area of Interest)      │
│  - Load AFRICAPOLIS2020.geojson     │
│  - Filter for Accra agglomeration   │
│  - Output: accra_aoi.geojson        │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Configure GCS Extraction           │
│  - Set confidence threshold (0.75)  │
│  - Set area filters (10-1000 m²)    │
│  - Set parallel workers (4)         │
│  - Choose output format (GeoJSON)   │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Extract Buildings from GCS         │
│  - NO authentication required!      │
│  - Direct download from GCS         │
│  - Parallel S2 cell processing      │
│  - Filter by confidence & area      │
│  - Apply spatial intersection       │
│  - Export: accra_buildings.geojson  │
└─────────────────────────────────────┘
```

## Key Advantages over Earth Engine Method

**Speed:**
- Small area (10 km²): 30-60 seconds vs 2-5 minutes
- Medium city (100 km²): 2-5 minutes vs 10-30 minutes
- Large city (1000 km²): 10-20 minutes vs 1-2 hours

**Simplicity:**
- No authentication needed (public data)
- No API quotas or timeouts
- Windows-compatible

**Input Data:**
- `AFRICAPOLIS2020.geojson` → AOI creation
- No service account needed!

**Output:**
- `accra_aoi.geojson` (area boundary)
- `accra_buildings.geojson` (building polygons)

In [None]:
from pathlib import Path
import logging

# GeoWorkflow imports
from geoworkflow.schemas.config_models import AOIConfig
from geoworkflow.processors.aoi.processor import AOIProcessor
from geoworkflow.schemas.open_buildings_gcs_config import OpenBuildingsGCSConfig
from geoworkflow.processors.extraction.open_buildings_gcs import OpenBuildingsGCSProcessor

## Optional: Setup Logging

Create a status logging tracker to monitor progress. This is **OPTIONAL** - you can skip this cell and remove logging statements below if preferred.

In [None]:
# Setup logging (OPTIONAL)
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

## Step 1: Create Area of Interest (AOI)

Extract the Accra boundary from AFRICAPOLIS data. This is the same as the Earth Engine workflow.

In [None]:
# Define AOI output path
aoi_file = Path("../data/aoi/accra_aoi.geojson")

# Create AOI configuration for Accra
aoi_config = AOIConfig(
    input_file=Path("../data/01_extracted/AFRICAPOLIS2020.geojson"),
    country_name_column="agglosName",
    countries=["Accra"],
    buffer_km=0,
    dissolve_boundaries=False,
    output_file=aoi_file
)

# Create and run the processor
aoi_processor = AOIProcessor(aoi_config)
aoi_result = aoi_processor.process()

# Check results
if aoi_result.success:
    print(f"✅ {aoi_result.message}")
    print(f"Processing time: {aoi_result.elapsed_time:.2f}s")
    print(f"Output: {aoi_file}")
else:
    print(f"❌ Failed: {aoi_result.message}")

## Step 2: Extract Buildings via GCS

This is where the magic happens! No authentication needed, just configure and run.

**For testing:** Use the smaller sample AOI (faster, ~1-2 minutes)  
**For production:** Use the full Accra AOI (complete data, ~5-10 minutes)

In [None]:
# Choose your AOI and output
# Option 1: Small sample for testing (RECOMMENDED FOR FIRST RUN)
input_aoi = Path("../data/aoi/accra_sample_aoi.geojson")
output_file = Path("../data/02_clipped/accra_buildings_sample.geojson")

# Option 2: Full Accra area (uncomment to use)
# input_aoi = Path("../data/aoi/accra_aoi.geojson")
# output_file = Path("../data/02_clipped/accra_buildings.geojson")

# Configure extraction
gcs_config = OpenBuildingsGCSConfig(
    aoi_file=input_aoi,
    output_dir=output_file.parent,
    
    # Quality filters
    confidence_threshold=0.75,  # Min confidence (0.5-1.0)
    min_area_m2=10.0,           # Min building size
    max_area_m2=100000.0,       # Max building size
    
    # Output settings
    export_format="geojson",    # Options: geojson, shapefile, csv
    overwrite_existing=True,     # Overwrite if exists
    
    # Performance
    num_workers=4                # Parallel workers (adjust based on CPU)
)

# Update output file to match config
output_file = gcs_config.get_output_file_path()

print("📋 Configuration:")
print(f"  Input AOI: {input_aoi}")
print(f"  Output: {output_file}")
print(f"  Confidence: ≥{gcs_config.confidence_threshold}")
print(f"  Area range: {gcs_config.min_area_m2}-{gcs_config.max_area_m2} m²")
print(f"  Workers: {gcs_config.num_workers}")

## Step 3: Run the Extraction

This cell does the actual extraction. Progress will be shown in real-time.

**Expected time:**
- Sample area: ~1-2 minutes
- Full Accra: ~5-10 minutes

In [None]:
print("🚀 Starting building extraction...\n")

try:
    # Create processor
    processor = OpenBuildingsGCSProcessor(gcs_config)
    
    # Run extraction
    result = processor.process()
    
    # Display results
    if result.success:
        print(f"\n✅ {result.message}")
        print(f"\n📊 Summary:")
        print(f"  Buildings extracted: {result.processed_count:,}")
        print(f"  Processing time: {result.elapsed_time:.1f}s")
        print(f"  Output file: {result.output_paths[0]}")
        
        # File size
        if result.output_paths[0].exists():
            file_size_mb = result.output_paths[0].stat().st_size / (1024 * 1024)
            print(f"  File size: {file_size_mb:.2f} MB")
        
        # Show metrics
        if hasattr(processor, 'get_metric'):
            s2_cells = processor.get_metric('s2_cells_processed')
            if s2_cells:
                print(f"  S2 cells processed: {s2_cells}")
        
        print("\n🎉 Extraction completed successfully!")
        
    else:
        print(f"\n❌ Extraction failed: {result.message}")
        
except Exception as e:
    print(f"\n❌ Error: {e}")
    import traceback
    traceback.print_exc()

## Step 4: Verify Results

Load and inspect the extracted buildings.

In [None]:
import geopandas as gpd

if output_file.exists():
    # Load buildings
    buildings = gpd.read_file(output_file)
    
    print(f"📊 Building Statistics:")
    print(f"  Total buildings: {len(buildings):,}")
    print(f"  Average area: {buildings['area_in_meters'].mean():.1f} m²")
    print(f"  Median area: {buildings['area_in_meters'].median():.1f} m²")
    print(f"  Average confidence: {buildings['confidence'].mean():.3f}")
    print(f"\n  Area range: {buildings['area_in_meters'].min():.1f} - {buildings['area_in_meters'].max():.1f} m²")
    print(f"  Confidence range: {buildings['confidence'].min():.3f} - {buildings['confidence'].max():.3f}")
    
    # Show first few records
    print(f"\n🔍 Sample records:")
    print(buildings[['confidence', 'area_in_meters']].head())
    
else:
    print("❌ Output file not found")

## Comparison: GCS vs Earth Engine

### GCS Method (This Notebook)
**Pros:**
- ⚡ 3-5x faster
- 🔓 No authentication required
- 💻 Windows compatible
- 🚀 No API quotas/timeouts
- 🔄 Parallel processing

**Cons:**
- Single data version (v3)
- No temporal filtering

### Earth Engine Method (Original Notebook)
**Pros:**
- 🔗 Integrated with other EE datasets
- 📅 Temporal analysis possible
- 🛠️ More processing options

**Cons:**
- 🐌 Slower (3-5x)
- 🔐 Requires authentication
- ⏱️ API quotas and timeouts
- 📊 More complex setup

### Recommendation
Use **GCS method** (this notebook) for:
- Standard building extraction
- Large areas
- Quick results
- No EE account

Use **Earth Engine method** for:
- Temporal analysis
- Integration with other EE datasets
- Custom EE processing

## Next Steps

1. **Analyze buildings:** Use the extracted GeoJSON in QGIS or Python
2. **Enrich with stats:** Add population, emissions, or other data
3. **Visualize:** Create maps and charts
4. **Scale up:** Extract buildings for other cities

### Example: Load in QGIS
```
1. Open QGIS
2. Layer → Add Layer → Add Vector Layer
3. Browse to: data/02_clipped/accra_buildings.geojson
4. Analyze and visualize!
```

### Example: Batch Processing
```python
# Extract buildings for multiple cities
cities = ['Accra', 'Lagos', 'Nairobi', 'Kampala']

for city in cities:
    config = OpenBuildingsGCSConfig(
        aoi_file=Path(f"../data/aoi/{city.lower()}_aoi.geojson"),
        output_dir=Path(f"../data/buildings/{city.lower()}/"),
        confidence_threshold=0.75,
        num_workers=4
    )
    
    processor = OpenBuildingsGCSProcessor(config)
    result = processor.process()
    print(f"{city}: {result.processed_count:,} buildings")
```

## Troubleshooting

### No buildings extracted?
- Check AOI location (must be in covered area)
- Lower confidence threshold: `confidence_threshold=0.5`
- Remove area filters temporarily

### Slow extraction?
- Increase workers: `num_workers=8`
- Use CSV format (faster): `export_format="csv"`
- Check network speed

### Memory issues?
- Reduce workers: `num_workers=2`
- Use smaller AOI
- Process in batches

### Import errors?
```bash
pip install geoworkflow[extraction]
```