# ODIAC CO2 Emissions Extraction Demo

This notebook demonstrates the two batch processing modes for extracting ODIAC fossil fuel CO2 emissions data.

**Output Location**: All data saved to `../../data/emissions/odiac/`

## Two Processing Modes:
1. **AfricaPolis Mode**: City/agglomeration-level extraction
2. **Countries Mode**: Country-level extraction

This notebook shows how to use the Countries mode. The example shows **ALL configuration options** with defaults clearly marked.

## Setup

In [None]:
import sys
from pathlib import Path
import pandas as pd
import rasterio
import matplotlib.pyplot as plt
from rasterio.plot import show

# Add src to path
project_root = Path.cwd().parent
src_path = project_root / "src"
sys.path.insert(0, str(src_path))

from geoworkflow.schemas.odiac_config import ODIACConfig
from geoworkflow.processors.extraction.odiac import ODIACProcessor

# Define base output directory
BASE_OUTPUT_DIR = project_root.parent / "data" / "emissions" / "odiac"

print("✓ Imports successful")
print(f"Project root: {project_root}")
print(f"Base output directory: {BASE_OUTPUT_DIR}")
print(f"Output directory exists: {BASE_OUTPUT_DIR.exists()}")



✓ Imports successful
Project root: /Users/juancheeto/Library/CloudStorage/Box-Box/UrbanStructureStudies/AfricaProject/geoworkflow
Base output directory: /Users/juancheeto/Library/CloudStorage/Box-Box/UrbanStructureStudies/AfricaProject/data/ghg/odiac
Output directory exists: True


---

## Example: Countries Mode (Country-Level Download)

Extract emissions for Rwanda using national boundaries.

### Data Source
- **File**: `data/00_source/boundaries/africa_boundaries.gpkg` (36 MB)
- **Features**: ~54 African countries
- **Output**: `data/emissions/odiac/countries/rwanda/`

### Configuration
All available options shown below with **defaults marked as `# DEFAULT`**:

In [4]:
countries = ["TGO","GHA","KEN","TZA"]
years = [2019,2022]
for country in countries:
    for year in years:
        config_countries = ODIACConfig(
            # ==================== REQUIRED PARAMETERS ====================
            aoi_file="countries",                                     # Keyword for Countries mode
            output_dir=BASE_OUTPUT_DIR / str(year) /"countries" / country,      # Output directory; see comment by country = [country]
            year=year,                                                # Year (2000-2023 available)
            
            # ==================== BATCH FILTERING ====================
            country=[country],                                          # Required: user can specify more than 1 country. Demo uses a loop to save each country in a separate folder.
            # city parameter is IGNORED in countries mode
            
            # ==================== DATA SOURCE ====================
            stac_api_url="https://earth.gov/ghgcenter/api/stac",     # DEFAULT
            raster_api_url="https://earth.gov/ghgcenter/api/raster", # DEFAULT
            collection_name="odiac-ffco2-monthgrid-v2023",           # DEFAULT
            asset_name="co2-emissions",                              # DEFAULT
            api_timeout=60,                                           # DEFAULT (seconds)
            
            # ==================== SPATIAL PROCESSING ====================
            buffer_aoi_meters=0.0,                                    # DEFAULT (no buffer)
            
            # ==================== EXPORT OPTIONS ====================
            export_format="geotiff",                                 # DEFAULT (or "cog")
            output_crs="ESRI:102022",                                # DEFAULT (Africa Albers Equal Area)
            export_monthly=True,                                      # DEFAULT (12 monthly TIFFs)
            export_annual=True,                                       # DEFAULT (annual average TIFF)
            export_statistics=True,                                   # DEFAULT (statistics CSV)
            overwrite_existing=True,                                  # Changed from False: overwrite for demo
            compression="lzw",                                       # DEFAULT (or "deflate", "none")
            
            # ==================== PERFORMANCE ====================
            num_workers=4                                             # DEFAULT (parallel threads: 1-16)
        )

        print(f"Starting ODIAC extraction for {country}, year {year}")
        print("This could take 3-5 minutes...\n")

        processor_countries = ODIACProcessor(config_countries)
        result_countries = processor_countries.process()

        print("\n" + "="*70)
        print("RESULTS")
        print("="*70)
        print(f"Success: {result_countries.success}")
        print(f"Total: {result_countries.total_count}")
        print(f"Succeeded: {result_countries.succeeded_count}")
        print(f"Failed: {result_countries.failed_count}")

        if result_countries.succeeded:
            print(f"\n✓ Successfully processed: {', '.join(result_countries.succeeded)}")
        print(" ")
        

Starting ODIAC extraction for TGO, year 2019
This could take 3-5 minutes...


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: TGO
 
Starting ODIAC extraction for TGO, year 2022
This could take 3-5 minutes...


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: TGO
 
Starting ODIAC extraction for GHA, year 2019
This could take 3-5 minutes...


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: GHA
 
Starting ODIAC extraction for GHA, year 2022
This could take 3-5 minutes...


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: GHA
 
Starting ODIAC extraction for KEN, year 2019
This could take 3-5 minutes...



Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: KEN
 
Starting ODIAC extraction for KEN, year 2022
This could take 3-5 minutes...



Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 323 coordinates after max simplification. Using convex hull for


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: KEN
 
Starting ODIAC extraction for TZA, year 2019
This could take 3-5 minutes...



Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: TZA
 
Starting ODIAC extraction for TZA, year 2022
This could take 3-5 minutes...



Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for API compatibility.
Geometry still has 582 coordinates after max simplification. Using convex hull for


RESULTS
Success: True
Total: 1
Succeeded: 1
Failed: 0

✓ Successfully processed: TZA
 


### Run Extraction

### Inspect Results

In [None]:
# List output files
output_dir_countries = config_countries.output_dir
print(f"Output files in: {output_dir_countries}\n")

if output_dir_countries.exists():
    tif_files = sorted(output_dir_countries.glob("*.tif"))
    csv_files = sorted(output_dir_countries.glob("*.csv"))
    
    print(f"TIF files ({len(tif_files)}):")
    for f in tif_files[:5]:  # Show first 5
        size_mb = f.stat().st_size / (1024*1024)
        print(f"  {f.name} ({size_mb:.2f} MB)")
    if len(tif_files) > 5:
        print(f"  ... and {len(tif_files)-5} more")
    
    print(f"\nCSV files ({len(csv_files)}):")
    for f in csv_files:
        print(f"  {f.name}")
else:
    print("Output directory not found")

In [None]:
# Load and display statistics
stats_file = output_dir_countries / "RWA_RWA_odiac_2022_stats.csv"

if stats_file.exists():
    df_stats = pd.read_csv(stats_file)
    print("Rwanda CO2 Emissions Statistics (2022)")
    print("="*70)
    print(df_stats.to_string(index=False))
    
    # Plot monthly trend
    monthly_data = df_stats[df_stats['period'] != 'annual'].copy()
    monthly_data['month'] = monthly_data['period'].str[-2:].astype(int)
    
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(monthly_data['month'], monthly_data['mean'], marker='o', linewidth=2, color='forestgreen')
    plt.xlabel('Month')
    plt.ylabel('Mean CO2 Emissions (tonne C/km²/month)')
    plt.title('Rwanda: Monthly Mean Emissions')
    plt.grid(True, alpha=0.3)
    
    plt.subplot(1, 2, 2)
    plt.bar(monthly_data['month'], monthly_data['sum'], color='forestgreen')
    plt.xlabel('Month')
    plt.ylabel('Total CO2 Emissions (tonne C/month)')
    plt.title('Rwanda: Monthly Total Emissions')
    plt.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()
else:
    print("Statistics file not found")

In [None]:
# Visualize annual average raster
annual_tif = output_dir_countries / "RWA_RWA_odiac_2022_annual.tif"

if annual_tif.exists():
    with rasterio.open(annual_tif) as src:
        print(f"Raster Info:")
        print(f"  Shape: {src.shape}")
        print(f"  CRS: {src.crs}")
        print(f"  Bounds: {src.bounds}")
        print(f"  Resolution: {src.res}")
        
        # Plot
        fig, ax = plt.subplots(figsize=(10, 8))
        show(src, ax=ax, cmap='YlOrRd', title='Rwanda: Annual Average CO2 Emissions (2022)')
        plt.colorbar(ax.images[0], ax=ax, label='tonne C/km²/month')
        plt.tight_layout()
        plt.show()
else:
    print("Annual TIFF not found")

---

## Summary of Downloaded Data

In [None]:
import os

def get_dir_size(path):
    """Calculate total size of directory in MB."""
    total = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            filepath = os.path.join(dirpath, filename)
            if os.path.exists(filepath):
                total += os.path.getsize(filepath)
    return total / (1024 * 1024)

print("="*70)
print("DATA DOWNLOAD SUMMARY")
print("="*70)
print(f"\nBase directory: {BASE_OUTPUT_DIR}\n")

if BASE_OUTPUT_DIR.exists():
    all_tifs = list(BASE_OUTPUT_DIR.rglob("*.tif"))
    all_csvs = list(BASE_OUTPUT_DIR.rglob("*.csv"))
    total_size = get_dir_size(BASE_OUTPUT_DIR)
    
    print(f"Total files:")
    print(f"  TIF files: {len(all_tifs)}")
    print(f"  CSV files: {len(all_csvs)}")
    print(f"  Total size: {total_size:.2f} MB")
    
    print(f"\nDirectory structure:")
    for subdir in sorted(BASE_OUTPUT_DIR.rglob("*")):
        if subdir.is_dir():
            rel_path = subdir.relative_to(BASE_OUTPUT_DIR)
            num_files = len(list(subdir.glob("*.tif"))) + len(list(subdir.glob("*.csv")))
            if num_files > 0:
                print(f"  {rel_path}/  ({num_files} files)")
else:
    print("No data downloaded yet.")

---

## Configuration Reference

### All Parameters with Defaults

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| **Required** |
| `aoi_file` | str/Path | - | `"africapolis"`, `"countries"`, or Path to file |
| `output_dir` | Path | - | Output directory |
| `year` | int | - | Year of data (2000-2023) |
| **Batch Filtering** |
| `country` | list/str | `None` | ISO3 codes or `"all"` (required for batch) |
| `city` | list | `None` | City names (AfricaPolis only) |
| **Data Source** |
| `stac_api_url` | str | `"https://earth.gov/ghgcenter/api/stac"` | STAC API |
| `raster_api_url` | str | `"https://earth.gov/ghgcenter/api/raster"` | Raster API |
| `collection_name` | str | `"odiac-ffco2-monthgrid-v2023"` | Collection |
| `asset_name` | str | `"co2-emissions"` | Asset name |
| `api_timeout` | int | `60` | Timeout (seconds) |
| **Spatial** |
| `buffer_aoi_meters` | float | `0.0` | Buffer (meters) |
| **Export** |
| `export_format` | str | `"geotiff"` | `"geotiff"` or `"cog"` |
| `output_crs` | str | `"ESRI:102022"` | Output CRS |
| `export_monthly` | bool | `True` | Export monthly TIFFs |
| `export_annual` | bool | `True` | Export annual TIFF |
| `export_statistics` | bool | `True` | Export CSV |
| `overwrite_existing` | bool | `False` | Overwrite files |
| `compression` | str | `"lzw"` | `"lzw"`, `"deflate"`, `"none"` |
| **Performance** |
| `num_workers` | int | `4` | Parallel workers (1-16) |

### Mode Comparison

| Feature | AfricaPolis | Countries |
|---------|-------------|----------|
| **Keyword** | `"africapolis"` | `"countries"` |
| **Data File** | agglomerations.gpkg (771 MB) | africa_boundaries.gpkg (36 MB) |
| **Features** | ~7,000 cities | ~54 countries |
| **country param** | Required | Required |
| **city param** | Optional filter | Ignored |
| **Output naming** | `{ISO3}_{City}_...` | `{ISO3}_{ISO3}_...` |
| **Use case** | Urban emissions | National emissions |

---

## Additional Usage Examples

### Process Multiple Cities
```python
config = ODIACConfig(
    aoi_file="africapolis",
    country=["KEN", "TZA"],
    city=["Nairobi", "Mombasa", "Dar es Salaam"],
    year=2022,
    output_dir=BASE_OUTPUT_DIR / "africapolis" / "east_africa"
)
```

### Process Multiple Countries
```python
config = ODIACConfig(
    aoi_file="countries",
    country=["KEN", "TZA", "UGA", "RWA", "BDI"],
    year=2022,
    output_dir=BASE_OUTPUT_DIR / "countries" / "eac",
    export_monthly=False,  # Annual only for speed
    num_workers=5
)
```

### Process All Cities in a Country
```python
config = ODIACConfig(
    aoi_file="africapolis",
    country=["KEN"],
    # city=None means all cities
    year=2022,
    output_dir=BASE_OUTPUT_DIR / "africapolis" / "kenya_all"
)
```

### Process All African Countries
```python
config = ODIACConfig(
    aoi_file="countries",
    country="all",
    year=2022,
    output_dir=BASE_OUTPUT_DIR / "countries" / "africa",
    export_monthly=False,
    num_workers=8
)
```