# GridFIA - Forest Biomass & Species Diversity Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mihiarc/gridfia/blob/main/examples/gridfia_colab_tutorial.ipynb)

This notebook demonstrates the GridFIA Python API for analyzing USDA Forest Service FIA BIGMAP 2018 forest biomass data at 30m resolution.

**What you'll learn:**
- How to download forest biomass data for any US location
- Creating cloud-optimized Zarr stores from GeoTIFF files
- Calculating diversity metrics (richness, Shannon, Simpson)
- Creating publication-quality visualizations

## Two Modes Available

1. **Quick Demo Mode** (30 seconds): Load pre-processed sample data from cloud storage
2. **Full Analysis Mode** (10-15 minutes): Download all species data from FIA API

**Part of the FIA Python Ecosystem:**
- **GridFIA**: Spatial raster analysis (this package)
- **PyFIA**: Survey/plot data analysis
- **PyFVS**: Growth/yield simulation
- **AskFIA**: AI conversational interface

---
## 1. Installation & Setup

First, let's install GridFIA and its dependencies. This may take a minute on Colab.

In [None]:
# Install GridFIA from GitHub (latest with cloud features)
# Use --force-reinstall to ensure we get the latest version
!pip install --force-reinstall git+https://github.com/mihiarc/gridfia.git -q

# Verify installation - should show 0.4.0+
import gridfia
print(f"GridFIA version: {gridfia.__version__}")

In [None]:
# Import required libraries
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import shutil

from gridfia import GridFIA
from gridfia.config import GridFIASettings, CalculationConfig

# Set up plotting style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['figure.dpi'] = 100

# Initialize the GridFIA API
api = GridFIA()
print("GridFIA API initialized!")
print(f"Available calculations: {api.list_calculations()}")

---
## 2. Choose Your Mode

Select how you want to run this notebook:

- **QUICK_DEMO = True**: Load pre-processed Durham County data from cloud (~30 seconds)
- **QUICK_DEMO = False**: Download all species from FIA API (~10-15 minutes)

Both modes produce valid diversity metrics since they include all species.

In [None]:
# ============================================================
# CHOOSE YOUR MODE
# ============================================================
# - True: Load pre-hosted Durham County data from cloud (~30 seconds)
# - False: Download all species from FIA API (~10-15 minutes)

QUICK_DEMO = True  # Set to False for full download experience

# ============================================================

# Cloud-hosted sample data URL
SAMPLE_URL = "https://pub-da6f67cd8f9147418258ed71cc130443.r2.dev/samples/durham_nc.zarr"

if QUICK_DEMO:
    print("=" * 60)
    print("QUICK DEMO MODE")
    print("=" * 60)
    print("\nLoading pre-hosted sample data from cloud storage...")
    print("This takes ~30 seconds and includes all species for valid diversity metrics.\n")
    
    print(f"Source: {SAMPLE_URL}\n")
    
    # Load from cloud using the API
    print("Loading Durham County sample...")
    cloud_store = api.load_from_cloud(url=SAMPLE_URL)
    
    print(f"\nData loaded successfully!")
    print(f"  Shape: {cloud_store.shape}")
    print(f"  Species: {cloud_store.num_species}")
    
    # Download to local using the API's download_sample method
    print("\nSaving to local Zarr store for analysis...")
    zarr_path = api.download_sample("durham_nc", "quickstart_data/forest.zarr")
    print(f"  Saved to: {zarr_path}")
    
    # Set flag to skip download cells
    DATA_LOADED = True
    
else:
    print("=" * 60)
    print("FULL ANALYSIS MODE")
    print("=" * 60)
    print("\nWill download all species from FIA API.")
    print("This takes 10-15 minutes but gives you complete control.\n")
    DATA_LOADED = False

---
## 2. Quickstart - Your First Analysis

Let's start with a complete workflow: download ALL species data, create a Zarr store, and calculate diversity metrics.

We'll use a small area (Wake County, NC subset) to keep download times reasonable while still getting valid results.

### 2.1 Predefined Location Bounding Boxes

GridFIA works with bounding boxes in Web Mercator (EPSG:3857) or WGS84 (EPSG:4326) coordinates.

Here are some predefined locations you can use:

In [None]:
# Predefined location bounding boxes
LOCATIONS = {
    "wake_nc": {
        "name": "Wake County, NC (subset)",
        "bbox": (-8765000, 4280000, -8740000, 4305000),  # Web Mercator
        "crs": "3857",
        "description": "Central Wake County - good balance of size and download time"
    },
    "durham_nc": {
        "name": "Durham County, NC",
        "bbox": (-8796055, 4281816, -8760768, 4333602),  # Full county
        "crs": "3857",
        "description": "Full Durham County - compact size ideal for demos"
    },
    "harris_tx": {
        "name": "Harris County, TX",
        "bbox": (-10688000, 3450000, -10575000, 3537000),
        "crs": "3857",
        "description": "Houston metropolitan area"
    },
    "mt_hood": {
        "name": "Mt. Hood National Forest",
        "bbox": (-122.0, 45.2, -121.4, 45.6),  # WGS84
        "crs": "4326",
        "description": "Oregon forest area"
    }
}

def get_location(key):
    """Get bounding box and CRS for a predefined location."""
    loc = LOCATIONS[key]
    return loc["bbox"], loc["crs"]

# Display available locations
print("Available predefined locations:\n")
for key, info in LOCATIONS.items():
    print(f"  {key:12} - {info['name']}")
    print(f"                {info['description']}\n")

### 2.2 Download ALL Species Data

Download forest biomass data for ALL species. This is required for valid diversity calculations.

**This takes 10-15 minutes** but is necessary for ecologically meaningful results.

In [None]:
# Download ALL species data (required for valid diversity metrics)
# Skip if already loaded from cloud in Quick Demo mode

if DATA_LOADED:
    print("Data already loaded from cloud - skipping download.")
    print(f"Using: {zarr_path}")
else:
    bbox, crs = get_location("wake_nc")

    print("Downloading forest data...")
    print(f"  Location: Wake County, NC (subset)")
    print(f"  Bbox: {bbox}")
    print(f"  CRS: EPSG:{crs}")
    print("\n  Downloading ALL species (required for diversity metrics)")
    print("  This takes 10-15 minutes...\n")

    files = api.download_species(
        bbox=bbox,
        crs=crs,
        species_codes=None,  # Download ALL species
        output_dir="quickstart_data"
    )

    print(f"\nDownloaded {len(files)} species files")

### 2.3 Create Zarr Store

Convert the downloaded GeoTIFF files to a cloud-optimized Zarr array format.

In [None]:
# Create Zarr store from downloaded rasters
# Skip if already loaded from cloud in Quick Demo mode

if DATA_LOADED:
    print("Zarr store already created from cloud data - skipping.")
else:
    print("Creating Zarr store...")
    zarr_path = api.create_zarr(
        input_dir="quickstart_data",
        output_path="quickstart_data/forest.zarr"
    )

# Validate and display info
info = api.validate_zarr(zarr_path)

print(f"\nZarr Store Info:")
print(f"  Path: {info['path']}")
print(f"  Shape: {info['shape']} (species, height, width)")
print(f"  Species count: {info['num_species']}")
print(f"  CRS: {info['crs']}")

### 2.4 Calculate Diversity Metrics

Now we can calculate ecologically valid diversity metrics since we have all species.

In [None]:
# Calculate diversity metrics (valid because we have all species)
print("Calculating forest metrics...")

results = api.calculate_metrics(
    zarr_path=zarr_path,
    calculations=[
        "total_biomass",
        "species_richness",
        "shannon_diversity",
        "simpson_diversity",
        "evenness"
    ],
    output_dir="quickstart_results"
)

print(f"\nCompleted {len(results)} calculations:")
for result in results:
    print(f"  - {result.name}: {result.output_path}")

### 2.5 Visualize Results

In [None]:
# Visualization of diversity metrics
import zarr

# Open zarr store
root = zarr.open(str(zarr_path), mode='r')

# Handle both old and new zarr structures
if 'biomass' in root:
    z = root['biomass']
    species_names = list(root.attrs.get('species_names', []))
else:
    z = root
    species_names = list(root.attrs.get('species_names', []))

# Load sample data
h, w = min(500, z.shape[1]), min(500, z.shape[2])
data = z[:, :h, :w]

# Calculate metrics for visualization
total = data[0]
forest_mask = total > 0

# Species richness
richness = np.sum(data[1:] > 0, axis=0)

# Shannon diversity
shannon = np.zeros_like(total)
for i in range(1, len(data)):
    p = np.zeros_like(total)
    p[forest_mask] = data[i][forest_mask] / total[forest_mask]
    mask = p > 0
    shannon[mask] -= p[mask] * np.log(p[mask])

# Simpson diversity
simpson_d = np.zeros_like(total)
for i in range(1, len(data)):
    p = np.zeros_like(total)
    p[forest_mask] = data[i][forest_mask] / total[forest_mask]
    simpson_d[forest_mask] += p[forest_mask] ** 2
simpson = 1 - simpson_d

# Create 2x2 figure
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
fig.suptitle('Forest Diversity Analysis - Wake County, NC\n(All Species Included)', fontsize=16, fontweight='bold')

# Total biomass
ax = axes[0, 0]
vmax = np.percentile(total[total > 0], 98) if np.any(total > 0) else 1
im = ax.imshow(total, cmap='YlGn', vmin=0, vmax=vmax)
ax.set_title('Total Biomass', fontsize=12, fontweight='bold')
ax.axis('off')
plt.colorbar(im, ax=ax, label='Mg/ha', fraction=0.046)

# Species richness
ax = axes[0, 1]
vmax = max(richness.max(), 1)
im = ax.imshow(richness, cmap='Spectral_r', vmin=0, vmax=vmax)
ax.set_title('Species Richness', fontsize=12, fontweight='bold')
ax.axis('off')
plt.colorbar(im, ax=ax, label='Count', fraction=0.046)

# Shannon diversity
ax = axes[1, 0]
vmax = max(shannon.max(), 0.1)
im = ax.imshow(shannon, cmap='viridis', vmin=0, vmax=vmax)
ax.set_title("Shannon Diversity (H')", fontsize=12, fontweight='bold')
ax.axis('off')
plt.colorbar(im, ax=ax, label="H'", fraction=0.046)

# Simpson diversity
ax = axes[1, 1]
im = ax.imshow(simpson, cmap='plasma', vmin=0, vmax=1)
ax.set_title('Simpson Diversity (1-D)', fontsize=12, fontweight='bold')
ax.axis('off')
plt.colorbar(im, ax=ax, label='1-D', fraction=0.046)

plt.tight_layout()
plt.savefig('diversity_analysis.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

print("\nFigure saved to diversity_analysis.png")

---
## 3. List Available Species

The BIGMAP database contains biomass estimates for hundreds of tree species across the US.

In [None]:
# List all available species
species = api.list_species()

print(f"Total species available: {len(species)}")
print("\nFirst 10 species:")
print("-" * 70)

for s in species[:10]:
    print(f"  {s.species_code}: {s.common_name:25} ({s.scientific_name})")

In [None]:
# Search for specific species types
pine_species = [s for s in species if "pine" in s.common_name.lower()]
oak_species = [s for s in species if "oak" in s.common_name.lower()]
maple_species = [s for s in species if "maple" in s.common_name.lower()]

print(f"Pine species: {len(pine_species)}")
print(f"Oak species: {len(oak_species)}")
print(f"Maple species: {len(maple_species)}")

# Display pine species
print("\nPine species:")
for s in pine_species[:8]:
    print(f"  {s.species_code}: {s.common_name}")

---
## 4. Working with Different Locations

GridFIA supports any US location via:
- Predefined bounding boxes
- Custom bounding boxes in WGS84 or Web Mercator
- State and county configurations

In [None]:
# Example: Get location config programmatically
config = api.get_location_config(
    state="North Carolina",
    county="Wake"
)

print("Location Configuration:")
print(f"  Name: {config.location_name}")
print(f"  WGS84 Bbox: {config.wgs84_bbox}")
print(f"  Web Mercator Bbox: {config.web_mercator_bbox}")

In [None]:
# Custom bounding box example
# Find your bbox at: https://boundingbox.klokantech.com/

custom_areas = [
    {
        "name": "Yellowstone Region",
        "bbox": (-111.2, 44.0, -109.8, 45.2),
        "crs": "4326"  # WGS84
    },
    {
        "name": "Great Smoky Mountains",
        "bbox": (-84.0, 35.4, -83.0, 36.0),
        "crs": "4326"
    },
    {
        "name": "Olympic Peninsula",
        "bbox": (-125.0, 47.5, -123.0, 48.5),
        "crs": "4326"
    }
]

print("Custom area examples:")
for area in custom_areas:
    print(f"\n  {area['name']}:")
    print(f"    bbox = {area['bbox']}")
    print(f"    crs = \"{area['crs']}\"")
    print(f"    # api.download_species(bbox={area['bbox']}, crs=\"{area['crs']}\", species_codes=None)")

---
## 5. Forest Calculations Framework

GridFIA includes a flexible calculation framework with built-in diversity and biomass metrics.

In [None]:
# List available calculations
calculations_info = [
    ("species_richness", "Count of species with biomass > threshold", "count"),
    ("shannon_diversity", "Shannon diversity index (H')", "index"),
    ("simpson_diversity", "Simpson diversity index (1-D)", "index"),
    ("evenness", "Pielou's evenness (J)", "ratio"),
    ("total_biomass", "Sum of all species biomass", "Mg/ha"),
    ("dominant_species", "ID of species with highest biomass", "species_id"),
    ("species_proportion", "Proportion of specific species", "ratio"),
    ("species_percentage", "Percentage of specific species", "percent"),
]

print("Available Forest Calculations:")
print("=" * 70)
print(f"{'Name':25} {'Description':35} {'Units'}")
print("-" * 70)
for name, desc, units in calculations_info:
    print(f"{name:25} {desc:35} {units}")

print("\nNOTE: Diversity metrics (richness, Shannon, Simpson, evenness) require")
print("      ALL species to be downloaded for ecologically valid results!")

### 5.1 Custom Configuration

Customize calculations with specific parameters.

In [None]:
# Custom configuration example
settings = GridFIASettings(
    output_dir=Path("custom_output"),
    calculations=[
        CalculationConfig(
            name="species_richness",
            parameters={"biomass_threshold": 2.0},  # Higher threshold
            output_format="geotiff"
        ),
        CalculationConfig(
            name="total_biomass",
            output_format="geotiff"
        )
    ]
)

# Create API with custom settings
api_custom = GridFIA(config=settings)
results = api_custom.calculate_metrics(zarr_path=zarr_path)

print(f"Custom calculations completed: {len(results)} metrics")

---
## 6. Visualization

Create publication-quality maps and figures.

In [None]:
# Create various map types using the API
print("Creating visualizations...")

# Diversity maps
maps = api.create_maps(
    zarr_path=zarr_path,
    map_type="diversity",
    output_dir="maps_diversity",
    dpi=150
)
print(f"  Diversity maps: {len(maps)} created")

# Richness map
maps = api.create_maps(
    zarr_path=zarr_path,
    map_type="richness",
    output_dir="maps_richness",
    dpi=150
)
print(f"  Richness maps: {len(maps)} created")

---
## 7. Understanding Diversity Metrics

Here's what each diversity metric tells us:

In [None]:
diversity_info = """
DIVERSITY METRICS EXPLAINED
===========================

1. SPECIES RICHNESS
   - Definition: Count of species present per pixel
   - Range: 0 to total species count
   - Interpretation: Higher = more species in that location
   - Limitation: Doesn't account for abundance

2. SHANNON DIVERSITY (H')
   - Definition: Information entropy measure
   - Formula: H' = -Σ(pi × ln(pi))
   - Range: 0 to ln(S) where S = species count
   - Interpretation: 
     * H' = 0: Single species dominates
     * Higher H': More diverse community
   - Accounts for: Both richness and evenness

3. SIMPSON DIVERSITY (1-D)
   - Definition: Probability two random trees are different species
   - Formula: 1 - Σ(pi²)
   - Range: 0 to 1
   - Interpretation:
     * 0: Single species
     * 1: Maximum diversity (equal proportions)
   - More sensitive to: Dominant species

4. EVENNESS (Pielou's J)
   - Definition: How equally biomass is distributed
   - Formula: J = H' / ln(S)
   - Range: 0 to 1
   - Interpretation:
     * 0: One species dominates completely
     * 1: All species have equal biomass

IMPORTANT: All diversity metrics require ALL species data to be valid!
           Downloading only 2-3 species produces meaningless results.
"""

print(diversity_info)

---
## 8. Species Group Analysis

Analyze specific groups of species (e.g., hardwoods vs. softwoods, Southern Yellow Pines).

In [None]:
# Species group definitions (example codes)
SPECIES_GROUPS = {
    "southern_yellow_pine": {
        "name": "Southern Yellow Pine",
        "codes": ["0110", "0111", "0121", "0131"],
        "species": ["Shortleaf Pine", "Slash Pine", "Longleaf Pine", "Loblolly Pine"],
        "description": "Important commercial timber species in the Southeast"
    },
    "oaks": {
        "name": "Oak Species",
        "codes": ["0802", "0833", "0812", "0837"],
        "species": ["White Oak", "Northern Red Oak", "Chestnut Oak", "Scarlet Oak"],
        "description": "Hardwood species valuable for wildlife and timber"
    },
    "maples": {
        "name": "Maple Species",
        "codes": ["0068", "0316", "0318"],
        "species": ["Red Maple", "Sugar Maple", "Silver Maple"],
        "description": "Common hardwood species across eastern US"
    }
}

print("Species Group Definitions:")
print("=" * 60)
for key, group in SPECIES_GROUPS.items():
    print(f"\n{group['name']}:")
    print(f"  {group['description']}")
    print(f"  Species:")
    for code, name in zip(group['codes'], group['species']):
        print(f"    {code}: {name}")

---
## 9. Statistical Summary

Calculate comprehensive statistics from Zarr data.

In [None]:
def calculate_forest_statistics(zarr_path):
    """Calculate comprehensive forest statistics from Zarr data."""
    import zarr
    
    root = zarr.open(str(zarr_path), mode='r')
    
    if 'biomass' in root:
        z = root['biomass']
        species_names = list(root.attrs.get('species_names', []))
    else:
        z = root
        species_names = list(root.attrs.get('species_names', []))
    
    # Load data (sample if large)
    if z.shape[1] * z.shape[2] > 1e6:
        data = z[:, :1000, :1000]
        print("Note: Using 1000x1000 sample for statistics")
    else:
        data = z[:]
    
    total = data[0]
    forest_mask = total > 0
    
    stats = {
        "total_pixels": int(total.size),
        "forest_pixels": int(np.sum(forest_mask)),
        "forest_coverage_pct": float(100 * np.sum(forest_mask) / total.size),
        "forest_area_ha": float(np.sum(forest_mask) * 900 / 10000),  # 30m pixels
        "mean_biomass": float(np.mean(total[forest_mask])) if np.any(forest_mask) else 0,
        "max_biomass": float(np.max(total)),
        "total_biomass_mg": float(np.sum(total) * 900 / 10000),  # Convert to Mg
        "num_species": len(species_names) - 1,  # Exclude total layer
    }
    
    # Richness stats
    richness = np.sum(data[1:] > 0, axis=0)
    stats["mean_richness"] = float(np.mean(richness[forest_mask])) if np.any(forest_mask) else 0
    stats["max_richness"] = int(np.max(richness))
    
    return stats

# Calculate statistics
stats = calculate_forest_statistics(zarr_path)

print("Forest Statistics Summary")
print("=" * 50)
print(f"\nCoverage:")
print(f"  Total area: {stats['total_pixels'] * 900 / 1e6:.2f} km²")
print(f"  Forest area: {stats['forest_area_ha']:.1f} hectares")
print(f"  Forest coverage: {stats['forest_coverage_pct']:.1f}%")
print(f"\nBiomass:")
print(f"  Mean: {stats['mean_biomass']:.1f} Mg/ha")
print(f"  Maximum: {stats['max_biomass']:.1f} Mg/ha")
print(f"  Total: {stats['total_biomass_mg']/1e6:.3f} million Mg")
print(f"\nDiversity (valid - all {stats['num_species']} species included):")
print(f"  Mean richness: {stats['mean_richness']:.2f} species/pixel")
print(f"  Max richness: {stats['max_richness']} species/pixel")

---
## 10. Cleanup

Remove downloaded data and results.

In [None]:
# Cleanup downloaded files and results
CLEANUP = False  # Set to True to remove all downloaded data

if CLEANUP:
    directories_to_remove = [
        "quickstart_data",
        "quickstart_results",
        "custom_output",
        "maps_diversity",
        "maps_richness",
    ]
    
    for dir_name in directories_to_remove:
        dir_path = Path(dir_name)
        if dir_path.exists():
            shutil.rmtree(dir_path)
            print(f"Removed: {dir_name}")
    
    # Remove individual files
    for file_name in ["diversity_analysis.png"]:
        if Path(file_name).exists():
            Path(file_name).unlink()
            print(f"Removed: {file_name}")
    
    print("\nCleanup complete!")
else:
    print("Cleanup skipped. Set CLEANUP = True to remove downloaded data.")

---
## Summary & Next Steps

In this notebook, you learned how to:

1. **Install & configure** GridFIA
2. **Download ALL species** for valid diversity analysis
3. **Create** cloud-optimized Zarr stores
4. **Calculate** diversity metrics (richness, Shannon, Simpson, evenness)
5. **Visualize** forest data with publication-quality maps
6. **Analyze** species groups and forest statistics

### Key Takeaways

- **Always download ALL species** (`species_codes=None`) for diversity analysis
- **Custom locations**: Use bounding boxes from https://boundingbox.klokantech.com/
- **Smaller areas** = faster downloads but still valid results

### Resources

- **Documentation**: https://grid.fiatools.org
- **GitHub**: https://github.com/mihiarc/gridfia
- **PyPI**: https://pypi.org/project/gridfia/

### Related FIA Tools

- **PyFIA**: Survey/plot data analysis
- **PyFVS**: Growth/yield simulation
- **AskFIA**: AI conversational interface