# Getting Started with FIA Data in Python

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mihiarc/fiatools/blob/main/tutorials/01_getting_started_with_fia_data.ipynb)
[![FIAtools](https://img.shields.io/badge/FIAtools-Ecosystem-2E7D32)](https://fiatools.org)

This tutorial introduces you to working with USDA Forest Service Forest Inventory and Analysis (FIA) data using Python and the **pyFIA** library.

## What You'll Learn

- What FIA data is and why it's useful
- How to install pyFIA and set up your environment
- How to query forest area, timber volume, biomass, and tree counts
- How to filter data by state, species, and other criteria
- How to interpret the statistical output

## Prerequisites

- Basic Python knowledge
- Python 3.11 or higher

---

## 1. What is FIA Data?

The [Forest Inventory and Analysis (FIA)](https://www.fia.fs.usda.gov/) program is the nation's forest census. The USDA Forest Service collects data on:

- **Forest area** - How much land is forested?
- **Timber volume** - How much merchantable wood is available?
- **Biomass & Carbon** - How much carbon is stored in forests?
- **Tree counts** - How many trees per acre by species?
- **Forest health** - Mortality rates, growth, removals

FIA data is collected from ~300,000 permanent sample plots across the US, with each plot revisited every 5-10 years.

### The Problem with EVALIDator

The official tool for querying FIA data is [EVALIDator](https://apps.fs.usda.gov/fiadb-api/evalidator), a web-based interface. While useful, it has limitations:

- Manual point-and-click workflow
- Limited customization options
- Not reproducible
- Slow for large queries

### Why pyFIA?

**pyFIA** provides programmatic access to FIA data with:

- Python API for scripting and automation
- DuckDB backend for fast queries
- EVALIDator-compatible statistical methods
- Full control over filtering and grouping

## 2. Installation

Install pyFIA from PyPI:

In [None]:
# Install pyFIA (uncomment to run)
# !pip install pyfia

For spatial analysis support (optional):

```bash
pip install pyfia[spatial]
```

## 3. Getting the FIA Database

pyFIA works with a local DuckDB database containing FIA data. You have two options:

### Option A: Download Pre-built Database (Recommended)

Download a pre-built database from the pyFIA releases or use the downloader:

In [None]:
from pyfia.downloader import FIADownloader

# Download FIA data for a specific state
# downloader = FIADownloader()
# downloader.download_state("NC")  # North Carolina

### Option B: Use Existing Database

If you already have a DuckDB file with FIA data, simply point pyFIA to it.

## 4. Connecting to the Database

Use the `FIA` context manager to connect to your database:

In [None]:
from pyfia import FIA

# Connect to the FIA database
# Replace with your actual database path
DB_PATH = "path/to/your/fia.duckdb"

# Example: Connect and filter to North Carolina
# with FIA(DB_PATH) as db:
#     db.clip_by_state(37)  # 37 = North Carolina FIPS code
#     db.clip_most_recent(eval_type="EXPVOL")
#     print(f"Connected to FIA database")

### State FIPS Codes

FIA uses FIPS codes to identify states. Common codes:

| State | FIPS | State | FIPS |
|-------|------|-------|------|
| Alabama | 1 | Montana | 30 |
| California | 6 | North Carolina | 37 |
| Colorado | 8 | Oregon | 41 |
| Florida | 12 | Texas | 48 |
| Georgia | 13 | Virginia | 51 |
| Maine | 23 | Washington | 53 |

See the full list at [FIPS state codes](https://www.census.gov/library/reference/code-lists/ansi.html).

## 5. Core Queries

pyFIA provides functions for common forest metrics. Let's explore each one.

### 5.1 Forest Area

Query the total forest land area:

In [None]:
from pyfia import FIA, area

# Example: Get forest area for North Carolina
# with FIA(DB_PATH) as db:
#     db.clip_by_state(37)
#     db.clip_most_recent(eval_type="EXPVOL")
#     
#     # Total forest area
#     forest_area = area(db, land_type="forest")
#     print(forest_area)
#     
#     # Forest area by forest type
#     area_by_type = area(db, land_type="forest", grp_by="FORTYPCD")
#     print(area_by_type)

**Parameters:**
- `land_type`: "forest", "timber", "all"
- `grp_by`: Group results by column(s) like "FORTYPCD" (forest type), "OWNGRPCD" (ownership)

### 5.2 Timber Volume

Query merchantable timber volume:

In [None]:
from pyfia import FIA, volume

# Example: Get timber volume for North Carolina
# with FIA(DB_PATH) as db:
#     db.clip_by_state(37)
#     db.clip_most_recent(eval_type="EXPVOL")
#     
#     # Total merchantable volume on timberland
#     timber_vol = volume(db, land_type="timber", tree_type="gs")
#     print(timber_vol)
#     
#     # Volume by species
#     vol_by_species = volume(db, land_type="timber", by_species=True)
#     print(vol_by_species)
#     
#     # Volume by diameter class
#     vol_by_size = volume(db, land_type="timber", by_size_class=True)
#     print(vol_by_size)

**Parameters:**
- `land_type`: "forest", "timber"
- `tree_type`: "all", "gs" (growing stock)
- `vol_type`: "net", "gross", "sawlog"
- `by_species`: Group by species code
- `by_size_class`: Group by diameter class

### 5.3 Biomass and Carbon

Query above-ground and below-ground biomass:

In [None]:
from pyfia import FIA, biomass

# Example: Get biomass for North Carolina
# with FIA(DB_PATH) as db:
#     db.clip_by_state(37)
#     db.clip_most_recent(eval_type="EXPVOL")
#     
#     # Total biomass
#     total_biomass = biomass(db)
#     print(total_biomass)
#     
#     # Biomass by species (top carbon storers)
#     biomass_by_species = biomass(db, by_species=True)
#     print(biomass_by_species.sort("DRYBIO_AG_ACRE", descending=True).head(10))

**Output columns:**
- `DRYBIO_AG_ACRE`: Above-ground dry biomass (tons/acre)
- `DRYBIO_BG_ACRE`: Below-ground dry biomass (tons/acre)
- `CARBON_AG_ACRE`: Above-ground carbon (tons/acre)

### 5.4 Trees Per Acre

Query tree density:

In [None]:
from pyfia import FIA, tpa

# Example: Get trees per acre for North Carolina
# with FIA(DB_PATH) as db:
#     db.clip_by_state(37)
#     db.clip_most_recent(eval_type="EXPVOL")
#     
#     # All live trees
#     live_trees = tpa(db, tree_domain="STATUSCD == 1")
#     print(live_trees)
#     
#     # Trees >= 5 inches DBH
#     large_trees = tpa(db, tree_domain="STATUSCD == 1 AND DIA >= 5.0")
#     print(large_trees)
#     
#     # Trees by species
#     tpa_by_species = tpa(db, tree_domain="STATUSCD == 1", by_species=True)
#     print(tpa_by_species.sort("TPA_UNADJ", descending=True).head(10))

**Tree Domain Filters:**
- `STATUSCD == 1`: Live trees only
- `STATUSCD == 2`: Dead trees only
- `DIA >= 5.0`: Trees 5 inches diameter or larger
- `SPCD == 131`: Loblolly pine only (species code 131)

## 6. Understanding the Output

pyFIA returns Polars DataFrames with statistical estimates. Key columns:

| Column | Description |
|--------|-------------|
| `*_ACRE` | Per-acre estimate (e.g., `VOLCFNET_ACRE`) |
| `*_SE` | Standard error of the estimate |
| `*_TOTAL` | Total estimate for the area |
| `nPlots_TREE` | Number of plots in the estimate |

### Interpreting Standard Error

The standard error (SE) indicates uncertainty. A 95% confidence interval is approximately:

```
Estimate ± (1.96 × SE)
```

Lower SE = more precise estimate (usually from more plots).

## 7. Putting It All Together

Here's a complete example that generates a forest summary report:

In [None]:
from pyfia import FIA, area, volume, biomass, tpa

def generate_forest_summary(db_path: str, state_fips: int, state_name: str):
    """Generate a summary of forest resources for a state."""
    
    with FIA(db_path) as db:
        # Filter to state and most recent inventory
        db.clip_by_state(state_fips)
        db.clip_most_recent(eval_type="EXPVOL")
        
        # Get forest metrics
        forest_area_result = area(db, land_type="forest")
        timber_vol_result = volume(db, land_type="timber", tree_type="gs")
        biomass_result = biomass(db)
        tpa_result = tpa(db, tree_domain="STATUSCD == 1")
        
        # Print summary
        print(f"\n{'='*50}")
        print(f"Forest Summary: {state_name}")
        print(f"{'='*50}\n")
        
        if not forest_area_result.is_empty():
            row = forest_area_result.row(0, named=True)
            print(f"Forest Area: {row.get('AREA_TOTAL', 'N/A'):,.0f} acres")
        
        if not timber_vol_result.is_empty():
            row = timber_vol_result.row(0, named=True)
            print(f"Timber Volume: {row.get('VOLCFNET_ACRE', 'N/A'):,.1f} ft³/acre")
        
        if not biomass_result.is_empty():
            row = biomass_result.row(0, named=True)
            print(f"Above-ground Biomass: {row.get('DRYBIO_AG_ACRE', 'N/A'):,.1f} tons/acre")
        
        if not tpa_result.is_empty():
            row = tpa_result.row(0, named=True)
            print(f"Live Trees: {row.get('TPA_UNADJ', 'N/A'):,.0f} trees/acre")
        
        print(f"\n{'='*50}")

# Uncomment to run with your database:
# generate_forest_summary("path/to/fia.duckdb", 37, "North Carolina")

## 8. Next Steps

Now that you understand the basics, explore more:

### More pyFIA Features
- **Mortality analysis**: `mortality(db)` for tree death rates
- **Growth estimation**: `growth(db)` for net growth
- **Custom grouping**: Use `grp_by` parameter for any FIA column

### Other FIAtools
- **[gridFIA](https://fiatools.org/tools/gridfia/)**: 30m resolution biomass maps
- **[pyFVS](https://fiatools.org/tools/pyfvs/)**: Forest growth simulation
- **[askFIA](https://fiatools.org/tools/askfia/)**: Natural language queries

### Resources
- [pyFIA Documentation](https://mihiarc.github.io/pyfia/)
- [FIAtools Website](https://fiatools.org)
- [FIA DataMart](https://apps.fs.usda.gov/fia/datamart/datamart.html)
- [EVALIDator](https://apps.fs.usda.gov/fiadb-api/evalidator) (for comparison)

---

**Questions or feedback?** Open an issue on [GitHub](https://github.com/mihiarc/pyfia/issues).