# AmeriFlux Data Processing with `converter.py`
This tutorial notebook demonstrates how to use the two key classes in `converter.py` — **`AmerifluxDataProcessor`** and **`Reformatter`** — to
1. Parse raw Campbell Scientific **TOA5** or **AmeriFlux Level‑2** CSV files.
2. Clean, standardize, and resample them for downstream analysis.

Run each code cell sequentially, adjusting the file paths to your own data.

## Prerequisites
```bash
pip install pandas numpy pyyaml
```
Make sure the *micromet* module (or the individual `converter.py` file plus its helpers) is on your `PYTHONPATH`, or is located in the same directory as this notebook.

In [None]:
import logging, pathlib, sys, pandas as pd
sys.path.append("../../src/")
from micromet import AmerifluxDataProcessor, Reformatter

# Show informational messages from the helper classes
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

## 1. Parse a raw datalogger file
The processor detects whether the file is **TOA5** (four‑row header) or already in AmeriFlux Level‑2 format (single header row).

In [None]:
# ➡️  Update this path to point to a real file on your system
example_file = pathlib.Path('station_data/US-CdM/21020_Flux_AmeriFluxFormat_2.dat')

proc = AmerifluxDataProcessor()
raw_df = proc.to_dataframe(example_file)
raw_df.head()

## 2. Clean & resample with `Reformatter`
This will:
- Convert timestamps to `datetime` and enforce 30‑min spacing
- Rename columns to a consistent schema
- Remove obvious outliers using `variable_limits.py`
- Apply several variable‑specific fixes (e.g., Tau zeros, SWC percent‑to‑fraction)
- (Optionally) drop redundant soil columns

In [None]:
rf = Reformatter(drop_soil=False)  # set to True to drop extra soil channels
clean_df = rf.prepare(raw_df)
clean_df.head()

### Quick diagnostics
Use the standard pandas tools to verify the cleaned data.

In [None]:
# Basic stats for a few key variables
clean_df[['NETRAD', 'SW_IN_1_1_2', 'SWC_3_1_1']].describe().T

## 3. Compile multiple raw files
If your logger writes many daily files, `raw_file_compile` can merge them into a single DataFrame.

In [None]:
compiled = proc.raw_file_compile(
    main_dir='station_data',
    station_folder_name='Cedar_mesa',
    search_str='*Flux_AmeriFluxFormat*.dat'
)
compiled.shape

## 4. Batch processing for many stations
`proc.iterate_through_stations()` loops through a hard‑coded dictionary of station IDs and compiles data for each. Modify the dictionary inside `converter.py` or supply your own loop for full control.

In [None]:
# proc.iterate_through_stations()  # Uncomment to run (may take a while)

## Appendix – Configuration files
- `reformatter_vars.py` – column rename maps, soil sensor groupings, etc.
- `variable_limits.py` – hard min/max limits for QC.

Feel free to adjust these to match your particular station setup.