# Fuel Field Observations from Oklahoma

The purpose of this notebook is to clean and format data received from JD Carlson (via Derek Vanderkamp) on fuel moisture field observations conducted in Oklahoma in 1996-1997.

## Background

- Part of publication in 2007
- Used to calibrate Nelson model, used by many agencies

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from src.utils import time_intp, read_yml

In [None]:
df = pd.read_excel("data/oklahoma_Carlson_data.xlsx")
nlist = read_yml("etc/nlists/carlson_fielddata.yaml")
output_dir = "data/processed_data"

In [None]:
df

## Process

Standardize names, convert temp C to K

In [None]:
# Rename columns to standardize
# Warn about keys not present in DataFrame
missing = set(nlist.keys()) - set(df.columns)
if missing:
    warnings.warn(f"The following old names were not found in DataFrame columns: {missing}")

df = df.rename(columns=nlist)
# Units
df.temp = df.temp+273.15

## Explore

Carlson Data from Derek Vanderkamp:

- Includes weather data and fuel moisture data.
- Weather data and fuel moisture data not exactly lined up in time
- Separate rows with missing weather or missing fuel moisture if not at the same time

GOAL:
NOTE: Running this process for 1h, 10h, 100h, and 1000h separately

- Separate weather from FMC data
- Sort by time
- Write separately

In [None]:
print(f"Unique sites: {len(df.site.unique())}")
print(f"Unique subsites: {len(df.subsite.unique())}")
print(f"Unique res: {len(df.res.unique())}")

In [None]:
df.columns

In [None]:
# Define Variable Sets
tvars = ["year", "month", "doy", "mday", "hod", "min", "date"]
wvars = ["solar", "rain", "rh", "temp", "vap.press", "vpd",
         "wind", "vap.den"]
fvars = ["fm1", "fm10", "fm100", "fm1000"] # 1h, 10h, 100h, and 1000h

### Fix Date
The date column as received in the spreadsheet has a couple of missing dates, and the 0 hour dates are read in oddly. Check both

In [None]:
# Construct date
dates = pd.to_datetime(dict(
    year=df['year'],
    month=df['month'],
    day=df['mday'],
    hour=df['hod'],
    minute=df['min']
))

print(f"Number of NA Dates: {np.sum(dates.isna())}")

In [None]:
# Check 0 hour dates
df[df.hod == 0][tvars]

In [None]:
# Confirm that hour and minute info is in timestamp
print(df[df.hod == 0][tvars].iloc[0])

In [None]:
print(df[df.hod == 0]['date'].dt.hour.unique())
print(df[df.hod == 0]['date'].dt.minute.unique())

In [None]:
# Compare to date column in data frame, manually extract
inds = np.where(dates != df.date)[0]
print(f"Number of Date Mismatches: {len(inds)}")
print(f"Number of Missing Dates: {np.sum(df.date.isna())}")

In [None]:
# Manually Investigate
print(dates.iloc[inds])
df.iloc[inds][tvars]

**NOTE:** the manually constructed date column exists for all but a couple of NA dates in the spreadsheet. We will replace the date column with the manually constructed one to overwrite the two missing dates.

In [None]:
df.date = dates

print(f"Number of Missing Dates: {np.sum(df.date.isna())}")

### Separate Datasets

Note: filtering FMC data by fuel class

In [None]:
def get_fm_class(df0, fuel_class,
                 tvars = ["year", "month", "doy", "mday", "hod", "min", "date"], 
                 wvars = ["solar", "rain", "rh", "temp", "vap.press", "vpd", "wind", "vap.den"]):

    # Extract fuel data
    fm = df[tvars + fvars]

    if fuel_class == "1h":
        fm = fm[~(fm['fm1'].isna())]
        fm = fm.drop(columns = ["fm10", "fm100", "fm1000"])
    elif fuel_class == "10h":
        fm = fm[~(fm['fm10'].isna())]
        fm = fm.drop(columns = ["fm1", "fm100", "fm1000"])
    elif fuel_class == "100h":
        fm = fm[~(fm['fm100'].isna())]
        fm = fm.drop(columns = ["fm1", "fm10", "fm1000"])
    elif fuel_class == "1000h":
        fm = fm[~(fm['fm1000'].isna())]
        fm = fm.drop(columns = ["fm1", "fm10", "fm100"])
    
    # Sort by time
    fm = fm.sort_values("date").reset_index(drop=True)

    return fm

In [None]:
fm1 = get_fm_class(df, fuel_class = "1h")
fm10 = get_fm_class(df, fuel_class = "10h")
fm100 = get_fm_class(df, fuel_class = "100h")
fm1000 = get_fm_class(df, fuel_class = "1000h")

In [None]:
# Extract weather data
weather = df[tvars + wvars]
weather = weather[~(weather.rh.isna()) & ~(weather.temp.isna())]
weather = weather.sort_values("date").reset_index(drop=True)
weather = weather[['date'] + wvars]

In [None]:
# Explore Time
wlag = weather.date.diff()

u = wlag.dropna().unique()
print(f"Weather Time Range:\n    {weather.date.min()} to {weather.date.max()}")
print(f"Weather time increments: {u}")

In [None]:
flag = fm1.date.diff()
u = flag.dropna().unique()
print(f"FM 1h Time Range:\n    {fm1.date.min()} to {fm1.date.max()}")
print(f"FM 1h time increments: ")
print(f"    Min increment: {u.min()}")
print(f"    Max increment: {u.max()}")
print(f"    Mean increment: {u.mean()}")

In [None]:
flag = fm10.date.diff()
u = flag.dropna().unique()
print(f"FM 10h Time Range:\n    {fm10.date.min()} to {fm10.date.max()}")
print(f"FM 10h time increments: ")
print(f"    Min increment: {u.min()}")
print(f"    Max increment: {u.max()}")
print(f"    Mean increment: {u.mean()}")

In [None]:
flag = fm100.date.diff()
u = flag.dropna().unique()
print(f"FM 100h Time Range:\n    {fm100.date.min()} to {fm100.date.max()}")
print(f"FM 100h time increments: ")
print(f"    Min increment: {u.min()}")
print(f"    Max increment: {u.max()}")
print(f"    Mean increment: {u.mean()}")

In [None]:
flag = fm1000.date.diff()
u = flag.dropna().unique()
print(f"FM 1000h Time Range:\n    {fm1000.date.min()} to {fm1000.date.max()}")
print(f"FM 1000h time increments: ")
print(f"    Min increment: {u.min()}")
print(f"    Max increment: {u.max()}")
print(f"    Mean increment: {u.mean()}")

## Calc Eqs in Weather

In [None]:
# To confirm Kelvin
weather.temp.head()

In [None]:
# Equilibria
rh = weather.rh
temp = weather.temp
Ed = 0.924 * rh**0.679 + 0.000499 * np.exp(0.1 * rh) + 0.18 * (21.1 + 273.15 - temp) * (1 - np.exp(-0.115 * rh))
Ew = 0.618 * rh**0.753 + 0.000454 * np.exp(0.1 * rh) + 0.18 * (21.1 + 273.15 - temp) * (1 - np.exp(-0.115 * rh))

weather["Ed"] = Ed
weather["Ew"] = Ew

## Save

In [None]:
# Write Out
os.makedirs(output_dir, exist_ok=True)
fm1.to_excel("data/processed_data/ok_1h.xlsx", index=False)
fm10.to_excel("data/processed_data/ok_10h.xlsx", index=False)
fm100.to_excel("data/processed_data/ok_100h.xlsx", index=False)
fm1000.to_excel("data/processed_data/ok_1000h.xlsx", index=False)

weather.to_excel("data/processed_data/dvdk_weather.xlsx", index=False)