# Data Ingest of 10-h Fuel Moisture Content

This notebook demonstrates retrieval and filtering of 10-h dead FMC data from RAWS. Retrieval of 10-h FMC observations is done with the software package `SynopticPy` and a stash of RAWS data kept and maintained by the broader OpenWFM community. This notebook will demonstrate use of `Synopticpy` with a free token, so limits are placed on the number of sensor hours that can be requested.

For more info, see Brian Blaylock's `SynopticPy` [python package](https://github.com/blaylockbk/SynopticPy)

The main steps in the retrieval are:
* Use `synoptic.Metadata` to determine the RAWS with FMC data in the given spatial domain and time frame
* 

## Setup

In [None]:
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import synoptic
import json
import sys
import polars as pl
sys.path.append('../src')
from utils import Dict

A configuration file is used to control data ingest. Automated processes utilize the file `training_data_config.json` or `forecast_config.json`. In this tutorial, we will manually build a config file

In [None]:
config = Dict({
    'start_time': '2024-01-01_00:00:00',
    'end_time': '2024-01-01_02:00:00',
    'bbox': [37, -111, 46, -95],
    # 'raws_vars': ["fuel_moisture"]
    'raws_vars': ["air_temp", "relative_humidity", "precip_accum", "fuel_moisture", "wind_speed", "solar_radiation", "soil_moisture"]
})

config

## Stations MetaData

*Note*: the bounding box format used in `wrfxpy` is `[min_lat, min_lon, max_lat, max_lon]`. But, the bounding box format used by Synoptic is `[min_lon, min_lat, max_lon, max_lat]`.

In [None]:
bbox = config.bbox
bbox_reordered = [bbox[1], bbox[0], bbox[3], bbox[2]]
start = datetime.strptime(config.start_time, "%Y-%m-%d_%H:%M:%S")
end = datetime.strptime(config.end_time, "%Y-%m-%d_%H:%M:%S")
raws_vars = config.raws_vars

In [None]:
sts = synoptic.Metadata(
    bbox=bbox_reordered,
    vars=["fuel_moisture"], # Note we only want to include stations with FMC. Other "raws_vars" are bonus
    obrange=(start, end),
).df()

In [None]:
sts

## Station Time Series

We loop over the station IDs found in the previous step and retrieve all available data and then format and clean.

*NOTE*: this process is not parallelized, as the same IP address is used for each request and parallization may result in issues

In [None]:
# name_mapping = {
#     "air_temp":"temp", 
#     "fuel_moisture":"fm", 
#     "relative_humidity":"rh", 
#     "precip_accum":"rain",
#     "solar_radiation":"solar", 
#     "wind_speed":"wind", 
#     "precip_accum":"precip_accum", 
#     "soil_moisture":"soil_moisture"
# }

In [None]:
def format_raws(df, tstart, tend, 
                static_cols = ["stid", "latitude", "longitude", "elevation", "name", "state", "id"], 
                weather_vars = ["air_temp", "relative_humidity", "precip_accum", "fuel_moisture", "wind_speed", "solar_radiation", "soil_moisture"],
                verbose=True):
    # Given input dataframe (the output of synoptic.TimeSeries), return formatted dictionary
    # Inputs:
    # df: (dataframe)
    # tstart: (datetime)
    # tend: (datetime)
    # Returns: tuple of dictionaries, location data and raws data (loc, raws)    
    ## NOTE: assumes you can join by date_time in the dataframe

    ## Set up return dictionaries
    loc = {} # static, physical features of the location
    weather = {} # time-dynamic weather variables
    units = {} # stores units for variables
    
    ## Extract Static Info into dictionary, raise error if any static vars missing
    for col in static_cols:
        if col not in df.columns:
            raise ValueError(f"Column '{col}' does not exist in the input dataframe.")
        unique_vals = df[col].unique()
        if len(unique_vals) != 1:
            raise ValueError(f"Column '{col}' must have exactly one unique value, found {len(unique_vals)}: {unique_vals}")
        loc[col] = unique_vals[0]     
    if 'elevation' in static_cols: # convert ft to meters
        if verbose:
            print("Converting elevation from ft to meters")
        loc['elevation'] = loc['elevation'] * 0.3048
        units['elevation'] = "m"

    ## Extract weather data into dictionary, allow for missing data except fuel moisture
    ## Extract value and associated time
    assert "fuel_moisture" in df["variable"], "fuel_moisture not detected in input dictionary"
    for var in weather_vars:
        if var in df['variable']:
            df_temp = df.filter(df['variable'] == var)
            unit = df_temp['units'].unique()
            if len(unit) != 1:
                raise ValueError(f"Variable {var} has multiple values for units")
            units[var] = unit[0]
    
    dat = df.filter(pl.col("variable").is_in(weather_vars))
    dat = dat.pivot(
        values="value",
        index=["date_time", "stid", "longitude", "latitude"],
        on="variable"
    )

    if "air_temp" in dat.columns and units['air_temp'] == "Celsius":
        print("Converting RAWS air temp from C to K")
        units['air_temp'] = "K"
        
        

        
    return loc, units, dat

In [None]:
format_raws(df_temp, start, end, weather_vars = config.raws_vars)

In [None]:
df_temp = synoptic.TimeSeries(
        stid="CPTC2",
        start=start,
        end=end,
        vars=config.raws_vars
    ).df()

In [None]:
df_temp

In [None]:
weather_vars = ["air_temp", "relative_humidity", "precip_accum", "fuel_moisture", "wind_speed", "solar_radiation", "soil_moisture"]

In [None]:
df = df_temp

In [None]:
weather_df = df.filter(pl.col("variable").is_in(weather_vars))

In [None]:
weather_df

In [None]:
result_df = weather_df.pivot(
    values="value",
    index="date_time",
    on="variable"
)

In [None]:
print(f"Attempting retrieval of RAWS from {start} to {end} within {bbox}")
print("~"*75)

raws_dict = {}

for st in sts['stid']:
    print("~"*50)
    print(f"Attempting retrival of station {st}")
    df_temp = synoptic.TimeSeries(
        stid="CPTC2",
        start=start,
        end=end,
        vars=["fuel_moisture"]
    ).df()

    
    
    if df_temp.shape[0] > 0:
        print(f"Found {df_temp.shape[0]} FMC records. Saving to data dictionary")
        raws_dict[st] = df_temp