# Data Ingest of 10-h Fuel Moisture Content

This notebook demonstrates retrieval and filtering of 10-h dead FMC data from RAWS. Retrieval of 10-h FMC observations is done with the software package `SynopticPy` and a stash of RAWS data kept and maintained by the broader OpenWFM community. This notebook will demonstrate use of `Synopticpy` with a free token, so limits are placed on the number of sensor hours that can be requested. Only records within the past year are freely available.

The module `ingest/retrieve_raws_api.py` has an executable section and will be run from the command line within this project. Here, the functions are used individually to demonstrate their utility. 

Time frame and spatial domain for data ingest are controlled in automated processes in the configuration files `training_data_config.json` or the `forecast_config.json` files. 

The main steps in the retrieval are:
* Use `synoptic.Metadata` to determine the RAWS with FMC data in the given spatial domain and time frame
* Use `synoptic.Timeseries` to retrieve all available data that may be relevant to FMC modeling. *NOTE:* the stations are selected so they must have FMC data, and then any other available variables are collected as a bonus. These data are used for exploratory purposes and quality control checks, but predictors for final modeling comes from HRRR.
* Format data and convert units.
* Identify missing data and interpolate with linear interpolation from numpy

The module has a main wrapper function `build_raws_dict` that puts all the steps together. In this module, we will demonstrate the individual steps with the module functions, and then run the main wrapper function at the end and check that it is all the same.

## References

For more info on python library API, see Brian Blaylock's `SynopticPy` [python package](https://github.com/blaylockbk/SynopticPy)

For more info on available Synoptic RAWS variables, see [Synoptic Data](https://demos.synopticdata.com/variables/index.html) documentation

## Setup

In [1]:
# import matplotlib.pyplot as plt
from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta
import synoptic
import json
import sys
import numpy as np
import polars as pl
import pandas as pd
sys.path.append('../src')
from utils import Dict, read_yml, read_pkl, str2time
from data_funcs import rename_dict
import ingest.retrieve_raws_api as rfuncs

In [2]:
raws_meta = read_yml("../etc/variable_metadata/raws_metadata.yaml")

with open("../etc/training_data_config.json", "r") as json_file:
    config = json.load(json_file)   
    config = Dict(config)

In [3]:
config

{'start_time': '2024-01-01T00:00:00Z',
 'end_time': '2024-01-01T05:00:00Z',
 'bbox': [40, -105, 45, -100],
 'forecast_step': 3,
 'training_data_filename': 'train.pkl',
 'data_params_path': 'etc/params_data.yaml'}

In [4]:
# End result should be the same as this...
raws_dict = rfuncs.build_raws_dict(config)

Start Date of RAWS retrieval: 2024-01-01T00:00:00Z
End Date of retrieval: 2024-01-01T05:00:00Z
Spatial Domain: [40, -105, 45, -100]
🚚💨 Speedy delivery from Synoptic's [32mmetadata[0m service.
📦 Received data from 33 stations.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station BRLW4
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC records
Converting RAWS air temp from C to K
Converting RAWS elevation from ft to meters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station HSYN1
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC records
Converting RAWS air temp from C to K
Converting RAWS elevation from ft to meters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station HRSN1
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC rec

## Stations MetaData

We use `SynopticPy` to get a list of all RAWS stations within the bounding box that have fuel moisture data availability in the given time period.

*Note*: the bounding box format used in `wrfxpy` is `[min_lat, min_lon, max_lat, max_lon]`. But, the bounding box format used by Synoptic is `[min_lon, min_lat, max_lon, max_lat]`. The code will assume the `wrfxpy` format and convert internally.

In [5]:
start = str2time(config.start_time)
end = str2time(config.end_time)
bbox = config.bbox
bbox_reordered = [bbox[1], bbox[0], bbox[3], bbox[2]]

In [6]:
sts = rfuncs.get_stations(bbox_reordered, start, end)

print(sts["stid"])

🚚💨 Speedy delivery from Synoptic's [32mmetadata[0m service.
📦 Received data from 33 stations.
shape: (33,)
Series: 'stid' [str]
[
	"BRLW4"
	"HSYN1"
	"HRSN1"
	"SBFN1"
	"DOHS2"
	…
	"MKVN1"
	"TT562"
	"TT567"
	"SFRS2"
	"TT591"
]


## Station Weather Data Time Series

Timeseries of observations are drawn for a single RAWS using the `synopticpy` package. Then, the data are formatted by custom funcitons in the `retrieve_raws_api` module. We subtract one hour from the start time because most stations produce data some number of minutes after the requested time, so if you request data at 1:00 the API will return data after that time. Then the temporal interpolation procedure, described below, will be extrapolating out at end points. Shifting the start time by 1 hour accounts for this, but if the start time is longer than 1 year in the past the API will truncate to 1 year. The module has a metadata file with a list of all RAWS weather variables relevant to FMC modeling. 

The data is returned in "long" format, where each weather variable has its own row. We restructure the data into "wide" format with the module function `format_raws` so that a single row corresponds to one time, and the columns correspond to different data variables. Additionally, this function converts units and returns a dictionary of all units for the variables

In [7]:
weather_vars = rfuncs.raws_meta["raws_weather_vars"]
df_temp = synoptic.TimeSeries(
        stid="HSYN1",
        start=start-relativedelta(hours=1),
        end=end,
        vars=weather_vars,
        units = "metric"
    ).df()

df_temp

🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.


date_time,variable,sensor_index,is_derived,value,units,id,stid,name,elevation,latitude,longitude,mnet_id,state,timezone,elev_dem,period_of_record_start,period_of_record_end,is_restricted,restricted_metadata,is_active
"datetime[μs, UTC]",str,u32,bool,f64,str,u32,str,str,f64,f64,f64,u32,str,str,f64,"datetime[μs, UTC]","datetime[μs, UTC]",bool,bool,bool
2023-12-31 23:23:00 UTC,"""precip_accum""",1,false,525.526,"""Millimeters""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 00:23:00 UTC,"""precip_accum""",1,false,0.0,"""Millimeters""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 01:23:00 UTC,"""precip_accum""",1,false,0.0,"""Millimeters""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 02:23:00 UTC,"""precip_accum""",1,false,0.0,"""Millimeters""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 03:23:00 UTC,"""precip_accum""",1,false,0.0,"""Millimeters""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2024-01-01 00:23:00 UTC,"""wind_speed""",1,false,0.0,"""m/s""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 01:23:00 UTC,"""wind_speed""",1,false,0.895,"""m/s""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 02:23:00 UTC,"""wind_speed""",1,false,0.895,"""m/s""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true
2024-01-01 03:23:00 UTC,"""wind_speed""",1,false,1.343,"""m/s""",3807,"""HSYN1""","""BESSEY""",2873.0,41.89722,-100.31056,2,"""NE""","""America/Chicago""",2841.2,2002-04-18 00:00:00 UTC,2024-12-30 20:23:00 UTC,false,false,true


In [8]:
dat, units = rfuncs.format_raws(df_temp)

Found 6 FMC records
Converting RAWS air temp from C to K
Converting RAWS elevation from ft to meters


In [9]:
units

{'air_temp': 'Kelvin',
 'relative_humidity': '%',
 'precip_accum': 'Millimeters',
 'fuel_moisture': 'gm',
 'wind_speed': 'm/s',
 'solar_radiation': 'W/m**2',
 'wind_direction': 'Degrees',
 'elevation': 'm'}

In [10]:
dat

date_time,stid,latitude,longitude,elevation,name,state,id,precip_accum,solar_radiation,air_temp,wind_direction,fuel_moisture,relative_humidity,wind_speed
"datetime[μs, UTC]",str,f64,f64,f64,str,str,u32,f64,f64,f64,f64,f64,f64,f64
2023-12-31 23:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,525.526,40.0,267.594,132.0,14.7,86.0,1.343
2024-01-01 00:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,0.0,0.0,266.483,,15.9,92.0,0.0
2024-01-01 01:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,0.0,0.0,265.372,112.0,17.1,94.0,0.895
2024-01-01 02:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,0.0,0.0,264.817,97.0,17.5,95.0,0.895
2024-01-01 03:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,0.0,0.0,264.261,110.0,18.8,94.0,1.343
2024-01-01 04:23:00 UTC,"""HSYN1""",41.89722,-100.31056,875.6904,"""BESSEY""","""NE""",3807,0.0,0.0,264.261,109.0,18.7,94.0,1.343


We then loop over the station IDs found in the previous step and retrieve all available data and then rename and pivot from long to wide. The loop generates a dictionary for each RAWS station with keys for weather data and other metadata.

*NOTE*: this process is not parallelized, as the same IP address is used for each request and parallization may result in issues

In [11]:
print(f"Attempting retrieval of RAWS from {start} to {end} within {bbox}")
print("~"*75)

raws_dict = {}

for st in sts["stid"]:
    print("~"*50)
    print(f"Attempting retrival of station {st}")
    try:
        df = synoptic.TimeSeries(
            stid=st,
            start=start-relativedelta(hours=1),
            end=end,
            vars=weather_vars,
            units = "metric"
        ).df()
    
        dat, units = rfuncs.format_raws(df)
        loc = rfuncs.get_static(sts, st)
        raws_dict[st] = {
            'RAWS': dat,
            'units': units,
            'loc': loc,
            'misc': "Data retrieved using `synoptic.TimeSeries` and formatted with custom functions within `ml_fmda` project."
        }
    except Exception as e:
        print(f"An error occured: {e}")

Attempting retrieval of RAWS from 2024-01-01 00:00:00+00:00 to 2024-01-01 05:00:00+00:00 within [40, -105, 45, -100]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station BRLW4
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC records
Converting RAWS air temp from C to K
Converting RAWS elevation from ft to meters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station HSYN1
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC records
Converting RAWS air temp from C to K
Converting RAWS elevation from ft to meters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Attempting retrival of station HRSN1
🚚💨 Speedy delivery from Synoptic's [32mtimeseries[0m service.
📦 Received data from 1 stations.
Found 6 FMC records
Converting RAWS air temp from 

In [12]:
raws_dict.keys()

dict_keys(['BRLW4', 'HSYN1', 'HRSN1', 'SBFN1', 'DOHS2', 'BKFS2', 'CRRS2', 'NMOS2', 'RDCS2', 'RESN1', 'VRFN1', 'PINS2', 'DVLW4', 'WCAS2', 'RHRS2', 'TS485', 'WPKS2', 'CSPS2', 'SDSS2', 'RWES2', 'MTRN1', 'MKVN1', 'TT562', 'TT567', 'SFRS2', 'TT591'])

In [13]:
st = [*raws_dict.keys()][0]
raws_dict[st].keys()

dict_keys(['RAWS', 'units', 'loc', 'misc'])

## Fix Time, Interpolate, and Calculate Rain

Synoptic may return RAWS data that has missing hours or is returned not exactly on the hour. The missing hours are simply absent in the return data from Synoptic, not marked by NaN. We fix that by filling in NaN for missing hours and interpolating to the exact hour. The resulting data should have regular hourly observations for every RAWS station.

Also, this is a good place in the code to rename variables. Various data sources have different variable names, so we standardize with naming conventions from the metadata files

In [14]:
times = pl.datetime_range(
    start=start,
    end=end,
    interval="1h",
    time_zone = "UTC",
    eager=True
).alias("time")
# times = np.array([dt.strftime("%Y-%m-%dT%H:%M:%SZ") for dt in times.to_list()])
times = np.array(times.to_list())

In [15]:
df2 = rfuncs.time_intp_df(raws_dict["BRLW4"]["RAWS"], times)
df2

date_time,stid,latitude,longitude,elevation,name,state,id,precip_accum,solar_radiation,fuel_moisture,air_temp,relative_humidity,wind_direction,wind_speed
object,str,f64,f64,f64,str,str,u32,f64,f64,f64,f64,f64,f64,f64
2024-01-01 00:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,755.0785,5.5,7.808333,275.372,43.25,127.416667,1.38025
2024-01-01 01:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,0.0,0.0,7.925,275.279417,46.333333,142.5,1.678167
2024-01-01 02:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,0.0,0.0,8.258333,274.353583,49.666667,247.416667,0.559833
2024-01-01 03:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,0.0,0.0,8.875,275.372,46.083333,127.0,1.75275
2024-01-01 04:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,0.0,0.0,8.675,275.418333,46.666667,86.666667,1.343
2024-01-01 05:00:00+00:00,"""BRLW4""",44.59722,-104.42806,1609.344,"""BEAR LODGE""","""WY""",2438,0.0,0.0,9.5,275.928,43.0,127.0,1.343


We now loop over all stations and run temporal interpolation. We also convert to pandas for easier pickle write.

In [16]:
print(f"Interpolating dataframe in time from {times.min()} to {times.max()}")
rename=True
if rename:
    print(f"Renaming RAWS columns based on raws_metadata file")
for st in raws_dict:
    print("~"*75)
    print(st)
    nsteps = raws_dict[st]["RAWS"].shape[0]
    raws_dict[st]["RAWS"] = rfuncs.time_intp_df(raws_dict[st]["RAWS"], times)
    raws_dict[st]["RAWS"] = pd.DataFrame(raws_dict[st]["RAWS"], columns = raws_dict[st]["RAWS"].columns)
    raws_dict[st]["times"] = times
    if raws_dict[st]["RAWS"].shape[0] != nsteps:
        raws_dict[st]["misc"] += " Interpolated data with numpy linear interpolation."
        print(f"    Original Dataframe time steps: {nsteps}")
        print(f"    Interpolated DataFrame time steps: {raws_dict[st]["RAWS"].shape[0]}")
        print(f"        interpolated {raws_dict[st]["RAWS"].shape[0] - nsteps} time steps")
    if rename:
        raws_dict[st]["units"] = rename_dict(raws_dict[st]["units"], raws_meta["rename_synoptic"])
        raws_dict[st]["RAWS"] = raws_dict[st]["RAWS"].rename(columns = raws_meta["rename_synoptic"])
        raws_dict[st]["loc"] = rename_dict(raws_dict[st]["loc"], raws_meta["rename_synoptic"])

Interpolating dataframe in time from 2024-01-01 00:00:00+00:00 to 2024-01-01 05:00:00+00:00
Renaming RAWS columns based on raws_metadata file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BRLW4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
HSYN1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
HRSN1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SBFN1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DOHS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BKFS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CRRS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NMOS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RDCS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RESN1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In [17]:
raws_dict[st]["units"]

{'temp': 'Kelvin',
 'rh': '%',
 'precip_accum': 'Millimeters',
 'fm': 'gm',
 'wind': 'm/s',
 'solar': 'W/m**2',
 'soilm': '%',
 'soilt': 'Celsius',
 'wind_direction': 'Degrees',
 'elev': 'm'}

In [18]:
raws_dict["BRLW4"].keys()

dict_keys(['RAWS', 'units', 'loc', 'misc', 'times'])

In [19]:
raws_dict["BRLW4"]["RAWS"]

Unnamed: 0,date_time,stid,lat,lon,elev,name,state,id,precip_accum,solar,fm,temp,rh,wind_direction,wind
0,2024-01-01 00:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,755.0785,5.5,7.808333,275.372,43.25,127.416667,1.38025
1,2024-01-01 01:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,0.0,0.0,7.925,275.279417,46.333333,142.5,1.678167
2,2024-01-01 02:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,0.0,0.0,8.258333,274.353583,49.666667,247.416667,0.559833
3,2024-01-01 03:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,0.0,0.0,8.875,275.372,46.083333,127.0,1.75275
4,2024-01-01 04:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,0.0,0.0,8.675,275.418333,46.666667,86.666667,1.343
5,2024-01-01 05:00:00+00:00,BRLW4,44.59722,-104.42806,1609.344,BEAR LODGE,WY,2438,0.0,0.0,9.5,275.928,43.0,127.0,1.343


In [20]:
raws_dict["BRLW4"]["loc"]

{'stid': 'BRLW4',
 'lat': 44.59722,
 'lon': -104.42806,
 'elev': 1609.344,
 'name': 'BEAR LODGE',
 'state': 'WY',
 'id': 2438}

In [21]:
raws_dict["BRLW4"]["units"]

{'temp': 'Kelvin',
 'rh': '%',
 'precip_accum': 'Millimeters',
 'fm': 'gm',
 'wind': 'm/s',
 'solar': 'W/m**2',
 'wind_direction': 'Degrees',
 'elev': 'm'}