# Retrieve RAWS Station Data

The purpose of this notebook is to collect 10-h fuel moisture data from RAWS stations in Colorado for a given analysis period. Additionally, we will filter the RAWS stations to those with complete atmospheric data sensors needed to run fuel moisture models. This additional data will be collected as a quality assurance for the atmospheric satellite data. 

* **Spatial Domain**: All RAWS stations in CO with complete atmospheric data.
* **Time Period**: All data from May-September 2023

The time period is 5 months of data that correspond to Colorado's traditional fire season, though this has been extended with climate change.

## Environment Setup

In [None]:
import sys
import os.path as osp
import os
import pickle
from MesoPy import Meso
import pandas as pd
import numpy as np
from datetime import datetime

# Custom modules
sys.path.append(osp.join(os.getcwd(),"src")) # Add src subdirectory to python path
from data_funcs import format_raws

# Define output path
outpath = "./data"

# Setup Mesowest data query
meso_token="4192c18707b848299783d59a9317c6e1" # Get your own token...
m=Meso(meso_token)

## Find Data Availability for Stations

Find which RAWS stations in Colorado have data availability for variables of interest to fuel moisture model. Variable names from [Synoptic](https://developers.synopticdata.com/mesonet/v2/api-variables/).

*Note:* the Mesowest API only provides recent data for users with a free token (one year based on testing). 

In [None]:
# Set up data query params
time_start = "202305010000"  # May 1 2023 00:00 in format yyyymmddHHMM
time_s2    = "202305010100"  # small time increment used to get station ids
time_end   = "202309302300"  # Sept 30 2023 23:00 in format yyyymmddHHMM
state = "CO"

# Variable names needed to run fmda
vars='air_temp,relative_humidity,precip_accum,fuel_moisture,wind_speed,solar_radiation'

We next query data for a small period of time to view which stations have complete observations of the variables listed above.

In [None]:
# Check that data alread exists in outputs
if osp.exists(osp.join(outpath, "station_df_co.csv")):
    station_df=pd.read_csv(osp.join(outpath, "station_df_co.csv"))
    ids = station_df['STID'].tolist()
    print('Number of RAWS Stations: ',station_df.shape[0])
else:
    # Get one hour of data
    meso_obss = m.timeseries(start=time_start,end=time_s2, state=state, 
                                 showemptystations = '0', vars=vars)
    # Set up DF to view data availability
    station_df = pd.DataFrame(columns=['STID', 'air_temp', 'relative_humidity', 'precip_accum', 'fuel_moisture', 'wind_speed', 'solar_radiation'],
                      index=range(0, len(meso_obss["STATION"])))
    # Loop through stations in returned data and add indicator of whether variable is present
    for i in range(0, station_df.shape[0]):
        station_df["STID"][i] = meso_obss["STATION"][i]["STID"]
        station_df["air_temp"][i] = int("air_temp" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())
        station_df["relative_humidity"][i] = int("relative_humidity" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())
        station_df["precip_accum"][i] = int("precip_accum" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())
        station_df["fuel_moisture"][i] = int("fuel_moisture" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())
        station_df["wind_speed"][i] = int("wind_speed" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())
        station_df["solar_radiation"][i] = int("solar_radiation" in meso_obss["STATION"][i]["SENSOR_VARIABLES"].keys())

    # Filter to stations with complete observations over time period
    station_df = station_df[
        (station_df["fuel_moisture"]==1) & 
        (station_df["relative_humidity"]==1) &
        (station_df["precip_accum"]==1) &
        (station_df["air_temp"]==1) &
        (station_df["wind_speed"]==1) &
        (station_df["solar_radiation"]==1)
    ]
    # Extract station IDs
    ids = station_df['STID'].tolist()
    # Print number of stations
    print('Number of RAWS Stations: ',station_df.shape[0])
    station_df[station_df["fuel_moisture"]==1].head()

    # write output
    station_df.to_csv(osp.join(outpath, 'station_df_co.csv'), index=False)

## Get RAWS Observations

For the station IDs found above, retrieve data for the entire time period. *Note:* the time period needs to be broken into smaller chunks for the API to work.

In [None]:
dates = pd.date_range(time_start, time_end, freq="MS") # Break into months
dates = dates.append(pd.date_range(time_end, time_end))

In [None]:
# define helper function to retrieve_data
def get_raws(d1, d2):
    print('Gathering data from '+str(d1)+' to '+str(d2))
    meso_ts = m.timeseries(d1.strftime("%Y%m%d%H%M"), d2.strftime("%Y%m%d%H%M"), stid=ids, showemptystations = '0', vars=vars)   # ask the object for data
    # Dictionary to be saved for testing
    raws_dict = {}
    for i in range(0, len(meso_ts['STATION'])):
        raws1 = format_raws(meso_ts['STATION'][i])
        # Filter out if less than 28 days of data
        if len(raws1['fm']) < int(24*28):
            print(f"Excluding {raws1['STID']}, nobs = {len(raws1['fm'])}")
        else:
            raws_dict[raws1['STID']+"_"+d1.strftime("%Y-%m-%d")] = raws1 # save to test dictionary
    print('Number of Stations: '+str(len(raws_dict)))
    return raws_dict

In [None]:
# Check that data alread exists in outputs
if osp.exists(osp.join(outpath, "raws_dat.pickle")):
    raws_dat=pd.read_pickle(osp.join(outpath, "raws_dat.pickle"))
else:
    # Get first time period, then join to it after
    raws_dict = get_raws(dates[0], dates[1])
    # Loop through other time periods and join
    for i in range(1, len(dates)-1):
        rtemp = get_raws(dates[i], dates[i+1])
        # Loop through stations and append observed data
        vs = 'time', 'rain', 'solar', 'fm', 'temp', 'rh', 'wind_speed', 'Ed', 'Ew' # variable names to append
        # filter out if less than 29 days of data (24*29 hours)
        raws_dict |= rtemp # append
        
    # Write Output
    with open(osp.join(outpath, 'raws_dat.pickle'), 'wb') as handle:
        pickle.dump(raws_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)