# Data Ingest of HRRR weather model data

### Intro

Weather data predictors for the ML models of FMC are retrieved from the HRRR weather model in this project. The 3D pressure model product from HRRR is utilized, since it has a larger set of variables than other products and it is used internally in other areas of the `wrfxpy` project. Additionally, since we require rainfall for modeling, we utilize the 3-hour forecast from HRRR and use the difference in accumulated precipitation from the 2 to 3 hour forecasts.

There are 2 main uses for the HRRR weather data:

1. For constructing training data sets
2. For forecasting with a trained model over a spatial domain

This notebook will demonstrate reading and calculating a set of predictors derived from the HRRR model for a spatial bounding box.

### Code

A configuration file is used to control data ingest. For automated processes, the code will look for a json configuration file depending on the use case: 

* For building training data, `../etc/training_data_config.json`
* For deploying the model on a grid, `../etc/forecast_config.json`

Retrieval of atmospheric weather predictors is done with the software package `Herbie`. A module `retrieve_hrrr_api.py` has functions and other metadata for directing data ingest. A list of predictors will be provided in order to control the data downloading. Some of these predictors are derived features, such as equilibrium moisture content which is calculated from relative humidity and air temperature. Within the module, there are some hard-coded objects that have metadata related to this, such as the regex formatted search strings used for each variable.

## References

For more info on HRRR data bands and definitions, see [HRRR inventory](https://www.nco.ncep.noaa.gov/pmb/products/hrrr/hrrr.t00z.wrfprsf02.grib2.shtml) for pressure model f02-f38 forecast hours.

For more info on python package, see Brian Blaylock's `Herbie` [python package](https://github.com/blaylockbk/Herbie)

## Setup

User definitions, these will come from config files in other areas of this project.

In [None]:
import matplotlib.pyplot as plt
from herbie import paint, wgrib2, Herbie
from herbie.toolbox import EasyMap, ccrs, pc
import xarray as xr
from herbie import FastHerbie
from datetime import datetime
import sys
import pandas as pd
import numpy as np
sys.path.append("../src")
from utils import Dict, read_yml
import ingest.retrieve_hrrr_api as ih

In [None]:
bbox = [40, -105, 45, -100]
start = datetime(2024, 11, 1, 20)
end = datetime(2024, 11, 2, 1)
forecast_step = 3 # Do not change for now, code depends on it
features_list = ['Ed', 'Ew', 'rain', 'wind', 'solar', 'elev', 'lat', 'lon']

print(f"Start Date of retrieval: {start}")
print(f"End Date of retrieval: {end}")
print(f"Spatial Domain: {bbox}")
print(f"Required Features: {features_list}")

### Read Data

This function from `herbie` sets up a connection to read, but only what is requested later will be downloaded. Available data can be viewed with the `inventory()` method. *Note:* this will display a separate row for each time step requested.

In [None]:
# Create a range of dates
dates = pd.date_range(
    start = start,
    end = end,
    freq="1h"
)

In [None]:
dates.shape

In [None]:
FH = FastHerbie(
    dates, 
    model="hrrr", 
    product="prs",
    fxx=range(3, 4)
)

In [None]:
inv = FH.inventory()
inv

In [None]:
import importlib
import ingest.retrieve_hrrr_api
importlib.reload(ingest.retrieve_hrrr_api)
import ingest.retrieve_hrrr_api as ih

In [None]:
search_strings = ih.features_to_searchstr(features_list)
search_strings

### Spatial Subset

Brian Blaylock recommends downloaded the data and spatially subsetting using Herbie's wrapper for `wgrib2`, then recreating the objects and reading into memory.

In [None]:
bbox

In [None]:
def get_fh_layer(FH, search_string, remove_grib=True, bbox=None, subset_naming="myRegion"):
    """
    Get HRRR data from fastherbie object given regex search string. 
    Search string groups variables by layer/level. 
    Optional bounding box spatially subsets data

    Arguments:
        - FH: FastHerbie object, defined with start and stop times
        - remove_grib: bool, whether or not to delete grib files returning to local read
        - search_string: str, based on regex. see utility function features_to_searchstr
        - bbox: list, optional bounding box to subset region

    Notes: As of Dec 18, 2024, Brian Blaylock recommends downloading data and using 
        wgrib2 to spatially subset the data
        
    Returns:
        xarray, optionally subsetted to a bounding box
    """

    if bbox is None:
        print("Returning data for entire conus, deleting all downloaded gribs")
        ds = FH.xarray(search_string, remove_grib=remove_grib)
    else:
        print(f"Subsetting data to region within bbox: {bbox}")
        print(f"Downloading Data to run wgrib2")

        files = FH.download(search_string)
        files = sorted(files, key=lambda x: int(x.name.split('__hrrr.t')[1][:2])) # sort by hour
        
        # Reorder bbox to match format (min_lon, max_lon, min_lat, max_lat)
        extent = (bbox[1], bbox[3], bbox[0], bbox[2]) 
        subset_files=[]
        for file in files:
            subset_files.append(wgrib2.region(file, extent, name=subset_naming))

        # Convert PosixPath list to strings
        file_list = [str(path) for path in subset_files]
        
        # Open files as a combined dataset
        ds = xr.open_mfdataset(
            file_list,
            engine="cfgrib",
            concat_dim="time",  # Replace 'time' with the appropriate dimension
            combine="nested" 
        )        
        ds = ds.sortby('time')  

        # Delete Files
        if remove_grib:
            for file in files:
                if file.exists():  # Check if the file exists before attempting to delete it
                    file.unlink()        
            for file in subset_files:
                if file.exists():  # Check if the file exists before attempting to delete it
                    file.unlink()    
                
    return ds

In [None]:
ss = search_strings['2m']

ds1 = get_fh_layer(FH, ss)

In [None]:
ds2 = get_fh_layer(FH, ss, remove_grib=False, bbox = bbox)

In [None]:
# Get CRS from geographic herbie 
## Assuming this info doesn't change over time
H = Herbie("2023-08-01", product="sfc")
ds_hgt = H.xarray("(?:HGT|LAND):surface")
crs = ds_hgt.herbie.crs

In [None]:
ax = EasyMap(crs=crs).STATES(color="k").ax
ax.pcolormesh(ds_hgt.longitude, ds_hgt.latitude, ds_hgt.orog, cmap=paint.LandGreen.cmap, alpha=0.5, transform=pc)
ax.pcolormesh(ds2.longitude, ds2.latitude, ds2.t2m.isel(time=0), transform=pc)

ax.gridlines(xlocs=extent[:2], ylocs=extent[2:], color="k", ls="--", draw_labels=True)

Data fields are accessed through the `.xarray()` method. This will temporarily download the file and then deliver it in memory as an xarray object. Different variables are accessed through search strings that specify the variable name (e.g. air temperature), the level of the observation (e.g. surface level), and the forecast hour relative to the f00 start time (e.g. hour 3 as we will be using). The `retrieve_hrrr_api` module in this project stores a dataframe with names and info on various variables that will be considered for modeling FMC.

In [None]:
# Show HRRR naming dataframe
ih.hrrr_name_df

## Getting a Set of Predictors

We will demonstrate retrieval of a restricted set of predictors.

Equilibrium moisture content is calculated from RH and air temp.

In [None]:
features_list = ["Ed", "rain", "wind"]

In [None]:
ds = FH.xarray("RH:2 m|TMP:2 m", remove_grib=False)
ds

In [None]:
from ingest.retrieve_hrrr_api import calc_eq
calc_eq(ds)

In [None]:
'time' in ds.dims

In [None]:
inv = FH.inventory()

In [None]:
inv[inv.variable == "WIND"]

In [None]:
ds = FH.xarray("CNWAT:surface|ASNOW:surface|:SNOD:surface:3 hour fcst")

In [None]:
ds

In [None]:
ds.herbie.pick_points(
    pd.DataFrame({"latitude": [40.76, 40], "longitude": [-111.876183, -111]})
)

In [None]:
type(ds)

In [None]:
from herbie.toolbox import EasyMap, pc, ccrs
from herbie import paint

ax = EasyMap("110m", figsize=[15, 9], crs=ds.herbie.crs).STATES().ax

x = ds["Ed"]
if 'time' in x.dims:
    x = x.isel(time=4)

p = ax.pcolormesh(
    ds.longitude,
    ds.latitude,
    x,
    transform=pc,
    cmap=paint.NWSRelativeHumidity.cmap,
)

plt.colorbar(
    p,
    ax=ax,
    orientation="horizontal",
    pad=0.01,
    shrink=0.8,
    label="Equilibrium Moisture Content",
)
plt.title(None, size=18)

In [None]:
paint.NWSWindSpeed

In [None]:
ax = EasyMap("50m", figsize=[15, 9], crs=ds.herbie.crs).STATES().ax
p = ax.pcolormesh(
    ds.longitude,
    ds.latitude,
    ds.t2m.isel(time=0),
    transform=pc,
    cmap=None,
)

plt.colorbar(
    p,
    ax=ax,
    orientation="horizontal",
    pad=0.01,
    shrink=0.8,
    label="Equilibrium Moisture Content",
)

In [None]:
ds = FH.xarray(":TSOIL:")
ds

In [None]:
ds = FH.xarray("CNWAT:surface|:TSOIL:")

In [None]:
ds

In [None]:
ds2 = FH.xarray(":APCP:surface:2-3 hour acc fcst")
ds2

In [None]:
ax = EasyMap("50m", figsize=[15, 9], crs=ds.herbie.crs).STATES().ax
p = ax.pcolormesh(
    ds.longitude,
    ds.latitude,
    ds2.tp.isel(time=0),
    transform=pc,
    cmap=paint.NWSPrecipitation.cmap,
)

plt.colorbar(
    p,
    ax=ax,
    orientation="horizontal",
    pad=0.01,
    shrink=0.8,
    label="Equilibrium Moisture Content",
)