# Data Ingest of HRRR weather model data

Retrieval of 10-h FMC observations is done with the software package `Herbie`. 

A configuration file is used to control data ingest. For automated processes, the code will look for a json configuration file depending on the use case: 

* For building training data, `../etc/training_data_config.json`
* For deploying the model on a grid, `../etc/forecast_config.json`

For a complete set of predictors that could be useful for FMC modeling, and for compatibility with other areas of `wrfxpy`, we use the 3D pressure model from HRRR. Additionally, since we require rainfall for modeling, we utilize the 3-hour forecast from HRRR and use the difference in accumulated precipitation from the 2 to 3 hour forecasts.

A module `retrieve_hrrr_api.py` has functions and other metadata for directing data ingest. A list of predictors will be provided in order to control the data downloading. Some of these predictors are derived features, such as equilibrium moisture content which is calculated from relative humidity and air temperature. Within the module, there are some hard-coded objects that have metadata related to this.

## References

For more info on HRRR data bands and definitions, see [HRRR inventory](https://www.nco.ncep.noaa.gov/pmb/products/hrrr/hrrr.t00z.wrfprsf02.grib2.shtml) for pressure model f02-f38 forecast hours.

For more info on python package, see Brian Blaylock's `Herbie` [python package](https://github.com/blaylockbk/Herbie)

## Setup

User definitions, these will come from config files in other areas of this project.

In [None]:
import matplotlib.pyplot as plt
import herbie
from herbie import FastHerbie
from datetime import datetime
import sys
import pandas as pd
import numpy as np
sys.path.append("../src")
import ingest.retrieve_hrrr_api as ih

In [None]:
bbox = [40, -105, 45, -100]
start = datetime(2024, 6, 1, 0)
end = datetime(2024, 6, 1, 5)
forecast_step = 3 # Do not change for now, code depends on it
features_list = ['Ed', 'Ew', 'rain', 'wind', 'solar', 'elev', 'lat', 'lon']

print(f"Start Date of retrieval: {start}")
print(f"End Date of retrieval: {end}")
print(f"Spatial Domain: {bbox}")
print(f"Required Features: {features_list}")

In [None]:
# Create a range of dates
dates = pd.date_range(
    start = start,
    end = end,
    freq="1h"
)

In [None]:
ih.feature_df

### Read Data

This function from `herbie` sets up a connection to read, but only what is requested later will be downloaded.

In [None]:
FH = FastHerbie(
    dates, 
    model="hrrr", 
    product="prs",
    fxx=range(3, 4)
)

In [None]:
inv = FH.inventory()
inv

In [None]:
inv[(inv.variable == "APCP")]

In [None]:
name_df_hrrr = pd.DataFrame({
    'band_prs': [616, 620, 624, 629, 661, (561, 563, 565, 567, 569, 571, 573, 575, 577), (560, 562, 564, 566, 568, 570, 572, 574, 576), 612, 643, 610, 615, 613, 607, 639, 640],
    'hrrr_name': ['TMP', 'RH', "WIND", 'APCP',
                  'DSWRF', 'SOILW', "TSOIL", 'CNWAT', 'GFLUX', "ASNOW", "SNOD", "WEASD", "PRES", "SFCR", "FRICV"],
    'herbie_str': ["TMP:2 m", "RH:2 m", "WIND:10 m", ":APCP:surface:2-3 hour acc", "DSWRF:surface", ":SOILW:", 
                   ":TSOIL:", "CNWAT:surface", "GFLUX:surface", "ASNOW:surface", ":SNOD:surface:3 hour fcst", ":WEASD:surface:2-3 hour acc", 
                   ":PRES:surface:3 hour fcst", ":SFCR:surface:3 hour fcst", ":FRICV:surface:3 hour fcst	"],
    'xarray_name': ["t2m", "r2", "si10", "tp", "dswrf", "soilw", "tsoil", "cnwat", "gflux", "unknown", "sde", "sdwe", "sp", "fsr", "fricv"],
    'fmda_name': ["temp", "rh", "wind", "precip_accum",
                 "solar", "soilm", "soilt", "canopyw", "groundflux", "asnow", "snod", "weasd", "pres", "rough", "fricv"],
    'descr': ['2m Temperature [K]', 
              '2m Relative Humidity [%]', 
              '10m Wind Speed [m/s]',
              'surface Total Precipitation [kg/m^2]',
              'surface Downward Short-Wave Radiation Flux [W/m^2]',
              'Volumetric Soil Moisture Content [Fraction]',
              'Soil Temperature [K]',
              'Plant Canopy Surface Water [kg/m^2]',
              'surface Ground Heat Flux [W/m^2]',
              'Total Snowfall [m]',
              'Snow Depth [m]',
              'Water Equivalent of Accumulated Snow Depth [kg/m^2]',
              'Surface air pressure [Pa]',
              'Surface Roughness [m]',
              'Frictional Velocity [m/s]'
             ],
    'notes': ["", "", "", "", "", "9 different depths, from 0-3m below ground", "9 different depths, from 0-3m below ground", "", "", "0-3 hr accumulated", "", 
              "0-3 hr accumulated, listed as `deprecated` in gribs", "", "", ""]
})
name_df_hrrr

In [None]:
ds = FH.xarray(":APCP:surface:2-3 hour acc")

In [None]:
ds

In [None]:
ds.tp.max()

In [None]:
from utils import hash_ndarray

In [None]:
hash_ndarray(ds.tp.values)

In [None]:
ds.tp.max()

In [None]:
ds.tp.min()