# GRIB2 Reference Recipe for HRRR (High-Resolution Rapid Refresh)

This notebook examples uses the {class}`pangeo_forge_recipes.recipes.ReferenceRecipe` to create a reference index of the HRRR dataset. Since it is a kerchunk based reference recipe, none of the source data files are transfered, only the `.json` kerchunk index is copied over. 

For more background, see [this blog post](https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685).

The HRRR dataset is an atmospheric model produced by NOAA in near-real-time. The output model data are stored in the GRIB2 format, which is a common format in weather forecasting and modeling. By using the Kerchunk-based {class}`pangeo_forge_recipes.recipes.ReferenceRecipe`, we can read this dataset as if it were `Zarr`. 


## Define the FilePattern

Here we will select some files from the HRRR data archive on the [AWS open data registry](https://registry.opendata.aws/noaa-hrrr-pds/). 

In [None]:

import fsspec 

# Initiate fsspec filesystems for reading and writing
fs = fsspec.filesystem('s3', anon=True, skip_instance_cache=True)

# retrieve list of available days in archive
days_available = fs.glob('s3://noaa-hrrr-bdp-pds/hrrr.*')

# Read HRRR GRIB2 files from latest day
files = fs.glob(f's3://{days_available[-1]}/conus/*wrfsfcf01.grib2')

# Append s3 prefix for filelist
files = sorted(['s3://'+f for f in files])
files

Examine one of the files with xarray.


In [None]:
%%time
import fsspec
import xarray as xr

ex_file = fsspec.open_local("simplecache::"+files[0], s3={'anon': True}, filecache={'cache_storage':'/tmp/files'})
ds = xr.open_dataset(ex_file, engine="cfgrib", filter_by_keys={'stepType': 'instant','typeOfLevel': 'heightAboveGround'})
ds

Opening up a single grib file took over 1.5 minutes. 


## Define the Recipe


As a first step in our recipe, we create a `File Pattern <../../recipe_user_guide/file_patterns>` to represent the input files.
In this case, since we already have a list of inputs, we just use the `pattern_from_file_sequence` convenience function.


In [None]:
from pangeo_forge_recipes.patterns import pattern_from_file_sequence
pattern = pattern_from_file_sequence(files, 'step', file_type='grib')


In [None]:
pattern

In the `GribReferenceRecipe` class we can pass kwargs such as: `output_storage_options` and `grib_filter_by_keys`.

In [None]:
from pangeo_forge_recipes.recipes import ReferenceRecipe

data_filter={'typeOfLevel': 'heightAboveGround', 'level': [2, 10]}    
storage_options = {"anon": True}

recipe = ReferenceRecipe(pattern, storage_options=storage_options,grib_filters=data_filter)

recipe

## Storage

If the recipe excecution occurs in a Bakery, cloud storage will be assigned automatically.

For this example, we use the recipe's default storage, which is a temporary local directory.

## Execute recipe

For testing, we will use the `copy_pruned()` utility, which will create a subset of the recipe for testing.

In [None]:
recipe_pruned = recipe.copy_pruned()

Next we are converting the recipe to a python function for debugging.

In [None]:
rp = recipe_pruned.to_function()

In [None]:
rp()

## Examine the Result

### Load with Intake

The easiest way to load the dataset created by `fsspec_reference_maker` is via intake.
An intake catalog is automatically created in the target.

In [None]:
cat_url = f"{recipe_pruned.target}/reference.yaml"
cat_url

In [None]:
import intake
cat = intake.open_catalog(cat_url)
cat

To load the data lazily:

In [None]:
%time ds = cat.data.to_dask()
ds

### Manual Loading

It is also possible to load the reference dataset directly with xarray, bypassing intake.

In [None]:
ref_url = f"{recipe_pruned.target}/reference.json"
ref_url

In [None]:
import fsspec
import xarray as xr
m = fsspec.get_mapper(
    "reference://",
    fo=ref_url,
    target_protocol="file",
    remote_protocol="s3",
    remote_options=dict(anon=True),
    skip_instance_cache=True,
)
ds = xr.open_dataset(
    m,
    engine='zarr',
    backend_kwargs={'consolidated': False},
    chunks={},
    decode_coords="all"
)
ds

### Make a Map

Let's just verify that we can read an visualize the data. We'll compare the first year to the last year.

In [None]:
ds['2t'][-1].plot()