# GRIB2 Reference Recipe for HRRR (High-Resolution Rapid Refresh)

In this notebook, we will demonstrate how to create a reference recipe using GRIB2 files. As with all reference recipes, the original data is not duplicated, instead a reference/index of the dataset is built so the dataset can be read as if it were a Zarr store.

The input files for this recipe are GRIB2 files provided by NOAA and stored in Amazon S3 ([HRRR AWS Open Data Page](https://registry.opendata.aws/noaa-hrrr-pds/)).

This Pangeo-Forge tutorial is an adaptation of the [Kerchunk GRIB2 Project Pythia Cookbook](https://projectpythia.org/kerchunk-cookbook/notebooks/case_studies/HRRR.html). 

## Define the FilePattern



In [None]:
import fsspec
import xarray as xr
from pangeo_forge_recipes.patterns import pattern_from_file_sequence

fs = fsspec.filesystem("s3", anon=True, skip_instance_cache=True)

# retrieve list of available days in archive
days_available = fs.glob("s3://noaa-hrrr-bdp-pds/hrrr.*")

# Read HRRR GRIB2 files from latest day, the select the first 2
files = fs.glob(f"s3://{days_available[-1]}/conus/*wrfsfcf01.grib2")[0:2]

# Create a filepattern object from input file paths
pattern = pattern_from_file_sequence(['s3://' + path for path in files], 'time', file_type='grib')
pattern


In [None]:
pattern


### Optional: Examine an input file

In [None]:
# import s3fs
# import xarray as xr 
# url = f'simplecache::s3://{files[0]}'
# file = fsspec.open_local(url, s3={'anon': True}, filecache={'cache_storage':'/tmp/files'})

# ds = xr.open_dataset(file, engine="cfgrib", backend_kwargs={'filter_by_keys': grib_filters})

## Write the Recipe

Now that we have created our `FilePattern`,  we can build our `beam` pipeline. A beam pipeline is a chained together list of (Apache Beam transformations)[https://beam.apache.org/documentation/programming-guide/#transforms].


### Specify where our target data should be written
Here, we are creating a temporary directory to store the written reference files. If we wanted these reference files to persist locally, we would want to specify another file path. 


In [None]:
import os
from tempfile import TemporaryDirectory
td = TemporaryDirectory()
target_root = td.name
store_name = "output.json"
target_store = os.path.join(target_root, store_name)

### Specify additional args


In [None]:
grib_filters ={"typeOfLevel": "heightAboveGround", "level": [2, 10]}
storage_options = {"anon": True}

## Construct a Pipeline
Next, we will construct a beam pipeline. This should look similar to the other standard Zarr examples, but will involve a few different transforms. 

In [None]:
import apache_beam as beam
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithKerchunk, DropKeys, CombineReferences, WriteCombinedReference

store_name = "GRIB2_reference"
output_json_fname = "reference.json"
remote_protocol = "s3"
transforms = (
        # Create a beam PCollection from our input file pattern
        beam.Create(pattern.items())
        # Pass out file inputs to fsspec
        | OpenURLWithFSSpec(open_kwargs={'anon':True})
        # Pass our fsspec-opened files to Kerchunk to create references for each file
        | OpenWithKerchunk(file_type=pattern.file_type, remote_protocol=remote_protocol)
        # Minor transform (REQUIRED) to drop keys from the PCollection prior to combining
        | DropKeys()
        # Use Kerchunk's `MultiZarrToZarr` functionality to combine the reference files into a single reference file
        # Note: Setting the correct contact_dims and identical_dims is important. 
        | CombineReferences(concat_dims=["valid_time"],
                            identical_dims=["latitude", "longitude", "heightAboveGround", "step"],

                            mzz_kwargs = {"remote_protocol": remote_protocol} )
        # Write the combined Kerchunk reference to file storage
        | WriteCombinedReference(
            target_root=target_root,
            store_name=store_name,
            output_json_fname=output_json_fname,
        )
    )

## Execute the Recipe

In [None]:
with beam.Pipeline() as p:
    p | transforms

## Examine the Result

Here we are creating an fsspec mapper of the reference file and then passing it to Xarray's `open_dataset` to be read as if it were a Zarr store.

In [None]:
# open dataset as zarr object using fsspec reference file system and Xarray
fpath = target_root + "/"+store_name +"/"+ output_json_fname
fs = fsspec.filesystem(
    "reference", fo=fpath
)
ds = xr.open_dataset(
    fs.get_mapper(""), engine="zarr", backend_kwargs=dict(consolidated=False), chunks={"valid_time": 1}
)
ds


## Make a Map

In [None]:
ds["t2m"][-1].plot()