# Using Herbie Module to extract the data from a gridded forecast file

This script uses the Herbie module to extract the data from a gridded forecast file and is based on the [Herbie documentation](https://herbie.readthedocs.io/en/latest/). The Herbie module is a Python library that provides a simple interface to access gridded forecast data from various sources. It can be used to extract data from a variety of forecast models, including GFS, ECMWF, and ICON.

## Install Herbie

To install the Herbie module, you can use the following command:

```bash
conda install -c conda-forge herbie-data matplotlib toolbox
```


In [86]:
from herbie import Herbie
from metpy.units import units
import matplotlib.pyplot as plt
from herbie.toolbox import EasyMap, pc, ccrs
import numpy as np
import pandas as pd
import xarray as xr
import itertools


In [79]:
# Create a Herbie object
H = Herbie("2022-05-01 06:00", model='gefs', product="atmos.25", member = "c00", fxx = "18")
H.inventory()["variable"]

✅ Found ┊ model=gefs ┊ [3mproduct=atmos.25[0m ┊ [38;2;41;130;13m2022-May-01 06:00 UTC[92m F00[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m


0      GUST
1     MSLET
2      PRES
3       HGT
4     TSOIL
5     SOILW
6     WEASD
7      SNOD
8     ICETK
9       TMP
10      DPT
11       RH
12     TMAX
13     TMIN
14     UGRD
15     VGRD
16     CAPE
17      CIN
18     PWAT
19     HLCY
20     CAPE
21      CIN
22    PRMSL
Name: variable, dtype: object

In [53]:
ds = H.xarray("TMP|[U|V]GRD|GUST|TMAX|TMIN|D[S|L]WRF")

Note: Returning a list of [3] xarray.Datasets because cfgrib opened with multiple hypercubes.


[<xarray.Dataset> Size: 8MB
 Dimensions:              (latitude: 721, longitude: 1440)
 Coordinates:
     number               int64 8B 1
     time                 datetime64[ns] 8B 2022-05-01T06:00:00
     step                 timedelta64[ns] 8B 18:00:00
     heightAboveGround    float64 8B 10.0
   * latitude             (latitude) float64 6kB 90.0 89.75 89.5 ... -89.75 -90.0
   * longitude            (longitude) float64 12kB 0.0 0.25 0.5 ... 359.5 359.8
     valid_time           datetime64[ns] 8B 2022-05-02
     gribfile_projection  object 8B None
 Data variables:
     u10                  (latitude, longitude) float32 4MB -2.088 ... -0.6976
     v10                  (latitude, longitude) float32 4MB -2.99 -2.98 ... -4.63
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP
     GRIB_subCentre:          2
     Conventions:             CF-1.7
     institution:             US National Weather Se

In [76]:
point = pd.DataFrame.from_dict({"longitude": [-105.85], "latitude": [40.22], "stid": ["smr"]}, orient="columns")

def get_data(df, point):
    extracted = df.herbie.pick_points(points = point, method = "nearest")
    return extracted.to_dataframe()

result = [get_data(df, point) for df in ds]

result = pd.concat(result, axis=1)

# Get the column names
cols = result.columns

# Create a boolean mask for columns to keep
mask = ~cols.duplicated(keep='first')

# Use the mask to select only non-duplicated columns
result = result.loc[:, mask]

result

Unnamed: 0_level_0,u10,v10,number,time,step,heightAboveGround,latitude,longitude,valid_time,gribfile_projection,...,point_longitude,point_latitude,point_stid,t2m,tmax,tmin,gust,dswrf,dlwrf,surface
point,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,-3.267647,1.960347,1,2022-05-01 06:00:00,0 days 18:00:00,10.0,40.25,254.25,2022-05-02,,...,-105.85,40.22,smr,276.188568,281.705566,275.763977,5.760063,477.0,294.190674,0.0


# Apply this as a funciton

Here, we'll grab all the info we want for a single day

In [91]:
# Create a Herbie object
date = "2022-05-01"
member_list  = ("c00", "p01", "p02", "p03", "p04", "p05", "p06",
  "p07", "p08", "p09", "p10", "p11", "p12", "p13", 
  "p14", "p15", "p16", "p17", "p18", "p19", "p20",
  "p21", "p22", "p23", "p24", "p25", "p26", "p27",
  "p28", "p29", "p30")
fxx_list = (12, 36, 60, 84, 108, 132, 156, 180, 204, 228)

def get_gefs(date, member, fxx):
    dt = str(date + " 06:00")
    H = Herbie(dt, model='gefs', product="atmos.25", member = member, fxx = fxx)
    ds = H.xarray("TMP|[U|V]GRD|GUST|TMAX|TMIN|D[S|L]WRF")
    result = [get_data(df, point) for df in ds]
    result = pd.concat(result, axis=1)
    # Get the column names
    cols = result.columns
    # Create a boolean mask for columns to keep
    mask = ~cols.duplicated(keep='first')
    # Use the mask to select only non-duplicated columns
    result = result.loc[:, mask]
    return result


# Apply the function to all combinations
results = []
for member, fxx in itertools.product(member_list, fxx_list):
    results.append(get_gefs(date, member, fxx))

# Combine all results into a single DataFrame
final_result = pd.concat(results, keys=[(m, f) for m, f in itertools.product(member_list, fxx_list)])


✅ Found ┊ model=gefs ┊ [3mproduct=atmos.25[0m ┊ [38;2;41;130;13m2022-May-01 06:00 UTC[92m F12[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m
Note: Returning a list of [3] xarray.Datasets because cfgrib opened with multiple hypercubes.
✅ Found ┊ model=gefs ┊ [3mproduct=atmos.25[0m ┊ [38;2;41;130;13m2022-May-01 06:00 UTC[92m F36[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m
Note: Returning a list of [3] xarray.Datasets because cfgrib opened with multiple hypercubes.
✅ Found ┊ model=gefs ┊ [3mproduct=atmos.25[0m ┊ [38;2;41;130;13m2022-May-01 06:00 UTC[92m F60[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m
Note: Returning a list of [3] xarray.Datasets because cfgrib opened with multiple hypercubes.
✅ Found ┊ model=gefs ┊ [3mproduct=atmos.25[0m ┊ [38;2;41;130;13m2022-May-01 06:00 UTC[92m F84[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m
Note: 