## <span style="color: blue;"> Fetching Temperature Difference (Surface - 850mb) With HRRR </span>

This notebook details the process for fetching the temperature difference of any of the finger lakes. If you want to skip the process, run all cells, and see the bottom block of code. That block will include specification for fetching batches of data.

### Step 0: Imports

If you haven't done so yet, refer to the directions in 'environment.yml' and create the appropriate conda environment for this project. Then, run the imports.

In [57]:
from herbie import Herbie
from herbie.toolbox import EasyMap

import numpy as np
import pandas as pd
import xarray as xr

import geopandas as gpd
from shapely.geometry import Point
import fiona  # used for listlayers / FileGDB access

from scipy.interpolate import griddata

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

import urllib
from pathlib import Path
import requests

#### Step 1: Creating Herbie Objects for the Appropriate Date/Time

To begin, we'll fetch the surface temperature and 850mb temperature GRIB files with Herbie. Essentially, GRIB files contain numerous parameters, such as relative humidity, temperature, precipiatation, and many more fields. This code subsets the GRIB files, getting only the temperature fields, and putting them into dataframes. This subsetting ensures we aren't downloading more data than we need. The function deltaTemp() returns a dataframe with the surface temperatures minus the 850mb temperatures.

In [58]:
#This function is just a helper method to return the current dateTime (-2 hours) in case the dateTime is unspecified
def herbieDateTime(dateTime=None):
    if dateTime is None:
        dt = pd.Timestamp.utcnow().floor("h") - pd.Timedelta(hours=2)
    else:
        dt = pd.Timestamp(dateTime)

    if dt.tzinfo is not None:
        dt = dt.tz_convert("UTC").tz_localize(None)
    return dt

In [59]:
#This function will return dataframes for the [Surface Temperature] and [850mb Temperature]
#Date must be provided in the format "Year-Month-Day" (e.g. "2025-12-25")
#Time must be provided in 24h format (range from "0:00" to "23:00")
#If unspecified, this method will fetch data from two hours prior.

def surfaceAnd850(dateTime=None):
    #Create the Herbie object for the corresponding date/time
    dateTime = herbieDateTime(dateTime)
    H = Herbie(
        dateTime, 
        model = "hrrr",
        product = "sfc",
        fxx = 0,
        save_dir=Path("herbie_cache"),
        overwrite=False
    )
    #Only sample the fields we need
    var_regex = r":TMP:surface|:TMP:850 mb"
    H.download(var_regex, verbose=False)
    #Use Regex to filter for the temperature field at the surface and 850mb
    dMixed = H.xarray(var_regex, remove_grib=True)
    d850mb = next(ds for ds in dMixed if "isobaricInhPa" in ds.coords)
    dSurface = next(ds for ds in dMixed if "surface" in ds.coords)
    return dSurface, d850mb

In [60]:
surfaceAnd850("2026-1-1 0:00")

✅ Found ┊ model=hrrr ┊ [3mproduct=sfc[0m ┊ [38;2;41;130;13m2026-Jan-01 00:00 UTC[92m F00[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ local[0m




(<xarray.Dataset> Size: 38MB
 Dimensions:              (y: 1059, x: 1799)
 Coordinates:
     time                 datetime64[ns] 8B 2026-01-01
     step                 timedelta64[ns] 8B 00:00:00
     surface              float64 8B 0.0
     latitude             (y, x) float64 15MB ...
     longitude            (y, x) float64 15MB ...
     valid_time           datetime64[ns] 8B ...
     gribfile_projection  object 8B None
 Dimensions without coordinates: y, x
 Data variables:
     t                    (y, x) float32 8MB ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP
     GRIB_subCentre:          0
     Conventions:             CF-1.7
     institution:             US National Weather Service - NCEP
     model:                   hrrr
     product:                 sfc
     description:             High-Resolution Rapid Refresh - CONUS
     remote_grib:             https://noaa-hrrr-bdp-p

#### Step 2: Fetch the Geometry of Lakes

To generate internal grids for the lakes, we need to fetch the geometry of our desired lakes from USGS NHD. The following code block will provide a function for downloading the GeoJSON file of any lake.

In [61]:
#Fetch a lake polygon from USGS NHD and save as GeoJSON
#Lake_name should be provided as a string (e.g. "Cayuga Lake")

def fetch_lake_geojson(lake_name="Cayuga Lake"):
    overwrite=False
    timeout=60
    out_dir = Path("finger_lakes_geojson")
    out_dir.mkdir(parents=True, exist_ok=True)

    fname = lake_name.lower().replace(" ", "_") + ".geojson"
    out_path = out_dir / fname

    if out_path.exists() and not overwrite:
        return out_path

    base = "https://hydro.nationalmap.gov/arcgis/rest/services/nhd/MapServer/12/query"

    where = f"GNIS_NAME = '{lake_name}' AND FTYPE = 390"
    params = {
        "where": where,
        "outFields": "GNIS_NAME,FTYPE",
        "returnGeometry": "true",
        "f": "geojson",
        "outSR": "4326",
    }

    url = base + "?" + urllib.parse.urlencode(params)
    r = requests.get(url, timeout=timeout)
    r.raise_for_status()
    geojson = r.json()

    features = geojson.get("features", [])
    if len(features) == 0:
        raise ValueError(f"No features returned for lake '{lake_name}'")
    if len(features) > 1:
        raise ValueError(
            f"Multiple features returned for lake '{lake_name}' "
            f"({len(features)} features). Refine query."
        )
    out_path.write_bytes(r.content)

    print(
        f"Saved {lake_name}: "
        f"{len(r.content):,} bytes → {out_path}"
    )

    return out_path

#### Step 3: Generate a Grid Within the Lake

The following code block uses the polygon we fetched from above and creates a uniform grid within the lake. It then returns a list of paired interior points in the format of (lat, lon). In the function, there's flexibility to specify the resolution of the grid values. By default, it's set to ~2km.

This code works by drawing a rectangle around the lake, scattering points to establish a rectangular grid, and then checking whether each grid point is within the lake.

In [62]:
def getLakeGrid(lake_name="Cayuga Lake", spacing=0.02):
    lake_geoJson = gpd.read_file(fetch_lake_geojson(lake_name))
    lakeGeom = lake_geoJson.geometry.iloc[0]
    minLon, minLat, maxLon, maxLat = lakeGeom.bounds #Define bounds for the rectangle surrounding the lake

    candidate_lons = np.arange(minLon, maxLon, spacing)
    candidate_lats = np.arange(minLat, maxLat, spacing)

    interior_points = []

    for lat in candidate_lats:
        for lon in candidate_lons:
            p = Point(lon, lat)
            if lakeGeom.contains(p):
                interior_points.append((lat, lon))
    return interior_points

#### Step 4: Fetch the Temperature Difference at Desired Interior Points

Now that we have the surface temperature and 850mb temperature dataframes, and the locations of the interior points, we can fetch the temperature at each of the points using Herbie's given pick_point() function. 

In short, Herbie uses curvilinear data (their model isn't a flat model, every latitude/longitude is specified by an ordered pair). So, to save effort, there's a built in function that takes the weighted average of temperature around a desired point.

In [63]:
#Fetch the temperature difference grid (Surface - 850mb), given the lake name and the date/time. Specify the spacing if desired.

def deltaTGrid(lake_name="Cayuga Lake", dateTime=None, spacing=0.02):
    dateTime = herbieDateTime(dateTime)
    interior_points = getLakeGrid(lake_name)
    lakeSurface, lake850mb = surfaceAnd850(dateTime)
    inLats = [p[0] for p in interior_points]
    inLons = [p[1] for p in interior_points]
    #Specify the points to be passed to Herbie's pick_points() function
    points = pd.DataFrame(
        {
            "latitude" : inLats,
            "longitude" : inLons,
        }
    )
    lakeSurfaceTemp = lakeSurface.herbie.pick_points(points)
    lakeSurfaceTemp = lakeSurfaceTemp["t"].values #Convert the temperature file from an xarray into a python list
    lake850mbTemp = lake850mb.herbie.pick_points(points)
    lake850mbTemp = lake850mbTemp["t"].values
    lakeTempDifference = lakeSurfaceTemp - lake850mbTemp

    projectedLakePoints = gpd.GeoDataFrame(
        {
            "lake" : lake_name,
            "timestamp" : pd.Timestamp(dateTime),
            "latitude" : inLats,
            "longitude" : inLons,
            "delta_t" : lakeTempDifference,
        },
        geometry=gpd.points_from_xy(inLons, inLats),
        crs="EPSG:4326",
    )
    
    return projectedLakePoints


##### <span style="color: purple;"> Parameter Specification </span>

<span style="color: green;"> avgDeltaTemp(<span style="color: pink;">lake_name, dateTime, spacing=0.02 </span>) </span>

<span style="color: pink;">
- lake_name: String of the desired lake (e.g. "Cayuga Lake") <br>
- dateTime: a pd.Timestamp object detailing the day/time (e.g. <span style="color: green;"> pd.Timestamp("2025-12-24 00:00", tz="UTC") </span>) <br>
- spacing: the distance between the sampled grid dots on the lake (default 0.02 ~ 2km)
</span>
 <br> <br>
<span style= "color: yellow;">
Return: Dataframe with:
- lake: String <br>
- dateTime: pd.Timestamp <br>
- meanDeltaT: float
</span>

In [64]:
def avgDeltaTemp(lake_name="Cayuga Lake", dateTime = None, spacing = 0.02):
    dateTime = herbieDateTime(dateTime)
    tempGrid = deltaTGrid(lake_name, dateTime, spacing)
    if tempGrid.empty:
        return pd.DataFrame(columns=["lake", "dateTime", "meanDeltaT"])
    return pd.DataFrame(
        {
            "lake": [tempGrid["lake"].iloc[0]],
            "dateTime" : [tempGrid["timestamp"].iloc[0]],
            "meanDeltaT" : [tempGrid["delta_t"].mean()],
        }
    )

In [None]:
print(avgDeltaTemp("Cayuga Lake", ))

✅ Found ┊ model=hrrr ┊ [3mproduct=sfc[0m ┊ [38;2;41;130;13m2026-Jan-03 05:00 UTC[92m F00[0m ┊ [38;2;255;153;0m[3mGRIB2 @ aws[0m ┊ [38;2;255;153;0m[3mIDX @ aws[0m
Downloading inventory file from self.idx='https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20260103/conus/hrrr.t05z.wrfsfcf00.grib2.idx'




          lake            dateTime  meanDeltaT
0  Cayuga Lake 2026-01-03 05:00:00   13.663022


In [None]:
startTime = pd.Timestamp()

NameError: name 'datetime' is not defined