# GDPTools and Conus404 processing 
This tutorial demonstrates the use of gdptools, a python package for area-weighted interpolation of *source* gridded datasets, such as conus404, to *target* polygonal geospatial fabrics.  Source datasets can be any gridded dataset that can be opened in XArray.  However it's important to note that gdptools, operations on XArray Datasets or DataArrays with dimentions of (Y,X,Time) generally.  As such climate datasets that have ensemble dimensions will require subsetting by ensemble to obtain the a dataset with the proper dimeions.  The target dataset can be any polygonal dataset that can be read by GeoPandas.  GDPtools also has capabilities of interpolating gridded data to lines as well, but our focus here is interpolating to polygons. 

This is the second in a series of tutorials illustrating how to use gdptools to process conus404 datasets to geospatial fabrics.  In this workflow, conus404 is aggregated to [**GeoSpatialFabric v1.1**](https://www.sciencebase.gov/catalog/item/5e29d1a0e4b0a79317cf7f63) (GFv1.1).  This is a CONUS scale spatial fabric with ~115,000 polygons.  In this tutorial we take advantage of the Hovenweep onprem version of conus404, and use the Jupyter interactive app on Hovenweep process our workflow co-located with the conus404 data, eliminating the overhead of downloading the data.  

We use the HyTest intake catalog to access the `conus404-daily-diagnostic-onprem-hw` version of conus404 on Hovenweep.  In addition we access GFv1.1 via the `geofabric_v1_1_nhru_v1_1_simp-osn` entry in the HyTest catalog.  Compared to the `Part 1 - Delaware River Basin` tutorial, the main difference is that to manage file size and memory overhead we process conus404 by year, generating 43 annual netcdf files of the interpolated data.

## Part 2 - Geospatial Fabric v1.1

In [1]:
# Common python packages
import xarray as xr
import hvplot.xarray
import hvplot.pandas
import hvplot.dask
import intake
import warnings
import intake_xarray
import intake_parquet
import intake_geopandas
import datetime
import holoviews as hv
import numpy as np
import pandas as pd
import geopandas as gpd

# HyRiver packages
from pynhd import NLDI, WaterData
import pygeohydro as gh
# GDPTools packages
from gdptools import AggGen, UserCatData, WeightGen
import os
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

hv.extension("bokeh")
warnings.filterwarnings('ignore')

Here we setup a variable the sets our local context, working on the HPC or working locally on your Desktop.  This just modifies the access point of the conus404 data, using the Hovenweep access for HPC and the OSN pod access for the Desktop.

In [2]:
t_sys = "HPC"  # "HPC" or "Desktop"


### Access data with HyTest intake catalog.  

- Use the `geofabric_v1_1-zip-osn` to read the Geospatial Fabric v1.1
- Use the `conus404-daily-diagnostic-onprem-hw` to read conus404

In [3]:
# open the hytest data intake catalog
# hytest_cat = intake.open_catalog("../dataset_catalog/hytest_intake_catalog.yml")
hytest_cat = intake.open_catalog("https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml")
list(hytest_cat)

['conus404-catalog',
 'benchmarks-catalog',
 'conus404-drb-eval-tutorial-catalog',
 'nhm-v1.0-daymet-catalog',
 'nhm-v1.1-c404-bc-catalog',
 'nhm-v1.1-gridmet-catalog',
 'trends-and-drivers-catalog',
 'nhm-prms-v1.1-gridmet-format-testing-catalog',
 'nwis-streamflow-usgs-gages-onprem',
 'nwis-streamflow-usgs-gages-osn',
 'nwm21-streamflow-usgs-gages-onprem',
 'nwm21-streamflow-usgs-gages-osn',
 'nwm21-streamflow-cloud',
 'geofabric_v1_1-zip-osn',
 'geofabric_v1_1_POIs_v1_1-osn',
 'geofabric_v1_1_TBtoGFv1_POIs-osn',
 'geofabric_v1_1_nhru_v1_1-osn',
 'geofabric_v1_1_nhru_v1_1_simp-osn',
 'geofabric_v1_1_nsegment_v1_1-osn',
 'gages2_nndar-osn',
 'wbd-zip-osn',
 'huc12-geoparquet-osn',
 'huc12-gpkg-osn',
 'nwm21-scores',
 'lcmap-cloud',
 'rechunking-tutorial-osn',
 'pointsample-tutorial-sites-osn',
 'pointsample-tutorial-output-osn']

In [4]:
# open the gfv1.1_simp file
gfv11_file = hytest_cat['geofabric_v1_1_nhru_v1_1_simp-osn']
gfv11 = gfv11_file.read()
gfv11

Unnamed: 0,nhru_v1_1,hru_segment_v1_1,nhm_id,hru_id_nat,Version,Shape_Length,Shape_Area,Change,geometry
0,76127,40038,76128,76128,1,80441.423292,1.881188e+08,-0.017302,"MULTIPOLYGON (((-105544.567 804074.976, -10541..."
1,76147,40038,76148,76148,1,53413.506290,4.418597e+07,0.054540,"MULTIPOLYGON (((-97185.217 806355.005, -97154...."
2,76170,40021,76171,76171,1,54988.828358,7.338919e+07,0.018316,"MULTIPOLYGON (((-105894.643 815045.861, -10570..."
3,76172,40019,76173,76173,1,31915.213081,3.629082e+07,0.049861,"MULTIPOLYGON (((-106762.641 815925.02, -106754..."
4,76181,40019,76182,76182,1,27390.023934,1.567346e+07,0.154732,"MULTIPOLYGON (((-109785.311 816675.047, -10978..."
...,...,...,...,...,...,...,...,...,...
114953,57964,31028,57965,57965,1,117117.944712,2.550804e+08,0.002392,"MULTIPOLYGON (((-294975.252 2728035.068, -2950..."
114954,64080,28886,64081,64081,1,67362.397583,1.102741e+08,0.002877,"MULTIPOLYGON (((-1284135 2658485, -1284035 265..."
114955,64150,28866,64151,64151,1,80355.338777,1.884838e+08,0.022769,"MULTIPOLYGON (((-1347645.097 2651834.819, -134..."
114956,65633,31412,65634,65634,1,71022.681331,7.386700e+07,-0.005068,"MULTIPOLYGON (((-1017265 2869285, -1017185 286..."


### Generate a quick plot of GFv11

In [23]:
# Plot using hvplot with datashading
plot = gfv11.hvplot(
    datashade=True,  # Enable datashading for large datasets
    aspect='equal'
)
# Display the plot
plot

### Load the conus404 dataset using the HyTest catalog

- In this case we are running this notebook on Hovenweep.

In [24]:
# open the hytest data intake catalog
hytest_cat = intake.open_catalog("https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml")
list(hytest_cat)

['conus404-catalog',
 'benchmarks-catalog',
 'conus404-drb-eval-tutorial-catalog',
 'nhm-v1.0-daymet-catalog',
 'nhm-v1.1-c404-bc-catalog',
 'nhm-v1.1-gridmet-catalog',
 'trends-and-drivers-catalog',
 'nhm-prms-v1.1-gridmet-format-testing-catalog',
 'nwis-streamflow-usgs-gages-onprem',
 'nwis-streamflow-usgs-gages-osn',
 'nwm21-streamflow-usgs-gages-onprem',
 'nwm21-streamflow-usgs-gages-osn',
 'nwm21-streamflow-cloud',
 'geofabric_v1_1-zip-osn',
 'geofabric_v1_1_POIs_v1_1-osn',
 'geofabric_v1_1_TBtoGFv1_POIs-osn',
 'geofabric_v1_1_nhru_v1_1-osn',
 'geofabric_v1_1_nhru_v1_1_simp-osn',
 'geofabric_v1_1_nsegment_v1_1-osn',
 'gages2_nndar-osn',
 'wbd-zip-osn',
 'huc12-geoparquet-osn',
 'huc12-gpkg-osn',
 'nwm21-scores',
 'lcmap-cloud',
 'rechunking-tutorial-osn',
 'pointsample-tutorial-sites-osn',
 'pointsample-tutorial-output-osn']

In [25]:
# open the conus404 sub-catalog
cat = hytest_cat['conus404-catalog']
list(cat)

['conus404-hourly-onprem-hw',
 'conus404-hourly-cloud',
 'conus404-hourly-osn',
 'conus404-daily-diagnostic-onprem-hw',
 'conus404-daily-diagnostic-cloud',
 'conus404-daily-diagnostic-osn',
 'conus404-daily-onprem-hw',
 'conus404-daily-cloud',
 'conus404-daily-osn',
 'conus404-monthly-onprem-hw',
 'conus404-monthly-cloud',
 'conus404-monthly-osn',
 'conus404-hourly-ba-onprem-hw',
 'conus404-hourly-ba-osn',
 'conus404-daily-ba-onprem',
 'conus404-daily-ba-osn',
 'conus404-pgw-hourly-onprem-hw',
 'conus404-pgw-hourly-osn',
 'conus404-pgw-daily-diagnostic-onprem-hw',
 'conus404-pgw-daily-diagnostic-osn']

There are a couple options we explore for accessing conus404.  If t_sys is set to HPC, the assumption is the notebook is run on USGS HPC Hovenweep, where we access the onprem version of the data, thus associated the workflow with the data, reducing the time access the data by eliminating the need to download the data.  This provides a significant improvement in processing time.  However, not all workflows require the use on an HPC, or it can be helpful to develop your workflow locally, before applying it to the HPC.  In this case, setting t_sys to Desktop accesses the conus404 data via the osn pod, which provides a fast connection to the data.  

In [26]:
## Select the dataset you want to read into your notebook and preview its metadata
if t_sys == "HPC":
    dataset = 'conus404-daily-diagnostic-onprem-hw'
elif t_sys == "Desktop":
    dataset = 'conus404-daily-diagnostic-osn' 
else:
    print("Please set the variable t_sys above to one of 'HPC' or 'Desktop'")        
cat[dataset]

conus404-daily-diagnostic-onprem-hw:
  args:
    consolidated: true
    urlpath: /caldera/hovenweep/projects/usgs/water/impd/hytest/conus404/conus404_daily_xtrm.zarr
  description: 'CONUS404 daily diagnostic output (maximum, minimum, mean, and standard
    deviation) for water vapor (Q2), grid-scale precipitation (RAINNC), skin temperature
    (SKINTEMP), wind speed at 10 meter height (SPDUV10), temperature at 2 meter height
    (T2), and U- and V-component of wind at 10 meters with respect to model grid (U10,
    V10). These files were created wrfxtrm model output files (see ScienceBase data
    release for more details: https://doi.org/10.5066/P9PHPK4F). This dataset is stored
    on USGS on-premise Caldera storage for Hovenweep and is only accessible via the
    USGS Hovenweep supercomputer.'
  driver: intake_xarray.xzarr.ZarrSource
  metadata:
    catalog_dir: https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/subcatalogs


In [28]:
# read in the dataset and use metpy to parse the crs information on the dataset
print(f"Reading {dataset} metadata...", end='')
ds = cat[dataset].to_dask().metpy.parse_cf()
ds

Reading conus404-daily-diagnostic-onprem-hw metadata...

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 110 graph layers,12 chunks in 110 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.29 MiB 478.52 kiB Shape (1015, 1367) (350, 350) Dask graph 12 chunks in 110 graph layers Data type float32 numpy.ndarray",1367  1015,

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 110 graph layers,12 chunks in 110 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 110 graph layers,12 chunks in 110 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.29 MiB 478.52 kiB Shape (1015, 1367) (350, 350) Dask graph 12 chunks in 110 graph layers Data type float32 numpy.ndarray",1367  1015,

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 110 graph layers,12 chunks in 110 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 2 graph layers,12 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.29 MiB 478.52 kiB Shape (1015, 1367) (350, 350) Dask graph 12 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015,

Unnamed: 0,Array,Chunk
Bytes,5.29 MiB,478.52 kiB
Shape,"(1015, 1367)","(350, 350)"
Dask graph,12 chunks in 2 graph layers,12 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 81.19 GiB 11.22 MiB Shape (15707, 1015, 1367) (24, 350, 350) Dask graph 7860 chunks in 2 graph layers Data type float32 numpy.ndarray",1367  1015  15707,

Unnamed: 0,Array,Chunk
Bytes,81.19 GiB,11.22 MiB
Shape,"(15707, 1015, 1367)","(24, 350, 350)"
Dask graph,7860 chunks in 2 graph layers,7860 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### GDPTools Background

In this section, we utilize three data classes from the `gdptools` package: `UserCatData`, `WeightGen`, and `AggGen`.

* [**UserCatData**](https://gdptools.readthedocs.io/en/develop/user_input_data_classes.html):  
  Serves as a data container for both the source and target datasets, along with their associated metadata. The instantiated object `user_data` is employed by both the `WeightGen` and `AggGen` classes.

* [**WeightGen**](https://gdptools.readthedocs.io/en/develop/weight_gen_classes.html):  
  Responsible for calculating the intersected areas between the source and target datasets. It generates normalized area-weights, which are subsequently used by the `AggGen` class to compute interpolated values between the datasets.

* [**AggGen**](https://gdptools.readthedocs.io/en/develop/agg_gen_classes.html):  
  Facilitates the interpolation of target data to match the source data using the areal weights calculated by `WeightGen`. This process is conducted over the time period specified in the `UserCatData` object.

### Instantiation of the `UserCatData` class.

In [31]:
# Coordinate Reference System (CRS) of the conus404 dataset
source_crs = ds.crs.crs_wkt

# Coordinate names of the conus404 dataset
x_coord = "x"
y_coord = "y"
t_coord = "time"

# Time period of interest for areal interpolation of conus404 to DRB HUC12s
# using the AggGen class below. Note: The dates follow the same format as the
# time values in the conus404 dataset.
sdate = "1979-10-01T00:00:00.000000000"
edate = "2022-10-01T00:00:00.000000000"

# Variables from the conus404 dataset used for areal interpolation
variables = ["T2MIN", "T2MAX", "RAINNCVMEAN"]

# CRS of the DRB HUC12 polygons
target_crs = 5070

# Column name for the unique identifier associated with target polygons.
# This ID is used in both the generated weights file and the areal interpolated output.
target_poly_idx = "nhru_v1_1"

# Common equal-area CRS for reprojecting both source and target data.
# This CRS is used for calculating areal weights in the WeightGen class.
weight_gen_crs = 5070

# Instantiate the UserCatData class, which serves as a container for both
# source and target datasets, along with associated metadata. The UserCatData
# object provides methods used by the WeightGen and AggGen classes to subset
# and reproject the data.
user_data = UserCatData(
    ds=ds,  # conus404 read from the intake catalog
    proj_ds=source_crs,
    x_coord=x_coord,
    y_coord=y_coord,
    t_coord=t_coord,
    var=variables,
    f_feature=gfv11,  # GFv1.1 read above from the intake catalog
    proj_feature=target_crs,
    id_feature=target_poly_idx,
    period=[sdate, edate],
)


bounds:  <class 'numpy.ndarray'> [-2186317.47246268 -1586110.25380368  2412768.595732    1700290.56017174]


### Weight Generation with `WeightGen`

In this section, we utilize the `WeightGen` class from the `gdptools` package to calculate the normalized areal weights necessary for interpolating gridded data (`conus404`) to polygonal boundaries (`DRB HUC12s`). The areal weights represent the proportion of each grid cell that overlaps with each polygon, facilitating accurate **areal interpolation** of the data. These weights are calculated using the `calculate_weights()` method.

**Note:** The `method` parameter can be set to one of `"serial"`, `"parallel"`, or `"dask"`. Given the scale of the gridded `conus404` data (4 km × 4 km) and the number and spatial footprint of the `DRB HUC12s`, using `"serial"` in this case is the most efficient method. In subsequent sections, we will explore how the `"parallel"` and `"dask"` methods can provide speed-ups in the areal interpolation process, as well as in the computation of weights for broader CONUS-wide targets.

### Parallel and Dask Methods

The domain of this workflow is small enough that using either the parallel or dask methods are not necessary.  However there is a speedup that we illustrate.  The parallel and dask engines used in the AggGen object operate in a similar manner using `multiprocessing` and `dask bag` respectivly. Using the jobs parameter the user can specify the number of processes to run.  The target data is chunked by the number of processes and each processor recieves a chunked GeoDataFrame along with a copy of the sub-setted source data.  This creates an overhead that can determine how effiently the parallel processing runs.

The tradeoff in using parallel processing lies in the balance between the number of processors and the overhead of copying data. While increasing the number of processors can significantly reduce computation time by dividing the workload, it also increases the amount of memory used for duplicate datasets and the coordination time between processes. There is a 'sweet spot' where the number of processors maximizes performance but beyond this point, additional processors may slow down the operation due to the overhead of managing more processes. The optimal number of processors depends on the size of the data, available memory, and system architecture, and can typically be found through experimentation.

Importantly, most of the time in processing here is dominated by downloading the data, so the speedup is relatively small.  For larger domains the processing will be a larger percentage of the total time and the speedup should be more pronounced.  Well explore that in the CONUS scale processing of conus404 on Hovenweep.

The large spatial footprint of the GFv1.1 means there will be many intersections to calculate when generating the weights.  We can get some performance by using the parallel method when generating the weights.  We'll calculate with both the serial and parallel engines to demonstrate.  NOTE: I am working with the Jupyter App on Hovenweep with 4 cores.  Generally 2-6 cores are in the sweet spot when using the parallel engine.  

In [33]:
%%time
wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    output_file="wghts_gfv11_c404daily.csv",
    weight_gen_crs=weight_gen_crs
)

wdf = wght_gen.calculate_weights()

Using serial engine
Generating grid-cell polygons finished in 72.82 second(s)
Data preparation finished in 73.4683 seconds
     - validating target polygons
     - fixing 0 invalid polygons.
     - validating source polygons
     - fixing 0 invalid polygons.
Validate polygons finished in 5.9417 seconds
     - reprojecting and validating source polygons
     - checking the source polygons for invalid polygons
     - checking source for empty polygons
     - reprojecting and validating target polygons
     - checking the target polygons for invalid polygons
     - checking target for empty polygons
Reprojecting to: EPSG:5070 and validating polygons finished in 8.07 seconds
Intersections finished in 90.7082 seconds
Weight gen finished in 90.7974 seconds
CPU times: user 2min 59s, sys: 1.46 s, total: 3min
Wall time: 3min


In [35]:
%%time
wght_gen = WeightGen(
    user_data=user_data,
    method="parallel",
    output_file="wghts_gfv11_c404daily_p.csv",
    weight_gen_crs=weight_gen_crs,
    jobs=4
)

wdf = wght_gen.calculate_weights()

Using parallel engine
Generating grid-cell polygons finished in 73.04 second(s)
Data preparation finished in 73.1781 seconds
     - validating target polygons
     - fixing 0 invalid polygons.
     - validating source polygons
     - fixing 0 invalid polygons.
Validate polygons finished in 5.9489 seconds
     - reprojecting and validating source polygons
     - checking the source polygons for invalid polygons
     - checking source for empty polygons
     - reprojecting and validating target polygons
     - checking the target polygons for invalid polygons
     - checking target for empty polygons
Reprojecting to: EPSG:5070 and validating polygons finished in 8.09 seconds
Weight gen finished in 57.5779 seconds
CPU times: user 1min 48s, sys: 2.1 s, total: 1min 50s
Wall time: 2min 27s


### Compute the areal weighted spatial interpolation

Because the result will be rather large.  To manage the file size and memory requirements for processing we process by year.  Additionaly, The conus404 data starts and ends on the water year dates, so we chose to process by water year in this case.  The code below generates a list of start_dates, end_dates, and years that we iterate over to process the data by year. 

In [36]:
t_start_series = pd.date_range(pd.to_datetime("1979-10-01"), periods=43, freq="YS-OCT")
t_end_series = pd.date_range(pd.to_datetime("1980-09-30"), periods=43, freq="Y-SEP ")
f_time_series = pd.date_range(pd.to_datetime("1980"), periods=43, freq="Y")

time_start = [t.strftime("%Y-%m-%dT%H:%M:%S.%f") for t in t_start_series]
time_end = [t.strftime("%Y-%m-%dT%H:%M:%S.%f") for t in t_end_series]
file_time = [t.strftime("%Y") for t in f_time_series]
time_start[:4], time_end[:4]

(['1979-10-01T00:00:00.000000',
  '1980-10-01T00:00:00.000000',
  '1981-10-01T00:00:00.000000',
  '1982-10-01T00:00:00.000000'],
 ['1980-09-30T00:00:00.000000',
  '1981-09-30T00:00:00.000000',
  '1982-09-30T00:00:00.000000',
  '1983-09-30T00:00:00.000000'])

### Areal Interpolation with the `AggGen` Class

In this section, we demonstrate the use of the `AggGen` class and its `calculate_agg()` method from the `gdptools` package to perform areal interpolation. We will explore all three `agg_engine` options: `"serial"`, `"parallel"`, and `"dask"`. The following links provide detailed documentation on the available parameter options:

* [**agg_engines**](https://gdptools.readthedocs.io/en/develop/agg_gen_classes.html#gdptools.agg_gen.AGGENGINES)
* [**agg_writers**](https://gdptools.readthedocs.io/en/develop/agg_gen_classes.html#gdptools.agg_gen.AGGWRITERS)
* [**stat_methods**](https://gdptools.readthedocs.io/en/develop/agg_gen_classes.html#gdptools.agg_gen.STATSMETHODS)

When using `AggGen` and the `calculate_agg()` method, it is important to consider the overlap between the source and target data when selecting the `stat_method` parameter value. All statistical methods have a masked variant in addition to the standard method; for example, `"mean"` and `"masked_mean"`. In cases where the source data has partial overlap with a target polygon, the `"mean"` method will return a missing value for the polygon, whereas the `"masked_mean"` method will calculate the statistic based on the available overlapping source cells. These considerations help users determine whether using a masked statistic is desirable or if a missing value would be preferred, allowing for post-processing of missing values (e.g., using nearest-neighbor or other approaches to handle the lack of overlap). In the case here conus404 completely covers the footprint of the DRB HUC12s, as such the `"mean"` method would be sufficient. 

Because we are processing by year, we have to create a new UserCatData object for each year processed.  
 

In [None]:
%%time
for index, _ts in enumerate(time_start):
    sdate = time_start[index]
    edate = time_end[index]
    print(sdate, edate)
    user_data = UserCatData(
        ds=ds,  # conus404 read from the intake catalog
        proj_ds=source_crs,
        x_coord=x_coord,
        y_coord=y_coord,
        t_coord=t_coord,
        var=variables,
        f_feature=gfv11,  # GFv1.1 read above from the intake catalog
        proj_feature=target_crs,
        id_feature=target_poly_idx,
        period=[sdate, edate],
    )
    
    agg_gen = AggGen(
        user_data=user_data,
        stat_method="mean",
        agg_engine="parallel",
        agg_writer="netcdf",
        weights='wghts_gfv11_c404daily_p.csv',
        out_path='.',
        file_prefix=f"{file_time[index]}_gfv11_c404_daily_diagnostic",
        jobs=4
    )
    ngdf, ds_out = agg_gen.calculate_agg()

1979-10-01T00:00:00.000000 1980-09-30T00:00:00.000000
bounds:  <class 'numpy.ndarray'> [-2186317.47246268 -1586110.25380368  2412768.595732    1700290.56017174]
Processing: T2MIN
    Data prepped for aggregation in 0.0027 seconds
    Data aggregated in 37.5621 seconds
Processing: T2MAX
    Data prepped for aggregation in 0.0042 seconds
    Data aggregated in 34.9255 seconds
Processing: RAINNCVMEAN
    Data prepped for aggregation in 0.0033 seconds
    Data aggregated in 34.1668 seconds
Saving netcdf file to 1980_gfv11_c404_daily_diagnostic.nc
1980-10-01T00:00:00.000000 1981-09-30T00:00:00.000000
bounds:  <class 'numpy.ndarray'> [-2186317.47246268 -1586110.25380368  2412768.595732    1700290.56017174]
Processing: T2MIN
    Data prepped for aggregation in 0.0028 seconds
