# Access GRIDMET as a virtual dataset 
The magic of Xarray/Zarr/Fsspec/Kerchunk allow a collection of NetCDF files on S3 to be accessed as a single virtual dataset. 

Here we demonstrate reading GridMET data (backed by 3 NetCDF4 files on S3, each with one variable). 

Parker Norton (USGS) created these files with chunking that would be approriate for a variety of use cases.

This should run anywhere as long as you have some AWS credentials, as it pulls data from requester_pays buckets

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
import xarray as xr
import fsspec
import intake
import hvplot.xarray

#### Open Intake Catalog
We load the Intake catalog here as a raw gist

In [None]:
#url = 'nhgf_intake.yml'
url = 'https://gist.githubusercontent.com/rsignell-usgs/4a33aa39cb377134538c0a2b46bafd93/raw/f1aaf97ac5b1e24266ffb391da7d969470ad17f0/nhgf_intake.yml'

#### Open GridMET Dataset into Xarray
Here we compare the time it takes to extract a time series of the gridmet data:
* NetCDF file with 10mb chunks using the native chunking with xarray
* NetCDF file with 10mb chunks using 100mb chunks in xarray
* NetCDF file with 100mb chunks using the native chunking in xarray

In [None]:
cat = intake.open_catalog(url)
list(cat)

In [None]:
var = 'daily_maximum_temperature'

In [None]:
ds1 = cat['gridmet-kerchunk-esip'].to_dask()

In [None]:
ds1[var]

In [None]:
ds1[var].encoding

In [None]:
%%timeit
ds1[var].sel(lon=-105.1352977, lat=39.7633285, method='nearest').load()

In [None]:
ds2 = cat['gridmet-kerchunk-esip-10x'].to_dask()

In [None]:
ds2[var]

In [None]:
ds2[var].encoding

In [None]:
%%timeit
ds2[var].sel(lon=-105.1352977, lat=39.7633285, method='nearest').load()

In [None]:
ds3 = cat['gridmet-kerchunk-esip-100mb'].to_dask()

In [None]:
ds3[var]

In [None]:
ds3[var].encoding

In [None]:
%%timeit
ds3[var].sel(lon=-105.1352977, lat=39.7633285, method='nearest').load()

#### We can select a specific time to plot...

In [None]:
%%time
date = '2017-08-26'
ppt = ds[var].sel(day=date).load()

In [None]:
ppt.hvplot.image(x='lon', y='lat', geo=True, colormap='turbo', 
                 rasterize=True, tiles='OSM', title=f'{var}:{date}')

#### Or extract the entire time series at a point...

In [None]:
%%time
ds[var].sel(lon=-105.1352977, lat=39.7633285, method='nearest').hvplot(grid=True)