# Explore Cloud-Optimized CONUS404 Dataset
This dataset was created by extracting specified variables from a collection of wrf2d output files, rechunking to better facilitate data extraction for a variety of use cases, and adding CF conventions to allow easier analysis, visualization and data extraction using Xarray and Holoviz.

In [None]:
import os
os.environ['USE_PYGEOS'] = '0'

import fsspec
import xarray as xr
import hvplot.xarray
import intake
import metpy
import cartopy.crs as ccrs


## Open Dataset

### 1) intake
For this demonstration notebook, we will open a cloud-native dataset. The details
of its access are stored in an `intake` catalog. 

In [None]:
cat = intake.open_catalog(
    r"https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml"
)
## NOTE: we happen to know this dataset's handle/name.
dataset = 'conus404-hourly-cloud' 
## If you did not know this name, you could list the datasets in the catalog with
## the command `list(cat)`

## But since we do know the name, let's see its metadata
cat[dataset]

**NOTE** This particular dataset has the `requester_pays` option set to true.  This means
we have to set up our AWS credentials, else we won't be able to load the data from S3 
object storage

In [None]:
os.environ['AWS_PROFILE'] = 'default'
%run ../environment_set_up/Help_AWS_Credentials.ipynb

### 2) Start parallel cluster
Some of the steps we will take are aware of parallel clustered compute environments
using `dask`. We're going to start a cluster now so that future steps can take advantage
of this ability. 

This is an optional step, but can speed up data loading significantly, especially 
when accessing data from the cloud.

In [None]:
%run ../environment_set_up/Start_Dask_Cluster_Nebari.ipynb
## If this notebook is not being run on Nebari/ESIP, replace the above 
## path name with a helper appropriate to your compute environment.  Examples:
# %run ../environment_set_up/Start_Dask_Cluster_Denali.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Tallgrass.ipynb

### 3) Explore and verify the dataset

In [None]:
print(f"Reading {dataset} metadata...", end='')
ds = cat[dataset].to_dask().metpy.parse_cf()
print("done")
# Examine the grid data structure for SNOW: 
ds.SNOW

Looks like this dataset is organized in three coordinates (x, y, and time).  There is a
`metpy_crs` attached:

In [None]:
crs = ds['SNOW'].metpy.cartopy_crs
crs

## Use Case 1:  Load the full domain at a specific time step

In [None]:
%%time
da = ds.SNOW.sel(time='2014-03-01 00:00').load()
### NOTE: the `load()` is dask-aware, so will operate in parallel if
### a cluster has been started. 

In [None]:
da.hvplot.quadmesh(
    x='lon', 
    y='lat', 
    rasterize=True, 
    geo=True, 
    tiles='OSM', 
    alpha=0.66, 
    cmap='plasma'
)

## Use case 2: Load the full time series at a specific grid cell

In [None]:
ds.PREC_ACC_NC

**SIDE NOTE**
To identify a point, we will start with its lat/lon coordinates.  But the
data is in Lambert Conformal Conic... need to re-project/transform using the
built-in `crs` we examined earlier: 

In [None]:
lat,lon = 39.978322,-105.2772194    
x, y = crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())   
print(x,y) # these vals are in LCC

In [None]:
%%time
da = ds.PREC_ACC_NC.sel(x=x, y=y, method='nearest').sel(time=slice('2013-01-01 00:00','2013-12-31 00:00')).load()

In [None]:
da.hvplot(x='time', grid=True)

## Stop cluster

In [None]:
client.close(); cluster.shutdown()