# Visualizing CONUS404 and reference data 
 
 Author: Hannah Podzorski, USGS

 Date: 2024-04-03
 
<img src='../../../doc/assets/Eval_Viz.svg' width=600>

The purpose of visualization notebooks is to look at data in pretty ways.

<details>
  <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>
  
  <table>
    <tr>
      <td>Pre-Requisites
      <td>To get the most out of this notebook, you should already have an understanding of these topics: 
        <ul>
        <li>pre-req one
        <li>pre-req two
        </ul>
    <tr>
      <td>Expected Results
      <td>At the end of this notebook, you should be able to: 
        <ul>
        <li>outcome one
        <li>outcome two
        </ul>
  </table>
</details>

## Using the **DRY** principle

This visualization notebook was developed with the "**D**on't **R**epeat **Y**ourself" (**DRY**) principle for software development in mind. The DRY principle promotes minimizing redundancy by creating reusable components, such as functions or modules, that can be used multiple times within a codebase. 

Reducing redundancy minimizes errors while improving readability, consistency, maintainability, and collaboration. 

- **Errors** are minimized, especially those relating to copy and pasting, by encouraging the development of reusable components. 

- **Readability** improves by shortening the codebase making it easier to navigate.

- **Consistency** and **maintainability** are improved because specific functionality exists only in one place within the codebase and any changes to that functionality will permeate throughout the codebase. 

- **Collaboration** improves by ensuring that efforts are not duplicated across collaborators and the modular structure prevents collaborators from interfering with each other’s work. 

For this notebook we utilize the `HoloViz` python package for visualization. `HoloViz` is designed to help reduce redundancy by allowing components to be used multiple times between charts. For example, inputs from a date slider that provides start and end times or a drop down that allows the user to pick a parameter can be used to filter the data in multiple charts. See if you can identify where the DRY principle is utilized in the code below.

In [1]:
# library imports
import os
import cf_xarray
import dask
from dask.distributed import LocalCluster, Client
import fsspec 
import geopandas as gpd
# import hvplot.pandas
import hvplot.xarray
import intake
import math
import numpy as np
import pandas as pd
import pygeohydro
import sparse 
import warnings
import xarray as xr

from shapely.geometry import Polygon

warnings.filterwarnings('ignore')


## **Start a Dask client using an appropriate Dask Cluster** 
This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.

### Setup your client on your local PC or on HPC like this:

In [2]:
# check for existing Dask cluster
if "client" in locals():
    print("Shutting down existing Dask cluster.")
    cluster.close()
    client.close()

cluster = LocalCluster()
client = Client(cluster)

print(f"The link to the Dask dashboard is {client.dashboard_link}. If on HPC, this may not be available.")

The link to the Dask dashboard is http://127.0.0.1:8787/status. If on HPC, this may not be available.


## Accessing tutorial CONUS404 data from OSN using `intake`

First, we will instantiate a connection to the HyTEST `intake` catalog YML and than we will access the forcings tutorial sub catalog. See below.

In [3]:
# connect to HyTEST catalog
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
cat = intake.open_catalog(url)

# access tutorial catalog
conus404_drb_cat = cat["conus404-drb-eval-tutorial-catalog"]

Let's get a description of each data set and what type of data it contains (gridded or tabular). 

In [7]:
# print data sets and descriptions
for item in list(conus404_drb_cat):
    descr = conus404_drb_cat[item].description
    if conus404_drb_cat[item].metadata.get("gridded") == True:
        data_type = "Gridded"
    else:
        data_type = "Tabular"
    print(f"{item} ({data_type}): {descr}\n")

c404-ceres-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to CERES-EBAF

c404-crn-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to CRN

c404-drb-zonal-OSN (Tabular): CONUS404 zonal statistics of Delware River Basin

c404-hcn-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to HCN

c404-prism-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to PRISM

ceres-drb-OSN (Gridded): CERES-EBAF Delaware River Basin subset, 40 years of monthly data for CONUS404 forcings evaluation

ceres-drb-zonal-OSN (Tabular): CERES-EBAF zonal statistics of Delware River Basin

conus404-drb-OSN (Gridded): CONUS404 Delaware River Basin subset, 40 years of monthly data for CONUS404 forcings evaluation

crn-drb-OSN (Tabular): Climate Reference Network subset, 40 years of monthly data for CONUS404 forcings evaluation

crn-drb-point-OSN (Tabular): CRN and CONUS404 point sta

Next let's read in the data sets. Below are examples of how to read in tabular and gridded data sets.

In [56]:
# Example of tabular data
conus404_drb_zonal = conus404_drb_cat['c404-drb-zonal-OSN'].read()
conus404_drb_zonal

Unnamed: 0,huc6,time,PREC_NC_ACC,RNET,TK
0,020401,1980-01,51.166572,8.634619,267.390965
1,020401,1980-02,39.551063,37.497082,265.023723
2,020401,1980-03,180.614316,71.697241,271.726642
3,020401,1980-04,133.649421,117.075991,279.958756
4,020401,1980-05,50.195687,167.817963,287.487071
...,...,...,...,...,...
1023,020402,2022-06,98.177848,191.787350,295.226971
1024,020402,2022-07,78.151746,180.768256,299.089277
1025,020402,2022-08,96.719828,159.768490,298.274521
1026,020402,2022-09,63.366276,102.287688,293.287686


In [13]:
# Example of gridded data
prism_drb = conus404_drb_cat['prism-drb-OSN'].to_dask()
prism_drb

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 17.37 MiB 16.58 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.69 MiB 8.29 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float32 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
