# Visualizing CONUS404 and reference data 
 
 Author: Hannah Podzorski, USGS

 Date: 2024-04-03
 
<img src='../../../doc/assets/Eval_Viz.svg' width=600>

The purpose of visualization notebooks is to look at data in pretty ways.

<details>
  <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>
  
  <table>
    <tr>
      <td>Pre-Requisites
      <td>To get the most out of this notebook, you should already have an understanding of these topics: 
        <ul>
        <li>pre-req one
        <li>pre-req two
        </ul>
    <tr>
      <td>Expected Results
      <td>At the end of this notebook, you should be able to: 
        <ul>
        <li>outcome one
        <li>outcome two
        </ul>
  </table>
</details>

## Using the **DRY** principle

This visualization notebook was developed with the "**D**on't **R**epeat **Y**ourself" (**DRY**) principle for software development in mind. The DRY principle promotes minimizing redundancy by creating reusable components, such as functions or modules, that can be used multiple times within a codebase. 

Reducing redundancy minimizes errors while improving readability, consistency, maintainability, and collaboration. 

- **Errors** are minimized, especially those relating to copy and pasting, by encouraging the development of reusable components. 

- **Readability** improves by shortening the codebase making it easier to navigate.

- **Consistency** and **maintainability** are improved because specific functionality exists only in one place within the codebase and any changes to that functionality will permeate throughout the codebase. 

- **Collaboration** improves by ensuring that efforts are not duplicated across collaborators and the modular structure prevents collaborators from interfering with each other’s work. 

For this notebook we utilize the `HoloViz` python package for visualization. `HoloViz` is designed to help reduce redundancy by allowing components to be used multiple times between charts. For example, inputs from a date slider that provides start and end times or a drop down that allows the user to pick a parameter can be used to filter the data in multiple charts. See if you can identify where the DRY principle is utilized in the code below.

In [6]:
# library imports
import os
import cf_xarray
import dask
from dask.distributed import LocalCluster, Client
import fsspec 
import geopandas as gpd
import hvplot.xarray
import intake
import math
import numpy as np
import pandas as pd
import pygeohydro
import sparse 
import warnings
import xarray as xr

import panel as pn
import datetime as dt

from shapely.geometry import Polygon

warnings.filterwarnings('ignore')

# data
# connect to HyTEST catalog
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
cat = intake.open_catalog(url)

# access tutorial catalog
conus404_drb_cat = cat["conus404-drb-eval-tutorial-catalog"]
list(conus404_drb_cat)

['c404-ceres-drb-desc-stats-OSN',
 'c404-crn-drb-desc-stats-OSN',
 'c404-drb-zonal-OSN',
 'c404-hcn-drb-desc-stats-OSN',
 'c404-prism-drb-desc-stats-OSN',
 'ceres-drb-OSN',
 'ceres-drb-zonal-OSN',
 'conus404-drb-OSN',
 'crn-drb-OSN',
 'crn-drb-point-OSN',
 'hcn-drb-OSN',
 'hcn-drb-point-OSN',
 'prism-drb-OSN',
 'prism-drb-zonal-OSN']

## **Start a Dask client using an appropriate Dask Cluster** 
This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.

### Setup your client on your local PC or on HPC like this:

In [7]:
# check for existing Dask cluster
if "client" in locals():
    print("Shutting down existing Dask cluster.")
    cluster.close()
    client.close()

cluster = LocalCluster()
client = Client(cluster)

print(f"The link to the Dask dashboard is {client.dashboard_link}. If on HPC, this may not be available.")

Shutting down existing Dask cluster.
The link to the Dask dashboard is http://127.0.0.1:8787/status. If on HPC, this may not be available.


Setting up a Dask cluster for other environments will be added later. 

## Accessing already prepared CONUS404 data from OSN using `intake`

Datasets are brought into the notebook using Dask through a couple of steps. 

First, the entry (prism-drb-OSN) in the catalog (conus404_drb_cat) is indexed and the method `to_dask` will automatically load the data from the catalog entry. See below.

In [None]:
prism_drb = conus404_drb_cat['prism-drb-OSN'].to_dask()
prism_drb

## Gridded Maps

In [8]:
select_data_source = pn.widgets.Select(name = 'Select Data Source', options = list(conus404_drb_cat))

select_data_source

BokehModel(combine_events=True, render_bundle={'docs_json': {'467ef62d-e2e2-4a30-b489-96591d06a4ce': {'version…

In [18]:
# load in selected data
data = conus404_drb_cat[select_data_source.value].to_dask()
data

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 17.37 MiB 16.58 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.69 MiB 8.29 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float32 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [41]:
# Create selection of variable long names
var_names = [getattr(data[var], 'long_name') for var in list(data.keys())]

select_data_variable = pn.widgets.Select(name = 'Select Data SVariable', options = list(var_names))
select_data_variable

BokehModel(combine_events=True, render_bundle={'docs_json': {'2f5de937-d55d-42ac-b94d-9cd47e2f43f2': {'version…

In [62]:
# Subsetting data for selected variable
var = [var for var in list(data.keys()) if getattr(data[var], 'long_name') == select_data_variable.value]
data_var = data[var]

['PREC_ACC_NC']

In [88]:
# Create data slider based on min/max dates of the variable selected
min_date = data_var['time'][1].values
max_date = data_var['time'][-1].values

date_range_slider = pn.widgets.DateRangeSlider(
    name='Date Range Slider',
    start = min_date, end = max_date,
    value = (min_date, max_date),
    step = 2
)

date_range_slider

BokehModel(combine_events=True, render_bundle={'docs_json': {'c17edb41-9532-4e81-982c-f98d4f183365': {'version…