# Visualizing CONUS404 and reference data 
 
 Author: Hannah Podzorski, USGS

 Date: 2024-04-03
 
<img src='../../../doc/assets/Eval_Viz.svg' width=600>

The purpose of visualization notebooks is to look at data in pretty ways.

<details>
  <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>
  
  <table>
    <tr>
      <td>Pre-Requisites
      <td>To get the most out of this notebook, you should already have an understanding of these topics: 
        <ul>
        <li>pre-req one
        <li>pre-req two
        </ul>
    <tr>
      <td>Expected Results
      <td>At the end of this notebook, you should be able to: 
        <ul>
        <li>outcome one
        <li>outcome two
        </ul>
  </table>
</details>

## Using the **DRY** principle

This visualization notebook was developed with the "**D**on't **R**epeat **Y**ourself" (**DRY**) principle for software development in mind. The DRY principle promotes minimizing redundancy by creating reusable components, such as functions or modules, that can be used multiple times within a codebase. 

Reducing redundancy minimizes errors while improving readability, consistency, maintainability, and collaboration. 

- **Errors** are minimized, especially those relating to copy and pasting, by encouraging the development of reusable components. 

- **Readability** improves by shortening the codebase making it easier to navigate.

- **Consistency** and **maintainability** are improved because specific functionality exists only in one place within the codebase and any changes to that functionality will permeate throughout the codebase. 

- **Collaboration** improves by ensuring that efforts are not duplicated across collaborators and the modular structure prevents collaborators from interfering with each other’s work. 

For this notebook we utilize the `HoloViz` python package for visualization. `HoloViz` is designed to help reduce redundancy by allowing components to be used multiple times between charts. For example, inputs from a date slider that provides start and end times or a drop down that allows the user to pick a parameter can be used to filter the data in multiple charts. See if you can identify where the DRY principle is utilized in the code below.

In [45]:
# library imports
import os
import cf_xarray
import dask
from dask.distributed import LocalCluster, Client
import fsspec 
import geopandas as gpd
import hvplot.xarray
import intake
import math
import numpy as np
import pandas as pd
import pygeohydro
import sparse 
import warnings
import xarray as xr

import panel as pn
import datetime as dt
import geoviews as gv
import holoviews as hv
import metpy

from shapely.geometry import Polygon

warnings.filterwarnings('ignore')

# data
# connect to HyTEST catalog
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
cat = intake.open_catalog(url)

# access tutorial catalog
conus404_drb_cat = cat["conus404-drb-eval-tutorial-catalog"]
list(conus404_drb_cat)

['c404-ceres-drb-desc-stats-OSN',
 'c404-crn-drb-desc-stats-OSN',
 'c404-drb-zonal-OSN',
 'c404-hcn-drb-desc-stats-OSN',
 'c404-prism-drb-desc-stats-OSN',
 'ceres-drb-OSN',
 'ceres-drb-zonal-OSN',
 'conus404-drb-OSN',
 'crn-drb-OSN',
 'crn-drb-point-OSN',
 'hcn-drb-OSN',
 'hcn-drb-point-OSN',
 'prism-drb-OSN',
 'prism-drb-zonal-OSN']

## **Start a Dask client using an appropriate Dask Cluster** 
This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.

### Setup your client on your local PC or on HPC like this:

In [7]:
# check for existing Dask cluster
if "client" in locals():
    print("Shutting down existing Dask cluster.")
    cluster.close()
    client.close()

cluster = LocalCluster()
client = Client(cluster)

print(f"The link to the Dask dashboard is {client.dashboard_link}. If on HPC, this may not be available.")

Shutting down existing Dask cluster.
The link to the Dask dashboard is http://127.0.0.1:8787/status. If on HPC, this may not be available.


Setting up a Dask cluster for other environments will be added later. 

## Accessing already prepared CONUS404 data from OSN using `intake`

Datasets are brought into the notebook using Dask through a couple of steps. 

First, the entry (prism-drb-OSN) in the catalog (conus404_drb_cat) is indexed and the method `to_dask` will automatically load the data from the catalog entry. See below.

In [8]:
prism_drb = conus404_drb_cat['prism-drb-OSN'].to_dask()
prism_drb

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 17.37 MiB 16.58 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.69 MiB 8.29 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float32 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Gridded Maps

In [9]:
# subset gridded data sets
dataset_options = list(conus404_drb_cat)
gridded_options = [option for option in dataset_options if conus404_drb_cat[option].metadata.get("gridded") == True]

In [60]:
select_data_source = pn.widgets.Select(name = 'Select Data Source', options = list(gridded_options))
select_data_source

BokehModel(combine_events=True, render_bundle={'docs_json': {'3b272dac-bd9d-4b5c-b9f7-7a80db928b6a': {'version…

In [69]:
# load in selected data
def load_data(data_source):
    _data = conus404_drb_cat[data_source].to_dask()
    return _data

data = pn.bind(load_data, select_data_source)

In [71]:
# Create selection of variable 
def get_var_names(data_source):
    var_names = list(data_source.data_vars)
    return var_names

var_names = pn.bind(get_var_names, data)

select_data_variable = pn.widgets.Select(name = 'Select Data Variable', options = var_names)
select_data_variable

BokehModel(combine_events=True, render_bundle={'docs_json': {'e634db94-7d06-4a46-8cdf-2c4f1b9d7a7c': {'version…

2024-08-27 11:32:08,881 ERROR: panel.reactive - Callback failed for object named "Select Data Variable" changing property {'value': 'PREC_ACC_NC'} 
Traceback (most recent call last):
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/panel/reactive.py", line 388, in _process_events
    self.param.update(**self_events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2318, in update
    restore = dict(self_._update(arg, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2351, in _update
    self_._batch_call_watchers()
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2545, in _batch_call_watchers
    self_._execute_watcher(watcher, events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized

AttributeError: 'function' object has no attribute 'owner'

2024-08-27 11:32:18,621 ERROR: panel.reactive - Callback failed for object named "Select Data Variable" changing property {'value': 'TK'} 
Traceback (most recent call last):
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/panel/reactive.py", line 388, in _process_events
    self.param.update(**self_events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2318, in update
    restore = dict(self_._update(arg, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2351, in _update
    self_._batch_call_watchers()
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2545, in _batch_call_watchers
    self_._execute_watcher(watcher, events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", lin

AttributeError: 'function' object has no attribute 'owner'

In [108]:
# Subsetting data for selected variable
def get_data_array(dataset, variable):
    da = dataset[variable]
    return da

data_var = pn.bind(get_data_array, data, select_data_variable)
data_var()

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.69 MiB 8.29 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float32 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [53]:
# Create data slider based on min/max dates of the variable selected
def create_date_range_slider(data_array):
    min_date = data_array['time'][1].values
    max_date = data_array['time'][-1].values

    _date_range_slider = pn.widgets.DateRangeSlider(
        name='Date Range Slider',
        start = min_date, end = max_date,
        value = (min_date, max_date),
        step = 2
    )

    return _date_range_slider

date_range_slider = pn.bind(create_date_range_slider, data_var)

In [107]:
# Subsetting data for time selected
def get_data_array_time(dataset, time):
    data_time = dataset.sel(time = time[1])
    return data_time

data_var_time = pn.bind(get_data_array_time, data_var(), date_range_slider)
data_var().sel(time = dt.date(2019, 12, 4))
dt.date(2019, 12, 4)

datetime.date(2019, 12, 4)

In [75]:
# Add selector for base map
base_map_options = {
    'OpenStreetMap': gv.tile_sources.OSM,
    'ESRI Imagery': gv.tile_sources.EsriImagery,
    'ESRI World Street Map': gv.tile_sources.EsriWorldStreetMap,
}

map_selector = pn.widgets.Select(
    description="Use to select Base Map",
    name="Select a Base Map",
    options=list(base_map_options.keys()),
    value = 'OpenStreetMap'
)

In [114]:
# base map
def plot(data, base_map):
    return data.hvplot(x='x', y='y', rasterize=True)

bound_plot = pn.bind(plot, data = data_var(), base_map = map_selector)

In [117]:
col = pn.Column(select_data_source, select_data_variable, date_range_slider, map_selector, bound_plot)

In [118]:
pn.Row(col).show('CONUS404 Dashboard')

Launching server at http://localhost:39525


<panel.io.server.Server at 0x7f00ebd96410>

2024-08-27 11:36:45,594 ERROR: panel.reactive - Callback failed for object named "Select Data Source" changing property {'value': 'ceres-drb-OSN'} 
Traceback (most recent call last):
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/panel/reactive.py", line 388, in _process_events
    self.param.update(**self_events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2318, in update
    restore = dict(self_._update(arg, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2351, in _update
    self_._batch_call_watchers()
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized.py", line 2545, in _batch_call_watchers
    self_._execute_watcher(watcher, events)
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/param/parameterized

2024-08-27 11:36:45,597 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f01d6591c10>>, <Task finished name='Task-2204028' coro=<ServerSession.with_document_locked() done, defined at /home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/bokeh/server/session.py:77> exception=AttributeError("'function' object has no attribute 'owner'")>)
Traceback (most recent call last):
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/tornado/ioloop.py", line 750, in _run_callback
    ret = callback()
          ^^^^^^^^^^
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/tornado/ioloop.py", line 774, in _discard_future_result
    future.result()
  File "/home/hpodzorski/miniforge3/envs/pangeo/lib/python3.11/site-packages/bokeh/server/session.py", line 98, in _needs_document_lock_wrapper
    result = await resul