# Visualizing CONUS404 and reference data 
 
 Author: Hannah Podzorski, USGS
 
<img src='../../../doc/assets/Eval_Viz.svg' width=600>

The purpose of this notebook is to visualize both gridded and tabular data sets. This notebook uses [`HoloViz`](https://holoviz.org/) and [`Panel`](https://panel.holoviz.org/index.html) to create interactive visuals that allow you to compare two datasets.  

## Using the **DRY** principle

This visualization notebook was developed with the "**D**on't **R**epeat **Y**ourself" (**DRY**) principle for software development in mind. The DRY principle promotes minimizing redundancy by creating reusable components, such as functions or modules, that can be used multiple times within a codebase. 

Reducing redundancy minimizes errors while improving readability, consistency, maintainability, and collaboration. 

- **Errors** are minimized, especially those relating to copy and pasting, by encouraging the development of reusable components. 

- **Readability** improves by shortening the codebase making it easier to navigate.

- **Consistency** and **maintainability** are improved because specific functionality exists only in one place within the codebase and any changes to that functionality will permeate throughout the codebase. 

- **Collaboration** improves by ensuring that efforts are not duplicated across collaborators and the modular structure prevents collaborators from interfering with each other’s work. 

For this notebook we utilize the `HoloViz` python package for visualization. `HoloViz` is designed to help reduce redundancy by allowing components to be used multiple times between charts. For example, inputs from a date slider that provides start and end times or a drop down that allows the user to pick a parameter can be used to filter the data in multiple charts. See if you can identify where the DRY principle is utilized in the code below and how it can be improved.

These are the steps this notebook follows:

- Step 0: Import Libraries
- Step 1: Accessing the Data
- Step 2: Visualize and Compare Gridded Data
- Step 3: Visualize and Compare Tabular Data

## **Step 0: Importing Libraries**

In [1]:
# library imports
import os
import cf_xarray
import dask
from dask.distributed import LocalCluster, Client
import fsspec 
import geopandas as gpd
import hvplot.xarray
import intake
import math
import numpy as np
import pandas as pd
import pygeohydro
import sparse 
import warnings
import xarray as xr

import panel as pn
import datetime as dt
import geoviews as gv
import holoviews as hv
import param
import hvplot.pandas

from shapely.geometry import Polygon

pn.extension(loading_indicator = True, defer_load = True)
warnings.filterwarnings('ignore')

## **Step 1: Accessing the Data**

First, we will instantiate a connection to the HyTEST intake catalog YML and then we will access the forcings tutorial sub catalog.

In [2]:
# connect to HyTEST catalog
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
cat = intake.open_catalog(url)

# access tutorial catalog
conus404_drb_cat = cat["conus404-drb-eval-tutorial-catalog"]

Let's get a description of each data set and what type of data it contains (gridded or tabular).

In [3]:
# print data sets names and descriptions
for item in list(conus404_drb_cat):
    descr = conus404_drb_cat[item].description
    if conus404_drb_cat[item].metadata.get("gridded") == True:
        data_type = "Gridded"
    else:
        data_type = "Tabular"
    print(f"{item} ({data_type}): {descr}\n")

c404-ceres-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to CERES-EBAF

c404-crn-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to CRN

c404-drb-zonal-OSN (Tabular): CONUS404 zonal statistics of Delware River Basin

c404-hcn-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to HCN

c404-prism-drb-desc-stats-OSN (Tabular): Descriptive statistics for the comparison of CONUS404 to PRISM

ceres-drb-OSN (Gridded): CERES-EBAF Delaware River Basin subset, 40 years of monthly data for CONUS404 forcings evaluation

ceres-drb-zonal-OSN (Tabular): CERES-EBAF zonal statistics of Delware River Basin

conus404-drb-OSN (Gridded): CONUS404 Delaware River Basin subset, 40 years of monthly data for CONUS404 forcings evaluation

crn-drb-OSN (Tabular): Climate Reference Network subset, 40 years of monthly data for CONUS404 forcings evaluation

crn-drb-point-OSN (Tabular): CRN and CONUS404 point sta

Next we may want to start a dask client using an appropriate Dask Cluster. This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.

Setup your client on your local PC or on HPC like this:

In [4]:
# check for existing Dask cluster
if "client" in locals():
    print("Shutting down existing Dask cluster.")
    cluster.close()
    client.close()

cluster = LocalCluster()
client = Client(cluster)

print(f"The link to the Dask dashboard is {client.dashboard_link}. If on HPC, this may not be available.")

The link to the Dask dashboard is http://127.0.0.1:8787/status. If on HPC, this may not be available.


Datasets are brought into the notebook using Dask through a couple of steps. First, the entry (prism-drb-OSN) in the catalog (conus404_drb_cat) is indexed and the method `to_dask` will automatically load the data from the catalog entry. See below.

In [5]:
# example of how data is loaded using dask
prism_drb = conus404_drb_cat['prism-drb-OSN'].to_dask()
prism_drb

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 17.37 MiB 16.58 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,17.37 MiB,16.58 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.69 MiB 8.29 MiB Shape (495, 92, 50) (492, 92, 48) Dask graph 4 chunks in 2 graph layers Data type float32 numpy.ndarray",50  92  495,

Unnamed: 0,Array,Chunk
Bytes,8.69 MiB,8.29 MiB
Shape,"(495, 92, 50)","(492, 92, 48)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## **Step 2: Visualize and Compare Gridded Data**

Now we're going to create interactive maps of two different gridded datasets. The two different datasets we will be using are the CONUS404 Delaware River Basin subset ('conus404-drb-OSN') and the PRISM Delaware River Basin subset ('prism-drb-OSN'). For the purposes of this notebook, we will only visualize the most recent data in each dataset.

Let's start by loading in our two datasets and combining them into a list.

In [6]:
# gridded datasets
gridded_options = ['conus404-drb-OSN', 'prism-drb-OSN']

In [7]:
# load in selected data
def load_data(data_sources):
    _data = conus404_drb_cat[data_sources].to_dask()
    date = _data.time.max().values # select only the maximum time in the dataset
    return _data.sel(time = date)

def multi_load(data_source):
    _data = []
    for data in data_source:
        _data.append(load_data(data))
    return _data

data = multi_load(gridded_options)

Next, we will start creating the components of our panel app. We need to create a selector that will allow us to select the variable we would like to visualize in our maps. First we need to determine which variables are present in both datasets. Then we can create a widget that will allow us to change between those variables.

In [8]:
# Create a list of variables that are in both data sets
def get_var(data): 
    _var = [value for value in data[0].data_vars if value in data[1].data_vars]
    return  _var

var = get_var(data) 

# Create variable selector
var_selector = pn.widgets.Select(
    name = "Select Variable",
    description = "Only variables available in both data sources are shown.",
    options = var)

We can also create a selector that will allow us to change the base map for our interactive maps.

In [9]:
# Create dictionary of base map options.
base_map_options = {
    'OpenStreetMap': gv.tile_sources.OSM,
    'ESRI Imagery': gv.tile_sources.EsriImagery,
    'ESRI World Street Map': gv.tile_sources.EsriWorldStreetMap,
}

# Create selector for base map
map_selector = pn.widgets.Select(
    name="Select a Base Map",
    options=list(base_map_options.keys()),
    value = 'OpenStreetMap'
)

We will also need to create a function that we can use to map our data. Below is a simple function that will create a map based on the dataset, variable, and basemap provided.

In [10]:
# Plot gridded data on map
def map(data, var, base_map):
    _base_map = base_map_options.get(base_map)
    _map = data[var].hvplot(
        x = 'x', y = 'y', geo = True, 
        rasterize = True, tiles = _base_map) 
    return _map
pn.rx(map)(data[0], var_selector, map_selector)

BokehModel(combine_events=True, render_bundle={'docs_json': {'e86b015e-605f-453c-9bc1-b86cbf4ecbde': {'version…

Finally, we can put all our components together in a panel app. We will do this by creating a function that will create our two maps and put them into a panel layout. We will then use the function `pn.rx()` to bind that function to its inputs, including the reactive inputs that we use to change variables and base maps. 

In [11]:
# Create Panel App Layout
def layout(data, source, var, base_map):
    map_0 = map(data[0], var, base_map)
    map_1 = map(data[1], var, base_map)

    layout = pn.Row(pn.Column(pn.pane.Markdown(f"## *{source[0]}*"), map_0), 
                    pn.Column(pn.pane.Markdown(f"## *{source[1]}*"), map_1))
    return layout

map_layout = pn.rx(layout)(data, gridded_options, var_selector, map_selector)

In [12]:
gridded_app = pn.Column("# Gridded Data Comparison", map_layout) # add an app title
gridded_app.servable() # display components

BokehModel(combine_events=True, render_bundle={'docs_json': {'aaac9ade-4586-475c-bfd5-8fc390337921': {'version…

Above we displayed the app within this notebook, but you can also use the function `.show()` to open the app in a browser. Run the code below if you would like to do so.

In [13]:
# gridded_app.show('CONUS404 Gridded Dashboard')

## **Step 3: Visualize and Compare Tabular Data**

Now we're going to create interactive maps and timeseries plots of two different tabular datasets. The two different data sets we will use are the Climate Reference Network subset ('crn-drb-OSN') and the Historical Climate Network subset ('hcn-drb-OSN'). 

Let's start, once agian, by loading in our two data sets and combining them into a list.

In [14]:
# subset gridded data sets
tabular_options = ['crn-drb-OSN', 'hcn-drb-OSN']

In [15]:
def load_data_tabular(data_sources):
    _data = conus404_drb_cat[data_sources].read()
    _data[["LATITUDE", "LONGITUDE"]] = _data[["LATITUDE", "LONGITUDE"]].astype(float)
    return _data

def multi_load(data_source):
    _data = []
    for data in data_source:
        _data.append(load_data_tabular(data))
    return _data

data = multi_load(tabular_options)

The data we loaded includes timeseries data for each station within the dataset. We will want to create additional datasets of just the points we would like to plot on our maps. For this notebook, we will pull out the most recent data to plot on our maps.

In [16]:
# Pull out most recent data for mapping
def data_recent(data):
    _data = []
    for df in data:
        df_max = df[df['DATE'] == df['DATE'].max()]
        _data.append(df_max)
    return _data

df_recent = data_recent(data)

Now we can get started on our components for our panel app. This will be very similar to what we did for the gridded data. We will create a widget to select the variable we would like to visualize and the base map we would like to use. In addition, we will also need widgets to select the stations we would like to view on our timeseries plot from each of the datasets. 

In [17]:
# Create selector of variables
var_selector_tabular = pn.widgets.Select(
    name = "Select Variable",
    options = ['PREC_ACC_NC', 'TK'])

In [18]:
# Add selector for base map
base_map_options = {
    'OpenStreetMap': gv.tile_sources.OSM,
    'ESRI Imagery': gv.tile_sources.EsriImagery,
    'ESRI World Street Map': gv.tile_sources.EsriWorldStreetMap,
}

map_selector_tabular = pn.widgets.Select(
    description="Use to select Base Map",
    name="Select a Base Map",
    options=list(base_map_options.keys()),
    value = 'OpenStreetMap'
)

In [19]:
# Create selectors for each dataset stations
ID_selector_0 = pn.widgets.Select(
    name = "Select a Climate Reference Network Station",
    options = list(df_recent[0]["ID"].unique())) # pull out unique stations from the crn dataset

ID_selector_1 = pn.widgets.Select(
    name = "Select a Historical Climate Network Station",
    options = list(df_recent[1]["ID"].unique())) # pull out unique stations from the hcn dataset

We will also need functions that will plot our maps and out timeseries plot based on our selected variables. 

In [20]:
# Create Map
def map(data, var, base_map):
    map = data.hvplot.points(
        x = 'LONGITUDE', y = 'LATITUDE', 
        geo = True, tiles = base_map, 
        color = var, hover_cols=['ID', 'DATE', var])
    return map

# Create timeseries plots
def timeseries(data, ID_0, ID_1, var):
    _data0 = data[0][data[0]["ID"] == ID_0]
    _data1 = data[1][data[1]["ID"] == ID_1]
    _data = pd.concat([_data0, _data1], ignore_index=True)
    plot = _data.hvplot.line(x = "DATE", y = var, by = 'ID')
    return plot

Finally, we can put everything together in our panel app. Once again, we will create a function that creates the layout for our panel app and then bind it to the reactive variables using the `pn.rx()` function. 

In [21]:
# Create Layout
def plot_layout(source, data_recent, data, var, base_map, ID_0, ID_1):
    _base_map = base_map_options.get(base_map)
    map_0 = map(data_recent[0], var, _base_map)
    map_1 = map(data_recent[1], var, _base_map)
    plot = timeseries(data, ID_0, ID_1, var)
    row_1 = pn.Row(pn.Column(pn.pane.Markdown(f"## *{source[0]}*"), map_0), 
                   pn.Column(pn.pane.Markdown(f"## *{source[1]}*"), map_1))
    row_2 = pn.Row(plot)
    return pn.Column(row_1, row_2)

map_layout = pn.rx(plot_layout)(tabular_options, df_recent, data, var_selector_tabular, map_selector_tabular, ID_selector_0, ID_selector_1)

In [22]:
tabular_app = pn.Column("# Tabular Data Comparison", map_layout) # add a title
tabular_app.servable() # display components

BokehModel(combine_events=True, render_bundle={'docs_json': {'6530da27-bc2d-4f90-8919-46eb1eae9a22': {'version…

Above we displayed the app within this notebook, but you can also use the function `.show()` to open the app in a browser. Run the code below if you would like to do so.

In [23]:
# tabular_app.show('CONUS404 Tabular Dashboard')