# Visualizing CONUS404 and reference data 
 
 Author: Hannah Podzorski, USGS
 
<img src='../../../doc/assets/Eval_Viz.svg' width=600>

The purpose of this notebook is to visualize both gridded and tabular data sets. This notebook uses [`HoloViz`](https://holoviz.org/) and [`Panel`](https://panel.holoviz.org/index.html) to create interactive visuals that allow you to compare two datasets. `HoloViz` is designed to help reduce redundancy by allowing components to be used multiple times between charts.

These are the steps this notebook follows:

- Step 0: Import Libraries
- Step 1: Accessing the Data
- Step 2: Visualize and Compare Gridded Data
- Step 3: Visualize and Compare Tabular Data
- Step 4: Examine Comparisons

## **Step 0: Importing Libraries**

In [None]:
# library imports
import os
import cf_xarray
import dask
from dask.distributed import LocalCluster, Client
import fsspec 
import geopandas as gpd
import hvplot.xarray
import intake
import math
import numpy as np
import pandas as pd
import pygeohydro
import sparse 
import warnings
import xarray as xr

import panel as pn
import datetime as dt
import geoviews as gv
import holoviews as hv
import param
import hvplot.pandas

from shapely.geometry import Polygon

pn.extension(loading_indicator = True, defer_load = True)
warnings.filterwarnings('ignore')

## **Step 1: Accessing the Data**

First, we will instantiate a connection to the HyTEST intake catalog YML and then we will access the forcings tutorial sub catalog.

In [None]:
# connect to HyTEST catalog
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
cat = intake.open_catalog(url)

# access tutorial catalog
conus404_drb_cat = cat["conus404-drb-eval-tutorial-catalog"]

Let's get a description of each data set and what type of data it contains (gridded or tabular).

In [None]:
# print data sets names and descriptions
for item in list(conus404_drb_cat):
    descr = conus404_drb_cat[item].description
    if conus404_drb_cat[item].metadata.get("gridded") == True:
        data_type = "Gridded"
    else:
        data_type = "Tabular"
    print(f"{item} ({data_type}): {descr}\n")

Next we may want to start a dask client using an appropriate Dask Cluster. This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.

Setup your client on your local PC or on HPC like this:

In [None]:
# check for existing Dask cluster
if "client" in locals():
    print("Shutting down existing Dask cluster.")
    cluster.close()
    client.close()

cluster = LocalCluster()
client = Client(cluster)

print(f"The link to the Dask dashboard is {client.dashboard_link}. If on HPC, this may not be available.")

Datasets are brought into the notebook using Dask through a couple of steps. First, the entry (prism-drb-OSN) in the catalog (conus404_drb_cat) is indexed and the method `to_dask` will automatically load the data from the catalog entry. See below.

In [None]:
# example of how data is loaded using dask
prism_drb = conus404_drb_cat['prism-drb-OSN'].to_dask()
prism_drb

## **Step 2: Visualize and Compare Gridded Data**

Now we're going to create interactive maps of two different gridded datasets. The two different datasets we will be using are the CONUS404 Delaware River Basin subset (_'conus404-drb-OSN'_) and the PRISM Delaware River Basin subset (_'prism-drb-OSN'_).

Let's start by loading in our two datasets and combining them into a list. Note that the timesteps in these datasets are the same (monthly). However, they have different periods of record. We will use the first time point for comparison.

In [None]:
# gridded datasets
gridded_options = ['conus404-drb-OSN', 'prism-drb-OSN']

In [None]:
# load in selected data
def load_data(data_sources):
    _data = conus404_drb_cat[data_sources].to_dask()
    date = _data.time.min().values # select only the minimum time in the dataset
    return _data.sel(time = date)

def multi_load(data_source):
    _data = []
    for data in data_source:
        _data.append(load_data(data))
    return _data

data = multi_load(gridded_options)

Next, we will start creating the components of our panel app. We need to create a selector that will allow us to select the variable we would like to visualize in our maps. First, we need to determine which variables are present in both datasets. Then we can create a widget that will allow us to change between those variables.

In [None]:
# Create a list of variables that are in both data sets
def get_var(data): 
    _var = [value for value in data[0].data_vars if value in data[1].data_vars]
    return  _var

var = get_var(data) 

# Create variable selector
var_selector = pn.widgets.Select(
    name = "Select Variable",
    description = "Only variables available in both data sources are shown.",
    options = var)

We can also create a selector that will allow us to change the base map for our interactive maps.

In [None]:
# Create dictionary of base map options.
base_map_options = {
    'OpenStreetMap': gv.tile_sources.OSM,
    'ESRI Imagery': gv.tile_sources.EsriImagery,
    'ESRI World Street Map': gv.tile_sources.EsriWorldStreetMap,
}

# Create selector for base map
map_selector = pn.widgets.Select(
    name="Select a Base Map",
    options=list(base_map_options.keys()),
    value = 'OpenStreetMap'
)

We will also need to create a function that we can use to map our data. Below is a simple function that will create a map based on the dataset, variable, and basemap provided.

In [None]:
# Plot gridded data on map
def map(data, var, base_map):
    _base_map = base_map_options.get(base_map)
    _map = data[var].hvplot(
        x = 'x', y = 'y', geo = True, 
        rasterize = True, tiles = _base_map) 
    return _map
pn.rx(map)(data[0], var_selector, map_selector)

Finally, we can put all our components together in a panel app. We will do this by creating a function that will create our two maps and put them into a panel layout. We will then use the function `pn.rx()` to bind that function to its inputs, including the reactive inputs that we use to change variables and base maps. 

In [None]:
# Create Panel App Layout
def layout(data, source, var, base_map):
    map_0 = map(data[0], var, base_map)
    map_1 = map(data[1], var, base_map)

    layout = pn.Row(pn.Column(pn.pane.Markdown(f"## *{source[0]}*"), map_0), 
                    pn.Column(pn.pane.Markdown(f"## *{source[1]}*"), map_1))
    return layout

map_layout = pn.rx(layout)(data, gridded_options, var_selector, map_selector)

In [None]:
gridded_app = pn.Column("# Gridded Data Comparison", map_layout) # add an app title
gridded_app.servable() # display components

Above we displayed the app within this notebook, but you can also use the function `.show()` to open the app in a browser. Run the code below if you would like to do so.

In [None]:
#gridded_app.show('CONUS404 Gridded Dashboard')

## **Step 3: Visualize and Compare Tabular Data**

Now we're going to create interactive maps and timeseries plots of two different tabular datasets. The two different data sets we will use are the Climate Reference Network subset ('crn-drb-OSN') and the Historical Climate Network subset ('hcn-drb-OSN'). 

Let's start by loading in our two data sets and combining them into a list.

In [None]:
# subset gridded data sets
tabular_options = ['crn-drb-OSN', 'hcn-drb-OSN']

In [None]:
# To preview CRN data or HCN data tabular data, uncomment each line below.
#print('CRN', conus404_drb_cat['crn-drb-OSN'].read().head())
#print('HCN', conus404_drb_cat['hcn-drb-OSN'].read().head())

In [None]:
def load_data_tabular(data_sources):
    _data = conus404_drb_cat[data_sources].read()
    _data[["LATITUDE", "LONGITUDE"]] = _data[["LATITUDE", "LONGITUDE"]].astype(float)
    return _data

def multi_load(data_source):
    _data = []
    for data in data_source:
        _data.append(load_data_tabular(data))
    return _data

data = multi_load(tabular_options)

Now we can get started on our components for our panel app. This will be very similar to what we did for the gridded data. We will create a widget to select the variable we would like to visualize and the base map we would like to use. In addition, we will also need widgets to select the stations we would like to view on our timeseries plot from each of the datasets. 

In [None]:
# Create selector of variables
var_selector_tabular = pn.widgets.Select(
    name = "Select Variable",
    options = ['PREC_ACC_NC', 'TK'])

In [None]:
# Add selector for base map
base_map_options = {
    'OpenStreetMap': gv.tile_sources.OSM,
    'ESRI Imagery': gv.tile_sources.EsriImagery,
    'ESRI World Street Map': gv.tile_sources.EsriWorldStreetMap,
}

map_selector_tabular = pn.widgets.Select(
    description="Use to select Base Map",
    name="Select a Base Map",
    options=list(base_map_options.keys()),
    value = 'OpenStreetMap'
)

In [None]:
# Create selectors for each dataset stations
ID_selector_0 = pn.widgets.Select(
    name = "Select a Climate Reference Network Station (blue)",
    options = list(data[0]["ID"].unique())) # pull out unique stations from the crn dataset

ID_selector_1 = pn.widgets.Select(
    name = "Select a Historical Climate Network Station (red)",
    options = list(data[1]["ID"].unique())) # pull out unique stations from the hcn dataset

We will also need functions that will plot our maps and our timeseries comparisons based on our selected variables. 

In [None]:
# Create map of the CRN and HCN locations so the proximity of the measurement locations can be examined in relation to the timeseries.
def create_map(data, base_map, ID_0, ID_1):
    # Combine the data from both datasets
    combined_data = pd.concat(data, ignore_index=True)

    # Create the base map layer
    base_map_layer = base_map_options.get(base_map)

    # Create some larger bounds around lat/long locations
    xlim = (combined_data['LONGITUDE'].min()-(0.5), combined_data['LONGITUDE'].max()+(0.5))
    ylim = (combined_data['LATITUDE'].min()-(0.5), combined_data['LATITUDE'].max()+(0.5))

    # Create points for all locations in black
    all_points = combined_data.hvplot.points(
        x='LONGITUDE', y='LATITUDE',
        geo=True, tiles=base_map_layer,
        color='black', size=50, hover_cols=['ID'],
        xlim = xlim, ylim = ylim,
        frame_width=150
    )

    # Highlight selected points
    highlighted_points_0 = combined_data[combined_data["ID"] == ID_0].hvplot.points(
        x='LONGITUDE', y='LATITUDE',
        geo=True, color='blue', size=100, hover_cols=['ID'],
        xlim = xlim, ylim = ylim
    ) if ID_0 else hv.Points([])

    highlighted_points_1 = combined_data[combined_data["ID"] == ID_1].hvplot.points(
        x='LONGITUDE', y='LATITUDE',
        geo=True, color='red', size=100, hover_cols=['ID'],
        xlim = xlim, ylim = ylim
    ) if ID_1 else hv.Points([])

    # Combine all layers
    combined_map = base_map_layer * all_points * highlighted_points_0 * highlighted_points_1
    return combined_map

In [None]:
# Create timeseries plots
def timeseries(data, ID_0, ID_1, var):
    _data0 = data[0][data[0]["ID"] == ID_0]
    _data1 = data[1][data[1]["ID"] == ID_1]
    _data = pd.concat([_data0, _data1], ignore_index=True)
    plot = _data.hvplot.line(x = "DATE", y = var, by = 'ID')
    return plot

Finally, we can put everything together in our panel app. Once again, we will create a function that creates the layout for our panel app and then bind it to the reactive variables using the `pn.rx()` function. 

In [None]:
# Create Layout
def plot_layout(source, data, var, base_map, ID_0, ID_1):
    # map
    map_0 = create_map(data, base_map, ID_0, ID_1)
    
    # timeseries
    plot = timeseries(data, ID_0, ID_1, var)

    # identify rows in layout
    row_1 = map_0
    row_2 = pn.Row(plot)
    
    return pn.Column(row_1, row_2)

map_layout = pn.rx(plot_layout)(tabular_options, data, var_selector_tabular, map_selector_tabular, ID_selector_0, ID_selector_1)

In [None]:
tabular_app = pn.Column("# Tabular Data Comparison", map_layout) # add a title
tabular_app.servable() # display components

Above we displayed the app within this notebook, but you can also use the function `.show()` to open the app in a browser. Run the code below if you would like to do so.

In [None]:
#tabular_app.show('CONUS404 Tabular Dashboard')

## **Step 4: Examine Comparisons**

There are various tabular datasets with the evaluation results that can be explored. For this example, we will examine how the station data compares to CONUS404.

#### Examine CONUS404 comparisons to CRN & HCN stations
There is only one Climate Reference Network station within the Delaware River Basin ("Avondale") that we can compare to CONUS404, meanwhile there are 14 Hydrologic Climate Network stations that we can compare to CONUS404. Let's combine the evaluation results of the station data and create a bar graph that has an option to select the statistic of interest. 

In [None]:
# Temperature & Precipitation (CONUS404 to HCN) Evaluation Results
hcn_eval = conus404_drb_cat['c404-hcn-drb-desc-stats-OSN'].to_dask()

# This will load the entire DataFrame into memory
hcn_eval = hcn_eval.compute()

In [None]:
# Temperature & Precipitation (CONUS404 to PRISM) Evaluation Results
crn_eval = conus404_drb_cat['c404-crn-drb-desc-stats-OSN'].to_dask()

# This will load the entire DataFrame into memory
crn_eval = crn_eval.compute()

# Add ID on dataframe
crn_eval['ID'] = 'Avondale'

In [None]:
# Trim columns that are consistent between the two datasets
columns = ['ID','stat','PREC_ACC_NC_c404','TK_c404']
crn_eval = crn_eval[columns]
hcn_eval = hcn_eval[columns]

Let's combine the comparative statistics from HCN and CRN compared to CONUS404

In [None]:
# Combine the datasets
df = pd.concat([crn_eval, hcn_eval], ignore_index=True)

# Remove summary statistics
df = df[~df['stat'].isin(['annual_mean', 'mean', 'median','stdev'])]

Let's create our interactive bar plot

In [None]:
# Create a selector for the statistics
stat_selector = pn.widgets.Select(name='Select Statistic', options=df['stat'].unique().tolist())

# Create a selector for the variable (precipitation or temperature)
var_selector = pn.widgets.Select(name='Select Variable', options=['PREC_ACC_NC_c404', 'TK_c404'])

# Function to update the bar graph based on the selected statistic
def update_bar_graph(stat, var_type):
    filt_df = df[df['stat'] == stat]
    return filt_df.hvplot.bar(x='ID', y=var_type, 
                              title=f'{var_type}', 
                              xlabel='ID', ylabel = stat,
                              rot = 45)

# Create a Panel with the selector and the bar graph
interactive_plot = pn.bind(update_bar_graph, stat=stat_selector, var_type = var_selector)

# Layout
layout = pn.Column(stat_selector, var_selector, interactive_plot)

# Serve the Panel
eval_app = pn.Column("# CONUS404 versus station data", layout)
eval_app.servable() # display