# Hyperspectral data exploration - Water 

## I.L. 21.10.2024

## Goal

In this notebook, we’ll walk through creating a basic interactive application using NumPy, Pandas, and hvPlot. If you haven’t installed hvPlot yet, you can do so with pip install hvplot or conda install -c conda-forge hvplot.

Let’s envision what our app will look like:

![](./GLORIA/GLORIA_plot.png)

## Tools

We will using [*Panel*](https://panel.holoviz.org/index.html), a Python library designed to streamline the development of robust tools, dashboards, and complex applications. Panel integrates seamlessly with the PyData ecosystem, offering powerful, interactive data tables, visualizations, and much more, to unlock, visualize, share, and collaborate on your data for efficient workflows.

Panel is a component of the [*HoloViz*](https://holoviz.org/) ecosystem, providing a gateway to a cohesive suite of data exploration tools.

### Fetching the data

First, let’s import the necessary dependencies and define some variables:

In [1]:
import holoviews as hv
import hvplot.pandas
import numpy as np
import pandas as pd
import panel as pn

PRIMARY_COLOR = "#0072B5"
SECONDARY_COLOR = "#B54300"
TERCIARY_COLOR = "#50C878" 

Next, we’ll import the Panel JavaScript dependencies using pn.extension(...). For a visually appealing and responsive user experience, we’ll set the design to "material" and the sizing_mode to stretch_width:

In [2]:
pn.extension(design="material", sizing_mode="stretch_width")

### GLORIA

The GLObal Reflectance community dataset for Imaging and optical sensing of Aquatic environments (GLORIA) includes 7,572 curated hyperspectral remote sensing reflectance measurements at 1 nm intervals within the 350 to 900 nm wavelength range. In addition, at least one co-located water quality measurement of chlorophyll a, total suspended solids, absorption by dissolved substances, and Secchi depth, is provided.

![](./GLORIA/GLORIA.png)


Now, let’s load the GLORIA dataset that measured global water quality and hyperpectral signatures. We’ll speed up our application by caching (@pn.cache) the data across users:

In [11]:
CSV_FILE2 = (
    "./GLORIA/GLORIA_Rrs.csv")

In [12]:
data = pd.read_csv(CSV_FILE2)

In [13]:
data.tail()

Unnamed: 0,GLORIA_ID,Rrs_350,Rrs_351,Rrs_352,Rrs_353,Rrs_354,Rrs_355,Rrs_356,Rrs_357,Rrs_358,...,Rrs_891,Rrs_892,Rrs_893,Rrs_894,Rrs_895,Rrs_896,Rrs_897,Rrs_898,Rrs_899,Rrs_900
7567,GID_7769,0.001315,0.001301,0.001291,0.001302,0.001313,0.001324,0.001336,0.001348,0.00136,...,,,,,,,,,,
7568,GID_7770,0.001621,0.001602,0.001662,0.001684,0.001705,0.001727,0.001796,0.001791,0.001785,...,,,,,,,,,,
7569,GID_7771,0.000992,0.000979,0.001013,0.001026,0.001038,0.00105,0.001091,0.001087,0.001084,...,,,,,,,,,,
7570,GID_7772,0.000538,0.000531,0.000549,0.000555,0.00056,0.000565,0.000584,0.000582,0.000579,...,,,,,,,,,,
7571,GID_7773,0.000426,0.000419,0.000431,0.000434,0.000437,0.00044,0.000452,0.00045,0.000447,...,,,,,,,,,,


In [14]:
data.columns = data.columns.str.replace('Rrs_', '')

In [15]:
data.tail()

Unnamed: 0,GLORIA_ID,350,351,352,353,354,355,356,357,358,...,891,892,893,894,895,896,897,898,899,900
7567,GID_7769,0.001315,0.001301,0.001291,0.001302,0.001313,0.001324,0.001336,0.001348,0.00136,...,,,,,,,,,,
7568,GID_7770,0.001621,0.001602,0.001662,0.001684,0.001705,0.001727,0.001796,0.001791,0.001785,...,,,,,,,,,,
7569,GID_7771,0.000992,0.000979,0.001013,0.001026,0.001038,0.00105,0.001091,0.001087,0.001084,...,,,,,,,,,,
7570,GID_7772,0.000538,0.000531,0.000549,0.000555,0.00056,0.000565,0.000584,0.000582,0.000579,...,,,,,,,,,,
7571,GID_7773,0.000426,0.000419,0.000431,0.000434,0.000437,0.00044,0.000452,0.00045,0.000447,...,,,,,,,,,,


In [16]:
##### Analyzing GLORIA data #######
metadata = pd.read_csv("./GLORIA/GLORIA_meta_and_lab.csv", index_col="GLORIA_ID")

In [17]:
metadata.head()

Unnamed: 0_level_0,Organization_ID,Dataset_ID,Sample_ID,LIMNADES_ID,LIMNADES_UID,SeaBASS_ID,Data_collection_purpose,Special_event_flag,Site_name,Country,...,TSS_method,aCDOM_method,Chla,Chla_plus_phaeo,TSS,aCDOM440,Turbidity,Secchi_depth,AOT,Comments
GLORIA_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GID_1,UT-TO,AlikasK_EE_UT-TO,53,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,8.16,2.67,2.532844,,1.85,,Lake Peipsi
GID_2,UT-TO,AlikasK_EE_UT-TO,54,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,8.6,5.67,2.624947,,1.8,,Lake Peipsi
GID_3,UT-TO,AlikasK_EE_UT-TO,55,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,7.27,8.0,2.578895,,1.8,,Lake Peipsi
GID_4,UT-TO,AlikasK_EE_UT-TO,56,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,7.3,7.0,3.085464,,1.15,,Lake Peipsi
GID_5,UT-TO,AlikasK_EE_UT-TO,57,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,13.05,8.67,3.039412,,1.0,,Lake Peipsi


In [18]:
metadata.shape

(7572, 63)

## Total Suspended Solids (TSS)

![](./GLORIA/TSS_TSD.png)

In [19]:
# selecting rows based on condition
a = metadata.index[metadata['TSS'] > 1000].tolist()

In [20]:
# selecting rows based on condition
#df[(df['age'] < 25) & df['name'].str.endswith('e')]
b = metadata.index[(metadata['TSS'] > 500) & (metadata['TSS'] < 1000)].tolist()

In [21]:
c = metadata.index[metadata['TSS'] < 1000].tolist()

In [22]:
@pn.cache
def get_data2():
  ##return pd.read_csv(CSV_FILE, parse_dates=["date"], index_col="date")
  data = pd.read_csv(CSV_FILE2,  index_col="GLORIA_ID")
  data.columns = data.columns.str.replace('Rrs_', '')
  return data

In [23]:
data3 = get_data2()

data3.tail()

Unnamed: 0_level_0,350,351,352,353,354,355,356,357,358,359,...,891,892,893,894,895,896,897,898,899,900
GLORIA_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GID_7769,0.001315,0.001301,0.001291,0.001302,0.001313,0.001324,0.001336,0.001348,0.00136,0.001365,...,,,,,,,,,,
GID_7770,0.001621,0.001602,0.001662,0.001684,0.001705,0.001727,0.001796,0.001791,0.001785,0.001881,...,,,,,,,,,,
GID_7771,0.000992,0.000979,0.001013,0.001026,0.001038,0.00105,0.001091,0.001087,0.001084,0.001141,...,,,,,,,,,,
GID_7772,0.000538,0.000531,0.000549,0.000555,0.00056,0.000565,0.000584,0.000582,0.000579,0.000615,...,,,,,,,,,,
GID_7773,0.000426,0.000419,0.000431,0.000434,0.000437,0.00044,0.000452,0.00045,0.000447,0.000474,...,,,,,,,,,,


## Visualizing a Subset of the Data

Before diving into Panel, let’s create a function that smooths one of our time series and identifies outliers. Then, we’ll plot the result using hvPlot:

In [24]:
data4 = data3[data3.index.isin(a)]

In [25]:
data5 = data4.mean(axis=0)

In [26]:
type(data5)

pandas.core.series.Series

In [27]:
def transform_data(variable, middle, high):
    """Calculates the rolling average and identifies outliers"""
    a = metadata.index[metadata[variable] > middle].tolist()
    b = metadata.index[(metadata[variable] > middle) & (metadata['TSS'] < high)].tolist()
    c = metadata.index[metadata[variable] < middle].tolist()
    data4 = data3[data3.index.isin(a)]
    avg1 = data4.mean(axis=0)
    data4 = data3[data3.index.isin(b)]
    avg2 = data4.mean(axis=0)
    data4 = data3[data3.index.isin(c)]
    avg3 = data4.mean(axis=0)
    return avg1, avg2, avg3

In [28]:
avg1, avg2, avg3 = transform_data('TSS', 500, 1000)

In [29]:
avg1

350    0.011894
351    0.012078
352    0.012158
353    0.012166
354    0.012240
         ...   
896    0.047805
897    0.047753
898    0.047698
899    0.047634
900    0.047575
Length: 551, dtype: float64

In [30]:
lista = ['400', '500', '600', '700', '800']

In [31]:
ttt = avg1[avg1.index.isin(lista)]

In [32]:
ttt['400']

0.013451264793444337

In [33]:
def get_plot(variable, middle,high):
    """Plots the average for each range"""
    avg1, avg2, avg3 = transform_data(variable, middle, high)
    lista = ['400', '500', '600', '700']
    #data = avg1[avg1.index.isin(lista)]
    #vlines = hv.VLines(data, lista)
    return avg1.hvplot(
        height=300, legend=True, color=PRIMARY_COLOR, line_width=3, label= variable + ' > ' + str(high)
    ) * avg2.hvplot(color=SECONDARY_COLOR, legend=True, label= str(middle) + ' < ' + variable + ' < ' + str(high)
                           ) * avg3.hvplot(color=TERCIARY_COLOR, legend=True, label= variable + ' < ' + str(middle)
                                          ).opts(title="Average reflectance")

Now, we can call our get_plot function with specific parameters to obtain a plot with a single set of parameters:

In [34]:
get_plot('TSS', 500,1000)

Great! Now, let’s explore how different values for window and sigma affect the plot. Instead of reevaluating the above cell multiple times, let’s use Panel to add interactive controls and quickly visualize the impact of different parameter values.

## Exploring the Parameter Space

Let’s create some Panel slider widgets to explore the range of parameter values:

In [35]:
variable_widget = pn.widgets.Select(name="variable", value="TSS", options=['TSS'])
high_widget = pn.widgets.IntSlider(name="high", value=1000, start=800, end=1200)
middle_widget = pn.widgets.IntSlider(name="middle", value=500, start=100, end=500)

Now, let’s link these widgets to our plotting function so that updates to the widgets rerun the function. We can achieve this easily in Panel using pn.bind:

In [36]:
bound_plot = pn.bind(
    get_plot, variable=variable_widget, high=high_widget, middle=middle_widget
)

Once we’ve bound the widgets to the function’s arguments, we can layout the resulting bound_plot component along with the widgets using a Panel layout such as Column:

In [37]:
widgets = pn.Column(variable_widget, high_widget, middle_widget, sizing_mode="fixed", width=300)
pn.Column(widgets, bound_plot)

As long as you have a live Python process running, dragging these widgets will trigger a call to the get_plot callback function, evaluating it for whatever combination of parameter values you select and displaying the results.

## Optional - Serving the Notebook

You may be interested in serving the notebook. In such a case, uncomment the next  chunk of code and run it. Then, write the code to serve the notebook.   

In [2]:
# Uncomment if needed
#pn.template.MaterialTemplate(
#   site="Panel",
#  title="GLORIA App",
#    sidebar=[variable_widget, high_widget, middle_widget],
#    main=[bound_plot],
#).servable(); # The ; is needed in the notebook to not display the template. Its not needed in a script

In [3]:
# Write the code based on # See https://panel.holoviz.org/tutorials/intermediate/serve.html