# Hyperspectral data exploration - Water 

## I.L. 21.10.2024

## Goal

In this notebook, we’ll walk through creating a basic interactive application using NumPy, Pandas, and hvPlot. If you haven’t installed hvPlot yet, you can do so with pip install hvplot or conda install -c conda-forge hvplot.

Let’s envision what our app will look like:

![](./GLORIA/GLORIA_plot.png)

## Tools

We will using [*Panel*](https://panel.holoviz.org/index.html), a Python library designed to streamline the development of robust tools, dashboards, and complex applications. Panel integrates seamlessly with the PyData ecosystem, offering powerful, interactive data tables, visualizations, and much more, to unlock, visualize, share, and collaborate on your data for efficient workflows.

Panel is a component of the [*HoloViz*](https://holoviz.org/) ecosystem, providing a gateway to a cohesive suite of data exploration tools.

### Fetching the data

First, let’s import the necessary dependencies and define some variables:

In [1]:
import holoviews as hv
import hvplot.pandas
import numpy as np
import pandas as pd
import panel as pn

PRIMARY_COLOR = "#0072B5"
SECONDARY_COLOR = "#B54300"
TERCIARY_COLOR = "#50C878" 

ModuleNotFoundError: No module named 'holoviews'

Next, we’ll import the Panel JavaScript dependencies using pn.extension(...). For a visually appealing and responsive user experience, we’ll set the design to "material" and the sizing_mode to stretch_width:

In [3]:
pn.extension(design="material", sizing_mode="stretch_width")

### GLORIA

The GLObal Reflectance community dataset for Imaging and optical sensing of Aquatic environments (GLORIA) includes 7,572 curated hyperspectral remote sensing reflectance measurements at 1 nm intervals within the 350 to 900 nm wavelength range. In addition, at least one co-located water quality measurement of chlorophyll a, total suspended solids, absorption by dissolved substances, and Secchi depth, is provided.

The GLORIA dataset is publicly available from [PANGAEA](https://doi.pangaea.de/10.1594/PANGAEA.948492)

![](./GLORIA/GLORIA.png)


Now, let’s load the GLORIA dataset that measured global water quality and hyperpectral signatures. Note that, in my case, there are three datasets to read. This may not be your case.

In [11]:
# Defining the path to three datasets
CSVFILE1 = ("../../GLORIA1_Rrs.csv")
CSVFILE2 = ("../../GLORIA2_Rrs.csv")
CSVFILE3 = ("../../GLORIA3_Rrs.csv")

In [8]:
# uncomment if you have a single dataset to read
#CSVFILE = ("../../GLORIA_Rrs.csv")
#data = pd.read_csv(CSV_FILE, index_col=1)

In [14]:
# Reading three datasets
dt1 = pd.read_csv(CSVFILE1, index_col="GLORIA_ID")
dt2 = pd.read_csv(CSVFILE2, index_col="GLORIA_ID")
dt3 = pd.read_csv(CSVFILE3, index_col="GLORIA_ID")

In [16]:
# Ignore if you read a single dataset
# Concatenating three data frames
frames = [dt1, dt2, dt3]
data = pd.concat(frames, ignore_index=False)

In [17]:
data.shape

(7572, 552)

In [18]:
data.head()

Unnamed: 0_level_0,Unnamed: 0,Rrs_350,Rrs_351,Rrs_352,Rrs_353,Rrs_354,Rrs_355,Rrs_356,Rrs_357,Rrs_358,...,Rrs_891,Rrs_892,Rrs_893,Rrs_894,Rrs_895,Rrs_896,Rrs_897,Rrs_898,Rrs_899,Rrs_900
GLORIA_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GID_1,0,0.001231,0.001214,0.001214,0.001225,0.001215,0.001219,0.001224,0.001246,0.001261,...,0.000274,0.000272,0.000272,0.00027,0.000268,0.000266,0.000265,0.000264,0.000259,0.000256
GID_2,1,0.001054,0.00104,0.001043,0.001055,0.001046,0.001052,0.00106,0.001086,0.001106,...,0.000334,0.000332,0.000331,0.000329,0.000327,0.000324,0.000322,0.000319,0.000311,0.000308
GID_3,2,0.00124,0.001224,0.001226,0.001239,0.00123,0.001236,0.001244,0.001271,0.00129,...,0.000451,0.000448,0.000447,0.000444,0.000441,0.000437,0.000435,0.000433,0.000423,0.000419
GID_4,3,0.001011,0.000997,0.001001,0.001014,0.001005,0.001012,0.001022,0.00105,0.001071,...,0.000516,0.000513,0.000512,0.000509,0.000505,0.0005,0.000499,0.000497,0.000487,0.000483
GID_5,4,0.001081,0.001067,0.001071,0.001084,0.001076,0.001083,0.001093,0.001122,0.001143,...,0.00059,0.000588,0.000587,0.000583,0.00058,0.000574,0.000572,0.00057,0.000558,0.000552


Let's remove the Unnamed column:

In [19]:
# Removing unnamed columns using drop function
data.drop(data.columns[data.columns.str.contains(
    'unnamed', case=False)], axis=1, inplace=True)

In [73]:
#Uncomment in case you need to replace column names
#data.columns = data.columns.str.replace('Rrs_', '')

In [20]:
data.head()

Unnamed: 0_level_0,Rrs_350,Rrs_351,Rrs_352,Rrs_353,Rrs_354,Rrs_355,Rrs_356,Rrs_357,Rrs_358,Rrs_359,...,Rrs_891,Rrs_892,Rrs_893,Rrs_894,Rrs_895,Rrs_896,Rrs_897,Rrs_898,Rrs_899,Rrs_900
GLORIA_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GID_1,0.001231,0.001214,0.001214,0.001225,0.001215,0.001219,0.001224,0.001246,0.001261,0.001262,...,0.000274,0.000272,0.000272,0.00027,0.000268,0.000266,0.000265,0.000264,0.000259,0.000256
GID_2,0.001054,0.00104,0.001043,0.001055,0.001046,0.001052,0.00106,0.001086,0.001106,0.001109,...,0.000334,0.000332,0.000331,0.000329,0.000327,0.000324,0.000322,0.000319,0.000311,0.000308
GID_3,0.00124,0.001224,0.001226,0.001239,0.00123,0.001236,0.001244,0.001271,0.00129,0.001294,...,0.000451,0.000448,0.000447,0.000444,0.000441,0.000437,0.000435,0.000433,0.000423,0.000419
GID_4,0.001011,0.000997,0.001001,0.001014,0.001005,0.001012,0.001022,0.00105,0.001071,0.001075,...,0.000516,0.000513,0.000512,0.000509,0.000505,0.0005,0.000499,0.000497,0.000487,0.000483
GID_5,0.001081,0.001067,0.001071,0.001084,0.001076,0.001083,0.001093,0.001122,0.001143,0.001147,...,0.00059,0.000588,0.000587,0.000583,0.00058,0.000574,0.000572,0.00057,0.000558,0.000552


## Checking the metadata

In [21]:
##### Analyzing GLORIA metadata #######
metadata = pd.read_csv("./GLORIA/GLORIA_meta_and_lab.csv", index_col=0)

In [22]:
metadata

Unnamed: 0_level_0,Organization_ID,Dataset_ID,Sample_ID,LIMNADES_ID,LIMNADES_UID,SeaBASS_ID,Data_collection_purpose,Special_event_flag,Site_name,Country,...,TSS_method,aCDOM_method,Chla,Chla_plus_phaeo,TSS,aCDOM440,Turbidity,Secchi_depth,AOT,Comments
GLORIA_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GID_1,UT-TO,AlikasK_EE_UT-TO,53,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,8.16,2.67,2.532844,,1.85,,Lake Peipsi
GID_2,UT-TO,AlikasK_EE_UT-TO,54,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,8.60,5.67,2.624947,,1.80,,Lake Peipsi
GID_3,UT-TO,AlikasK_EE_UT-TO,55,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,7.27,8.00,2.578895,,1.80,,Lake Peipsi
GID_4,UT-TO,AlikasK_EE_UT-TO,56,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,7.30,7.00,3.085464,,1.15,,Lake Peipsi
GID_5,UT-TO,AlikasK_EE_UT-TO,57,,,,3.0,,Lake Peipsi,Estonia,...,ESS Method 340.2,NASA TM 2003-211621,,13.05,8.67,3.039412,,1.00,,Lake Peipsi
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
GID_7769,UiB,KristoffersenA_NO_UiB,L1,,,,2.0,,Lurefjorden,Norway,...,UiB TSS,UiB CDOM,3.957,,13.35,0.348420,,7.50,,Threshold marine inland fjord
GID_7770,UiB,KristoffersenA_NO_UiB,L2,,,,2.0,,Lurefjorden,Norway,...,UiB TSS,UiB CDOM,4.400,,18.67,0.228306,,8.30,,CDOM are salinity corrected
GID_7771,UiB,KristoffersenA_NO_UiB,L3,,,,2.0,,Lurefjorden,Norway,...,UiB TSS,UiB CDOM,3.643,,10.80,0.310476,,7.70,,
GID_7772,UiB,KristoffersenA_NO_UiB,L4,,,,2.0,2.0,Lurefjorden,Norway,...,UiB TSS,UiB CDOM,4.353,,11.06,0.345544,,7.60,,Special_event_flag: rain for last two stations...


## Total Suspended Solids (TSS)

![](./GLORIA/TSS_TSD.png)

In [23]:
# selecting rows based on condition
a = metadata.index[metadata['TSS'] > 1000].tolist()

In [25]:
# selecting rows based on condition
b = metadata.index[(metadata['TSS'] > 500) & (metadata['TSS'] < 1000)].tolist()

In [26]:
# selecting rows based on condition
c = metadata.index[metadata['TSS'] < 1000].tolist()

## Visualizing a Subset of the Data

Before diving into Panel, let’s create a function that smooths one of our time series and identifies outliers. Then, we’ll plot the result using hvPlot:

In [27]:
def transform_data(variable, middle, high):
    """Calculates the rolling average and identifies outliers"""
    a = metadata.index[metadata[variable] > middle].tolist()
    b = metadata.index[(metadata[variable] > middle) & (metadata['TSS'] < high)].tolist()
    c = metadata.index[metadata[variable] < middle].tolist()
    data4 = data[data.index.isin(a)]
    avg1 = data4.mean(axis=0)
    data4 = data[data.index.isin(b)]
    avg2 = data4.mean(axis=0)
    data4 = data[data.index.isin(c)]
    avg3 = data4.mean(axis=0)
    return avg1, avg2, avg3

Now, let's define a function to plot the average of the a,b, and c  TSS ranges

In [28]:
def get_plot(variable, middle,high):
    """Plots the average for each range"""
    avg1, avg2, avg3 = transform_data(variable, middle, high)
    return avg1.hvplot(
        height=300, legend=True, color=PRIMARY_COLOR, line_width=3, label= variable + ' > ' + str(high)
    ) * avg2.hvplot(color=SECONDARY_COLOR, legend=True, label= str(middle) + ' < ' + variable + ' < ' + str(high)
                           ) * avg3.hvplot(color=TERCIARY_COLOR, legend=True, label= variable + ' < ' + str(middle)
                                          ).opts(title="Average reflectance")

Now, we can call our get_plot function with specific parameters to obtain a plot with a single set of parameters:

In [29]:
get_plot('TSS', 100,500)

Great! Now, let’s explore how different values for window and sigma affect the plot. Instead of reevaluating the above cell multiple times, let’s use Panel to add interactive controls and quickly visualize the impact of different parameter values.

## Exploring the Parameter Space

Let’s create some Panel slider widgets to explore the range of parameter values:

In [30]:
variable_widget = pn.widgets.Select(name="variable", value="TSS", options=['TSS'])
high_widget = pn.widgets.IntSlider(name="high", value=1000, start=800, end=1200)
middle_widget = pn.widgets.IntSlider(name="middle", value=500, start=100, end=500)

Now, let’s link these widgets to our plotting function so that updates to the widgets rerun the function. We can achieve this easily in Panel using pn.bind:

In [31]:
bound_plot = pn.bind(
    get_plot, variable=variable_widget, high=high_widget, middle=middle_widget
)

Once we’ve bound the widgets to the function’s arguments, we can layout the resulting bound_plot component along with the widgets using a Panel layout such as Column:

In [32]:
widgets = pn.Column(variable_widget, high_widget, middle_widget, sizing_mode="fixed", width=300)
pn.Column(widgets, bound_plot)

BokehModel(combine_events=True, render_bundle={'docs_json': {'bfde634a-1f73-42e3-bab5-7c7f8a69cea5': {'version…

As long as you have a live Python process running, dragging these widgets will trigger a call to the get_plot callback function, evaluating it for whatever combination of parameter values you select and displaying the results.

## Optional - Serving the Notebook

You may be interested in serving the notebook. In such a case, uncomment the next  chunk of code and run it. Then, write the code to serve the notebook.   

In [75]:
# Uncomment if needed
#pn.template.MaterialTemplate(
#   site="Panel",
#  title="GLORIA App",
#    sidebar=[variable_widget, high_widget, middle_widget],
#    main=[bound_plot],
#).servable(); # The ; is needed in the notebook to not display the template. Its not needed in a script

In [33]:
# Write the code based on # See https://panel.holoviz.org/tutorials/intermediate/serve.html
#

That's all. We have achieved our goal.