<a href="https://colab.research.google.com/github/nneibaue/etsp_explorer/blob/master/explorer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>ETSP Data Explorer</h1>

This notebook is written to provide some basic visualization tools in Python using some of Colab's nice output features. It's fairly basic, but should provide a decent example of how Python and Colab can be useful for something like this. Google provides a free runtime in the cloud, so no need to install Python and set anything up on the computer. The free version of Colab has more than enough features, memory, and drive space for our purposes here 


## Research Context

Samples are collected at different depths for a given location in the ocean (e.g. lat, long). Each of these samples is measured for concentrations of various different elements via a 2D scan, yielding concentration values at individual pixels. A given pixel may contain a non-trivial concentration value for one or more elements. 

It is of particular interest how a given element (Cu in this case) is distributed among different element groups for a given scan. For example, one pixel could contain non-trivial concentrations of Cu, Mg, Br, and Zn, whereas another pixel might only contain Fe and Mg. 


## Problem Statement

* Given a dataset for a single location, how does the distribution of an element vary with depth? Assumptions include:
  * There can be many scans at a given depth
  * No two scans overlap in space
  * Concentration values ($[x]$) at a pixel are only considered non-trivial if:
  $$
  [x] > \bar{[x]} + 2 \cdot \sigma_x 
  $$
  where $\bar{[x]}$ is the average concentration value and $\sigma$ is the standard deviation
  * Concentration values filtered by an element are only considered non-trivial if the element in question satisfies the above condition
    * E.g. a pixel may contain non-trivial amounts of Ca and Mg, but not Cu. If we are filtering by Cu, then this pixel is rejected


**Please don't edit this notebook directly. To make changes, first make a copy of the notebook.**

#Setup

The following cell clones the github repo so private libraries can be imported.

In [1]:
#@title Clone github repo
import os
import sys
import shutil

ROOT = '/content'
REPO_NAME = 'etsp_explorer'
REPO_PATH = os.path.join(ROOT, REPO_NAME)

# Get latest changes if repo already exists
if REPO_NAME in os.listdir(ROOT):
  print('Local repo found. Updating....')
  os.chdir(REPO_PATH)
  !git pull
else:
  print('No local repo found. Cloning from github...')
  !git clone https://github.com/nneibaue/etsp_explorer

if REPO_PATH not in sys.path:
  print(f'Adding {REPO_PATH} to path')
  sys.path.append(REPO_PATH)

os.chdir(ROOT)

Local repo found. Updating....
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 6 (delta 4), reused 5 (delta 3), pack-reused 0[K
Unpacking objects: 100% (6/6), done.
From https://github.com/nneibaue/etsp_explorer
   cc7c659..8b03d84  master     -> origin/master
Updating cc7c659..8b03d84
Fast-forward
 etsp.py     |   2 [32m++[m
 plotting.py | 105 [32m++++++++++++++++++++++++++++++++++++++++[m[31m--------------------[m
 2 files changed, 72 insertions(+), 35 deletions(-)
Adding /content/etsp_explorer to path


In [0]:
#@title Imports

# etsp stuff
from etsp import Detsum, Scan, CombinedScan, Depth
from plotting import ribbon_plot

# Colab output stuff
from google.colab import drive
from google.colab import widgets
from IPython.display import display, HTML
import ipywidgets

# General
import numpy as np
import random
import re
import pandas as pd

# Plotting
from cycler import cycler
import altair as alt
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
#from IPython import display, html
#Namespace class to keep things organized
class Namespace:
  def __init__(self, **kwargs):
    self.__dict__.update(**kwargs)

In [3]:
#@title Connect Google Drive

drive.mount('/content/gdrive')
DRIVE_BASE = '/content/gdrive/My Drive'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


# Analysis

In [4]:
#@title Data Import

#@markdown Enter drive path to data folder (do not include 'My Drive/'):
data_path = "software_development/etsp/XRF data deglitch/" #@param{type:"string"}
#depth_path = "software_development/etsp/XRF data/25m" #@param {type:"string"}
#@markdown Enter elements separated by comma
ELEMENTS_OF_INTEREST = "Br,Ca,Cu,Fe,K,Cl,Mn,S,Si,Zn" #@param {type:"string"}
ELEMENTS_OF_INTEREST=ELEMENTS_OF_INTEREST.split(',')
ORBITALS = "K" #@param {type:"string"}

def import_data(data_path):
  depths = []
  for d in os.listdir(os.path.join(DRIVE_BASE, data_path)):
    try:
      fullpath = os.path.join(DRIVE_BASE, data_path, d)
      d = Depth(os.path.join(fullpath),
                      ELEMENTS_OF_INTEREST,
                      orbitals=['K'],
                      normalized=True)
      depths.append(d)
      print(f"Successfully imported data for {d.depth}")
    except NameError as e:
      print(e)
      pass
  return depths

depths = import_data(data_path)
PROP_DICT = {} # For storing properties of ribbon plots

Successfully imported data for 65m
Successfully imported data for 900m
Successfully imported data for 250m
Successfully imported data for 40m
Successfully imported data for 25m
Successfully imported data for 165m
Successfully imported data for 10m
Successfully imported data for 50m


In [0]:
#@title Element Filter

In [0]:
# Function template
mean_n_std = lambda n: (lambda x: np.mean(x) + np.std(x) * n)

ELEMENT_FILTER = {
    'Br': mean_n_std(2),
    'Ca': mean_n_std(2),
    'Cu': mean_n_std(2),
    'Fe': mean_n_std(2),
    'K': mean_n_std(2),
    'Cl': mean_n_std(2),
    'Mn': mean_n_std(2),
    'S': mean_n_std(2),
    'Si': mean_n_std(2),
    'Zn': mean_n_std(2),
}

del(mean_n_std)

## Looking for Glitches

Uncomment the last line and run the following cell to plot all `Detsums` from all depths.

In [0]:
#@title Plot All Detsums

elements_to_plot = 'Br,Ca,Cu,Fe,K,Cl,Mn,S,Si,Zn' #@param {type:"string"}
sort_by = "element" #@param ["element", "depth"]

#@markdown To show plots, check box below and run cell
show_plots = False #@param {type:"boolean"}

def plot_all_detsums(depths, elements=None, sort_by='element'):
  '''Plots the raw data from all detsums of the given elements.

  Args:
    depths: list of Depth objects
    elements: optional list of elements. E.g. ['Cu', 'Fe']. If this 
      is `None`, then all elements will be plotted
    sort_by: string. Can either be 'element' or 'depth'. This will
      determine how the detsums are sorted before they are rendered
      to the screen. This is set to 'element' by default
    
  Returns: raw detsums plotted in a grid
  '''

  # Triple looping to get detsums from all depths
  detsums = []
  for d in depths:
    for s in d.scans:
      for detsum in s.detsums:
        if elements is not None:
          if detsum.element not in elements:
            continue # skip to next iteration
        detsums.append(detsum)

  # Determine sorting function 
  if sort_by == 'element':
    sort_func = lambda d: d.element
  elif sort_by == 'depth':
    sort_func = lambda d: int(d.depth.split('m')[0]) # Turn depth into integer for sorting
  else:
    raise ValueError("`sort_by` must be 'element' or 'depth'")
  
  # Sort detsums
  detsums = sorted(detsums, key=sort_func)

  # Build grid
  ncols = 4
  nrows = 1 + (len(detsums) // ncols)
  g = widgets.Grid(nrows, ncols)
  row = 0
  col = 0
  for i, detsum in enumerate(detsums):
    with g.output_to(row, col):
      #print(f'Element: {detsum.element}, Depth: {detsum.depth}, Scan: {detsum.scan_name}')
      print(f'    {detsum.element}    |    {detsum.depth}    |    {detsum.scan_name}')
      detsum.plot(raw=True)
    if (col + 1) % 4 == 0:
      row += 1
      col = 0
    else:
      col += 1

##Example Usage
#=====================================
#Uncomment this line to plot all detsums from Iron and Copper, e.g:
#plot_all_detsums(depths, elements=['Fe', 'Cu'])

#Uncomment this line to plot all detsums from all elements and sort by depth:
if show_plots:
  plot_all_detsums(depths,
                  elements=elements_to_plot.split(','),
                  sort_by=sort_by)

del(elements_to_plot, sort_by, show_plots)

## **Plotting**

In [9]:
#@title RibbonPlotUI
graph_output = ipywidgets.Output()
element_inputs = {}
element_filter = {}
test = {}
smalltextbox = ipywidgets.Layout(width='50px', height='25px')
filter_func = lambda n: lambda x: np.mean(x) + np.std(x)*n

for e in ELEMENTS_OF_INTEREST:
  element_inputs[e] = ipywidgets.Textarea(value='2', layout=smalltextbox)
  element_filter[e] = filter_func(2)



# filter_row = lambda e: ipywidgets.VBox([ipywidgets.HTML(f'{e}'),
#                                         ipywidgets.HBox(
#                                             [ipywidgets.HTML('mean + '),
#                                              element_sliders[e],
#                                              ipywidgets.HTML(' std')])
#                                       ])

element_filter_input = ipywidgets.HBox(
    [ipywidgets.VBox([ipywidgets.HTML(f'<h3>{e}</h3>'), element_inputs[e]]) for e in ELEMENTS_OF_INTEREST]
)

filter_by_control = ipywidgets.HBox(
    [ipywidgets.HTML('Filter by: '), ipywidgets.Dropdown(options=ELEMENTS_OF_INTEREST, value='Cu')])

combine_scans_checkbox = ipywidgets.HBox([ipywidgets.HTML('Combine Scans: '),
                                          ipywidgets.Checkbox(value=True)])

combine_detsums_checkbox = ipywidgets.HBox([ipywidgets.HTML('Combine detsums: '),
                                          ipywidgets.Checkbox(value=False)])

normalize_by_control = ipywidgets.HBox(
    [ipywidgets.HTML('Filter by: '), ipywidgets.Dropdown(options=['counts', 'pixels'], value='counts')]
)

N_input = ipywidgets.HBox([ipywidgets.HTML('N: '),
                          ipywidgets.Textarea(value='8', layout=smalltextbox)])
update_button = ipywidgets.Button(description='Update Plot')                          


def set_filter(e, val):
    if not val:
      return
    val = float(val)
    element_filter[e] = filter_func(val)
    test[e] = val

def update_plot(b):
  graph_output.clear_output()
  with graph_output:
    plt.close()
    ribbon_plot(depths, element_filter=element_filter,
                filter_by=filter_by_control.children[1].value,
                combine_detsums=combine_detsums_checkbox.children[1].value,
                combine_scans=combine_scans_checkbox.children[1].value,
                N=int(N_input.children[1].value),
                normalize_by=normalize_by_control.children[1].value)

for e in element_filter:
  element_inputs[e].observe(
      lambda change, e=e: set_filter(e, change['new']), names='value')

update_button.on_click(update_plot)

with graph_output:
    ribbon_plot(
        depths,
        element_filter=element_filter,
        filter_by='Cu',
        combine_scans=True,
        combine_detsums=False,
        N=8,
        normalize_by='counts',
        prop_dict=PROP_DICT,
    )


# top = ipywidgets.VBox([filter_row(e) for e in ELEMENTS_OF_INTEREST])
top = element_filter_input
controls = ipywidgets.HBox([update_button, filter_by_control, combine_scans_checkbox,
                            combine_detsums_checkbox, N_input, normalize_by_control],
                          layout=ipywidgets.Layout(
                              padding='0px',
                              border='1px solid black',
                          ))

app = ipywidgets.VBox([top, graph_output, controls])
  
display(app)

VBox(children=(HBox(children=(VBox(children=(HTML(value='<h3>Br</h3>'), Textarea(value='2', layout=Layout(heig…

### Ribbon Plots

In [0]:
# Filter by Cu, take the top 8 groups. Separate Scans, Combined Detsums
ribbon_plot(depths, element_filter=ELEMENT_FILTER,
            filter_by='Cu',
            combine_scans=False,
            combine_detsums=True,
            N=8,
            normalize_by='counts',
            prop_dict=PROP_DICT)