# Introduction

Ian Guinn, UNC

Presented at [LEGEND Software Tutorial, Nov. 2021](https://indico.legend-exp.org/event/561/)

**"Have you tried looking at the waveforms from those events?"  - David Radford**

This is a tutorial demonstrating several ways to use the Waveform browser to examine data from LEGEND. This will consist of multiple examples, increasing in complexity, and will use data from the PGT. The waveform browser (pygama.dsp.WaveformBrowser) is a pygama utility for accessing waveforms from raw files in an interactive way, enabling you to access, draw, or even process waveforms. Some use cases for this utility include investigating a population of waveforms, and debugging waveform processors.

Why do we need a waveform browser when we can access data via pandas dataframes?
Pandas dataframes work extremely well for reading tables of simple values from multiple HDF5 files. However, they are less optimal for waveforms. The reason for this is that they require holding all waveforms in memory at once. If we want to look at waveforms spread out across multiple files, this can potentially take up GBs of memory, which will cause problems! To get around this, we want to load only bits of the files into memory at a time and pull out only what we need. Since this is an inconvenient process, the WaveformBrowser will do this for you, while hiding the details as much as possible.

## Contents:
**Example 1:** Minimal usage of the waveform browser  
**Example 2:** Draw waveforms using a data cut to investigate a population; fill a legend  
**Example 3:** Draw waveforms from multiple populations with different cuts for comparison; using more advanced formatting options  
**Example 4:** Draw processed waveforms from a DSP config file to inspect processors  
**Example 5:** Access waveforms without drawing

In [None]:
#First, import necessary modules and set some input values for use later
%matplotlib inline
import pygama.lgdo.lh5_store as lh5
from pygama.vis.waveform_browser import WaveformBrowser
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os, json

# Set input values for where to find our data. This will grab all calibration runs from run 30, parsing wildcards

# pgt_dir = '$LEGENDDATADIR/lngs/pgt/'
pgt_dir = '/global/cfs/cdirs/m2676/data/lngs/pgt/'
raw_files = pgt_dir + 'raw/geds/LPGTA_r0030_*_calib_geds_raw.lh5'
dsp_files = pgt_dir + 'dsp/geds/LPGTA_r0030_*_calib_geds_dsp.lh5'
channel = 'g040'

# Set defaults for figures
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 16

## Example 1

First, a minimal example simply drawing waveforms from the raw file

In [None]:
# Create a minimal waveform browser; a file or list of files is required
browser = WaveformBrowser(raw_files, channel+'/raw')

# Draw the 100th waveform in the file
browser.draw_entry(100)

# To draw multiple figures in a single cell, you must explicitly create a new one:
browser.new_figure()
browser.draw_entry([200, 300, 400])

In [None]:
# Draw the next waveform in the file. You can run this cell multiple times to scroll through many WFs
browser.draw_next()

## Example 2
Ok, that was nice, but how often do we just want to scroll through all of our waveforms?

For our next example, we will select a population of waveforms from within the files, and draw multiple at once. Selecting a population of events to draw uses the same syntax as numpy and pandas, and can be done either with a list of entries or a boolean numpy array. This selection can be made using data from a dsp or hit hit file.

We will also learn how to set a few other properties of the figure

In [None]:
# First, load a dataframe from a DSP file that we can use to make our selection:
print(dsp_files)
df = lh5.load_dfs(dsp_files, ['trapEmax', 'AoE'], channel+'/dsp', verbose=True)

In [None]:
# Create a selection mask around the 2614 keV peak
trapE = df['trapEmax']
energy_selection = (trapE>13100) & (trapE<13400)

trapE.hist(bins=1000, range=(0, 30000))
trapE[energy_selection].hist(bins=1000, range=(0, 30000))
plt.yscale('log')

In [None]:
# Now construct a WaveformBrowser with this cut
browser = WaveformBrowser(raw_files, channel+'/raw',
                          verbosity   = 0,                  # Silence output on construction
                          aux_values  = df,
                          legend      = 'energy={trapEmax}',       # Values to put in the legend
                          x_lim       = (22000, 30000),     # Range for time-axis
                          entry_mask  = energy_selection ,  # Apply cut
                          n_drawn     = 10                  # number to draw for draw_next
                         )
# Draw the next 5 batches of 10 waveforms, and move the legend outside
for entries, i in zip(browser, range(5)):
    print("Entries:", entries)
    browser.new_figure()
    #plt.pause(1) # If you can use interactive plots (i.e. not on nersc), you can comment the above line and uncomment this one to draw a slideshow!


## Example 3
Now, we'll shift from drawing populations of waveforms to drawing waveform transforms. We can draw any waveforms that are defined in a DSP JSON configuration file. This is useful for debugging purposes and for developing processors. We will draw the baseline subtracted WF, pole-zero corrected WF, and trapezoidal filter WF. We will also draw horizontal and vertical lines for trapE (the max of the trapezoid) and tp_0 (our estimate of the start of the waveform's rise). The browser will determine whether these lines should be horizontal or vertical based on the unit.

In [None]:
browser = WaveformBrowser(raw_files, channel+'/raw',
                          dsp_config=dsp_config_file, # Need to include a dsp config file!
                          database={"pz_const":'396.9*us'}, # TODO: use metadata instead of manually defining...
                          lines=['wf_blsub', 'wf_pz', 'wf_trap', 'trapEmax', 'tp_0'], # names of waveforms from dsp config file
                          styles=[{'ls':['-'], 'c':['orange']},
                                  {'ls':[':'], 'c':['green']},
                                  {'ls':['--'], 'c':['blue']},
                                  {'lw':[0.5], 'c':['black']},
                                  {'lw':[0.5], 'c':['red']}],
                          legend=['Waveform', 'PZ Corrected', "Trap Filter", 'Trap Max={trapEmax}', 't0={tp_0}'],
                          legend_opts={'loc':"upper left"},
                          x_lim=('15*us', '50*us') # x axis range
                         )

In [None]:
browser.draw_next()

## Example 4
Here's a more advanced example that combines the previous 2. We will draw waveforms from multiple populations for the sake of comparison. This will require creating two separate waveform browsers and drawing them onto the same axes. We'll also normalize and baseline subtract the waveforms from parameters in a DSP file. Finally, we'll add some formatting options to the lines and legend.

In [None]:
AoE = df['AoE']
aoe_cut = (AoE<0.045) & energy_selection
aoe_accept = (AoE>0.045) & energy_selection

AoE[aoe_accept].hist(bins=200, range=(-0, 0.1))
AoE[aoe_cut].hist(bins=200, range=(-0, 0.1))
# Use the lpgta dsp json file. TODO: get this from DataGroup
# dsp_config_file = os.path.expandvars("$HOME/pygama/experiments/lpgta/LPGTA_dsp.json")
dsp_config_file = os.path.expandvars("./metadata/LPGTA_dsp.json")


In [None]:
browser1 = WaveformBrowser(raw_files, channel+'/raw',
                           dsp_config  = dsp_config_file, # include so we can do bl subtraction
                           lines       = 'wf_blsub',
                           norm        = 'trapEmax',        # normalize wfs
                           verbosity   = 0,                 # Silence output on construction
                           styles      = {'color':['red', 'orange', 'salmon', 'magenta']}, # set a color cycle for this
                           legend      = "E={trapEmax} ADC, A/E={AoE:~.3f}", # Formatted values to put in the legend
                           entry_mask  = aoe_cut,           # Apply cut
                           n_drawn     = 4                  # number to draw for draw_next
                          )

browser2 = WaveformBrowser(raw_files, channel+'/raw',
                           dsp_config  = dsp_config_file, # include so we can do bl subtraction
                           lines       = 'wf_blsub',
                           norm        = 'trapEmax',        # normalize wfs
                           verbosity   = 0,                 # Silence output on construction
                           styles      = {'color':['blue', 'navy', 'cyan', 'teal']}, # set a color cycle for this
                           legend      = "E={trapEmax} ADC, A/E={AoE:~.3f}", # Formatted values to put in the legend
                           legend_opts = {'loc':"center",'bbox_to_anchor':(1,0.35)}, # set options for drawing the legend
                           x_lim       = (26500, 28000),    # Range for time-axis
                           entry_mask  = aoe_accept,           # Apply cut
                           n_drawn     = 4                  # number to draw for draw_next
                          )

In [None]:
browser1.draw_next()
browser2.set_figure(browser1) # use the same figure/axis as the other browser
browser2.draw_next(clear=False) # Set clear to false to draw on the same axis!

## Example 5

The waveforms, lines and legend entries are all stored inside of the waveform browser. Sometimes you want to access these directly; maybe you want to access the raw data, or do control the lines in a way not enabled by the WaveformBrowser interface. It is possible to access them quickly and easily. Waveforms and legend values are stored as a dict from the parameter name to a list of stored values.
- The waveforms are as a list of matplotlib Line2D artists
- Horizontal and vertical lines are also stored as Line2D artists
- Legend entries are stored as pint Quanitities

When accessing waveforms in this way, you can also do the same things previously shown, such as applying a data cut and grabbing processed waveforms. For this example, we are going to get waveforms, trap-waveforms and trap energies, after applying an A/E cut. We will simply print them, but the possibility exists to do more!

In [None]:
browser = WaveformBrowser(raw_files, channel+'/raw',
                          dsp_config = dsp_config_file,                # Need to include a dsp config file!
                          database   = {"pz_const":'396.9*us'},        # TODO: use metadata instead of manually defining...
                          lines      = ['waveform', 'wf_trap'],        # names of waveforms from dsp config file
                          legend     = ['{trapEmax}'],
                          entry_mask = aoe_accept,                     # apply A/E cut
                          n_drawn    = 5                               # get five at a time
                         )

In [None]:
browser.find_next()
waveforms = browser.lines['waveform']
traps = browser.lines['wf_trap']
energies = browser.legend_vals['trapEmax']
for wf, trap, en in zip(waveforms, traps, energies):
    print("Raw waveform:", wf.get_ydata())
    print("Trap-filtered waveform:", trap.get_ydata())
    print("TrapEmax:", en)
    print()