# Data Processing and Access 101

Ian Guinn, UNC. Presented at [LEGEND Software Tutorial, Nov. 2021](https://indico.legend-exp.org/event/561/)

In [None]:
# Set up python environment
from pygama.io.daq_to_raw import daq_to_raw
from pygama.io.raw_to_dsp import raw_to_dsp
from pygama.lh5.store import *

daq_file = '/global/cfs/cdirs/m2676/data/legend-testdata/data/cage/2021-1-16-CAGERun1250'
raw_file = "metadata/CAGERun1250_raw.lh5"
dsp_file = "metadata/CAGERun1250_dsp.lh5"

dsp_config = "./metadata/dsp_config.json"

## Running daq_to_raw

Our first step in processing is to run daq_to_raw, which will decode the binary file produced by our DAQ system and output an HDF5 file following LEGEND's lh5 file specification. This requires us to provide an input file, an output file name, and a dictionary of settings. Because the CAGE file we are using comes from ORCA, the config is extremely simple; for other DAQ systems, this can get more complex.

In [None]:
# Config would normally be provided as a JSON file
d2r_config = {
    'daq':'ORCA'
}

daq_to_raw(daq_file,
           raw_file,
           config=d2r_config)

## Inspecting the raw file

Next, we'll look at the file output from daq_to_raw. The file is output using the LH5 specification, and can be accessed using the pygama.lh5.store module.

First, we'll create a Store object, and call ls to list the contents of the hdf group containing our data. Then, we'll call load_dfs to create a pandas dataframe. Note that the pandas dataframe will not contain the waveforms.

In [None]:
lh5_st = Store()
print("List of raw file elements:")
print(lh5_st.ls(raw_file, 'ORSIS3302DecoderForEnergy/raw/'))

print()
print("Data from file:")
raw_df = load_dfs(raw_file,
                  par_list = ['card', 'channel', 'crate', 'energy', 'energy_first',
                              'ievt', 'packet_id', 'timestamp'],
                  lh5_group = 'ORSIS3302DecoderForEnergy/raw/')
print(raw_df)

## Running raw_to_dsp

The next stop in our processing is to run raw_to_dsp, which will run a sequence of digital signal processors and output the results into another lh5 file. These processors are set up using a dictionary which will be provided by a JSON file. For more details about how to set up this file, see other tutorials.

Many processors benefit from having optimized parameters for each channel and run range. This optimization process is not covered by this tutorial, but the parameters are stored in a the metadata database, and is provided to raw_to_dsp using a json file or dict. In this example, we will provide the pole-zero correction time constant in this way.

In [None]:
db_dict = {'ORSIS3302DecoderForEnergy': {
    'pz': { 'tau':"48*us" }
    }
}

raw_to_dsp(raw_file, dsp_file, dsp_config,
           database = db_dict)

## Inspecting the dsp file

In [None]:
print("List of raw file elements:")
print(lh5_st.ls(dsp_file, 'ORSIS3302DecoderForEnergy/dsp/'))

print()
print("Data from file:")
raw_df = load_dfs(dsp_file,
                  par_list = ['trapEmax', 'bl_mean', 'timestamp', 'AoE'],
                  lh5_group = 'ORSIS3302DecoderForEnergy/dsp/')
print(raw_df)

raw_df.hist('trapEmax', bins=2000)