# DAQ Analysis

This notebook provides an analysis of the hardware-triggered data of a detector module. We calculate standard events, noise power spectra, optimum filter, resolutions and trigger thresholds.

In [1]:
print('Let´s start!')

Let´s start!


## Introduction

In the first part of the notebook, we build a data set of hardware triggered files, with which we create a standard event, a noise power spectrum and an optimum filter. Afterwards we trigger the corresponding stream files and extract descriptive features. Finally we do cuts and an energy calibration on the triggered and processed events and extract a histogram of the recoil energies and a light yield plot.

First we import packages.

In [None]:
import cait as ai
import matplotlib.pyplot as plt
import numpy as np
import tracemalloc
from tqdm.auto import tqdm
import time
%config InlineBackend.figure_formats = ['svg']  # we need this for a suitable resolution of the plots

And we define a set of constants and paths.

In [None]:
RUN = ... # put an string for the number of the experiments run, e.g. '34'
MODULE = ...  # put a name for the detector, e.g. 'DetA'
PATH_HW_DATA = ...  # path to the directory in which the RDT and CON files are stored
PATH_PROC_DATA = ...  # path to where you want to store the HDF5 files
FILE_NMBRS = []  # a list of string, the file number you want to analyse, e.g. ['001', '002', '003']
RDT_CHANNELS = []  # a list of strings of the channels, e.g. [0, 1] (written in PAR file - attention, the PAR file counts from 1, Cait from 0)
RECORD_LENGTH = 16384  # the number of samples within a record window  (read in PAR file)
SAMPLE_FREQUENCY = 25000  # the sample frequency of the measurement (read in PAR file)
DOWN_SEF = 4  # the downsample rate for the standard event fit
DOWN_BLF = 16  # the downsample rate for the baseline fit
PROCESSES = 8  # the number of processes for parallelization
PCA_COMPONENTS = 2  # the number of pca components to calculate
SKIP_FNMR = []    # in case the loop crashed at some point and you want to start from a specific file number, write here the numbers to ignore, e.g. ['001', '002']
ALLOWED_NOISE_TRIGGERS = 1  # the number of noise triggers we allow for in threshold calculations

# typically you need not change the values below this line!

FNAME_RESOLUTION = 'resolution_001'  # file anticipated file name for the resolution data set
FNAME_EFFICIENCY = 'efficiency_001'  # file anticipated file name for the efficiency data set
FNAME_TRAINING = 'training_001'  # file anticipated file name for the training data set
FNAME_HW = 'hw_{:03d}'.format(len(FILE_NMBRS) - 1)
H5_CHANNELS = list(range(len(RDT_CHANNELS)))
SEF_APP = '_down{}'.format(DOWN_SEF) if DOWN_SEF > 1 else ''

Here we assamble calculated values from further down in the notebook. Fill them up while you go.

In [None]:
DETECTOR_MASS = ... # the detector mass in kg
print('Detector mass in kg: ', DETECTOR_MASS)
BL_RESOLUTION_OF = []  # list of the baseline resolutions, calculated with the superposition method, in mV
THRESHOLDS = [(6.5 * r) * 1e-3 for r in BL_RESOLUTION_OF]
print('OF resolution in V: ', BL_RESOLUTION_OF)
print('OF thresholds in V: ', THRESHOLDS)
FIT_BL_SIGMAS = []  # list of the baseline resolutions, calculated with the noise fit model, in V
FIT_THRESHOLDS = []  # list of the trigger thresholds, calculated with the noise fit model, in V
print('Fit thresholds in V: ', FIT_THRESHOLDS)
TRUNCATION_LEVELS = []  # list of the truncation levels

In Cait, most data processing happens with the DataHandler class of that we create an instance.

In [None]:
dh_hw = ai.DataHandler(run=RUN,
                       module=MODULE,
                       channels=RDT_CHANNELS,
                       sample_frequency=SAMPLE_FREQUENCY,
                       record_length=RECORD_LENGTH)

dh_hw.set_filepath(path_h5=PATH_PROC_DATA,
                fname='hw_{:03d}'.format(len(FILE_NMBRS)-1),
                appendix=False)

## DAQ Data

We start with several analysis steps on the hardware data, to extract a proper filter and a threshold for filtering of the continuous stream.

### View Events

Let's have a look at the raw data events.

In [None]:
ei = ai.EventInterface(module=MODULE,
                       run=RUN,
                       nmbr_channels=len(H5_CHANNELS),
                       sample_frequency=SAMPLE_FREQUENCY,
                       record_length=RECORD_LENGTH)
ei.load_h5(path=PATH_PROC_DATA, 
           fname=FNAME_HW, 
           channels=RDT_CHANNELS, 
           appendix=False, 
           which_to_label=['events'])

We can viewing events, we can also create a labels CSV file to store labels for the raw data events.

In [None]:
# ei.create_labels_csv(path=PATH_PROC_DATA)
# ei.load_labels_csv(path=PATH_PROC_DATA)

In [None]:
# ei.load_of()  # this is only possible once the OF was calculated!

You can now view events. Press 'o' for the options menu and a variety of additional plotting features. Some you can only use after having calculated the according properties. You can come back to these celss later, to use the additional features.

In [None]:
ei.start(start_from_idx=0, print_label_list=False)

### SEV, NPS, OF

The first thing to do is calculating the main parameters.

In [None]:
dh_hw.calc_mp(type='events')
dh_hw.calc_mp(type='testpulses')
dh_hw.calc_mp(type='noise')

We can use routines to watch the main parameter distribution or access them directly.

In [None]:
dh_hw.content()

In [None]:
dh_hw.get('events', 'decay_time')  # returns array of the decay times of all channels and events

In [None]:
ranges = [(0, 40), (0, 40)]
for c in H5_CHANNELS:
    dh_hw.show_values(group='events', key='decay_time', bins=200, idx0=c,  range=ranges[c],
                       xlabel='Decay Time (ms)', ylabel='Counts', title='Channel {}'.format(c))

We want to preceed with calculating a noise power spectrum. For this we need clean baselines for calculation of noise power spectra and simulation of events, this we do with the fit error of a polynomial fit to the baselines.

In [None]:
dh_hw.calc_bl_coefficients(down=DOWN_BLF)

In the histogram of the fit error, we can see which is a suitable upper limit of the error for either of the channels.

In [None]:
for c in H5_CHANNELS:
    dh_hw.show_values(group='noise', key='fit_rms', bins=200, idx0=c, range=(0,1e-5),
                   xlabel='Baseline Fit RMS', ylabel='Counts', title='Channel {}'.format(c))

Now we can create the noise power spectra.

In [None]:
dh_hw.calc_nps(rms_cutoff=[3.5e-6, 7e-6], window=True)
for c in H5_CHANNELS:
    dh_hw.show_nps(channel=c, title='Channel {} NPS'.format(c), yran=(1e-2, 1e3))

For cuts we will need some knowledge about the pulse heights, so plot a first histogram.

In [None]:
for c in H5_CHANNELS:
    dh_hw.show_values(group='events', key='mainpar', bins=250, idx0=c, idx2=0, range=(0,2.4), yran=(0,300),
                   xlabel='Pulse Height (V)', ylabel='Counts', title='Spectrum PH Channel {}'.format(c))

To avoid Squid jumps in the SEV, we want to do a cut on the slope of the events. For this, we plot the linear slope to determine cut values. The slope is defined as right - left baseline level. The left and right baseline levels are determined as the average of the first and last eight of the record window.

In [None]:
for c in H5_CHANNELS:
    dh_hw.show_values(group='events', key='slope', bins=200, idx0=c, yran=(0,100), range=(-1, 1),
                   xlabel='$A_{R}-A_{L}$ (V)', ylabel='Counts', title='Linear Slope Channel {}'.format(c))

To make an informed decision about the cuts we put for the standard event calculation, lets plot as well the rise time and onset interval.

In [None]:
for c in H5_CHANNELS:
    dh_hw.show_values(group='events', key='rise_time', bins=200, idx0=c, yran=(0,20), #range=(-1, 1),
                   xlabel='Rise Time (ms)', ylabel='Counts', title='Channel {}'.format(c))

In [None]:
for c in H5_CHANNELS:
    dh_hw.show_values(group='events', key='onset', bins=200, idx0=c, yran=(0,100), #range=(-1, 1),
                   xlabel='Onset (ms)', ylabel='Counts', title='Channel {}'.format(c))

Now we create the standard events with suitable cut values.

In [None]:
dh_hw.calc_sev(pulse_height_interval=[[0.3, 0.5], [0.2, 0.65]],
                left_right_cutoff=[0.5, 0.5],  # in V
                rise_time_interval=[(0, 30), (0, 30)],  # in ms
                decay_time_interval=[(6, 14), (2, 14)],  # in ms
                onset_interval=[(-20, 20), (-5, 10)],  # in ms
                t0_start=None,
                opt_start=True,  # better fit, but much slower (~ minutes)
                )
for c in H5_CHANNELS:
    dh_hw.show_sev(channel=c)

For a test pulse standard event we need to know the typical pulse heights of test pulses, which we look up in the histogram.

In [None]:
ranges = [(0, 1.6), (0, 0.3)]
for c, r in zip(H5_CHANNELS, ranges):
    dh_hw.show_values(group='testpulses', key='mainpar', bins=400, idx0=c, idx2=0, range=r,
                   xlabel='Pulse Height (V)', ylabel='Counts', title='Testpulses PH {}'.format(c))

With these cut values we create test pulse standard events.

In [None]:
dh_hw.calc_sev(type='testpulses',
            pulse_height_interval=[[0.3, 1], [0.05, 0.2]],
            left_right_cutoff=[0.5, 0.5],
            rise_time_interval=None,
            decay_time_interval=None,
            onset_interval=[[-10, 10], [-10, 10]],
            #t0_start=(-1, 0),
            opt_start=True)
for c in H5_CHANNELS:
    dh_hw.show_sev(name_appendix='_tp', channel=c, show_fit=True)

We need a filter to trigger the stream. The filter is the ratio of the time-inversed SEV and the NPS in Fourier space. We have all these requirements and can create the filter for events and testpulses.

In [None]:
dh_hw.calc_of()
for c in H5_CHANNELS:
    dh_hw.show_of(channel=c, yran=(1e-10, 10))

In [None]:
dh_hw.calc_of(name_appendix='_tp')
for c in H5_CHANNELS:
    dh_hw.show_of(channel=c, group_name_appendix='_tp', yran=(1e-10, 10))

In [None]:
dh_hw.apply_of()
dh_hw.apply_of(type='testpulses', name_appendix_group='_tp')

We can now drop the raw data events of the test pulses.

In [None]:
# dh_hw.drop_raw_data(type='testpulses')

Write the SEV for Events and Particles and the NPS to xy files. 

In [None]:
time = np.arange(-RECORD_LENGTH/4, RECORD_LENGTH*3/4, 1)/SAMPLE_FREQUENCY

for r, c in zip(RDT_CHANNELS, H5_CHANNELS):

    ai.data.write_xy_file(filepath=PATH_PROC_DATA + 'xy_files/Channel_{}_SEV_Particle.xy'.format(r),
                         data=[time, 
                               dh_hw.get('stdevent', 'event')[c]],
                         title='Run {} Channel {} SEV Particle'.format(RUN, r),
                         axis=['Time (ms)', 
                               'Amplitude (V)'])      

    ai.data.write_xy_file(filepath=PATH_PROC_DATA + 'xy_files/Channel_{}_SEV_TP.xy'.format(r),
                         data=[time, 
                               dh_hw.get('stdevent_tp', 'event')[c]],
                         title='Run {} Channel {} SEV TP'.format(RUN, r),
                         axis=['Time (ms)', 
                               'Amplitude (V)'])
                               
    ai.data.write_xy_file(filepath=PATH_PROC_DATA + 'xy_files/Channel_{}_NPS.xy'.format(r),
                         data=[dh_hw.get('noise', 'freq'), 
                               dh_hw.get('noise', 'nps')[c]],
                         title='Run {} Channel {} NPS'.format(RUN, r),
                         axis=['Frequency (Hz)', 
                               'Amplitude (a.u.)'])

### Baseline Resolution

To set a trigger threshold, we need the baseline resolution. This we determine by superposing the standard event to empty noise baselines and measuring the sigma of the, roughly Gaussian distributed, height reconstruction. Befor we start the simulation, we find the number of empty noise baselines in the data set.

In [None]:
dh_hw.get('noise', 'hours').shape

In [None]:
dh_hw.simulate_pulses(path_sim=PATH_PROC_DATA + FNAME_RESOLUTION + '.h5',
                      size_events=3000,  # should be below Nmbr of clean baselines, otherwise activate reuse_bl
                      reuse_bl=True,
                      ev_discrete_phs=[[1], [1]],
                      t0_interval=[-20, 20],  # in ms
                      rms_thresholds=[4e-6, 8e-6],
                      fake_noise=False)

For the simulated resolution data set, we determine the pulse height with the OF, the SEF and the raw pulse height.

In [None]:
dh_res = ai.DataHandler(nmbr_channels=2)
dh_res.set_filepath(path_h5=PATH_PROC_DATA, fname=FNAME_RESOLUTION, appendix=False)

dh_res.apply_of()
dh_res.calc_mp(type='events')
dh_res.apply_sev_fit(type='events', down=DOWN_SEF, verb=True, t0_bounds=(-25, 25), processes=PROCESSES)

Let's have a look at the reconstructed pulse height histograms, to check if they are Gaussian.

In [None]:
for c in H5_CHANNELS:
    dh_res.show_values(group='events', key='of_ph', bins=250, idx0=c, 
                   xlabel='Pulse Height (V)', ylabel='Counts', title='Channel {} Resolution OF'.format(c))
    dh_res.show_values(group='events', key='mainpar', bins=250, idx0=c, idx2=0,
                   xlabel='Pulse Height (V)', ylabel='Counts', title='Channel {} Resolution PH'.format(c))
    dh_res.show_values(group='events', key='sev_fit_par', bins=250, idx0=c, idx2=0,
                   xlabel='Pulse Height (V)', ylabel='Counts', title='Channel {} Resolution SEV Fit'.format(c))

Now we are ready to calculate the resolutions, which are the sigmas of above Gaussians.

In [None]:
resolutions_of, mus_of = dh_res.calc_resolution(pec_factors=None, ph_intervals=[(0,2), (0,2)], 
                                      use_tp=False, of_filter=True, sev_fit=False, fit_gauss=True)
resolutions_ph, mus_ph = dh_res.calc_resolution(pec_factors=None, ph_intervals=[(0,2), (0,2)], 
                                      use_tp=False, of_filter=False, sev_fit=False, fit_gauss=True)
resolutions_fit, mus_fit = dh_res.calc_resolution(pec_factors=None, ph_intervals=[(0,2), (0,2)], 
                                      use_tp=False, of_filter=False, sev_fit=True, fit_gauss=True)

We notice that the optimum filter, as estimator for the pulse height, is biased. We calculate the bias-correction factor as one over the mean of above Gaussians.

In [None]:
OF_CORRECTION = [1/0.991, 1/0.975]
print('These factors should be multiplied to OF outputs: ', OF_CORRECTION)

### Noise Trigger Rate

We find the treshold with 1 noise trigger per kg days with a fit.

In [None]:
dh_hw.apply_of(type='noise')

In [None]:
for c in H5_CHANNELS:
    cut = ai.cuts.LogicalCut(dh_hw.get('noise', 'of_ph')[c] < 0.02)

    dh_hw.estimate_trigger_threshold(channel=c,
                                  detector_mass=DETECTOR_MASS,
                                  allowed_noise_triggers=ALLOWED_NOISE_TRIGGERS,
                                  cut_flag=cut.get_flag(),
                                  ll=0,
                                  ul=20,
                                  yran=(0.1, 3e7),
                                  xran_hist=(2, 9),
                                  xran=(2, 14),
                                  bins=250,
                                  model='pollution_exponential'
                                  )

### Truncation Level

In stream triggering, we want to apply a trucated standard event fit and therefore need to determine the truncation level. This we do with the reconstruction error of a principal component analysis (PCA), i.e. a singular value decomposition. First we apply several cuts to get only clean events for the PCA calculation.

In [None]:
clean_events = ai.cuts.LogicalCut(initial_condition=np.abs(dh_hw.get('events', 'slope')[0]) < 0.2)
clean_events.add_condition(np.abs(dh_hw.get('events', 'slope')[1]) < 0.2)
clean_events.add_condition(dh_hw.get('events', 'pulse_height')[0] < 1) 
clean_events.add_condition(dh_hw.get('events', 'pulse_height')[1] < 1.5) 
clean_events.add_condition(dh_hw.get('events', 'onset')[0] < 20) 
clean_events.add_condition(dh_hw.get('events', 'onset')[0] > -20)
clean_events.add_condition(dh_hw.get('events', 'onset')[1] < 20) 
clean_events.add_condition(dh_hw.get('events', 'onset')[1] > -20)
print('Nmbr clean events: ', len(clean_events))

We choose a suitable number of components for the PCA reconstruction and apply it only to the clean events.

In [None]:
dh_hw.apply_pca(nmbr_components=PCA_COMPONENTS, down=DOWN_SEF, fit_idx=clean_events.get_idx())

The first and second principal components, i.e. eigenvectors, are the two most occuring, linearly independent, templates.

In [None]:
components = dh_hw.get('events', 'pca_components')

for c in H5_CHANNELS:
    plt.close()
    ai.styles.use_cait_style()
    for i, comp in enumerate(components[c]):
        plt.plot(comp, label='Component {}'.format(i+1))
    plt.title('Principal Components Channel {}'.format(c))
    ai.styles.make_grid()
    plt.xlabel('Sample Index')
    plt.ylabel('Amplitude (V)')
    plt.xlim(3000, 7000)
    plt.legend()
    plt.show()

To determine truncation levels, we plot the PCA reconstruction error vs. the pulse height. We typically see a pulse height, above which the error strongly increases. This is out truncation level for the channel.

In [None]:
x_ranges = [(0, 2), (0, 0.7)]
y_ranges = [(0, 0.0001), (0, 0.0001)]

for c, xr, yr in zip(H5_CHANNELS, x_ranges, y_ranges):
    dh_hw.show_scatter(groups=['events', 'events'],
                    keys=['mainpar', 'pca_error'],
                    title=None,
                    idx0s=[c, c],  # 0 is the phonon channel
                    idx2s=[0, None],
                    xlabel='Pulse Height (V)',
                    ylabel='PCA Reconstruction Error',
                    marker='.',
                    xran=xr,
                    yran=yr,
                    )

### Efficiency Data Set

For determining the cut efficiency later on, we simulate now a dataset of pulses continuously distributed throughout the pulse height range.

In [None]:
dh_hw.simulate_pulses(path_sim=PATH_PROC_DATA + FNAME_EFFICIENCY + '.h5',
                      size_events=10000,  # should be below Nmbr of clean baselines, otherwise activate reuse_bl
                      reuse_bl=True,
                      ev_ph_intervals=[[0, 1.6], [0, 0.3]],
                      t0_interval=[-15, 15],  # in ms
                      rms_thresholds=[100, 100],
                      fake_noise=False)

To have all pulse height estimation methods at hand, we calculate our usual estimators.

In [None]:
dh_eff = ai.DataHandler(run=RUN,
                    module=MODULE,
                    channels=RDT_CHANNELS)

dh_eff.set_filepath(path_h5=PATH_PROC_DATA,
                fname=FNAME_EFFICIENCY,
                appendix=False)

In [None]:
dh_eff.apply_of()
dh_eff.calc_mp(type='events')
dh_eff.apply_sev_fit(type='events', down=DOWN_SEF, verb=True, t0_bounds=(-25, 25), processes=PROCESSES)

### Training Data Set

In case we want to use anomaly detection methods later on, we build a data set of clean pulses. For these, we simulate baselines instead of using measured ones, to save the measured baselines for the efficiency calculation.

In [None]:
dh_hw.simulate_pulses(path_sim=PATH_PROC_DATA + FNAME_TRAINING + '.h5',
                      size_events=5000,  # should be below Nmbr of clean baselines, otherwise activate reuse_bl
                      reuse_bl=True,
                      ev_ph_intervals=[[0, 1.6], [0, 0.3]],
                      t0_interval=[-20, 20],  # in ms
                      rms_thresholds=[4e-6, 8e-6],
                      fake_noise=True)

To have our usual features ready, we calculate main parameters, fits and filters.

In [None]:
dh_train = ai.DataHandler(run=RUN,
                    module=MODULE,
                    channels=RDT_CHANNELS)

dh_train.set_filepath(path_h5=PATH_PROC_DATA,
                fname=FNAME_TRAINING,
                appendix=False)

In [None]:
dh_train.apply_of()
dh_train.calc_mp(type='events')
dh_train.calc_additional_mp(type='events')
dh_train.apply_sev_fit(type='events', down=DOWN_SEF, name_appendix='_down{}'.format(DOWN_SEF), 
                       verb=True, t0_bounds=(-25, 25), processes=PROCESSES)

This is all we gonna do on the hardware data. Proceed with the trigger script, afterwards do quality cuts and energy calibration in the stream notebook.