# Creating A WaveformReducer

The `WaveformReducer` functionality in CHECLabPy (and extract_dl1.py) is facilitated by two additional utilities: `column` and `WaveformReducerChain`. This tutorial will describe these objects, and show how to create your own simple `WaveformReducer`.

## WaveformReducer

An example of a `WaveformReducer` is shown below:

In [None]:
from CHECLabPy.core.reducer import WaveformReducer, column

class WaveformMaxReducer(WaveformReducer):
    @column
    def waveform_max(self):
        return self.waveforms.max(axis=1)

As you can see, it can be very simple to create a WaveformReducer.

There are 3 stages to a `WaveformReducer`:
1. When the `WaveformReducer` is initialised, the arguments passed to it dictate which of its columns are activated or disabled. For example `reducer = WaveformMaxReducer(waveform_max=False)` would disable the column for the above reducer.


In [None]:
reducer = WaveformMaxReducer(n_pixels=2048, n_samples=128, waveform_max=True)
print(reducer.active_columns)

reducer = WaveformMaxReducer(n_pixels=2048, n_samples=128, waveform_max=False)
print(reducer.active_columns)

2. To process an event, the `process` method is called. The first thing the reducer does is calculate the values that multiple of its columns require, and store them as members of the `WaveformReducer`. This is performed in the `_prepare` method, which is the first thing called by `process`. By default, the `_prepare` method simply attaches the waveform for the current event to the reducer, ready to be processed by the `columns`:

In [None]:
import numpy as np

reducer = WaveformMaxReducer(n_pixels=2048, n_samples=128, waveform_max=True)
waveforms = np.random.rand(2048, 128)
reducer._prepare(waveforms)
print((waveforms == reducer.waveforms).all())

(It is important to not that it is not necessary to call the `_prepare` method yourself, it is automatically called when calling `process`.

3. The active columns are looped through, and a dict containing the extracted values per pixel for each column is produced and returned:

In [None]:
import numpy as np

reducer = WaveformMaxReducer(n_pixels=2048, n_samples=128, waveform_max=True)
waveforms = np.random.rand(2048, 128)
params = reducer.process(waveforms)
print(params)
print(params['waveform_max'].shape)

## column

The purpose of the column decorator is to identify the items that are to be included as column in the extracted dl1 file. 

It is expected that a `column` returns a numpy array of size n_pixels, and uses the `self.waveforms` attribute to perform the calculation (or other pre-calculated attributes from the `_prepare` method.

No two columns can have the same name, even if they are in different `WaveformReducers`, ensuring that columns are unique. If a column with a duplicate name is defined in a different `WaveformReducer`, an error is raised:

In [None]:
from CHECLabPy.core.reducer import WaveformReducer, column

class WaveformMaxReducer2(WaveformReducer):
    @column
    def waveform_max(self):
        return self.waveforms.max(axis=1)

## Chain

The purpose of `Chain` is to loop over all defined WaveformReducers, and accumulate the column results for all activated columns. If a WaveformReducer has no active columns, it is skipped. This means that now multiple `WaveformReducers` can contribute to the same dl1 file.

The `Chain` class also defines which columns are active by default, and can also read a yaml configuration file, allowing the user to select the active columns from the command line by specifying a path to a config file. This config file path can be specified as an argument to `extract_dl1.py`.


In [None]:
from CHECLabPy.data import get_file
config_path = get_file("extractor_config.yml")

In [None]:
!echo "waveform_max: True\ncharge_averagewf: True" > $config_path
!cat $config_path

In [None]:
import numpy as np
from CHECLabPy.core.chain import WaveformReducerChain

chain = WaveformReducerChain(n_pixels=2048, n_samples=128, config_path=config_path)
waveforms = np.random.rand(2048, 128)
params = chain.process(waveforms)
print("\n", params)

As you can see from the print statement, the chain was correctly configured to contain both the `AverageWF.charge_averagewf` and `WaveformMaxReducer.waveform_max` column. The resulting dict from the `chain.process` method contains two items, with a name corresponding to the column, and an array of n_pixles as the value.

## scripts/generate_config.py

As described in the "2_Reducing_R1_to_DL1.ipynb" tutorial, this script produces a config file that can be used as input to `extract_dl1.py`. It also includes the docstring of each `WaveformReducer` and `column`, providing insight into what is stored in each column.