# GERDADeepLearning.jl
This tutorial explains how to load and work with GERDA data.

Initialize the Julia packet and load the environment you want to work in.

In [2]:
using GERDADeepLearning
env = DLEnv();

Error in <UnknownClass::InitInterpreter()>: LLVM SYMBOLS ARE EXPOSED TO CLING! This will cause problems; please hide them or dlopen() them after the call to TROOT::InitInterpreter()!


This loads the configuration from config.json located in the same directory as the notebook.
If not already present, it will also create the folders data/, models/ and plots/ inside that directory.

Now let's have a look at the JSON configuration file. A simple file might look like this:

The configuration file defines the path of the GERDA data and the names of keylist files located in the same directory as the configuration file.
The `verbosity` influences how detailled the console output is going to be. A value of 0 hides all output, 2 is the default and higher values will produce information that might be useful for debugging.
The `cache` field determines whether the data should be cached in HDF5 and should be set to "true" in most cases.

The `pulses` section defines the selection of data, the preprocesing and the splitting into data sets.
The array of `detectors` can hold detector names or channel numbers (0-39). If empty, all detectors are processed.
The flags `test-pulses`, `baseline-events`, `unphysical-events`, `low-energy-events` and `failed-preprocessing` can be used to select a subgroup of events and take the values `exclude`, `include` or `only`. Events discarded in this step cannot be accessed later on. If you're in doubt, include the events, they can easily be removed later on.

The processing steps are listed under `preprocessing` and are executed in order. These are translated to function calls in signal_processing.jl.

The entries inside `sets` define the splitting of the data into multiple sets. The numbers define what fraction of the total data of a given keylist is put into which set. One number per keylist is required. The events for each set are chosen deterministically at random, so two preprocessing definitions with the same sets will have the same events in the same order.

Once the configuration is set up, loading the data is trivial.

In [3]:
pulses = getdata(env; preprocessing="pulses")

[1m[36mINFO: [39m[22m[36mRetrieving 'pulses' from cache.
[39m

DLData (120 subsets)


This step first checks the cache and only performs the preprocessing if not done before.

The returned `pulses` object represents the entirety of preprocessed data consisting of multiple libraries, one for each detector and data set.

We can get the number of events using `eventcount`

In [4]:
eventcount(pulses)

Let's only work with the training set of one detector:

In [5]:
# Only use data set "train"
data = filter(pulses, :set, "train") # creates a new data instance
data = pulses[:set=>"train"] # equivalent

# Only keep one detector
filter!(data, :detector_name, "GD00A") # modifies the given instance

# Only keep events above a certain energy
filter!(data, :E, E->E>1500);

Let's check if it worked

In [6]:
println(detectors(data))
println(data[:set])

String["GD00A"]
train


For performance reason the actual waveforms are not stored in memory until needed. This is why everything until now was so quick to compute.
The data is loaded into memory when properties or waveforms are accessed for the first time.

In [7]:
eventcount(data) # now the energy filter is applied

We can access the waveforms and properties of the data like this:

In [8]:
typeof(waveforms(data))

Array{Float32,2}

In [9]:
keys(data)

23-element Array{Symbol,1}:
 :top_level          
 :AoE                
 :baseline_level     
 :timestamp          
 :isTP               
 :multiplicity       
 :isMuVetoed         
 :ANN_mse_class      
 :isBL               
 :E                  
 :baseline_std       
 :keylist            
 :AoE_class          
 :ANN_alpha_class    
 :isLArVetoed        
 :FailedPreprocessing
 :preprocessing      
 :detector_id        
 :name               
 :waveform_type      
 :sampling_rate      
 :detector_name      
 :set                

In [13]:
data[:E] # Energy

146673-element Array{Float32,1}:
 2616.52
 2494.26
 2321.9 
 2353.51
 2614.68
 1651.39
 2487.44
 2611.43
 2309.75
 1993.03
 2386.22
 2187.7 
 2298.59
    ⋮   
 1957.24
 2347.6 
 1593.03
 2152.11
 2195.94
 1743.25
 1856.58
 2282.93
 2456.12
 1620.75
 2053.42
 2107.45