# Part 5: Detector Level

## Preparations

Let us first load all the python libraries again

In [1]:
import sys
import os
madminer_src_path = "/Users/felixkling/Documents/GitHub/madminer"
sys.path.append(madminer_src_path)

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
% matplotlib inline

from madminer.delphes import DelphesProcessor
from madminer.sampling import combine_and_shuffle
from madminer.utils.particle import MadMinerParticle

from madminer.fisherinformation import FisherInformation
from madminer.fisherinformation import project_information,profile_information

from madminer.plotting import plot_fisher_information_contours_2d

from madminer.sampling import SampleAugmenter
from madminer.sampling import constant_benchmark_theta, multiple_benchmark_thetas
from madminer.sampling import constant_morphing_theta, multiple_morphing_thetas, random_morphing_thetas
from madminer.ml import MLForge, EnsembleForge

from sklearn.metrics import mean_squared_error

Please enter here the path to your MG5 root directory. This notebook assumes that you installed Delphes and Pythia through MG5. 

**This needs to be updated by the user**

In [2]:
mg_dir = '/Users/felixkling/work/MG5_aMC_v2_6_2'

## 9.  Run detector simulation and extract observables

The `madminer.delphes` wraps around Delphes, a popular fast detector simulation. In addition to simulating the detector, it allows for the fast extraction of observables, which are saved in the MadMiner HDF5 file. The central object is an instance of the `DelphesProcessor` class, which has to be initialized with a MadMiner file:

In [3]:
dp = DelphesProcessor('data/madminer_example.h5')

20:32  
20:32  ------------------------------------------------------------
20:32  |                                                          |
20:32  |  MadMiner v0.1.0                                         |
20:32  |                                                          |
20:32  |           Johann Brehmer, Kyle Cranmer, and Felix Kling  |
20:32  |                                                          |
20:32  ------------------------------------------------------------
20:32  


After creating the DelphesProcessor object, one can add a number of HepMC event samples (the output of running MadGraph and Pythia) and have it run Delphes:

In [4]:
dp.add_hepmc_sample(
    'mg_processes/wgamma/Events/run_01/tag_1_pythia8_events.hepmc.gz',
    sampled_from_benchmark='sm'
)
#dp.add_hepmc_sample(
#    'mg_processes/background/Events/run_01/tag_1_pythia8_events.hepmc.gz',
#    sampled_from_benchmark='sm'
#)

dp.run_delphes(
    delphes_directory=mg_dir + '/Delphes',
    delphes_card='cards/delphes_card.dat',
    log_file='logs/wgamma/log_delphes.log',
    initial_command='source ~/.bashrc'
)

20:32  Running Delphes (/Users/felixkling/work/MG5_aMC_v2_6_2/Delphes) on event sample at mg_processes/wgamma/Events/run_01/tag_1_pythia8_events.hepmc.gz


The next step is the definition of observables through a name and a python expression. For the latter, you can use the objects `j[i]`, `e[i]`, `mu[i]`, `a[i]`, `met`, where the indices `i` refer to a ordering by the transverse momentum. All of these objects inherit from scikit-hep [LorentzVectors](http://scikit-hep.org/api/math.html#vector-classes), see the link for a documentation of their properties. In addition, they have `charge` and `pdg_id` properties.

There is an optional keyword `required`. If `required=True`, we will only keep events where the observable can be parsed, i.e. all involved particles have been detected. If `required=False`, un-parseable observables will be filled with the value of another keyword `default`.

In [5]:
dp.add_observable('px_l', 'mu[0].px',required=True)
dp.add_observable('px_v', 'met.px',required=True)
dp.add_observable('px_a', 'a[0].px',required=True)

dp.add_observable('py_l', 'mu[0].py',required=True)
dp.add_observable('py_v', 'met.py',required=True)
dp.add_observable('py_a', 'a[0].py',required=True)

dp.add_observable('pz_l', 'mu[0].pz',required=True)
dp.add_observable('pz_a', 'a[0].pz',required=True)

dp.add_observable('e_l', 'mu[0].e',required=True)
dp.add_observable('e_a', 'a[0].e',required=True)

dp.add_observable('pt_l', 'mu[0].pt',required=True)
dp.add_observable('pt_v', 'met.pt',required=True)
dp.add_observable('pt_a', 'a[0].pt',required=True)

dp.add_observable('eta_l', 'mu[0].eta',required=True)
dp.add_observable('eta_a', 'a[0].eta',required=True)

dp.add_observable('dphi_lv', 'mu[0].deltaphi(met)',required=True)
dp.add_observable('dphi_la', 'mu[0].deltaphi(a[0])',required=True)
dp.add_observable('dphi_va', 'met.deltaphi(a[0])',required=True)

dp.add_observable('m_la', '(mu[0] + a[0]).m',required=True)

We can also add cuts, again in parse-able strings. In addition to the objects discussed above, they can contain the observables:

In [6]:
dp.add_cut('pt_a > 250.')

The function `analyse_delphes_samples` then calculates all observables from the Delphes ROOT file(s) generated before and applies the cuts:

In [None]:
dp.analyse_delphes_samples()

20:36  Analysing Delphes sample mg_processes/wgamma/Events/run_01/tag_1_pythia8_events_delphes.root


The values of the observables and the weights are then saved in the HDF5 file. It is possible to overwrite the same file, or to leave the original file intact and save all the data into a new file as follows:

In [None]:
dp.save('data/madminer_detector_with_data.h5')

One side remark: For the detector simulation and calculation of observables, different users might have very different requirements. While a phenomenologist might be content with the fast detector simulation from Delphes, an experimental analysis might require the full simulation through Geant4. We therefore intend this part to be interchangeable.

To reduce disk usage, you can generate several small event samples with the steps given above, and combine them now. Note that (for now) it is essential that all of them are generated with the same setup, including the same benchmark points / morphing basis!

In our case we only have one sample, so this is not strictly necessary, but we still include it for completeness.

In [None]:
combine_and_shuffle(
    ['data/madminer_detector_with_data.h5'],
    'data/madminer_detector_shuffled.h5'
)

## 10. Run MadMiner at Detector Level

Let's first define the input file, the number of samples and *effective* number of samples

In [None]:
inputfile = 'data/madminer_detector_with_data.h5'
nsamples = 100000
nsampleseff = 50000

### 10a) Run the Data Augmentation and Machine Learning part

First, we once again augment the data and machine learning part again. Here  `n_samples` should be choosen similar to the effective number of events, which also depends on the cuts choosen earlier. 

In [None]:
sa = SampleAugmenter(inputfile, debug=False)

n_estimators = 5

for i in range(n_estimators):
    x, theta, t_xz = sa.extract_samples_train_local(
        theta=constant_benchmark_theta('sm'),
        n_samples=nsampleseff,
        folder='./data/samples_detector/',
        filename='train{}'.format(i)
    )

x, theta, t_xz = sa.extract_samples_train_local(
    theta=constant_benchmark_theta('sm'),
    n_samples=nsampleseff,
    folder='./data/samples_detector/',
    filename='test',
    switch_train_test_events=True
)

Next, we use the perform the ML part

In [None]:
ensemble = EnsembleForge(estimators=n_estimators)
ensemble.train_all(
    method='sally',
    x_filename=['data/samples_detector/x_train{}.npy'.format(i) for i in range(n_estimators)],
    t_xz0_filename=['data/samples_detector/t_xz_train{}.npy'.format(i) for i in range(n_estimators)]
)

ensemble.save('models/samples_detector')

### 10b) Obtain the Fisher Info

Now let's evaluate the Fisher Info again

In [None]:
fisher_parton = FisherInformation('data/madminer_example_shuffled.h5', debug=False)

fi_ml_mean, fi_ml_covariance = fisher_parton.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_ensemble',
    unweighted_x_sample_file='data/samples_ensemble/x_test.npy',
    luminosity=300*1000./nsamples
)

fi_metonly_mean, fi_metonly_covariance = fisher_parton.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_metonly',
    unweighted_x_sample_file='data/samples_ensemble/x_test.npy',
    luminosity=300*1000./nsamples
)

fi_truth_mean, fi_truth_covariance = fisher_parton.calculate_fisher_information_full_truth(
    theta=[0.,0.],
    luminosity=300*1000./nsamples
)

fisher_detector = FisherInformation(inputfile, debug=False)

fi_detector_mean, fi_detector_covariance = fisher_detector.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_detector',
    unweighted_x_sample_file='data/samples_detector/x_test.npy',
    luminosity=300*1000.
)

And let's plot the result

In [None]:
_ = plot_fisher_information_contours_2d(
    [fi_ml_mean, fi_metonly_mean, fi_detector_mean,fi_truth_mean ],
    [fi_ml_covariance, fi_metonly_covariance,fi_detector_covariance, fi_truth_covariance],
    colors=[u'C0',u'C1',u'C2',"black"],
    linestyles=["solid","solid","solid","dashed"],
    inline_labels=["ML-all","ML-MET","ML-Detector","truth"],
    xrange=(-15,15),
    yrange=(-5,5)
)