# A MadMiner Example Analysis -  Analyzing dim6 operators in $W\gamma$ 

## Preparations

Let us first load all the python libraries again

In [1]:
import sys
import os
madminer_src_path = "/Users/felixkling/Documents/GitHub/madminer"
sys.path.append(madminer_src_path)

from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import matplotlib
import math
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
% matplotlib inline

from madminer.delphes import DelphesProcessor
from madminer.sampling import combine_and_shuffle
from madminer.utils.particle import MadMinerParticle

from madminer.sampling import SampleAugmenter
from madminer.sampling import constant_benchmark_theta, multiple_benchmark_thetas
from madminer.sampling import constant_morphing_theta, multiple_morphing_thetas, random_morphing_thetas
from madminer.ml import MLForge, EnsembleForge

from sklearn.metrics import mean_squared_error

Please enter here the path to your MG5 root directory. This notebook assumes that you installed Delphes and Pythia through MG5. 

**This needs to be updated by the user**

In [2]:
mg_dir = '/Users/felixkling/work/MG5_aMC_v2_6_2'

## 4.  Run detector simulation and extract observables

### 4a) Initialize DelphesProcessor 

The `madminer.delphes` wraps around Delphes, a popular fast detector simulation. In addition to simulating the detector, it allows for the fast extraction of observables, which are saved in the MadMiner HDF5 file. The central object is an instance of the `DelphesProcessor` class, which has to be initialized with a MadMiner file:

In [3]:
dp = DelphesProcessor('data/madminer_example.h5')

09:36  
09:36  ------------------------------------------------------------
09:36  |                                                          |
09:36  |  MadMiner v0.1.1                                         |
09:36  |                                                          |
09:36  |           Johann Brehmer, Kyle Cranmer, and Felix Kling  |
09:36  |                                                          |
09:36  ------------------------------------------------------------
09:36  


### 4b) Run Delphes

After creating the DelphesProcessor object, one can add a number of HepMC event samples (the output of running MadGraph and Pythia) and have it run Delphes (the hepmc sample is mainly needed to ensure that the the benchmark points are assigned correctly):

In [4]:
dp.add_hepmc_sample(
    'mg_processes/wgamma/Events/run_01/tag_1_pythia8_events.hepmc.gz',
    sampled_from_benchmark='sm'
)

dp.run_delphes(
    delphes_directory=mg_dir + '/Delphes',
    delphes_card='cards/delphes_card.dat',
    log_file='logs/wgamma/log_delphes.log',
    initial_command='source ~/.bashrc'
)

09:36  Running Delphes (/Users/felixkling/work/MG5_aMC_v2_6_2/Delphes) on event sample at mg_processes/wgamma/Events/run_01/tag_1_pythia8_events.hepmc.gz


The next step is the definition of observables through a name and a python expression. For the latter, you can use the objects `j[i]`, `e[i]`, `mu[i]`, `a[i]`, `met`, where the indices `i` refer to a ordering by the transverse momentum. All of these objects inherit from scikit-hep [LorentzVectors](http://scikit-hep.org/api/math.html#vector-classes), see the link for a documentation of their properties. In addition, they have `charge` and `pdg_id` properties.

There is an optional keyword `required`. If `required=True`, we will only keep events where the observable can be parsed, i.e. all involved particles have been detected. If `required=False`, un-parseable observables will be filled with the value of another keyword `default`.

### 4c) Extract Detector Level Data

Let's first define again some functions for observables

In [5]:
def calculate_mt(leptons, photons, jets, met):
    # Particles
    if len(leptons) < 1:
        raise RuntimeError()
    l = leptons[0]
    
    # Transverse mass and Delta
    cos_delta_phi = np.cos(l.phi() - met.phi())
    mt = (2 * l.pt * met.pt * (1. - cos_delta_phi))**0.5
    
    return mt

In [6]:
def calculate_phi(leptons, photons, jets, met):
    # Parameters
    mw = 80.4
    
    # Particles
    if len(leptons) < 1 or len(photons) < 1:
        raise RuntimeError()
    
    l = leptons[0]
    a = photons[0]
    
    # Transverse mass and Delta
    mt = calculate_mt(leptons, photons, jets, met)
    deltasq = 0.
    if met.pt > 0. and l.pt > 0.:
        deltasq = (mw**2 - mt**2) / (2. * met.pt * l.pt)
    
    # v reconstruction, "normal" case
    if deltasq > 0.:
        # Two solutions
        temp = np.log(1 + deltasq**0.5 * (2 + deltasq)**0.5 + deltasq)
        eta_v_plus = l.eta + temp
        eta_v_minus = l.eta - temp
        
        # Randomly select one of them
        dice = np.random.rand()
        if dice > 0.5:
            eta_v = eta_v_plus
        else:
            eta_v = eta_v_minus
            
    # v reconstruction, "other" case
    else:
        eta_v = l.eta
        
    # v particle
    v = MadMinerParticle()
    v.setptetaphim(met.pt, eta_v, met.phi(), 0.)
    
    # W and Wgamma reconstruction
    w = l + v
    vv = w + a
    
    # Boost into VV frame
    v_ = v.boost(vv.boostvector)
    l_ = l.boost(vv.boostvector)
    a_ = a.boost(vv.boostvector)
    w_ = w.boost(vv.boostvector)
    r_ = vv # vv.boost(vv.boostvector)

    # Calculate axes of "special frame" (1708.07823)
    z_ = w_.vector.unit()
    x_ = (r_.vector - z_ * r_.vector.dot(z_)).unit()
    y_ = z_.cross(x_)
    
    # Calculate x and y components of lepton wrt special x_, y_, z_ system
    lx_ = l_.vector.dot(x_)
    ly_ = l_.vector.dot(y_)
    
    # Calculate phi
    phi = math.atan2(ly_, lx_)
    
    return phi
    

In [7]:
dp.add_observable('px_l', 'mu[0].px',required=True)
dp.add_observable('px_v', 'met.px',required=True)
dp.add_observable('px_a', 'a[0].px',required=True)

dp.add_observable('py_l', 'mu[0].py',required=True)
dp.add_observable('py_v', 'met.py',required=True)
dp.add_observable('py_a', 'a[0].py',required=True)

dp.add_observable('pz_l', 'mu[0].pz',required=True)
dp.add_observable('pz_a', 'a[0].pz',required=True)

dp.add_observable('e_l', 'mu[0].e',required=True)
dp.add_observable('e_a', 'a[0].e',required=True)

dp.add_observable('pt_l', 'mu[0].pt',required=True)
dp.add_observable('pt_v', 'met.pt',required=True)
dp.add_observable('pt_a', 'a[0].pt',required=True)

dp.add_observable('eta_l', 'mu[0].eta',required=True)
dp.add_observable('eta_a', 'a[0].eta',required=True)

dp.add_observable('dphi_lv', 'mu[0].deltaphi(met)',required=True)
dp.add_observable('dphi_la', 'mu[0].deltaphi(a[0])',required=True)
dp.add_observable('dphi_va', 'met.deltaphi(a[0])',required=True)

dp.add_observable('m_la', '(mu[0] + a[0]).m',required=True)

dp.add_observable_from_function('mt',calculate_mt,required=True)
dp.add_observable_from_function('phi_resurrection',calculate_phi,required=True)

We can also add cuts, again in parse-able strings. In addition to the objects discussed above, they can contain the observables:

In [8]:
dp.add_cut('pt_a > 250.')

The function `analyse_delphes_samples` then calculates all observables from the Delphes ROOT file(s) generated before and applies the cuts:

In [9]:
dp.analyse_delphes_samples()

09:40  Analysing Delphes sample mg_processes/wgamma/Events/run_01/tag_1_pythia8_events_delphes.root
09:43    77552 / 100000 events pass everything


The values of the observables and the weights are then saved in the HDF5 file. It is possible to overwrite the same file, or to leave the original file intact and save all the data into a new file as follows:

In [10]:
dp.save('data/madminer_detectordata_0.h5')

One side remark: For the detector simulation and calculation of observables, different users might have very different requirements. While a phenomenologist might be content with the fast detector simulation from Delphes, an experimental analysis might require the full simulation through Geant4. We therefore intend this part to be interchangeable.

To reduce disk usage, you can generate several small event samples with the steps given above, and combine them now. Note that (for now) it is essential that all of them are generated with the same setup, including the same benchmark points / morphing basis!

In our case we only have one sample, so this is not strictly necessary, but we still include it for completeness.

In [11]:
combine_and_shuffle(
    ['data/madminer_detectordata_0.h5'],
    'data/madminer_detectordata.h5'
)

09:43  Copying setup from data/madminer_detectordata_0.h5 to data/madminer_detectordata.h5
09:43  Loading samples from file 1 / 1 at data/madminer_detectordata_0.h5


### 4d) Extract Pythia Level Data

We can also use the pythia level data from the hepmc file. This can be done with the option `generator_truth=False` in `DelphesProcessor.analyse_delphes_samples()` 

In [12]:
dp.analyse_delphes_samples()
dp.save('data/madminer_pythiadata_0.h5')
combine_and_shuffle(
    ['data/madminer_pythiadata_0.h5'],
    'data/madminer_pythiadata.h5'
)

09:43  Analysing Delphes sample mg_processes/wgamma/Events/run_01/tag_1_pythia8_events_delphes.root
09:46    77552 / 100000 events pass everything
09:46  Copying setup from data/madminer_pythiadata_0.h5 to data/madminer_pythiadata.h5
09:46  Loading samples from file 1 / 1 at data/madminer_pythiadata_0.h5
