# MadMiner particle physics tutorial

# Appendix 1: Adding systematic uncertainties

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

In this tutorial we'll explain how to add systematic uncertainties to the MadMiner workflow. Note that the treatment of systematic uncertainties changed substantially with `MadMiner v0.6`, including changes to the MadMiner file specification. Please don't use files from older MadMiner versions with systematic uncertainties.

## Preparations

Before you execute this notebook, make sure you have running installations of MadGraph, Pythia, and Delphes.

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.core import MadMiner
from madminer.lhe import LHEReader
from madminer.sampling import SampleAugmenter
from madminer import sampling
from madminer.plotting import plot_systematics


Please enter here the path to your MG5 root directory.

In [2]:
mg_dir = '/Users/johannbrehmer/work/projects/madminer/MG5_aMC_v2_6_4'

MadMiner uses the Python `logging` module to provide additional information and debugging output. You can choose how much of this output you want to see by switching the level in the following lines to `logging.DEBUG` or `logging.WARNING`.

In [3]:
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.DEBUG
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

## 1. Parameters and benchmarks

We'll just load the MadMiner setup from the first part of this tutorial:

In [4]:
miner = MadMiner()
miner.load('data/setup.h5')

15:34 madminer.core        INFO    Found 2 parameters:
15:34 madminer.core        INFO       CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
15:34 madminer.core        INFO       CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
15:34 madminer.core        INFO    Found 6 benchmarks:
15:34 madminer.core        INFO       sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
15:34 madminer.core        INFO       w: CWL2 = 15.20, CPWL2 = 0.10
15:34 madminer.core        INFO       neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
15:34 madminer.core        INFO       ww: CWL2 = 0.30, CPWL2 = 15.10
15:34 madminer.core        INFO       neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
15:34 madminer.core        INFO       morphing_basis_vector_5: CWL2 = -1.68e+01, CPWL2 = -1.72e+01
15:34 madminer.core        INFO    Found morphing setup with 6 components
15:34 madminer.core        INFO    Did not find systematics setup.


## 2. Set up systematics, save settings

This is where things become interesting: We want to model systematic uncertainties. The main function is `add_systematics()`, the keyword `effect` determines how the effect of the nuisance parameters on the event weights is calculated. For `effect="norm"`, the nuisance parameter rescales thee overall cross section of one or multiple samples. For `effect="pdf"`, its effect is calculated with PDF variations. Finally, with `effect="scale"` scale variations are used.

Here we consider three nuisance parameters: one for the signal normalization, one for the background normalization, and one for scale uncertainties (which we here assume to be correlated between signal and background).

In [6]:
miner.add_systematics(effect="norm", systematic_name="signal_norm", norm_variation=1.1)
miner.add_systematics(effect="norm", systematic_name="bkg_norm", norm_variation=1.2)
miner.add_systematics(effect="scale", systematic_name="scales", scale="mu")

Again, we save our setup:

In [7]:
miner.save('data/setup_systematics.h5')

15:34 madminer.core        INFO    Saving setup (including morphing) to data/setup_systematics.h5


## 3. Run MadGraph

Now it's time to run MadGraph. MadMiner will instruct MadGraph to use its built-in `systematics` tool to calculate how the event weights change under the scale variation.

In [10]:
miner.run(
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/signal_systematics',
    proc_card_file='cards/proc_card_signal.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_file='cards/run_card_signal_small.dat',
    log_directory='logs/signal',
    python2_override=True,
    systematics=["signal_norm", "scales"],
)

15:43 madminer.utils.inter INFO    Generating MadGraph process folder from cards/proc_card_signal.dat at ./mg_processes/signal_systematics
15:43 madminer.core        INFO    Run 0
15:43 madminer.core        INFO      Sampling from benchmark: sm
15:43 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
15:43 madminer.core        INFO      Original Pythia8 card:   None
15:43 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_0.dat
15:43 madminer.core        INFO      Copied Pythia8 card:     None
15:43 madminer.core        INFO      Param card:              /madminer/cards/param_card_0.dat
15:43 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_0.dat
15:43 madminer.core        INFO      Log file:                run_0.log
15:43 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal_systematics//madminer/cards/param_card_0.dat, ./mg_processes/signal_systema

In [11]:
miner.run(
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/bkg_systematics',
    proc_card_file='cards/proc_card_background.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_file='cards/run_card_background.dat',
    log_directory='logs/background',
    python2_override=True,
    systematics=["bkg_norm", "scales"],
)

15:49 madminer.utils.inter INFO    Generating MadGraph process folder from cards/proc_card_background.dat at ./mg_processes/bkg_systematics
15:49 madminer.core        INFO    Run 0
15:49 madminer.core        INFO      Sampling from benchmark: sm
15:49 madminer.core        INFO      Original run card:       cards/run_card_background.dat
15:49 madminer.core        INFO      Original Pythia8 card:   None
15:49 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_0.dat
15:49 madminer.core        INFO      Copied Pythia8 card:     None
15:49 madminer.core        INFO      Param card:              /madminer/cards/param_card_0.dat
15:49 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_0.dat
15:49 madminer.core        INFO      Log file:                run_0.log
15:49 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/bkg_systematics//madminer/cards/param_card_0.dat, ./mg_processes/bkg_systematics//m

## 4. Load events from LHE file

When adding LHE or Delphes files, use the `systematics` keyword to list which systematic uncertainties apply to which sample:

In [4]:
lhe = LHEReader('data/setup_systematics.h5')

lhe.add_sample(
    lhe_filename='mg_processes/signal_systematics/Events/run_01/unweighted_events.lhe.gz',
    sampled_from_benchmark='sm',
    is_background=False,
    k_factor=1.1,
    systematics=["signal_norm", "scales"]
)

lhe.add_sample(
    lhe_filename='mg_processes/bkg_systematics/Events/run_01/unweighted_events.lhe.gz',
    sampled_from_benchmark='sm',
    is_background=True,
    k_factor=1.1,
    systematics=["bkg_norm", "scales"]
)

10:24 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
10:24 madminer.lhe         DEBUG   Adding event sample mg_processes/signal_systematics/Events/run_01/unweighted_events.lhe.gz
10:24 madminer.lhe         DEBUG   Adding event sample mg_processes/bkg_systematics/Events/run_01/unweighted_events.lhe.gz


The next steps are unaffected by systematics.

In [5]:
lhe.set_smearing(
    pdgids=[1,2,3,4,5,6,9,22,-1,-2,-3,-4,-5,-6],   # Partons giving rise to jets
    energy_resolution_abs=0.,
    energy_resolution_rel=0.1,
    pt_resolution_abs=None,
    pt_resolution_rel=None,
    eta_resolution_abs=0.1,
    eta_resolution_rel=0.,
    phi_resolution_abs=0.1,
    phi_resolution_rel=0.,
)

lhe.add_observable(
    'pt_j1',
    'j[0].pt',
    required=False,
    default=0.,
)
lhe.add_observable(
    'delta_phi_jj',
    'j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))',
    required=True,
)
lhe.add_observable(
    'met',
    'met.pt',
    required=True,
)

lhe.add_cut('(a[0] + a[1]).m > 124.')
lhe.add_cut('(a[0] + a[1]).m < 126.')
lhe.add_cut('pt_j1 > 30.')

10:24 madminer.lhe         DEBUG   Adding optional observable pt_j1 = j[0].pt with default 0.0
10:24 madminer.lhe         DEBUG   Adding required observable delta_phi_jj = j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))
10:24 madminer.lhe         DEBUG   Adding required observable met = met.pt
10:24 madminer.lhe         DEBUG   Adding cut (a[0] + a[1]).m > 124.
10:24 madminer.lhe         DEBUG   Adding cut (a[0] + a[1]).m < 126.
10:24 madminer.lhe         DEBUG   Adding cut pt_j1 > 30.


In [6]:
lhe.analyse_samples()
lhe.save('data/lhe_data_systematics.h5')

10:24 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal_systematics/Events/run_01/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with systematics signal_norm, scales
10:24 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
10:24 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal_systematics/Events/run_01/unweighted_events.lhe.gz
10:24 madminer.utils.inter DEBUG   Systematics setup: OrderedDict([(u'signal_norm', (u'norm', 1.1)), (u'scales', (u'scale', 'mu', '0.5,1.0,2.0'))])
10:24 madminer.utils.inter DEBUG   3 weight groups
10:24 madminer.utils.inter DEBUG   Extracting nuisance parameter information for systematic signal_norm
10:24 madminer.utils.inter DEBUG   Extracting nuisance parameter information for systematic scales
10:24 madminer.utils.inter DEBUG   Weight group: <Element 'weightgroup' at 0x115c09150>
10:2

10:25 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
10:25 madminer.utils.inter DEBUG   Adding nuisance benchmark None
10:25 madminer.utils.inter DEBUG   Benchmark morphing_basis_vector_5 already in benchmark_names_phys
10:25 madminer.utils.inter DEBUG   Benchmark neg_w already in benchmark_names_phys
10:25 madminer.utils.inter DEBUG   Benchmark neg_ww already in benchmark_names_phys
10:25 madminer.utils.inter DEBUG   Adding nuisance benchmark scales_nuisance_param_0_benchmark_0
10:25 madminer.utils.inter DEBUG   Adding nuisance benchmark scales_nuisance_param_0_benchmark_1
10:25 madminer.utils.inter DEBUG   Benchmark sm already in benchmark_names_phys
10:25 madminer.utils.inter DEBUG   Benchmark w already in benchmark_names_phys
10:25 madminer.utils.inter DEBUG   Benchmark ww already in benchmark_names_phys


AttributeError: 'NoneType' object has no attribute 'encode'

In [8]:
lhe.nuisance_parameters

OrderedDict([(u'signal_norm_nuisance_param_0', (u'signal_norm', None, None)),
             (u'scales_nuisance_param_0',
              (u'scales',
               u'scales_nuisance_param_0_benchmark_0',
               u'scales_nuisance_param_0_benchmark_1')),
             (u'bkg_norm_nuisance_param_0', (u'bkg_norm', None, None))])

### A look at distributions

The function `plot_systematics()` makes it easy to check the effect of the various nuisance parameters on a distribution:

In [None]:
_ = plot_systematics(
    filename='data/lhe_data_systematics.h5',
    theta=np.array([0.,0.]),
    observable="pt_j1",
    obs_label="$p_{T,j}$",
    obs_range=(20.,400.),
)

## 5. Sampling

To be continued...

## 6. Training

## 7. Fisher information

### Calculate Fisher information

### Plot Fisher contours

### "Profiled score"