## Preparations

Let us first load all the python libraries again

In [1]:
import sys
import os
madminer_src_path = "/Users/felixkling/Documents/GitHub/madminer"
sys.path.append(madminer_src_path)

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
% matplotlib inline

from madminer.delphes import DelphesProcessor
from madminer.sampling import combine_and_shuffle
from madminer.utils.particle import MadMinerParticle

from madminer.fisherinformation import FisherInformation
from madminer.fisherinformation import project_information,profile_information

from madminer.plotting import plot_fisher_information_contours_2d

from madminer.sampling import SampleAugmenter
from madminer.sampling import constant_benchmark_theta, multiple_benchmark_thetas
from madminer.sampling import constant_morphing_theta, multiple_morphing_thetas, random_morphing_thetas
from madminer.ml import MLForge, EnsembleForge

from sklearn.metrics import mean_squared_error

## 10. Run MadMiner at Detector Level

Let's first define the input file, the number of samples and *effective* number of samples

In [2]:
lhedatafile = 'data/madminer_lhedata.h5'
detectordatafile='data/madminer_detectordata.h5'
pythiadatafile='data/madminer_pythiadata.h5'

nsamples = 100000

### 10a) Run the Data Augmentation and Machine Learning part with Detetector Level Data

First, we once again augment the data and machine learning part again. Here  `n_samples` should be choosen similar to the effective number of events, which also depends on the cuts choosen earlier. 

In [3]:
sa = SampleAugmenter(detectordatafile, debug=False)

n_estimators = 5

for i in range(n_estimators):
    x, theta, t_xz = sa.extract_samples_train_local(
        theta=constant_benchmark_theta('sm'),
        n_samples=int(nsamples/2),
        folder='./data/samples_detector/',
        filename='train{}'.format(i)
    )

x, theta, t_xz = sa.extract_samples_train_local(
    theta=constant_benchmark_theta('sm'),
    n_samples=int(nsamples/2),
    folder='./data/samples_detector/',
    filename='test',
    switch_train_test_events=True
)

10:01  
10:01  ------------------------------------------------------------
10:01  |                                                          |
10:01  |  MadMiner v0.1.1                                         |
10:01  |                                                          |
10:01  |           Johann Brehmer, Kyle Cranmer, and Felix Kling  |
10:01  |                                                          |
10:01  ------------------------------------------------------------
10:01  
10:01  Loading data from data/madminer_detectordata.h5
10:01  Found 2 parameters:
10:01     CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:01     CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:01  Found 6 benchmarks:
10:01     sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
10:01     w: CWL2 = 20.00, CPWL2 = 0.00e+00
10:01     morphing_basis_vector_2: CWL2 = -3.26e+01, CPWL2 = -4.46e+01
10:01     morphing_basis_vector_3: CWL2 = 8.14, CPWL2 = -3.50e+

Next, we use the perform the ML part

In [4]:
ensemble = EnsembleForge(estimators=n_estimators)
ensemble.train_all(
    method='sally',
    x_filename=['data/samples_detector/x_train{}.npy'.format(i) for i in range(n_estimators)],
    t_xz0_filename=['data/samples_detector/t_xz_train{}.npy'.format(i) for i in range(n_estimators)],
    
    n_epochs=100,
    n_hidden=(100,100,100,100),
    activation='tanh',
    initial_lr=0.001,
    final_lr=0.0001
)

ensemble.save('models/samples_detector')

10:01  Training 5 estimators in ensemble
10:01  Training estimator 1 / 5 in ensemble
10:01  Starting training
10:01    Method:                 sally
10:01    Training data: x at data/samples_detector/x_train0.npy
10:01                   t_xz (theta0) at  data/samples_detector/t_xz_train0.npy
10:01    Features:               all
10:01    Method:                 sally
10:01    Hidden layers:          (100, 100, 100, 100)
10:01    Activation function:    tanh
10:01    Batch size:             128
10:01    Trainer:                amsgrad
10:01    Epochs:                 100
10:01    Learning rate:          0.001 initially, decaying to 0.0001
10:01    Validation split:       None
10:01    Early stopping:         True
10:01    Scale inputs:           True
10:01    Shuffle labels          False
10:01    Regularization:         None
10:01  Loading training data
10:01  Found 50000 samples with 2 parameters and 21 observables
10:01  Rescaling inputs
10:01  Creating model for method sally
10:01  T

### 10b) Run the Data Augmentation and Machine Learning part with Pythia Level Data

Now we repeat the same procedure, but with the pythia level data

In [5]:
sa = SampleAugmenter(pythiadatafile, debug=False)

n_estimators = 5

for i in range(n_estimators):
    x, theta, t_xz = sa.extract_samples_train_local(
        theta=constant_benchmark_theta('sm'),
        n_samples=int(nsamples/2),
        folder='./data/samples_pythia/',
        filename='train{}'.format(i)
    )

x, theta, t_xz = sa.extract_samples_train_local(
    theta=constant_benchmark_theta('sm'),
    n_samples=int(nsamples/2),
    folder='./data/samples_pythia/',
    filename='test',
    switch_train_test_events=True
)

10:13  Loading data from data/madminer_pythiadata.h5
10:13  Found 2 parameters:
10:13     CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:13     CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:13  Found 6 benchmarks:
10:13     sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
10:13     w: CWL2 = 20.00, CPWL2 = 0.00e+00
10:13     morphing_basis_vector_2: CWL2 = -3.26e+01, CPWL2 = -4.46e+01
10:13     morphing_basis_vector_3: CWL2 = 8.14, CPWL2 = -3.50e+01
10:13     morphing_basis_vector_4: CWL2 = -3.51e+01, CPWL2 = 32.41
10:13     morphing_basis_vector_5: CWL2 = 5.86, CPWL2 = 39.09
10:13  Found 21 observables: px_l, px_v, px_a, py_l, py_v, py_a, pz_l, pz_a, e_l, e_a, pt_l, pt_v, pt_a, eta_l, eta_a, dphi_lv, dphi_la, dphi_va, m_la, mt, phi_resurrection
10:13  Found 77552 events
10:13  Found morphing setup with 6 components
10:13  Extracting training sample for local score regression. Sampling and score evaluation according to (u'benchmark

In [6]:
ensemble = EnsembleForge(estimators=n_estimators)
ensemble.train_all(
    method='sally',
    x_filename=['data/samples_pythia/x_train{}.npy'.format(i) for i in range(n_estimators)],
    t_xz0_filename=['data/samples_pythia/t_xz_train{}.npy'.format(i) for i in range(n_estimators)],
    
    n_epochs=100,
    n_hidden=(100,100,100,100),
    activation='tanh',
    initial_lr=0.001,
    final_lr=0.0001
)

ensemble.save('models/samples_pythia')

10:13  Training 5 estimators in ensemble
10:13  Training estimator 1 / 5 in ensemble
10:13  Starting training
10:13    Method:                 sally
10:13    Training data: x at data/samples_pythia/x_train0.npy
10:13                   t_xz (theta0) at  data/samples_pythia/t_xz_train0.npy
10:13    Features:               all
10:13    Method:                 sally
10:13    Hidden layers:          (100, 100, 100, 100)
10:13    Activation function:    tanh
10:13    Batch size:             128
10:13    Trainer:                amsgrad
10:13    Epochs:                 100
10:13    Learning rate:          0.001 initially, decaying to 0.0001
10:13    Validation split:       None
10:13    Early stopping:         True
10:13    Scale inputs:           True
10:13    Shuffle labels          False
10:13    Regularization:         None
10:13  Loading training data
10:13  Found 50000 samples with 2 parameters and 21 observables
10:13  Rescaling inputs
10:13  Creating model for method sally
10:13  Train

### 10c) Obtain the Fisher Info

Now let's evaluate the Fisher Info again

In [None]:
fisher_parton = FisherInformation(lhedatafile, debug=False)

fi_ml_mean, fi_ml_covariance = fisher_parton.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_ensemble',
    unweighted_x_sample_file='data/samples_ensemble/x_test.npy',
    luminosity=300*1000.
)

fi_metonly_mean, fi_metonly_covariance = fisher_parton.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_metonly',
    unweighted_x_sample_file='data/samples_ensemble/x_test.npy',
    luminosity=300*1000.
)

fi_truth_mean, fi_truth_covariance = fisher_parton.calculate_fisher_information_full_truth(
    theta=[0.,0.],
    luminosity=300*1000.
)

fisher_detector = FisherInformation(detectordatafile, debug=False)

fi_detector_mean, fi_detector_covariance = fisher_detector.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_detector',
    unweighted_x_sample_file='data/samples_detector/x_test.npy',
    luminosity=300*1000.
)

fisher_pythia = FisherInformation(pythiadatafile, debug=False)

fi_pythia_mean, fi_pythia_covariance = fisher_pythia.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/samples_pythia',
    unweighted_x_sample_file='data/samples_pythia/x_test.npy',
    luminosity=300*1000.
)

10:23  Loading data from data/madminer_lhedata.h5
10:23  Found 2 parameters:
10:23     CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:23     CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-50.0, 50.0))
10:23  Found 6 benchmarks:
10:23     sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
10:23     w: CWL2 = 20.00, CPWL2 = 0.00e+00
10:23     morphing_basis_vector_2: CWL2 = -3.26e+01, CPWL2 = -4.46e+01
10:23     morphing_basis_vector_3: CWL2 = 8.14, CPWL2 = -3.50e+01
10:23     morphing_basis_vector_4: CWL2 = -3.51e+01, CPWL2 = 32.41
10:23     morphing_basis_vector_5: CWL2 = 5.86, CPWL2 = 39.09
10:23  Found 26 observables: px_l, px_v, px_a, py_l, py_v, py_a, pz_l, pz_v, pz_a, e_l, e_v, e_a, pt_l, pt_v, pt_a, eta_l, eta_v, eta_a, dphi_lv, dphi_la, dphi_va, m_lv, m_lva, m_la, mtrans, phi_resurrection
10:23  Found 100000 events
10:23  Found morphing setup with 6 components
10:23  Evaluating rate Fisher information
10:23  Found ensemble with 5 estimators an

And let's plot the result

In [None]:
_ = plot_fisher_information_contours_2d(
    [fi_ml_mean, fi_metonly_mean, fi_detector_mean,fi_pythia_mean,fi_truth_mean ],
    [fi_ml_covariance, fi_metonly_covariance,fi_detector_covariance, fi_pythia_covariance,fi_truth_covariance],
    colors=[u'C0',u'C1',u'C2',u'C3',"black"],
    linestyles=["solid","solid","solid","solid","dashed"],
    inline_labels=["PL: all","PL: MET","Detector","Pythia","PL: truth"],
    xrange=(-15,15),
    yrange=(-5,5)
)