# MadMiner tutorial 2: SALLY, Fisher information, and ensemble methods

Johann Brehmer, Felix Kling, Kyle Cranmer 2018

In the first tutorial you saw how to set up a process with MadMiner, generate events and turn them into training samples, and construct likelihood ratio estimators from them. Here we introduce some alternative steps which will lead us to an estimator for the score at a reference point (SALLY) and the expected Fisher information. Along the way, we'll introduce some powerful ensemble methods.

If you're not familiar with SALLY, please have a look at ["Constraining Effective Field Theories With Machine Learning"](https://arxiv.org/abs/1805.00013)
or, for more details, ["A Guide to Constraining Effective Field Theories With Machine Learning"](https://arxiv.org/abs/1805.00020). For the Fisher information part, see the original publication ["Better Higgs Measurements Through Information Geometry"](https://arxiv.org/abs/1612.05261) or a more detailed, pedagogical introduction in Chapter 4 of ["New Ideas for Effective Higgs Measurements"](https://inspirehep.net/record/1624219).

## Preparations

Make sure you've run the first tutorial before executing this notebook!

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.sampling import SampleAugmenter, constant_benchmark_theta
from madminer.ml import MLForge, EnsembleForge
from madminer.fisherinformation import FisherInformation
from madminer.plotting import plot_fisher_information_contours_2d


Please enter here the path to your MG5 root directory. This notebook assumes that you installed Delphes and Pythia through MG5.

In [2]:
mg_dir = '/Users/johannbrehmer/work/projects/madminer/MG5_aMC_v2_6_4'

MadMiner uses the Python `logging` module to provide additional information and debugging output. You can choose how much of this output you want to see by switching the level in the following lines to `logging.DEBUG` or `logging.WARNING`.

In [3]:
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.INFO
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

## 1. to 5.: see tutorial 1.

We assume that you have run steps 1 through 5 of the first tutorial, and thus have a MadMiner file with observables and event weights ready to go!

## 6. Make (unweighted) training and test samples with augmented data

Again, the next step is handled by the MadMiner class `SampleAugmenter`.

In [4]:
sa = SampleAugmenter('data/madminer_example_shuffled.h5')

16:13 madminer.sampling    INFO    Loading data from data/madminer_example_shuffled.h5
16:13 madminer.sampling    INFO    Found 2 parameters
16:13 madminer.sampling    INFO    Did not find nuisance parameters
16:13 madminer.sampling    INFO    Found 6 benchmarks, of which 6 physical
16:13 madminer.sampling    INFO    Found 2 observables
16:13 madminer.sampling    INFO    Found 1018 events
16:13 madminer.sampling    INFO    Found morphing setup with 6 components


The relevant `SampleAugmenter` function for local score estimators is `extract_samples_train_local()`. As before, for the argument `theta` you can use the helper functions `constant_benchmark_theta()`, `multiple_benchmark_thetas()`, `constant_morphing_theta()`, `multiple_morphing_thetas()`, and `random_morphing_thetas()`.

In [5]:
x, theta, t_xz = sa.extract_samples_train_local(
    theta=constant_benchmark_theta('sm'),
    n_samples=100000,
    folder='./data/samples',
    filename='train0'
)

16:13 madminer.sampling    INFO    Extracting training sample for local score regression. Sampling and score evaluation according to ('benchmark', 'sm')
16:13 madminer.sampling    INFO    Effective number of samples: 44.0585179822994


## 7a. Train one network to estimate score and Fisher information

It's now time to build a neural network. Only this time, instead of the likelihood ratio itself, we will estimate the gradient of the log likelihood with respect to the theory parameters -- the score. To be precise, the output of the neural network is an estimate of the score at some reference parameter point, for instance the Standard Model. A neural network that estimates this "local" score can be used to calculate the Fisher information at that point. The estimated score can also be used as a machine learning version of Optimal Observables, and likelihoods can be estimated based on density estimation in the estimated score space. This method for likelihood ratio estimation is called SALLY, and there is a closely related version called SALLINO. Both are explained in ["Constraining Effective Field Theories With Machine Learning"](https://arxiv.org/abs/1805.00013) and ["A Guide to Constraining Effective Field Theories With Machine Learning"](https://arxiv.org/abs/1805.00020).

Again, the central object for this is the `madminer.ml.MLForge` class:

In [6]:
forge = MLForge(debug=False)

### Training

In [7]:
forge.train(
    method='sally',
    x_filename='data/samples/x_train0.npy',
    t_xz0_filename='data/samples/t_xz_train0.npy',
    n_epochs=50,
    batch_size=256,
    validation_split=None
)

forge.save('models/sally')

16:13 madminer.ml          INFO    Starting training
16:13 madminer.ml          INFO      Method:                 sally
16:13 madminer.ml          INFO      Training data: x at data/samples/x_train0.npy
16:13 madminer.ml          INFO                     t_xz (theta0) at  data/samples/t_xz_train0.npy
16:13 madminer.ml          INFO      Features:               all
16:13 madminer.ml          INFO      Method:                 sally
16:13 madminer.ml          INFO      Hidden layers:          (100, 100)
16:13 madminer.ml          INFO      Activation function:    tanh
16:13 madminer.ml          INFO      Batch size:             256
16:13 madminer.ml          INFO      Trainer:                amsgrad
16:13 madminer.ml          INFO      Epochs:                 50
16:13 madminer.ml          INFO      Learning rate:          0.001 initially, decaying to 0.0001
16:13 madminer.ml          INFO      Validation split:       None
16:13 madminer.ml          INFO      Early stopping:         True
1

### Evaluation

Let's evaluate the SM score on the test data

In [8]:
forge.load('models/sally')

t_hat = forge.evaluate(
    x_filename='data/samples/x_test.npy'
)

TypeError: evaluate() got an unexpected keyword argument 'x_filename'

Let's have a look at the estimated score and how it is related to the observables:

In [None]:
x = np.load('data/samples/x_test.npy')
skip=10

fig = plt.figure(figsize=(10,4))

for i in range(2):
    
    ax = plt.subplot(1,2,i+1)

    sc = plt.scatter(x[::skip,0], x[::skip,1], c=t_hat[::skip,i], s=10., cmap='viridis', vmin=-0.05, vmax=0.02)
    cbar = plt.colorbar(sc)

    cbar.set_label(r'$\hat{t}_' + str(i) + r'(x | \theta_{ref})$')
    plt.xlabel(r'$p_{T,j1}$ [GeV]')
    plt.ylabel(r'$\Delta \phi_{jj}$')
    plt.xlim(10.,400.)
    plt.ylim(0.,6.2)
    
plt.tight_layout()
plt.show()


### Fisher information

In addition, we can use SALLY estimators to estimate the expected Fisher information.

In [None]:
fisher = FisherInformation('data/madminer_example_shuffled.h5', debug=False)

In [None]:
fisher_information, _ = fisher.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/sally',
    unweighted_x_sample_file='data/samples/x_test.npy',
    luminosity=300000.
)

print('Kinematic Fisher information after 300 ifb:\n{}'.format(fisher_information))

We can visualize the Fisher information with contours:

In [None]:
_ = plot_fisher_information_contours_2d(
    [fisher_information],
    xrange=(-2.,2.),
    yrange=(-2.,2.)
)

## 7b. Ensemble methods

Instead of using a single neural network to estimate the likelihood ratio, score, or Fisher information, we can use an ensemble of such estimators. That provides us with a more reliable mean prediction as well as a measure of the uncertainty. The class `madminer.ml.EnsembleForge` automates this process. Currently, it only supports SALLY estimators:

In [None]:
ensemble = EnsembleForge(estimators=5)

### Training

The `EnsembleForge` object has very similar functions as `MLForge`. In particular, we can train all estimators simultaneously with `train_all()` and save the ensemble to files:

In [None]:
ensemble.train_all(
    method='sally',
    x_filename='data/samples/x_train0.npy',
    t_xz0_filename='data/samples/t_xz_train0.npy',
    n_epochs=20,
    batch_size=256,
    validation_split=None
)

ensemble.save('models/sally_ensemble')

### Evaluation

We can evaluate the ensemble similarly to the individual networks. Let's stick to the estimation of the Fisher information. There are two different ways to take the ensemble average:

- `mode='information'`: We can calculate the Fisher information for each estimator in the ensemble, and then take the mean and the covariance over the ensemble. This has the advantage that it provides a direct measure of the uncertainty of the prediction.
- `mode='score'`: We can calculate the score for each event and estimator, take the ensemble mean for the score of each event, and then calculate the Fisher information based on the mean scores. This is expected to be more precise (since the score estimates will be more precise, and the nonlinearity in the Fisher info calculation amplifies any error in the score estimation). But calculating the covariance in this approach is computationally not feasible, so there will be no error bands.

By default, MadMiner uses the 'score' mode. Here we will use the 'information' mode just to show the nice uncertainty bands we get.

In [None]:
fisher = FisherInformation('data/madminer_example_shuffled.h5', debug=False)

fisher_information_mean, fisher_information_covariance = fisher.calculate_fisher_information_full_detector(
    theta=[0.,0.],
    model_file='models/sally_ensemble',
    unweighted_x_sample_file='data/samples/x_test.npy',
    luminosity=300000.,
    mode='information'
)

The covariance can be propagated to the Fisher distance contour plot easily:

In [None]:
_ = plot_fisher_information_contours_2d(
    [fisher_information_mean],
    [fisher_information_covariance],
    xrange=(-1.,1.),
    yrange=(-1.,1.)
)

That's the end of our second tutorial. If you have questions, please have a look at the papers, the module documentation, or drop us an email.