# MadMiner particle physics tutorial

# Part 4: Limit setting

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

In part 4 of this tutorial we will use the networks trained in step 3a and 3b to calculate the expected limits on our theory parameters.

## Preparations

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.limits import AsymptoticLimits


In [2]:
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.INFO
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

## 1. Preparations

In the end, what we care about are not plots of the log likelihood ratio, but limits on parameters. But at least under some asymptotic assumptions, these are directly related. MadMiner makes it easy to calculate p-values in the asymptotic limit with the `AsymptoticLimits` class in the `madminer.limits`: 

In [3]:
limits = AsymptoticLimits('data/lhe_data_shuffled.h5')
# limits = AsymptoticLimits('data/delphes_data_shuffled.h5')

13:45 madminer.analysis    INFO    Loading data from data/lhe_data_shuffled.h5
13:45 madminer.analysis    INFO    Found 2 parameters
13:45 madminer.analysis    INFO    Did not find nuisance parameters
13:45 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
13:45 madminer.analysis    INFO    Found 3 observables
13:45 madminer.analysis    INFO    Found 14839 events
13:45 madminer.analysis    INFO    Found morphing setup with 6 components


This class provids two high-level functions:
- `AsymptoticLimits.observed_limits()` lets us calculate p-values on a parameter grid for some observed events, and
- `AsymptoticLimits.expected_limits()` lets us calculate expected p-values on a parameter grid based on all data in the MadMiner file.

First we have to define the parameter grid on which we evaluate the p-values.

In [12]:
theta_ranges = ((-20., 20.), (-20., 20.))
resolutions = (25, 25)

In [13]:
p_values = {}

## 2. Expected limits based on rate or simple histograms

First, with `mode="rate"`, we can calculate expected limits based only on rate information:

In [14]:
_, p_values_expected_xsec, best_fit_expected_xsec = limits.expected_limits(
    mode="rate",
    theta_true=[0.,0.],
    theta_ranges=theta_ranges,
    resolutions=resolutions,
    luminosity=300000.0
)

13:47 madminer.limits      INFO    Calculating rate log likelihood
13:47 madminer.limits      INFO    Calculating p-values


`mode="histo"` calculates limits based on histograms. For now, there is not a lot of freedom in this step, the histogram binning is determined automatically.

In [15]:
_, p_values_expected_histo, best_fit_expected_histo = limits.expected_limits(
    mode="histo",
    hist_vars=["pt_j1"],
    include_xsec=False,
    theta_true=[0.,0.],
    theta_ranges=theta_ranges,
    resolutions=resolutions,
    luminosity=300000.0
)

13:47 madminer.limits      INFO    Setting up standard summary statistics
13:47 madminer.limits      INFO    Creating histogram with 20 bins for the summary statistics
13:47 madminer.limits      INFO    Building histogram with %s bins per parameter and %s bins per observable
13:47 madminer.analysis    INFO    Loading data from data/lhe_data_shuffled.h5
13:47 madminer.analysis    INFO    Found 2 parameters
13:47 madminer.analysis    INFO    Did not find nuisance parameters
13:47 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
13:47 madminer.analysis    INFO    Found 3 observables
13:47 madminer.analysis    INFO    Found 14839 events
13:47 madminer.analysis    INFO    Found morphing setup with 6 components
13:47 madminer.sampling    INFO    Extracting plain training sample. Sampling according to ('morphing_points', [array([-20., -20.]), array([-18.33333333, -20.        ]), array([-16.66666667, -20.        ]), array([-15., -20.]), array([-13.33333333, -20.        ]), 

13:47 madminer.sampling    INFO    Starting sampling serially
13:47 madminer.sampling    INFO    Sampling from parameter point 31 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 62 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 93 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 124 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 155 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 186 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 217 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 248 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 279 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 310 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 341 / 625
13:47 madminer.sampling    INFO    Sampling from parameter point 372 / 625
13:47 madminer.sampling    INFO    Sampli

TypeError: can only concatenate tuple (not "list") to tuple

## 3. Expected limits based on ratio estimators

Finally and perhaps most importantly, `mode="ml"` allows us to calculate limits based on any `ParamterizedRatioEstimator` instance like the ALICES estimator trained above:

In [None]:
theta_grid, p_values_expected_ml, best_fit_expected_ml = limits.expected_limits(
    theta_true=[0.,0.],
    theta_ranges=[(theta_min, theta_max), (theta_min, theta_max)],
    mode="ml",
    model_file='models/alices',
    include_xsec=False,
    resolution=resolution,
    luminosity=300000.0
)

## 4. Expected limits based on score estimators

## 5. Toy signal

Observed limits take as input actual data, which we here generate on the fly:

In [None]:
sampler = SampleAugmenter('data/madminer_example_shuffled.h5')
x_observed, _ = sampler.extract_samples_test(
    theta=sampling.morphing_point([0.,0.]),
    n_samples=5,
    folder=None,
    filename=None
)

In [None]:
_, p_values_observed, best_fit_observed = limits.observed_limits(
    x_observed=x_observed,
    theta_ranges=[(theta_min, theta_max), (theta_min, theta_max)],
    mode="ml",
    model_file='models/alices',
    include_xsec=True,
    resolution=resolution,
    luminosity=300000.0,
)

## 6. Plot

Let's plot the results:

In [None]:
bin_size = (theta_max - theta_min)/(resolution - 1)
edges = np.linspace(theta_min - bin_size/2, theta_max + bin_size/2, resolution + 1)
centers = np.linspace(theta_min, theta_max, resolution)

fig = plt.figure(figsize=(6,5))
ax = plt.gca()

cmin, cmax = 1.e-3, 1.
    
pcm = ax.pcolormesh(
    edges, edges, p_values_expected_ml.reshape((resolution, resolution)),
    norm=matplotlib.colors.LogNorm(vmin=cmin, vmax=cmax),
    cmap='Greys_r'
)
cbar = fig.colorbar(pcm, ax=ax, extend='both')

plt.contour(
    centers, centers, p_values_expected_xsec.reshape((resolution, resolution)),
    levels=[0.05],
    linestyles='-', colors='darkgreen'
)
plt.contour(
    centers, centers, p_values_expected_ml.reshape((resolution, resolution)),
    levels=[0.05],
    linestyles='-', colors='#CC002E'
)
plt.contour(
    centers, centers, p_values_expected_histo.reshape((resolution, resolution)),
    levels=[0.05],
    linestyles='-', colors='C1'
)
plt.contour(
    centers, centers, p_values_observed.reshape((resolution, resolution)),
    levels=[0.05],
    linestyles='--', colors='black'
)

plt.scatter(
    theta_grid[best_fit_expected_xsec][0], theta_grid[best_fit_expected_xsec][1],
    s=80., color='darkgreen', marker='*',
    label="xsec"
)
plt.scatter(
    theta_grid[best_fit_expected_ml][0], theta_grid[best_fit_expected_ml][1],
    s=80., color='#CC002E', marker='*',
    label="ALICES"
)
plt.scatter(
    theta_grid[best_fit_expected_histo][0], theta_grid[best_fit_expected_histo][1],
    s=80., color='C1', marker='*',
    label="Histo"
)
plt.scatter(
    theta_grid[best_fit_observed][0], theta_grid[best_fit_observed][1],
    s=80., color='black', marker='*',
    label="Observed"
)

plt.legend()

plt.xlabel(r'$\theta_0$')
plt.ylabel(r'$\theta_1$')
cbar.set_label('Expected p-value (ALICES)')

plt.tight_layout()
plt.show()
