In [19]:
import numpy as np
import matplotlib.pyplot as plt
from helpers import folding, unfold, percentage_difference

# Systematic Variations and Unfolding

In this notebook, the effect of systematic variations on reco or unfolded distributions is investigated. We will work through a simple 2 bin example to understand the following:
  * The effect of a detector uncertaintiy 
  * The effect of a detector uncertainity which only differs from the nominal by a different acceptance and efficiency - i.e. the migration matrix is the same
  * The effect of a signal modelling uncertainity
    * evaluating the uncertainity with two mehtods

The unfolding will be simple matrix inversion. 

First we set up a nominal sample from a truth, $t_\text{nominal}$. Then define a migration matrix, $M_\text{nominal}$, acceptance $a_\text{nominal}$ and efficiency $\epsilon_\text{nominal}$. Using this the truth is folded to give us a reco $r_\text{nominal}$ and data, $N$, is generated by poission varying the reco.

In [10]:
t_nominal = [500, 300] 
M_nominal = np.array([[0.80, 0.20],
                      [0.20, 0.80]])  
a_nominal = [0.75, 0.78]
e_nominal = [0.92, 0.90]
r_nominal = folding(t_nominal, M_nominal, e_nominal, a_nominal)
print("r_nominal = %s" % str(r_nominal))

data = np.random.poisson(r_nominal)
print("data =  %s" % str(data))

r_nominal = [562.66666666666674, 394.87179487179492]
data =  [588 376]


Now we can unfold the data to give us our unfolded distribution using the following equation:

$t^{\text{unfold}}_i = \frac{1}{\epsilon_i}\sum_{j}M_{ij}^{-1}a_j N_j$

In [11]:
t_unfold = unfold(data, M_nominal, a_nominal, e_nominal)
print(" %s" % str(t_unfold))

 [532.86956521739125, 271.15555555555557]


We now invent a detector systematic variation. This sample has the same truth distribution but a different reco, $r_{\text{s}1}$. To begin with we will use one with a different migration matrix $M_{\text{s}1}$ but the same acceptance and efficiency.

In [27]:
M_s1 = np.array([[0.85, 0.15],
                 [0.15, 0.85]])  

r_s1 = folding(t_nominal, M_s1, e_nominal, a_nominal)
print("systematic_1 =  %s" % str(r_s1))

systematic_1 =  [575.33333333333337, 382.69230769230768]


The systematic variation in the reco is:
    

In [28]:
print("Systematic variation difference = %s %%" % str(percentage_difference(r_nominal, r_s1)))

Systematic variation difference = [-2.2511848341232157, 3.0844155844155994] %


Now we want to see the systematic variation in the unfolded regime. The $r_{\text{s}1}$ is unfolded with $M_{\text{nominal}}$, $a_{\text{nominal}}$ and $\epsilon_{\text{nominal}}$ then compared to $t_{\text{nominal}}$ to asses the systematic uncertainty.

In [29]:
t_s1_unfolded = unfold(r_s1, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s1_unfolded)))

Systematic variation difference unfolded = [-3.4420289855072497, 5.8641975308642031] %


In this case we find that the unfolded systematic uncertainity is larger. We now try a second example where the migration matrix and efficiency is the same as the nominal but the acceptance is different, $a_{\text{s}2}$.

In [33]:
a_s2 = [0.80, 0.85]
r_s2 = folding(t_nominal, M_nominal, e_nominal, a_s2)
t_s2_unfolded = unfold(r_s2, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_s2)))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s2_unfolded)))

Systematic variation difference at reco = [6.2500000000000124, 8.2352941176470509] %
Systematic variation difference unfolded = [5.8069053708440039, 9.2696078431372175] %


In this case we find that the unfolded systematic uncertainity is larger. We now try a third example where the migration matrix and efficiency is the same as the nominal but the efficiency is different, $e_{\text{s}3}$.

In [34]:
e_s3 = [0.88, 0.87]
r_s3 = folding(t_nominal, M_nominal, e_s3, a_nominal)
t_s3_unfolded = unfold(r_s3, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_s3)))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s3_unfolded)))

Systematic variation difference at reco = [4.2180094786729878, 3.6363636363636522] %
Systematic variation difference unfolded = [4.3478260869565135, 3.3333333333333335] %


From these three simple tests, the unfolded systematic variation is worse in all scenarios. The greatest difference seems to come from when there is a difference in the migration matrix. A different efficiency and acceptance has a smaller difference between 