In [19]:
import numpy as np
import matplotlib.pyplot as plt
from helpers import folding, unfold, percentage_difference

# Systematic Variations and Unfolding

In this notebook, the effect of systematic variations on reco or unfolded distributions is investigated. We will work through a simple 2 bin example to understand the following:
  * The effect of a detector uncertaintiy 
  * The effect of a detector uncertainity which only differs from the nominal by a different acceptance and efficiency - i.e. the migration matrix is the same
  * The effect of a signal modelling uncertainity
    * evaluating the uncertainity with two mehtods

The unfolding will be simple matrix inversion. 

#### Detector uncertainty
First we set up a nominal sample from a truth, $t_\text{nominal}$. Then define a migration matrix, $M_\text{nominal}$, acceptance $a_\text{nominal}$ and efficiency $\epsilon_\text{nominal}$. Using this the truth is folded to give us a reco $r_\text{nominal}$ and data, $N$, is generated by poission varying the reco.

In [85]:
t_nominal = [500, 300] 
M_nominal = np.array([[0.80, 0.20],
                      [0.20, 0.80]])  
a_nominal = [0.75, 0.78]
e_nominal = [0.92, 0.90]
r_nominal = folding(t_nominal, M_nominal, e_nominal, a_nominal)
print("r_nominal = %s" % str(r_nominal))

data = np.random.poisson(r_nominal)
print("data =  %s" % str(data))

r_nominal = [562.66666666666674, 394.87179487179492]
data =  [601 424]


Now we can unfold the data to give us our unfolded distribution using the following equation:

$t^{\text{unfold}}_i = \frac{1}{\epsilon_i}\sum_{j}M_{ij}^{-1}a_j N_j$

In [86]:
t_unfold = unfold(data, M_nominal, a_nominal, e_nominal)
print(" %s" % str(t_unfold))

 [533.43478260869563, 323.01111111111112]


Want to check closure

In [87]:
t_closure_check = unfold(r_nominal, M_nominal, a_nominal, e_nominal)
print(" %s" % str(t_closure_check))

 [500.00000000000006, 300.00000000000006]


We now invent a detector systematic variation. This sample has the same truth distribution but a different reco, $r_{\text{s}1}$. To begin with we will use one with a different migration matrix $M_{\text{s}1}$ but the same acceptance and efficiency.

In [88]:
M_s1 = np.array([[0.85, 0.15],
                 [0.15, 0.85]])  

r_s1 = folding(t_nominal, M_s1, e_nominal, a_nominal)
print("systematic_1 =  %s" % str(r_s1))

systematic_1 =  [575.33333333333337, 382.69230769230768]


The systematic variation in the reco is:
    

In [89]:
print("Systematic variation difference = %s %%" % str(percentage_difference(r_nominal, r_s1)))

Systematic variation difference = [-2.2511848341232157, 3.0844155844155994] %


Now we want to see the systematic variation in the unfolded regime. The $r_{\text{s}1}$ is unfolded with $M_{\text{nominal}}$, $a_{\text{nominal}}$ and $\epsilon_{\text{nominal}}$ then compared to $t_{\text{nominal}}$ to asses the systematic uncertainty.

In [90]:
t_s1_unfolded = unfold(r_s1, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s1_unfolded)))

Systematic variation difference unfolded = [-3.4420289855072497, 5.8641975308642031] %


In this case we find that the unfolded systematic uncertainity is larger. We now try a second example where the migration matrix and efficiency is the same as the nominal but the acceptance is different, $a_{\text{s}2}$.

In [91]:
a_s2 = [0.80, 0.85]
r_s2 = folding(t_nominal, M_nominal, e_nominal, a_s2)
t_s2_unfolded = unfold(r_s2, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_s2)))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s2_unfolded)))

Systematic variation difference at reco = [6.2500000000000124, 8.2352941176470509] %
Systematic variation difference unfolded = [5.8069053708440039, 9.2696078431372175] %


In this case we find that the unfolded systematic uncertainity is larger. We now try a third example where the migration matrix and efficiency is the same as the nominal but the efficiency is different, $e_{\text{s}3}$.

In [92]:
e_s3 = [0.88, 0.87]
r_s3 = folding(t_nominal, M_nominal, e_s3, a_nominal)
t_s3_unfolded = unfold(r_s3, M_nominal, a_nominal, e_nominal)
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_s3)))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_s3_unfolded)))

Systematic variation difference at reco = [4.2180094786729878, 3.6363636363636522] %
Systematic variation difference unfolded = [4.3478260869565135, 3.3333333333333335] %


From these three simple tests, the unfolded systematic variation is worse in all scenarios. The greatest difference seems to come from when there is a difference in the migration matrix. A different efficiency and acceptance has a smaller difference between reco and unfolded systematic variations.

#### Singal modelling uncertainty

We now use the same original nominal sample and invent a signal modelling sample with a different truth distribution $t_{\text{sm}1}$, reco $r_{\text{sm}1}$, migration matrix $M_{\text{sm}1}$, acceptance $a_{\text{sm}1}$ and efficiency $\epsilon_{\text{sm}1}$.

In [93]:
t_sm1 = [520, 310]
M_sm1 = np.array([[0.81, 0.19],
                    [0.19, 0.81]])
a_sm1 = [0.74, 0.77]
e_sm1 = [0.92, 0.91]
r_sm1 = folding(t_sm1, M_sm1, e_sm1, a_sm1)
print("r_sm1 = %s" % str(r_sm1))


r_sm1 = [596.08513513513515, 414.80129870129878]


At reco level, the ucnertainty is:

In [94]:
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_sm1)))

Systematic variation difference at reco = [-5.9393012680927253, -5.0470821386405875] %


We now have two ways to assess the systematic uncertainty from the signal modelling variation in the unfoled regime:
  1. Unfold $r_{\text{sm}1}$ with the nominal migration matrix, acceptance and efficiency and then compare this to $t_{\text{sm}1}$.
  1. Unfold $r_\text{nominal}$ with the signal modelling migration matrix, acceptance and efficeincy and comepare this to $t_\text{nominal}$.
  
These should be equivalent.

##### Method 1

In [95]:
t_sm1_unfold = unfold(r_sm1, M_nominal, a_nominal, e_nominal)
print("t_sm1_unfold = %s" % str(t_sm1_unfold))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_sm1, t_sm1_unfold)))

t_sm1_unfold = [530.69217116608411, 313.74674096174101]
Systematic variation difference unfolded = [-2.0561867627084824, -1.2086261166906487] %


##### Method 2

In [96]:
t_nominal_unfold = unfold(r_nominal, M_sm1, a_sm1, e_sm1)
print("t_nominal_unfold = %s" % str(t_nominal_unfold))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_nominal_unfold)))

t_nominal_unfold = [489.99413816664855, 296.29671238604243]
Systematic variation difference unfolded = [2.0011723666702892, 1.2344292046525234] %


Giving similar answers but with a sign flip per bin. Now to look at a signal modelling uncertainity which only has a different migration matrix $M_{\text{sm}2}$.

In [101]:
t_sm2 = [520, 310]
M_sm2 = np.array([[0.78, 0.22],
                  [0.22, 0.78]])
a_sm2 = a_nominal
e_sm2 = e_nominal
r_sm2 = folding(t_sm2, M_sm2, e_sm2, a_sm2)
print("r_sm2 = %s" % str(r_sm2))
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_sm2)))

print("Method 1")
t_sm2_unfold = unfold(r_sm2, M_nominal, a_nominal, e_nominal)
print("t_sm2_unfold = %s" % str(t_sm2_unfold))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_sm2, t_sm2_unfold)))
print("Method 2")
t_nominal_unfold_sm2 = unfold(r_nominal, M_sm2, a_sm2, e_sm2)
print("t_nominal_unfold_sm2 = %s" % str(t_nominal_unfold_sm2))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_nominal_unfold_sm2)))


r_sm2 = [579.37599999999998, 413.93333333333339]
Systematic variation difference at reco = [-2.9696682464454796, -4.8272727272727298] %
Method 1
t_sm2_unfold = [512.77536231884051, 317.38518518518526]
Systematic variation difference unfolded = [1.3893534002229793, -2.382317801672666] %
Method 2
t_nominal_unfold_sm2 = [507.37577639751566, 292.46031746031753]
Systematic variation difference unfolded = [-1.4751552795031331, 2.5132275132274913] %


Now to look at a signal modelling uncertainity which only has a different acceptance $a_{\text{sm}3}$.

In [104]:
t_sm3 = [520, 310]
M_sm3 = M_nominal
e_sm3 = e_nominal
a_sm3 = [0.70, 0.65]
r_sm3 = folding(t_sm3, M_sm3, e_sm3, a_sm3)

print("r_sm3 = %s" % str(r_sm3))
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_sm3)))

print("Method 1")
t_sm3_unfold = unfold(r_sm3, M_nominal, a_nominal, e_nominal)
print("t_sm3_unfold = %s" % str(t_sm3_unfold))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_sm3, t_sm3_unfold)))
print("Method 2")
t_nominal_unfold_sm3 = unfold(r_nominal, M_sm3, a_sm3, e_sm3)
print("t_nominal_unfold_sm3 = %s" % str(t_nominal_unfold_sm3))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_nominal_unfold_sm3)))

r_sm3 = [626.45714285714291, 490.58461538461535]
Systematic variation difference at reco = [-11.337169939065667, -24.238961038961016] %
Method 1
t_sm3_unfold = [542.28819875776401, 392.88190476190465]
Systematic variation difference unfolded = [-4.2861920688007702, -26.736098310291823] %
Method 2
t_nominal_unfold_sm3 = [477.82608695652175, 234.37037037037044]
Systematic variation difference unfolded = [4.4347826086956506, 21.87654320987652] %


Now to look at a signal modelling uncertainity which only has a different efficiency $e_{\text{sm}4}$.

In [109]:
t_sm4 = [520, 310]
M_sm4 = M_nominal
e_sm4 = [0.85, 0.88]
a_sm4 = a_nominal
r_sm4 = folding(t_sm4, M_sm4, e_sm4, a_sm4)

print("r_sm4 = %s" % str(r_sm4))
print("Systematic variation difference at reco = %s %%" % str(percentage_difference(r_nominal, r_sm4)))

print("Method 1")
t_sm4_unfold = unfold(r_sm4, M_nominal, a_nominal, e_nominal)
print("t_sm4_unfold = %s" % str(t_sm4_unfold))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_sm3, t_sm3_unfold)))
print("Method 2")
t_nominal_unfold_sm4 = unfold(r_nominal, M_sm4, a_sm4, e_sm4)
print("t_nominal_unfold_sm4 = %s" % str(t_nominal_unfold_sm4))
print("Systematic variation difference unfolded = %s %%" % str(percentage_difference(t_nominal, t_nominal_unfold_sm4)))

r_sm4 = [544.21333333333337, 393.12820512820508]
Systematic variation difference at reco = [3.2796208530805759, 0.44155844155846513] %
Method 1
t_sm4_unfold = [480.43478260869568, 303.11111111111103]
Systematic variation difference unfolded = [-4.2861920688007702, -26.736098310291823] %
Method 2
t_nominal_unfold_sm4 = [541.17647058823536, 306.81818181818187]
Systematic variation difference unfolded = [-8.2352941176470722, -2.2727272727272898] %


With a change in effiicency only, the two methods do not give the same result. I think this makes sense.