In [2]:
import numpy as np
import matplotlib.pyplot as plt
from helpers import percentage_difference, compare_method

Transfer function vs Folding - simple 2 bin example
----------------------------------------------------

We will work through a simple 2 bin example and look at whether using a transfer function or full folding gives a more "*correct*" answer.

##### Question: 
We have a **system 1** where we have a `reco_1`, `truth_1`, migration matrix `M_1` and a transfer function `TF_1`.

We then have a second system, **system 2**, again with `reco_2`, `truth_2`, migration matrix `M_2` and a transfer function `TF_2`.

We want to know if we can use the `M_1` and `TF_1` from **system 1** to get the correct `reco_2` from `truth_2`

First we will look at two simple cases: `M_1=M_2` and `M_1!=M_2`


`M_1 = M_2`
----------------------------------------------------

In [3]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])
reco_1 = np.matmul(M_1, truth_1)
TF_1 = reco_1/truth_1
print("Our truth_1 = %s and our reco_1 = %s" % (str(truth_1), str(reco_1)))
print("The migration matrix M_1 = ")
print(np.matrix(M_1))
print("The transfer function TF_1 = %s" % str(TF_1))

Our truth_1 = [50 20] and our reco_1 = [ 44.  26.]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 0.88  1.3 ]


In [4]:
truth_2 = np.array([45, 25])
M_2 = M_1
reco_2 = np.matmul(M_2, truth_2)
TF_2 = reco_2/truth_2
print("Our truth_2 = %s and our reco_2 = %s" % (str(truth_2), str(reco_2)))
print("The migration matrix M_2 = ")
print(np.matrix(M_2))
print("The transfer function TF_2 = %s" % str(TF_2))

Our truth_2 = [45 25] and our reco_2 = [ 41.  29.]
The migration matrix M_2 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_2 = [ 0.91111111  1.16      ]


##### If we now apply `M_1` and `TF_1` to `truth_2`, which method is closest the the actual `reco_2`?

In [5]:
reco_2_folded = np.matmul(M_1, truth_2)
reco_2_transfer = truth_2 * TF_1
print(reco_2_transfer)
print("Difference between folded and actual reco_2 = %s %%" % str(percentage_difference(reco_2, reco_2_folded)))
print("Difference between transfer function and actual reco_2 = %s %%" % str(percentage_difference(reco_2, reco_2_transfer)))
reco_2_folded_80 = reco_2_folded
reco_2_transfer_80  = reco_2_transfer

[ 39.6  32.5]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [3.4146341463414602, -12.068965517241379] %


With both migration matricies being equal, as expected the folding method gives exact closure. The transfer function however gives quite a different result. This would suggest that if the migration matrix of the two systems are the same, the folding method will be more accurate.

If we now look at how this changes with the diagonality of the migration matrix.

##### 70% diagonal

In [6]:
M_1 = np.array([[0.7, 0.3],
                [0.3, 0.7]])
compare_method(truth_1, M_1, truth_2, M_1)


Our truth_1 = [50 20] and our reco_1 = [ 41.  29.]
The migration matrix M_1 = 
[[ 0.7  0.3]
 [ 0.3  0.7]]
The transfer function TF_1 = [ 0.82  1.45]
Our truth_2 = [45 25] and our reco_2 = [ 39.  31.]
The migration matrix M_2 = 
[[ 0.7  0.3]
 [ 0.3  0.7]]
The transfer function TF_2 = [ 0.86666667  1.24      ]
[ 36.9   36.25]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [5.3846153846153886, -16.93548387096774] %


##### 90% diagonal

In [7]:
M_1 = np.array([[0.9, 0.1],
                [0.1, 0.9]])
compare_method(truth_1, M_1, truth_2, M_1)

Our truth_1 = [50 20] and our reco_1 = [ 47.  23.]
The migration matrix M_1 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_1 = [ 0.94  1.15]
Our truth_2 = [45 25] and our reco_2 = [ 43.  27.]
The migration matrix M_2 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_2 = [ 0.95555556  1.08      ]
[ 42.3   28.75]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [1.6279069767441927, -6.4814814814814685] %


As expected, changing the diagonality of the migration matrix makes no difference to the folding approach - it always closes. 

For the transfer function, we find that the more diagonal the migration matrix, the smaller the error. This is summarised in the table below:

| % diagonality | Bin 1 %difference | Bin 2 %difference  |
| -------------: |-------------:    | -----:             |
| 70%          | 5.3              | -16.9                |
| 80%          | 3.4              | -12.1                |
| 90%          | 1.6              | -6.5                 |

There is one special case where the transfer will be the same for both systems. This is where the difference between the two systems is just a scale. i.e.:

In [8]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])
truth_2 = 1.1 * truth_1

compare_method(truth_1, M_1, truth_2, M_1)

Our truth_1 = [50 20] and our reco_1 = [ 44.  26.]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 0.88  1.3 ]
Our truth_2 = [ 55.  22.] and our reco_2 = [ 48.4  28.6]
The migration matrix M_2 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_2 = [ 0.88  1.3 ]
[ 48.4  28.6]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [0.0, 0.0] %


`M_1 != M_2`
----------------------------------------------------
Now we will look at the case where the two migration matricies are not equal. In this case we will keep **system 1** the same but change **system 2**.

In [9]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])

truth_2 = np.array([45, 25])
M_2 = np.array([[0.9, 0.1],
                [0.1, 0.9]])

compare_method(truth_1, M_1, truth_2, M_2)

Our truth_1 = [50 20] and our reco_1 = [ 44.  26.]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 0.88  1.3 ]
Our truth_2 = [45 25] and our reco_2 = [ 43.  27.]
The migration matrix M_2 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_2 = [ 0.95555556  1.08      ]
[ 39.6  32.5]
Difference between folded and actual reco_2 = [4.6511627906976747, -7.4074074074074066] %
Difference between transfer function and actual reco_2 = [7.9069767441860437, -20.37037037037037] %


This makes sense. There is only a scale difference between the two systems so the ratio will stay consatnt for each bin and therefore the transfer function will close. This is a very unlikley case and therefore can be ignored for practical purposes.
##### If we now apply `M_1` and `TF_1` to `truth_2`, which method is closest the the actual `reco_2`?
What we see is that there is now a difference in the folded case and the transfer function case. In this particular case where `M_1` is less diagonal than `M_2`, the folding method is more accurate. In the opposie situation where `M_1` is more diagonal than `M_2`:

In [10]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.9, 0.1],
                [0.1, 0.9]])

truth_2 = np.array([45, 25])
M_2 = np.array([[0.8, 0.2],
                [0.2, 0.8]])
compare_method(truth_1, M_1, truth_2, M_2)


Our truth_1 = [50 20] and our reco_1 = [ 47.  23.]
The migration matrix M_1 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_1 = [ 0.94  1.15]
Our truth_2 = [45 25] and our reco_2 = [ 41.  29.]
The migration matrix M_2 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_2 = [ 0.91111111  1.16      ]
[ 42.3   28.75]
Difference between folded and actual reco_2 = [-4.8780487804878048, 6.8965517241379306] %
Difference between transfer function and actual reco_2 = [-3.1707317073170662, 0.86206896551725365] %


We find that the opposite is true, the transfer function is more accurate. In this case we are using significantly different migration matricies when in fact for our situation they are likely much closer. Trying with more similar migration matricies:

In [11]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.82, 0.18],
                [0.18, 0.82]])

truth_2 = np.array([45, 25])
M_2 = np.array([[0.8, 0.2],
                [0.2, 0.8]])

compare_method(truth_1, M_1, truth_2, M_2)

Our truth_1 = [50 20] and our reco_1 = [ 44.6  25.4]
The migration matrix M_1 = 
[[ 0.82  0.18]
 [ 0.18  0.82]]
The transfer function TF_1 = [ 0.892  1.27 ]
Our truth_2 = [45 25] and our reco_2 = [ 41.  29.]
The migration matrix M_2 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_2 = [ 0.91111111  1.16      ]
[ 40.14  31.75]
Difference between folded and actual reco_2 = [-0.97560975609755751, 1.3793103448275814] %
Difference between transfer function and actual reco_2 = [2.0975609756097549, -9.4827586206896548] %


In this case where the migration matricies are much closer, the folding method is more accurate. The same case where `M_1` and `M_2` are now swapped:

In [12]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])

truth_2 = np.array([45, 25])
M_2 = np.array([[0.82, 0.18],
                [0.18, 0.82]])

compare_method(truth_1, M_1, truth_2, M_2)

Our truth_1 = [50 20] and our reco_1 = [ 44.  26.]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 0.88  1.3 ]
Our truth_2 = [45 25] and our reco_2 = [ 41.4  28.6]
The migration matrix M_2 = 
[[ 0.82  0.18]
 [ 0.18  0.82]]
The transfer function TF_2 = [ 0.92   1.144]
[ 39.6  32.5]
Difference between folded and actual reco_2 = [0.96618357487922368, -1.3986013986013937] %
Difference between transfer function and actual reco_2 = [4.3478260869565144, -13.63636363636363] %


Again the folding method is more accurate. It seems that the closer the two migration matricies are, the better the folding method will be compared to the transfer function. This makes sense as we have already shown that when the migration matricies are the same, the folding closes whereas the transfer function has an error.
# Conclusions
The take away messages from this are:
* If the two migration matricies are the same, the folding method will always be more accurate.

  * The transfer fucntion method is more accurate the more diagonal the migration matrix is
  * In the special case where the two truth systems only vary by a constant scale, the transfer function will also close, but this is an unlikley case
  
* If the two migration matricies are different and the difference between the migration matricies is quite large, the best option depends on the numbers involved.

  * In the case where the two migration matricies are similar (as in our scenario) the folding method is better. This follows from the first argument.

Next is to add the efficiency and acceptance terms but I think the overall conclusions will be the same.




# Inclusion of acceptance and efficiency 
The same process will be investigated but with the inclusion of acceptance and efficiency. These values used for acceptance and efficiency are similar to those found in our case.

In [26]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])
acceptance_1 = [0.76, 0.68]
efficiency_1 = [0.97, 0.98]


truth_2 = np.array([45, 25])
M_2 = M_1
acceptance_2 = acceptance_1
efficiency_2 = efficiency_1

compare_method(truth_1, M_1, truth_2, M_2, acceptance_1, efficiency_1, acceptance_2, efficiency_2)


Our truth_1 = [50 20] and our reco_1 = [56.21052631578948, 37.32352941176471]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 1.12421053  1.86617647]
Our truth_2 = [45 25] and our reco_2 = [52.39473684210526, 41.661764705882355]
The migration matrix M_2 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_2 = [ 1.16432749  1.66647059]
[ 50.58947368  46.65441176]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [3.4455047714716067, -11.983762795623031] %


We again see a perfect closure for the folding and a fairly large difference for the transfer function. If we now look at how this changes with the diagonality of the migration matrix.
##### 70%

In [29]:
M_1 = np.array([[0.7, 0.3],
                [0.3, 0.7]])
M_2 = M_1
compare_method(truth_1, M_1, truth_2, M_2, acceptance_1, efficiency_1, acceptance_2, efficiency_2)


Our truth_1 = [50 20] and our reco_1 = [52.407894736842103, 41.573529411764696]
The migration matrix M_1 = 
[[ 0.7  0.3]
 [ 0.3  0.7]]
The transfer function TF_1 = [ 1.04815789  2.07867647]
Our truth_2 = [45 25] and our reco_2 = [49.874999999999993, 44.47794117647058]
The migration matrix M_2 = 
[[ 0.7  0.3]
 [ 0.3  0.7]]
The transfer function TF_2 = [ 1.10833333  1.77911765]
[ 47.16710526  51.96691176]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [5.4293628808864307, -16.837493800628199] %


##### 90%

In [28]:
M_1 = np.array([[0.9, 0.1],
                [0.1, 0.9]])
M_2 = M_1
compare_method(truth_1, M_1, truth_2, M_2, acceptance_1, efficiency_1, acceptance_2, efficiency_2)


Our truth_1 = [50 20] and our reco_1 = [60.013157894736842, 33.073529411764703]
The migration matrix M_1 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_1 = [ 1.20026316  1.65367647]
Our truth_2 = [45 25] and our reco_2 = [54.914473684210527, 38.845588235294116]
The migration matrix M_2 = 
[[ 0.9  0.1]
 [ 0.1  0.9]]
The transfer function TF_2 = [ 1.22032164  1.55382353]
[ 54.01184211  41.34191176]
Difference between folded and actual reco_2 = [0.0, 0.0] %
Difference between transfer function and actual reco_2 = [1.6437043249071446, -6.4262729509748153] %


This can be summarised in the table below:

| % diagonality | Bin 1 %difference | Bin 2 %difference  |
| -------------: |-------------:    | -----:             |
| 70%          | 5.4              | -16.8                |
| 80%          | 3.4              | -11.99               |
| 90%          | 1.6              | -6.4                 |

We see almost identical results as using only the response matrix


`M_1 != M_2`
----------------------------------------------------
Now we will look at the case where the two migration matricies are not equal and also the acceptance and efficiency is changed. In this case we will keep **system 1** the same but change **system 2**.

In [35]:
truth_1 = np.array([50, 20])
M_1 = np.array([[0.8, 0.2],
                [0.2, 0.8]])
acceptance_1 = [0.76, 0.68]
efficiency_1 = [0.97, 0.98]

truth_2 = np.array([45, 25])
M_2 = np.array([[0.82, 0.18],
                [0.18, 0.82]])
acceptance_1 = [0.77, 0.67]
efficiency_1 = [0.96, 0.97]

compare_method(truth_1, M_1, truth_2, M_2, acceptance_1, efficiency_1, acceptance_2, efficiency_2)



Our truth_1 = [50 20] and our reco_1 = [54.909090909090907, 37.492537313432827]
The migration matrix M_1 = 
[[ 0.8  0.2]
 [ 0.2  0.8]]
The transfer function TF_1 = [ 1.09818182  1.87462687]
Our truth_2 = [45 25] and our reco_2 = [52.898684210526298, 41.098529411764702]
The migration matrix M_2 = 
[[ 0.82  0.18]
 [ 0.18  0.82]]
The transfer function TF_2 = [ 1.17552632  1.64394118]
[ 49.41818182  46.86567164]
Difference between folded and actual reco_2 = [3.2455741656547148, -1.8302768192885577] %
Difference between transfer function and actual reco_2 = [6.5795632619003772, -14.03247832117188] %


For this example, realistic values for the migration matrix, acceptance and efficiency terms are used. This is such that the difference between the two systems in reasonably small. Overall we again see that the folding method produces a more accurate result.