# Exercise 1: Optimize lepton selection

* First, print the distributions of the relevant variables for *all* the Monte Carlo samples (i.e. all the *channels* of the $Z$-boson decay to be studied). Which variables are these? Give sensible ranges to include all the events in the samples (both MC and OPAL data) 
* Do the same for **one** of the OPAL data samples (your lab assistant will decide which one you choose).
* Describe the results.
* Optimize the object selection by applying cuts. Make a strategy on how to proceed to find the optimal selection. which information do you need?
* Determine the efficiency and the amount of background for each $Z$ decay channel. Use the simulated events $e^+e^-$, $\mu^+\mu^-$, $\tau^+\tau^-$ and hadrons ($qq$). Represent the result in a matrix form and think carefully about how you have to correct the measured rates. Don't forget to calculate the errors!
* How do we estimate the statistical fluctuations per bin?

#### Import libraries

In [1]:
import uproot
import awkward as ak
import mplhep
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt

#### Load the data

In [2]:
### Specify the folder path for MC samples
path_data = 'data/'

### Open the file introducing file path
mc_data_ee = uproot.open(path_data+'ee.root')
mc_data_mm = uproot.open(path_data+'mm.root')
mc_data_qq = uproot.open(path_data+'qq.root')
mc_data_tt = uproot.open(path_data+'tt.root')
ttree_name = 'myTTree'

### Print list of 'branches' of the TTree (i.e. list of variable names)
print(mc_data_ee[ttree_name].keys())

### Load branches
branches_ee = mc_data_ee[ttree_name].arrays()
branches_mm = mc_data_mm[ttree_name].arrays()
branches_qq = mc_data_qq[ttree_name].arrays()
branches_tt = mc_data_tt[ttree_name].arrays()


### List of variables (of interest)
variables=['Pcharged', 'Ncharged', 'E_ecal', 'E_hcal']
channels=['ee','mm','qq','tt'] 

# print(channels)
# print(len(channels))

### For later purposes (to calculate efficiency matrix) we store the data from braches into a list
mc_data = {'ee' : {}, 'mm' : {}, 'qq' : {}, 'tt' : {}}
mc_data['ee'] = branches_ee
mc_data['mm'] = branches_mm
mc_data['qq'] = branches_qq
mc_data['tt'] = branches_tt




### Read in the variables for all MC data samples
Pchar=ak.Array([branches_ee[variables[0]],branches_mm[variables[0]],branches_qq[variables[0]],branches_tt[variables[0]]])
Nchar=ak.Array([branches_ee[variables[1]],branches_mm[variables[1]],branches_qq[variables[1]],branches_tt[variables[1]]])
E_ecal=ak.Array([branches_ee[variables[2]],branches_mm[variables[2]],branches_qq[variables[2]],branches_tt[variables[2]]])
E_hcal=ak.Array([branches_ee[variables[3]],branches_mm[variables[3]],branches_qq[variables[3]],branches_tt[variables[3]]])

['run', 'event', 'Ncharged', 'Pcharged', 'E_ecal', 'E_hcal', 'E_lep', 'cos_thru', 'cos_thet']


#### Plot the MC data samples to nice looking histograms

In [3]:
# labels=[r'$Z^0$ $\to$ $e^+e^-$',r'$Z^0$ $\to$ $\mu^+\mu^-$',r'$Z^0$ $\to$ hadrons',r'$Z^0$ $\to$ $\tau^+\tau^-$']
# plt.style.use(mplhep.style.ATLAS) # load ATLAS plot style
# plt.title(r'PCharged: total sum of charged momenta')
# for i in np.arange(4):
#     plt.hist(ak.to_numpy(Pchar[i]),bins=np.arange(0,120,0.1),label=labels[i],alpha=0.7)
    
# plt.xlim(0,120)
# plt.ylim(0,2200)
# plt.xlabel(r'PCharged [Gev]')
# plt.legend()
# plt.savefig('pchar_hist.png')
# plt.show()

In [4]:
# plt.title(r'NCharged: charged multiplicity')
# for i in np.arange(4):
#     plt.hist(ak.to_numpy(Nchar[i]),bins=np.arange(0,60,1),label=labels[i],alpha=0.7)

# plt.xlim(0,45)
# plt.ylim(0,100000)
# plt.xlabel(r'NCharged')
# plt.legend()
# plt.savefig('nchar_hist.png')
# plt.show()

In [5]:
# plt.title(r'E_ECal: total energy in the ECal')
# for i in np.arange(4):
#     plt.hist(ak.to_numpy(E_ecal[i]),bins=np.arange(0,120,0.1),label=labels[i],alpha=0.7)

# plt.xlim(0,120)
# plt.ylim(0,5000)
# plt.xlabel(r'E_ECal [GeV]')
# plt.legend()
# plt.savefig('e_ecal_hist.png')
# plt.show()

In [6]:
# plt.title(r'E_HCal: total energy in the HCal')
# for i in np.array([0,1,2,3]):
#     plt.hist(ak.to_numpy(E_hcal[i]),bins=np.arange(0,80,0.08),label=labels[i],alpha=0.7)

# plt.xlim(0,80)
# plt.ylim(0,3000)
# plt.xlabel(r'E_HCal [Gev]')
# plt.legend()
# plt.savefig('e_hcal_hist.png')
# plt.show()

#### Cut strategy
1. $Ncharged\begin{cases} > 7 \Rightarrow Z^0 \to \text{hadrons}, \text{break}.\\
       < 7  \Rightarrow Z^0 \to \begin{cases}e^+e^- \\ \mu^+\mu^- \\ \tau^+\tau^- \end{cases}, \text{go to 2.}\end{cases}$ 
2. $E\text{_}ecal \begin{cases}> 70 \Rightarrow Z^0 \to e^+e^-, \text{break}.\\
 <70 \Rightarrow Z^0 \to \begin{cases}\mu^+\mu^-  \\ \tau^+\tau^- \end{cases},\text{go to 3.}\end{cases}$
3. $Pcharged \begin{cases} > 63  \Rightarrow Z^0 \to \mu^+\mu^-, \text{break}. \\
                            <63\Rightarrow Z^0 \to \tau^+\tau^-, \text{break}. \end{cases}$

In [7]:
### initial try for the cuts
cuts = {'ee' : {}, 'mm' : {}, 'tt' : {}, 'qq' : {}}

cuts['ee'] = {'Ncharged' : (0, 7), 'E_ecal' : (70,120)}
cuts['mm'] = {'Pcharged' : (63,120),'Ncharged' : (0, 7), 'E_ecal' : (0,70)}
cuts['qq'] = {'Ncharged' : (7, 100)}
cuts['tt'] = {'Pcharged' : (0,63),'Ncharged' : (0, 7), 'E_ecal' : (0,70)}
# print(len(cuts_init['ee']))
# for var,cut in cuts['ee'].items():
#     print(var)
#     print(cut[0])

In [8]:
from functions import get_efficiency

efficiency_matrix = get_efficiency(mc_data,channels,variables,cuts)
print(efficiency_matrix)

[[9.90869065e-01 5.00005298e-01 5.15406390e-02 4.99280430e-01]
 [4.75338124e-01 9.75076198e-01 3.69641752e-01 6.78255527e-01]
 [2.66518838e-04 0.00000000e+00 9.94724187e-01 1.23210544e-02]
 [5.34515966e-01 6.89022861e-01 5.65232389e-01 9.72050395e-01]]
