# Helicity as a discriminator in a search for charged Higgs bosons

The goal of this exercise is to use theoretical knowledge of helicity to help us in experimental analysis. 

Helsinki Institute of Physics is involved in a search for charged Higgs bosons in the CMS collaboration. If one adds a second scalar doublet to the standard model, we get in total five physical Higgs bosons. Three of them are neutral and two have an electrical charge. Models predicting these bosons are called *two-Higgs-doublet models* and they are the first step towards for instance supersymmetry and therefore an observation of a charged Higgs boson would be clear evidence of Beyond the Standard Model theories. 

We will use two simulated data sets in our analysis: 

- Charged Higgs boson signal sample
- W + jet background sample 

With these we will study the decay of charged Higgs bosons into a tau lepton and a tau neutrino.


When searching for new particles, we need to impose cuts to data to increase the signal over background ratio. Signal over background ratio tells us how much signal and background we have after a cut. In other words when we impose a cut, we usually cut background but also some signal. We want to make sure that every cut takes a small amount of signal away and a large amount of background away. 

The helicity of tau leptons carries information about their origin, so it provides a useful handle for discrimination between the tau leptons originating from charged Higgs boson decays and those originating from a W decays. Therefore, we will use a variable called $R_{\tau}$ to discriminate between the signal and the background in this exercise. 

Neutrinos are always left-handed in the Standard Model. Tau leptons produced in a charged Higgs boson's decay are therefore always right-handed since a charged Higgs boson is a scalar particle. On the other hand (pun intended), when a vector boson W decays into a tau lepton, the tau lepton is forced to be left-handed since the neutrino is left-handed. It turns out that this influences the kinematics of the tau lepton decay. Emission of the leading track particle is more likely to be towards the tau lepton's direction of movement for the right-handed tau. Therefore in laboratory coordinates the leading tracks originally caused by the decay of the charged Higgs boson have on average a larger momentum than the ones caused by the decay of the W boson.

$R_{\tau}$ is defined as the leading track $p_T$ of a tau lepton divided by the $p_T$ of the tau lepton. In the data files, the momentum variables are called _LeadingTrackPt_ and _TauPt_. $R_{\tau}$ can then be calculated by

$$
\text{R_tau} = \frac{\text{LeadingTrackPt}}{\text{TauPt}}
$$

## Part 1 - Calculate $R_{\tau}$

This week's task is to find an optimal value for the $R_{\tau}$ that cuts the background without taking too much of the signal away. Use the Hplus1.csv and WJet1.csv in this task.

We begin with reading the data sets. Your first task is to calculate the $R_{\tau}$ values for both Hplus and Wjet datasets. Then calculate the average value of $R_{\tau}$ for both datasets.

<br>

$\color{red}{\text{Write the code below}}$

In [None]:
import pandas as pd
import numpy as np

# Read the data sets
# Calculate the values of R_tau and return the average values of R_tau for both data sets.

## Part 2 - Plotting the data

It's always a good idea to plot the data to understand what is going on. Use the R_tau distributions you obtained in the previous part and plot both of the datasets in a histogram.

<br>

$\color{red}{\text{Write the code below}}$


In [None]:
import matplotlib.pyplot as plt

# Plot the R_tau histograms for both datasets and compare them

## Part 3 - Finding the optimal cut

Finally, you should find the optimal $R_{\tau}$ cut that gives us the best purity.
The purity $P$ (or signal to background ratio) can be calculated by

$$
P = \frac{\text{(number of signal events after the cut)}}{\text{(number of background events after the cut)}}.
$$

Iterate over different $R_{\tau}$ cut values to find the optimal purity. Always calculate the purity only for the events that have a higher $R_{\tau}$ value than the cut. To find the optimal cut, try 1000 cuts evenly spaced between [0, 1]. If the cut is over 1 there aren't enough events left for further analysis. Calculate the optimal cut and the purity obtained with it.

<br>

$\color{red}{\text{Write the code below}}$


In [None]:
# Find the optimal R_tau cut