This notebook uses ATLAS Open Data https://opendata.atlas.cern to show you the steps to rediscover the Higgs boson yourself!

# What is the Higgs boson?

The Higgs boson is a fundamental particle predicted by the Standard Model. It is a manifestation of the Higgs field, which gives mass to the fundamental particles. 
However, it is incredibly hard to produce. At the LHC, a Higgs particle is produced about once every 10 billion collisions! 
This tiny fraction makes it very difficult to detect. 
Nevertheless, after years of data collection, the Higgs boson was finally discovered in 2012 by CMS and ATLAS experiments at CERN. In this tutorial, we shall be following their example.

# Detecting the Higgs

The Higgs boson can be produced in many different ways.  In particle physics, we describe these production modes using Feynman diagrams.  
These diagrams allow us to visualize particle processes while also acting as powerful tools for calculations.  

There are four main production modes of the Higgs boson, and their respective Feynman diagrams:

- **Gluon-gluon fusion** (top left)  
- **Vector boson fusion** (top right)  
- **Vector boson bremsstrahlung** (bottom left)  
- **Top-antitop fusion** (bottom right)  

<CENTER>
<img src="ggH.png" style="width:20%"> 
    
<img src="VBFH.png" style="width:15%">
</CENTER>



<CENTER>
<img src="WH.png" style="width:20%"> 
    
<img src="ttbarfusion.png" style="width:15%">
</CENTER>


## Higgs Boson Decay

The Higgs has a very short lifetime, on the order of \(10^{-22} \, \text{s}\). It decays almost instantaneously after production, so there is no hope of directly detecting the particle. Nevertheless, we can use the Standard Model to predict its decay products: photons, \(Z\)-bosons, quarks, etc., each with different probabilities. These **decay channels** can be used to identify the Higgs boson. 

In this notebook, we'll focus on one particular decay channel:  $ H \rightarrow ZZ^* \rightarrow \ell\ell\ell\ell $

<CENTER><img src="HZZ_feynman_1.png" style="width:20%"></CENTER>

We refer to this as our desired **signal**. Ideally, we would search for collisions that yield four leptons as products to identify the Higgs boson.  



## Background Processes

In addition to our signal, many other **background processes** can produce four leptons in the final state. The main background is: $ZZ^* \rightarrow \ell\ell\ell\ell$

Here, the decay products mimic those in the Higgs decay. This is known as an **irreducible background**.

<CENTER><img src="ZZllll.png" style="width:20%"></CENTER>

We can distinguish the Higgs boson signal by examining the total **invariant mass** of the lepton products. By conservation of energy and momentum, the invariant mass of the products should equal the Higgs mass $(125 \, \text{GeV})$, while background processes will have different invariant masses.

Other backgrounds include:
1. **Z + jets**: Additional leptons arise from misidentified jets.
2. **Top-antitop processes**: Leptons come from the semi-leptonic decay of heavy flavor.

<CENTER>
<img src="Zllll.png" style="width:20%"> 
<img src="ttbar.png" style="width:20%">
</CENTER>

---



Note: 
 $Z^*$ refers to a 
 boson that is off its mass shell. This means that its mass is not fixed to the 
 of a typical 
 boson.


# Let's Start ! 
Learn to process large data sets, 
Understand some general principles of a particle physics analysis,
Discover the Higgs boson!


In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import uproot as up
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import roc_curve, auc
import seaborn as sns
import joblib

from tensorflow import keras
from tensorflow.keras import layers

2024-11-29 06:34:48.617402: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-29 06:34:49.093331: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-29 06:34:49.294235: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-29 06:34:50.831694: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


1) Load signal and background training files. 
2) Convert into Dataframe
3) Specify Signal Class as 1 and Background Class as 0
4) Print the number of Signal and Background Events.
5) Concat signal and Backgorund dataframce as df, Also specify Class dataframe column as y_df
6) Choose the variables and plot the variables that you are going to use for training purpose
7) Plot the coorrelation between your variables
8) Define X_train and X_test using sklearn train_test_split function
9) Preprocess using StandardScalar
10) Calcualte scale_pos_weight
11) Train DNN
12) Print Accuracy, Precisison 
13) Plot ROC 
14) Plot DNN Score for train and test samples
15) Evaluation:  Load test signal, background and data validation files and convert into Dataframes for each
16) Make two dataframe for each samples: one with variables used in training and another eith varaibels used in the training + total_weight
17) Preprocess: eg: y_s_1 = scaler.transform(v_df_s)
18) Load pre-trained DNN Model and calculate DNN score for each samples
19) Save score as a column in corresponing sample dataframe
20) Plot Stacked histogram: signal, bakcgoround. Plot data point with error
21) Choose the bins where signal is concentrated and calulate significance (usually last few bins)
22) Report the Signifincae