# Higgs Classification

	In this lab we are working with LHC particle physics data that contains 100k jets of information recorded by the Large Hadron Collider (LHC). The LHC is the world's largest and highest-energy particle collider and was created by CERN through international collaboration from 1998 to 2008. This device, which lies 175 meters below the surface of the earth near Geneva and covers 27 kilometers in circumference was created in efforts of testing many different theories of particle physics. The collider is equipped with four crossing points with 7 detectors placed around these points, each designed for certain kinds of research, and this device was most notably employed to discover the Higgs Boson particle.
	The Higgs Boson was a theoretical elementary particle of the Standard Model of particle physics and came out of neccessity following the theory of Higgs field. This quantum field, 'Higg's field', was the the most popular explanation at the time for what gave particles their mass. However, due to wave-particle duality all quantum fields, according to this theory, have an associated fundamental particle the Higgs Boson. Because Higgs Boson is responsible for the mass of particles it was though that is the Higgs Boson did exist its mass could be predicted based on its effects on the properties of other particles. The Higgs Boson is often called the "God Particle" because it is regarded as the final missing piece to the explanation of the Standard Model which helps explain three pivotal phenomenom: electromagetic interactions, strong interactions, and the weak nuclear force.
	The LHC works essentially by firing two particles at each other at speeds close to the speed of light. In its 27km circumference there are lined nearly 9000 superconducting magnets with accelerating structures to boost the energy of the particle it encounters. The strong magnetic fields created by these magnets are used to direct two particles at opposite directions of each other to speed them up for their eventual collision and the other end of the tunnel. More specialized magnets are used closer to the collision point in order to squeeze the particles closer togther to increase the probability of them colliding. With the tiny sizes of the particle the task of having two particles collide is similar to having two needles collide at nearly the sped of light. Expanding on this, the collision of high energy particles can produce jets of elementary particles which are then sectioned off and analyzed.
	The idea of proving the Higgs Boson using the LHC was as follows. These scientists were proving the existence of Higgs field and if it was real then after colliding two particles at high speeds then the the higgs fielf should ripple and hopefully shoot off a particle which would then be identified as the Higgs Boson. However, another difficulty with identifying this particle was that the Higgs Boson was calculated to be unstable and disintegrate in a fraction of a second. Thus scientists were tasked with searching for the finger print of the Higgs Boson through decay products it leaves behind.

- pt: Transverse Momentum = sqrt((Px)^2 + (Py)^2)
	* This is the component of momentum that is transverse, or perpendicular, to the beam line and this is important because momentum along the beam line can easily be the result of left over particles however the transverse momentum gives a strong description of what happens at the vertex of the beam. Where the z-axis is along the beam line the transverse momentum would be dependent upon px and py which are the momentum perpendicular to the beam line. The supplied equation supports this defintion as it uses the momentum in the x and y axis to determine the total transverse momentum which is the vector sum of the two. This sum is found using Pythagorean's theorem as in the above equation.

- eta: Pseudorapidity = -ln(tan(theta/2)) or 1/2ln((mag(P) + Pl)/(mag(P) - Pl))
	* Pseudorapidity is a way of measuring the angle at which secondary particles emerge relative to the longitudinal axis of the collision of two particles. In this case it is the angle with respect to the axis of the colliding beams in the LHC and is 0 for angles perpendicular to the collision. Expressed above are two equivalent ways of finding pseudorapidity either by taking the angle of the secondary particle or the momentum of the particle

- phi: Azimuthal angle = cos^-1(x/r)
	* The azimuthal angle is the horizontal angle from the origin to the point of interest. For example if a disk is in the xy plane its aximuthal angle is along the z axis. In the supplied equation we wil be treatingthe azimuthal angle with respect to the x and y plane which rotates along the z-axis.

- mass: Invariant mass = E^2 = P^2 + m^2
	* Invariant mass is the total mass of a system that is independent of the motion of the system. More specifically, it is the total mass that stays the same through all frames of reference.

- ee2: 2-point ECF ratio
	* Energy correlation functions can be used to probe jet substructures and are based on the energies and pair-wise angles of particles within these jets. 2-point ECF ratio is used for quark/gluon discrimination and is known as 2-point correlators.

- ee3: 3-point ECF ratio
	* ECF is the same as described above and 3-point ECF is in reference to 3-point correlators. These correlators are best used for boosted W/Z/Higgs boson identification.

- d2: 3-to-2 point ECF ratio

- angularity:
	* angularity is the orientation of one feature to another at a given reference angle

- t1: 1-subjettiness
	* N-subjettiness sums the angular distances of N jet constituents to their nearest subjet axis where N is the subjet axis in a jet. At a higher level subjettiness is used to identify boosted hadronic objects such as top quarks. In this case N = 1.

- t2: 2-subjettiness
	* Subjettiness definition maintained from t1. In this case N = 2.

- t3: 3-subjettiness
	* Subjettiness definition maintained from t1. In this case N = 2.

- t21: 21-subjettiness
	* Subjettiness definition maintained from t1. In this case N = 2.

- t32: 32-subjettiness
	* Subjettiness definition maintained from t1. In this case N = 2.

- KtDeltaR: Cluster sequence
	* Delta R of two subjets within the large-R jet

## 1) Download the datasets (signal & background)

In [9]:
# import library
import pickle

# open the file of interest, and use pickle loading
infile = open ("lab5/qcd_100000_pt_250_500.pkl",'rb')
qcd = pickle.load(infile)
infile = open ("lab5/higgs_100000_pt_250_500.pkl",'rb')
higgs = pickle.load(infile)

# list all keys of the files
print(qcd.keys())
print(higgs.keys())
print(len(qcd))

Index(['pt', 'eta', 'phi', 'mass', 'ee2', 'ee3', 'd2', 'angularity', 't1',
       't2', 't3', 't21', 't32', 'KtDeltaR'],
      dtype='object')
Index(['pt', 'eta', 'phi', 'mass', 'ee2', 'ee3', 'd2', 'angularity', 't1',
       't2', 't3', 't21', 't32', 'KtDeltaR'],
      dtype='object')
100000


## 2) Explore training data

### a. Do all features provide discrimination power between signal and background?

### b. Are there correlations among these features?

### c. Compute expected discovery sensitivity by normalizing each sample appropriately.

### d. Develop a plan to optimize the discovery sensitivity.