# Studying the $Z$ boson using data from CMS

## Introduction

The Compact Muon Solenoid (CMS) is a general-purpose detector at the Large Hadron Collider (LHC). It has a broad physics programme ranging from studying the Standard Model (including the Higgs boson) to searching for extra dimensions and particles that could make up dark matter. Although it has the same scientific goals as the ATLAS experiment, it uses different technical solutions and a different magnet-system design.

The CMS detector is built around a huge solenoid magnet. This takes the form of a cylindrical coil of superconducting cable that generates a field of 4 tesla, about 100,000 times the magnetic field of the Earth. The field is confined by a steel “yoke” that forms the bulk of the detector’s 14,000-tonne weight.

The CMS detector is shaped like a cylindrical onion, with several concentric layers of components. The silicon tracker is the innermost layer, surrounded by the electromagnetic and hadronic caloimeters. The superconducting solenoid surrounds the calorimeters and lies within the muon system.

<div style="width: 500px;margin: auto" align="center">
    <img src="1920px-CMS_160312_06.png">
    Figure 1. The CMS Detector
</div>


## Goals

Study the properties of the $Z$ boson using dimuon and dielectron data from CMS. Key properties include the cross-section, mass and width. You will need to estimate the backgrounds to each sample from the data. 

The cross-section can be calculated using the following formula:

$$ \sigma = \frac{S - B}{A \times E \times L}$$

where
* S is the number of signal events
* B is the number of background events
* A is the acceptance
* E is the efficiency
* L is the luminosity

In the dataset that you are analysing you are given the following values
* The luminosity is 20 pb$^{-1}$.
* The acceptance is 40%
* The efficiency is 90% for the dimuons and 75% for the dielectrons

Please note that you could expect to reproduce the cross-section to within 20%.

References: 
* CMS paper: https://arxiv.org/pdf/1107.4789.pdf
* PDG : http://pdg.lbl.gov/2019/listings/rpp2019-list-z-boson.pdf

<div style="width: 500px;margin: auto" align="center">
    <img src="Zmumu.png">
    Invariant mass distributions for the $Z \rightarrow \mu\mu$ process
</div>

<div style="width: 500px;margin: auto" align="center">
    <img src="Zee.png">
    Invariant mass distributions for the $Z \rightarrow ee$ process
</div>

## The Dataset

A set of selection cuts have been applied to the data collected by the CMS experiment. 

In the muon dataset, only events with two muons were kept. The following selection requirements (or cuts) were applied to the muons


* Both muons are "global" muons
* 60 < $M_{\mu\mu} < 120$ GeV
* |$\eta$| < 2.1 for both muons
* $p_{t}$ > 20 GeV

These requirements ensure that the muons are of high quality and that there is little background. The luminosity of the dataset corresponds to $2.1$fb$^{-1}$.

In the electron dataset, only events with two electrons were kept. The following selection requirements (or cuts) were applied to the electrons

* 60 < $M_{ee}$ < 120 GeV
* $p_{t}$ > 25 GeV

The following code will read in the data and print out the available variables in the dataset. It can be applied to either the muon or the electron data files.


In [63]:
import pandas as pd #pandas is a convenient tool to analyse the data; however you don't have to use it if you don't want to
import numpy as np
import matplotlib.pyplot as plt

# Jupyter Notebook uses "magic functions". With this function it is possible to plot
# the histogram straight to notebook.
%matplotlib inline

# Create a new DataFrame structure from the file "DoubleMuRun2011A.csv"
dataset = pd.read_csv('Zmumu_Run2011A.csv')
#dataset = pd.read_csv('Zee_Run2011A.csv')

# This will list the available variables in the dataset
for element in dataset:
    print(element)



Run
Event
pt1
eta1
phi1
Q1
type1
sigmaEtaEta1
HoverE1
isoTrack1
isoEcal1
isoHcal1
pt2
eta2
phi2
Q2
type2
sigmaEtaEta2
HoverE2
isoTrack2
isoEcal2
isoHcal2


Each element in the dataset is simply a variable.

For the muons, the variables are as follows:
* Run, Event are the run and event numbers, respectively
* pt is the transverse momentum $p_{t}$ of the muon
* eta is the pseudorapidity of the muon: $\eta$
* phi is the $\phi$ angle of the muon direction
* Q is the charge of the muon
* dxy is the impact parameter in the transverse plane: $d_{xy}$
* iso is the track isolation: $I_{track}$

For the electrons, the variables are as follows:
* Run, Event are the run and event numbers, respectively
* pt is the transverse momentum $p_{t}$ of the electron
* eta is the pseudorapidity of the electron: $\eta$
* phi is the $\phi$ angle of the electron direction
* Q is the charge of the electron
* type is either EB or EE: whether the electron is in the barrel or in the endcap
* sigmaEtaEta is the weighted cluster rms along $\eta$: $\sigma_{\eta\eta}$
* HoverE is the HCAL energy / ECAL energy
* isoTrack is the isolation variable for tracks
* isoEcal is the isolation variable for the ECAL
* isoHcal is the isolation variable for the HCAL

In [41]:
#  Suggested starting point (you'll want to do this for both the muon and electron data)

Become familiar with the contents of the dataset by plotting out different variables.







In [None]:
Calculate the invariant mass of the pair of muons in each event. Plot this mass and identify the $Z$-boson.


In this analysis, there is little background, however, we will check if we should tighten any of our selection cuts. Define a range in the invariant mass around the $Z$ boson where there is signal but very little background. Select the events within that window and plot the properties of the other variables, e.g. the momentum, impact parameter, isolation, etc. Next select the events below and above that window (these we will call the sidebands) and make the same set of plots. Do you observe any significant differences in the distributions between the signal and the background? Determine and apply any additional selection cuts.

Fit the data to determine the properties of the $Z$ boson and estimate the background. Choose appropriate functions for the signal and the background. Fit the data and estimate
* Number of signal events
* Number of background events
* Mass
* Width

# Extensions



Here are some ideas about how you could explore the data further, but you're welcome to take your project in a different direction

* Make some event displays for several of your candidate events.
* Calculate the statistical uncertainty on your cross-section
* Identify the main systematic uncertainties on your cross-section measurement
* Develop a tag-and-probe method to measure the efficiency of your selection cuts
* Divide the electron data into 4 categories depending on whether each of the electrons is located in the barrel and the end-cap. How does the resolution and background rate compare between the categories?