<CENTER>
    <a href="http://opendata.atlas.cern/release/2020/documentation/notebooks/intro.html" class="icons"><img src="./images/ATLASOD.gif" style="width:40%"></a>
</CENTER>

<CENTER><h1>Intro to HEP histograms</h1></CENTER>


This notebook will walk you through the basic computing techniques commonly used in high energy  physics (HEP) analyzes. You will learn how to:

- Interact with HEP data files
- Create a histogram for displaing HEP data
- Fill your histogram
- Draw your histogram
    
    
At the end you get a plot with the number of leptons in each event in the dataset.

### Getting started: What to load

The software we will use to analyse our ATLAS data is called __ROOT__. Using ROOT, we are able to process large datasets, do statistical analyses, and visualise our data (we will accomlish this using __histograms__). ROOT also has its own format for __storing__ data - we'll come back to this later.

In [1]:
#Import the ROOT library
import ROOT

#Here you could also import any other python libraries you would like to use

Welcome to JupyROOT 6.18/04


ROOT lets us interact with the histograms we make through __jsroot__ (JavaScript ROOT). Let's turn this on.


In [2]:
%jsroot on

_(If this gives you an error, don't worry, it could be that your ROOT version is too old - all this means is that your plots will not be interactive)._

### The following analysis is searching for events where [Z bosons](https://en.wikipedia.org/wiki/W_and_Z_bosons) decay to two leptons of same flavour and opposite charge (to be seen for example in the [Feynman diagram](https://en.wikipedia.org/wiki/Feynman_diagram)).

<CENTER><img src="./images/Z_ElectronPositron.png" style="width:30%"></CENTER>

### Working with .root files

Next we have to open the data files that we want to analyze. 

As described above, ROOT has it's own format for storing high energy physics data - a _[something].root_ file. For each event in the dataset we could have many particles, and for each particle, there are several __variables__ we measure (for example energy, momentum, charge). The structure of a _*.root_ file is as follows:


- A _.root_ file stores and keeps track of all this information in a container called a __TTree__. 
- Inside the TTree, each variable that we measure are stored separately in containers called __branches__. 
- Inside each branch, the measurement of that variable for each event is stored.

<CENTER><img src="./images/root_struct.png" style="width:70%"></CENTER>

You can find more information about ROOT's data structure __here__.

Lets load our _*.root_ file using ROOT's `TFile.Open()` function. The __argument__ inside the brackets tells ROOT where to look for the file.

In [3]:
f = ROOT.TFile.Open("https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/1largeRjet1lep/MC/mc_361106.Zee.1largeRjet1lep.root") ## 13 TeV sample
#f = ROOT.TFile.Open("http://opendata.atlas.cern/release/samples/MC/mc_105987.WZ.root") ## 8 TeV sample
#f = ROOT.TFile.Open("/home/student/datasets/MC/mc_105987.WZ.root") ## local file example

_You could uncomment one of the other lines to repeat the analysis we will do below for a different dataset (remember to comment out the top line first)._

Next, to load our data from the _*.root_ file we retrieve the TTree  using the `.Get()` function, which takes the name of the TTree as its argument.

In [4]:
#In this .root file, the TTree is called "mini"
tree = f.Get("mini")

We can see how many events are stored in the tree using the `.GetEntries()` function.

In [5]:
print('Number of events in this tree: %d' %tree.GetEntries())

Number of events in this tree: 53653


### Getting ready to display histograms

We're now almost ready to begin working with our data.

Similarly to a physical drawing, our histogram needs a structure on which to be drawn, so after the data is opened we create a __canvas__ to hold our histogram. If we do not have a canvas we cannot see our histogram at the end. 

We create our canvas using the `ROOT.TCanvas()` function. Its arguments are:

- Its name: `"Canvas"`
- Its title: `"a first way to plot a variable"`. 
- The two following arguments define the width and the height of the canvas (in pixels).

In [6]:
canvas = ROOT.TCanvas("Canvas","a first way to plot a variable",800,600)

Now we define a histogram that will later be placed on this canvas using the `ROOT.TH1F()` function. 

Its name is `"variable"` and the title of the histogram is `"Example plot: Number of leptons"`. The three following arguments indicate that this histogram contains 5 __bins__ which have a range from 0 to 4.

In [7]:
hist = ROOT.TH1F("variable","Example plot: Number of leptons; Number of leptons ; Events ",5,-0.5,4.5)

_(Note the offset of 0.5 in the range arguments. This shifts the bins so they are __centred__ on 0,1,2,3,4 rather than having their leftmost edges on those values, as is the default.)_

### Filling histograms

It's now time to fill our histogram! In this example, we want to display the number of leptons each event in our tree. This information is stored in a branch called `lep_n`.

To do this, we need to loop through all the events in the tree and fill the histogram `hist` (that we already defined) with the value stored in `lep_n` for each event. This is done with the `.Fill()` function which takes the branch name as its argument.

After the program has looped over all the data it prints the word __Done!__.

In later examples, we'll be more particular about the events we put in our histogram, so we'll be skipping some events in the tree if they don't meet our criteria. This is called making __cuts__.

In [8]:
#Example of a loop
for event in tree:
    hist.Fill(tree.lep_n)
    
print("Done!")

Done!


### Drawing histograms

After filling the histogram, we want to see the results of the analysis. ROOT has a very specific order for drawing histograms:

1. Decalre any formatting you would like. In this example, we want to fill the histogram in a solid colour using the `.SetFillColor()` function.
2. Draw the histogram onto the canvas using the `.Draw()` function.
3. Draw the canvas on which the histogram is now "mounted". This is also done with the `.Draw()` function.

In [9]:
hist.SetFillColor(2) #.SetFillColor() takes whole numbers, which represent colours, as arguments
#Try giving .SetFillColor() different numbers as arguments

hist.Draw()
canvas.Draw()

Thanks to `jsroot`, this plot should be interactive - try hovering your mouse over the bins, or zooming in and out.

### Normalising histograms

Often, we are more interested in the __proportions__ of our histogram than the absolute number of events it contains (which can change depending on what dataset you use).

Our final step will be to rescale the y-axis of our histogram to that the histogram's total is equal to 1. This is called __normalisation__.

In [10]:
#Find the number of entries in the histogram
#In this particular case (where no cuts are made), this should be equal to the number of events in the tree

scale = hist.Integral() 
print('Number of entries in histogram: %0.1f' %scale)


#Divide the number of entries in each bin by the total number of entries in the histogram
#This will indicate what fraction of the total is held in each bin

hist.Scale(1/scale)

Number of entries in histogram: 53653.0


Finally, we format and draw the normalised histogram onto our canvas, then draw the canvas as we did above.

In [11]:
hist.SetFillColor(2)
hist.Draw("h")
canvas.Draw("hist")

## Now, it's your turn

Next, we want to load a different data file from the 8TeV dataset and plot the number of jets per event. 

__1)__ Open the _*.root_ data file `"http://opendata.atlas.cern/release/samples/MC/mc_105987.WZ.root"`

_Relevant functions:_ `.TFile.Open()`

In [12]:
f = ROOT.TFile.Open("http://opendata.atlas.cern/release/samples/MC/mc_105987.WZ.root")

__2)__ Create a canvas to display your plot. Name your canvas `"8TeV_Canvas"`, give it the title `"2nd time plotting a variable"` and dimensions `700`x`500` pixels.

_Relevant functions:_ `ROOT.TCanvas()`

In [13]:
canvas = ROOT.TCanvas("8TeV_Canvas","a first way to plot a variable",700,500)

__3)__ Load the tree named "mini" stored in the _.*root_ data file. Print the number of events in this tree.

_Relevant functions:_ `.Get()` `.GetEntries()`

In [14]:
tree = f.Get("mini")
print('Number of events in this tree: %d' %tree.GetEntries())

Number of events in this tree: 500000


__Also__ create variables for the maximum number of jets and the minimum number of jets for a single event in this dataset using:

In [15]:
minimum = int(tree.GetMinimum('jet_n'))
maximum = int(tree.GetMaximum('jet_n'))

Print out the values of `maximum` and `minimum`.

In [16]:
print('Max no. jets: %d' %maximum)
print('Min no. jets: %d' %minimum)

Max no. jets: 8
Min no. jets: 0


__4)__ Set up an (empty for now) histogram, which will contain the number of __jets__ in each event. Name your plot `"8TeV_variable"`, title it `"Example plot: Number of jets"` with axes `"Number of jets"` and `"Events"`. 

This time, instead of explicitly giving the bin range and number, do so in terms of the `maximum` and `minimum`.

_Relevant functions:_ `ROOT.TH1F()`

In [17]:
hist = ROOT.TH1F("variable","Example plot: Number of jets; Number of jets ; Events ",maximum-minimum,minimum-0.5,maximum-0.5)

__5)__ Loop through each event in the tree, filling your histogram with the number of jets (contained in the branch `jet_n`) in each. When the loop is finished, print a message to indicate that it is done.

_Relevant functions:_ `.Fill()`

In [18]:
for event in tree:
    hist.Fill(tree.jet_n)
    
print("Done!")

Done!


__6)__ Draw your histogram onto the canvas, then draw your canvas. Fill your histogram with a color other than red.

_Relevant functions:_ `.SetFillColor()` `.Draw()`

In [19]:
hist.SetFillColor(3)
hist.Draw("h")

canvas.Draw()

__7)__ Normalize your histogram. Redraw it onto the canvas then redraw the canvas.

_Relevant functions:_ `.Integral()` `.Scale()` and functions from __6)__

In [20]:
scale = hist.Integral()

hist.Scale(1/scale)
hist.SetFillColor(4)

hist.Draw("h")
canvas.Draw('same')

### Optional Extra

The `.GetListOfBranches()` function will extract the set of branches from the tree. Looping through each branch in this set, the `.GetName()` function can be used to acess the branch names.

In [21]:
branches = tree.GetListOfBranches()

print('TTree branch names:\n')
for branch in branches:
    print(branch.GetName())

TTree branch names:

runNumber
eventNumber
channelNumber
mcWeight
pvxp_n
vxp_z
scaleFactor_PILEUP
scaleFactor_ELE
scaleFactor_MUON
scaleFactor_BTAG
scaleFactor_TRIGGER
scaleFactor_JVFSF
scaleFactor_ZVERTEX
trigE
trigM
passGRL
hasGoodVertex
lep_n
lep_truthMatched
lep_trigMatched
lep_pt
lep_eta
lep_phi
lep_E
lep_z0
lep_charge
lep_type
lep_flag
lep_ptcone30
lep_etcone20
lep_trackd0pvunbiased
lep_tracksigd0pvunbiased
met_et
met_phi
jet_n
alljet_n
jet_pt
jet_eta
jet_phi
jet_E
jet_m
jet_jvf
jet_trueflav
jet_truthMatched
jet_SV0
jet_MV1


Choose one of these branches and repeat the steps __3-7__ above to show the distribution of that branch's variable over the dataset.