# Getting started with *uproot4*

In this notebook, we will see how ROOT files can be viewed using [uproot4](https://github.com/scikit-hep/uproot4) python package. 

More help on how to use the package can be found in https://uproot.readthedocs.io/en/latest/index.html. Here we will give a few simple examples that will be used during the project.

If needed, install the package using:
```
!python -m pip install uproot4
```


In [None]:
#import uproot
import uproot4

Explore the content of the file:

In [None]:
# relace next line with the path to your file
path='data'
#path='/eos/cms/store/user/jjhollar/CERNSummerStudentProject2021/'
filename=path+'/gammagammaMuMu_FPMC_pT25_PU140_NTUPLE_1_version2.root'

In [None]:
#open the file, using uproot
root_=uproot4.open( filename )

Now inspect the conten of the file:

In [None]:
root_.keys()

In the selected file, we have directory `myana` and a tree in the directory names `mytree`, let's get the tree and see the branches:

In [None]:
tree_ = root_["myana/mytree"]

In [None]:
tree_.show()

A shorter option to read a tree from a file is:
```
tree_ = uproot4.open(filename+":myana/mytree")
```

To see the content of a single variable (we will see the number of verteces in each event):

In [None]:
vtx_size = tree_['vtx_size'].array()
print(vtx_size)

## Convert ROOT to pandas dataframe

export the ROOT file content into pandas dataframe:

The tricky part is that in ROOT files we usually store vectors, and the conversion to dataframes is not always trivial. The simplest way is to store different types of vectors in different dataframes:

In [None]:
#variables_to_save=[tree_.keys()] # this will not work, due to vectors in the dataset
muons_df = tree_.arrays(['muon_size','muon_pt','muon_eta','muon_phi'], library="pd")
genproton_df = tree_.arrays(['genproton_size','genproton_xi','genproton_pz','genproton_vz','genproton_ispu'], library="pd")
floats_df = tree_.arrays(['genvtx_t0'], library="pd")

In [None]:
muons_df.head()

In [None]:
genproton_df.head()

In [None]:
floats_df.head()

### analyze the tree

Now let's do a simple analysis, we will compute invariant of pair of signal protons, and compare it to the invariant mass of two muons with highest $p_T$:

Muon kinematics are vectors, and the easy way is to process tree, and create floats for each event and store them in the dataframe.

In [None]:
import pandas as pd
import numpy as np

create new arrays:

In [None]:
n_events = len(floats_df)
mpp=np.zeros(n_events)
mll=np.zeros(n_events)

read proton content from the file

In [None]:
N_protons=tree_['genproton_size'].array()
genproton_xi=tree_['genproton_xi'].array()
genproton_pz=tree_['genproton_pz'].array()
genproton_vz=tree_['genproton_vz'].array()
genproton_ispu=tree_['genproton_ispu'].array()

compute mass of two signal protons

In [None]:
for ev, nprotons in enumerate(N_protons):
    xi_pos=0; xi_neg=0; n_protons=0
    for i in range(nprotons):
        if genproton_ispu[ev][i]: continue
        n_protons=n_protons+1
        if genproton_pz[ev][i] > 0: xi_pos=genproton_xi[ev][i]
        else: xi_neg=genproton_xi[ev][i]
    if n_protons != 2: 
        print('Error, find '+str(n_protons)+' signal protons, skip')
        mpp[ev]=-1.
    mpp[ev]=14000.*np.sqrt(xi_pos*xi_neg)


read muon content from the file:

In [None]:
N_muons=tree_['muon_size'].array()
mu_pt=tree_['muon_pt'].array()
mu_eta=tree_['muon_eta'].array()
mu_phi=tree_['muon_phi'].array()

compute invariant mass of peir of leptons:

In [None]:
for ev, nmuons in enumerate(N_muons):
    mu1_pt=0; mu1_eta=0; mu1_phi=0
    mu2_pt=0; mu2_eta=0; mu2_phi=0
    for i in range(nmuons):
        if mu_pt[ev][i]>mu1_pt:
            mu2_pt=mu1_pt
            mu2_eta=mu1_eta
            mu2_phi=mu1_phi
            
            mu1_pt=mu_pt[ev][i]
            mu1_eta=mu_eta[ev][i]
            mu1_phi=mu_phi[ev][i]
        elif mu_pt[ev][i]>mu2_pt:
            mu2_pt=mu_pt[ev][i]
            mu2_eta=mu_eta[ev][i]
            mu2_phi=mu_phi[ev][i]            
    
    # compute invariant mass of lepton pair
    sumE=mu1_pt*np.cosh(mu1_eta)+mu2_pt*np.cosh(mu2_eta)
    sumPx=mu1_pt*np.cos(mu1_phi)+mu2_pt*np.cos(mu2_phi)
    sumPy=mu1_pt*np.sin(mu1_phi)+mu2_pt*np.sin(mu2_phi)
    sumPz=mu1_pt*np.sinh(mu1_eta)+mu2_pt*np.sinh(mu2_eta)
    
    
    mll2 = sumE**2 - sumPx**2 - sumPy**2 - sumPz**2
    mll[ev]=np.sqrt(mll2)

    

To end this exersize, plot the correlation between $m_{ll}$ and $m_{pp}$

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Fixing random state for reproducibility
np.random.seed(19680801)

plt.scatter(mpp, mll)
plt.xlabel("mpp")
plt.ylabel("mll")
plt.show()