# 1- Introduction

In this notebook, we will demonstrate how to use the akward table and how to build selector efficiently using lambda's in python

In [2]:
import uproot
import awkward as ak
import numpy as np
import math
import hist
import matplotlib.pyplot as plt
import os
import subprocess
import vector
import gc

First we need to identify the akward table as vectors

In [3]:
vector.register_awkward() 

We then collect the data (either MC or real data)

In [4]:
DATATYPE="mc"
assert((DATATYPE=="mc") or (DATATYPE=="data"))
BASEDIR="/pbs/throng/training/nantes-m2-rps-exp/data" # basedir where to look for runXXX.DATATYPE.root files
IS_MC=True if DATATYPE=="mc" else False

In [5]:
def data_file_path(run,is_mc=IS_MC,dest=BASEDIR):
    datatype="mc" if is_mc else "data"
    print({dest},"/run",{run},".",{datatype},".root")
    return f"{dest}/run{run}.{datatype}.root"

In [6]:
SAMPLE_RUNS=[291694,291399]

Let's now open the root file and print the content

In [7]:
file = uproot.open(data_file_path(SAMPLE_RUNS[0],IS_MC))
events = file["eventsTree"]
eventsGen = file["genTree"]
events.show()
eventsGen.show()

{'/pbs/throng/training/nantes-m2-rps-exp/data'} /run {291694} . {'mc'} .root
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
runNumber            | int32_t                  | AsDtype('>i4')
xVtx                 | double                   | AsDtype('>f8')
yVtx                 | double                   | AsDtype('>f8')
zVtx                 | double                   | AsDtype('>f8')
isCINT               | bool                     | AsDtype('bool')
isCMSL               | bool                     | AsDtype('bool')
isCMSH               | bool                     | AsDtype('bool')
isCMLL               | bool                     | AsDtype('bool')
isCMUL               | bool                     | AsDtype('bool')
nMuons               | int32_t                  | AsDtype('>i4')
Muon_E               | std::vector<float>       | AsJagged(AsDtype('>f4'), he...
Muon_Px              | st

We will just print here the number of entries (events) in the file for the record

In [8]:
print(events.num_entries)
print(eventsGen.num_entries)

40000
40000


# 2- 4-momentum vector with akward and selectors

It is possible to build akward table as a 4-momentum vector. To do that, you need to use the "zip" method whith px, py, pz and E as its 4 first members. You also need to call this table with the following specific name "Momentum4D"

This allow you in the following to get the kinematics of your tracks quite easily by calling:
- .p, .px, .py, .pz, .pt, ... for the momentum
- .eta for the pseudo-rapidity
- ...

Indeed, you can also add more information to the akward table after defining the first 4 members of the table (as you can see below)

We are here implementing a "getTracks" function that will be use in the following to select tracks. See here:

In [9]:
def getTracks(events):
    return ak.zip({"px":events["Muon_Px"],
                       "py":events["Muon_Py"],
                       "pz":events["Muon_Pz"],
                       "E":events["Muon_E"],
                       "charge":events["Muon_Charge"],
                       "thetaAbs":events["Muon_thetaAbs"],
                       "matched":events["Muon_matchedTrgThreshold"],
                       "n":events["nMuons"]},
                    with_name='Momentum4D')

def getTracksGen(events):
    return ak.zip({"px":events["Muon_GenPx"],
                    "py":events["Muon_GenPy"],
                    "pz":events["Muon_GenPz"],
                    "E":events["Muon_GenE"],
                    "n":events["nMuonsGen"],
                    "label":events["Muon_GenLabel"],
                    "mother":events["Muon_GenMotherPDGCode"]},
                    with_name='Momentum4D')

Now we will design a function to run over the data with uproot. 
In the iterate, we define all the varaible we want to access for each events

We first select good events and track using lambdas in python. What is a good event or a good track will be define later when we call the function "scan"

In [10]:
#print(dir(vector.backends.awkward.MomentumArray4D))

In [11]:
import pandas
l1 = pandas.read_csv('../data/counters.offline.csv')['run']
l2 = [int(e) for e in os.popen("ls /pbs/throng/training/nantes-m2-rps-exp/data/ | grep 'mc' | cut -c 4-9").read().split('\n')[0:-2]]
#list(set(l1) & set(l2))

In [12]:
def invariant_mass(part1,part2):
    ''' Not used anymore '''
    px = part1.px+part2.px
    py = part1.py+part2.py
    pz = part1.pz+part2.pz 
    E1 = part1.e
    E2 = part2.e
    return np.sqrt((E1+E2)**2-(px**2+py**2+pz**2))

def Momentum4D(events):
    return ak.zip({
        "px": events["0"].px + events["1"].px,
        "py": events["0"].py + events["1"].py,
        "pz": events["0"].pz + events["1"].pz,
        "E" : events["0"].E  + events["1"].E},
        with_name="Momentum4D")

In [13]:
def scanGen(dataDescription, 
              #hMag:hist.Hist, hPhi:hist.Hist, hMinv:hist.Hist,
              eventSelector=lambda x:[True]*len(x),
              trackSelector=lambda x:[True]*len(x), 
              verbose:bool=False):
    """ Loop over data to fill the invariant mass histogram.
        
        :param: dataDescription: is anything uproot.iterate can take.
                typical something like run*.data.root:eventsTree in our case
        :param: eventSelector: returns an array of bool from an array of events
        :param: trackSelector: returns an array of bool from an array of tracks
    """
    
    for batch in uproot.iterate(dataDescription,
                                ["nMuonsGen","Muon_GenPx","Muon_GenPy","Muon_GenPz","Muon_GenE","Muon_GenLabel","Muon_GenMotherPDGCode"],                                
                                 report=True):

        events=batch[0] # batch[1] is the report info
        if len(events) < 1000:
            print("something is wrong",batch[1]) # this is a protection for some corrupted input data files 
            break
            
        goodEvents = events[eventSelector(events)] 
        tracks = getTracksGen(events)
        goodTracks=tracks[trackSelector(tracks)]
        t = goodTracks  # notation
         
        # JPsi 443
        
        #print(t)
        
        NgenJPsi = ak.sum(1*t.mother==443)/2
        NgenJPsiN = NgenJPsi/len(t)
        
        if verbose:
            print(batch[1])
        gc.collect()

        return NgenJPsiN



In [14]:
def scan(dataDescription,
            fminv,
            eventSelector=lambda x:[True]*len(x),
            trackSelector=lambda x:[True]*len(x),
            pairSelector=lambda x:[True]*len(x),
            verbose:bool=False):
    """ Loop over data to fill the invariant mass histogram.
        
        :param: dataDescription: is anything uproot.iterate can take.
                typical something like run*.data.root:eventsTree in our case
        :param: eventSelector: returns an array of bool from an array of events
        :param: trackSelector: returns an array of bool from an array of tracks
    """
    

    for batch in uproot.iterate(dataDescription,
                                ["isCINT","isCMUL","isCMSL","nMuons","Muon_Px","Muon_Py","Muon_Pz","Muon_E","Muon_Charge","Muon_thetaAbs","Muon_matchedTrgThreshold"],
                                report=True):
    #for batchGen, batchEvents in zip(uproot.iterate(dataDescription+"genTree",
    #                                                ["isCINT","isCMUL","isCMSL","nMuons","Muon_Px","Muon_Py","Muon_Pz","Muon_E","Muon_Charge","Muon_thetaAbs","Muon_matchedTrgThreshold"],
    #                                                report=True),
    #                                 uproot.iterate(dataDescription+"eventsTree",
    #                                                ["nMuonsGen","Muon_GenPx","Muon_GenPy","Muon_GenPz","Muon_GenE","Muon_GenLabel","Muon_GenMotherPDGCode"],
    #                                                report=True)
                                    
        events=batch[0] # batch[1] is the report info
        if len(events) < 1000:
            print("something is wrong",batch[1]) # this is a protection for some corrupted input data files 
            break
        
        
        goodEvents = events[eventSelector(events)] 
        tracks = getTracks(events)
        
        goodTracks=tracks[trackSelector(tracks)]
        t = goodTracks  # notation
         
        #hMag.fill(ak.flatten(t.p))
        #hPhi.fill(ak.flatten(t.phi))

        # Keep events with n>=2
        n = ak.num(t.charge,axis=1)
        tsel = t[n==2]
        
        # Combinations & Keep opposite charges only 
        C = ak.combinations(tsel,2)
        C_os = C[(C["0"].charge+C["1"].charge)==0]
                
        P = Momentum4D(C_os)
        
        # Compute invariant mass
        
        #minv = invariant_mass(C_os["0"],C_os["1"])
        minv = P[pairSelector(P)].mass
        #minv = P.mass
        #print(ak.flatten(minv).to_numpy())
        
        # Save all minv
        np.save(fminv,ak.flatten(minv).to_numpy())
              
        
        
        if verbose:
            print(batch[1])
        gc.collect()


In [15]:
%%time
## SINGLE MUON TRACK PLOTS
#No cuts

#scan(dataDescription=f"{BASEDIR}/run*.{DATATYPE}.root:eventsTree",
#          hMag=vhMagRaw, hPhi=vhPhiRaw)

if 0:
    #runid = "2904[1-3]*"
    runid = "*"

    NgenN = scanGen(dataDescription=f"{BASEDIR}/run{runid}.mc.root:genTree",
              #hMag=vhMagEvSel, hPhi=vhPhiEvSel, hMinv=vhMinvEvSel,
              #eventSelector=lambda x: x["isCMUL"]==True,
              #trackSelector=lambda x: (x.p>5) & (x.eta>-4),
             verbose=True
        )

    os.system("> minv.npy")
    f = open("minv.npy","ab")
    scan(dataDescription=f"{BASEDIR}/run{runid}.data.root:eventsTree",
            fminv=f,
            eventSelector=lambda x: x["isCMUL"]==True,
            verbose=True
        )
    f.close()
    
#print(NrawN/NgenN)

CPU times: user 4 µs, sys: 1e+03 ns, total: 5 µs
Wall time: 8.58 µs


# Cuts

In [20]:
%%time
runid = "{290323,290327,290848,291361,291360,291362,290853,290860,291373,290374,290375,291399,291400,290894,290895,290404,291943,291944,291948,291953,290932,290423,291447,290935,290425,290427,291451,291453,291976,291982,290456,290458,290459,291482,291485,290975,290980,290469,292012,291002,291003,291004,291005,290501,292040,292060,292061,292062,291041,290539,290540,292075,292077,292080,290549,290553,291590,292106,292108,292109,292115,290590,291618,291622,291624,292140,290612,292160,292162,292163,292164,292166,290632,291657,292168,292192,290658,290660,291690,291692,291694,291698,291706,290687,290692,290696,290699,292242,292265,291755,292269,292270,291760,292273,292274,290742,291769,291263,290764,290766,291283,291284,291285,291795,291796,290776,291803,290787}"
#runid = "{290323,290327,290848,291361,291360,291362,290853,290860,291373,290374,290375,291399,291400,290894,290895,290404,291943,291944,291948,291953}"
#runid = "290853"
os.system(f"ls {BASEDIR}/run{runid}.data.root | wc -l")
if 1:
    pTcuts = [0, 1, 2, 3, 4, 5, 6, 8]
    pTcuts = [1,2]
    #pTcuts = [0, 1000]
    for i in range(len(pTcuts)-1):
        pTl = pTcuts[i]
        pTu = pTcuts[i+1]
        suffix = f"pT{pTl}-{pTu}.npy"
        fiddata = "mass/data."+suffix
        fidmc = "mass/mc."+suffix
        os.system("> "+fiddata)
        os.system("> "+fidmc)
        fdata = open(fiddata,"ab")
        fmc = open(fidmc,"ab")
        cuttrack = lambda x: (x.thetaAbs < 10) & (x.thetaAbs > 2) & (x.pt > 0.5) & (x.eta < -2.5) & (x.eta > -4)
        cutpair = lambda x: (pTl<x.pt)&(x.pt<pTu)
        scan(dataDescription=f"{BASEDIR}/run{runid}.data.root:eventsTree",
                fminv=fdata,
                eventSelector=lambda x: x["isCMUL"]==True,
                trackSelector=cuttrack,
                pairSelector=cutpair,
                verbose=True
            )
        scan(dataDescription=f"{BASEDIR}/run{runid}.mc.root:eventsTree",
                fminv=fmc,
                eventSelector=lambda x: x["isCMUL"]==True,
                trackSelector=cuttrack,
                pairSelector=cutpair,
                verbose=True
            )
        fdata.close()
        fmc.close()


108
<Report start=0 stop=2139140 source='/pbs/throng/training/nantes-m2-rps-exp/data/run290323.data.root:/eventsTree;1'>
<Report start=2139140 stop=2329535 source='/pbs/throng/training/nantes-m2-rps-exp/data/run290327.data.root:/eventsTree;1'>
<Report start=2329535 stop=4902978 source='/pbs/throng/training/nantes-m2-rps-exp/data/run290848.data.root:/eventsTree;1'>
<Report start=4902978 stop=5125248 source='/pbs/throng/training/nantes-m2-rps-exp/data/run290848.data.root:/eventsTree;1'>
<Report start=5125248 stop=5682939 source='/pbs/throng/training/nantes-m2-rps-exp/data/run291361.data.root:/eventsTree;1'>
<Report start=5682939 stop=8151330 source='/pbs/throng/training/nantes-m2-rps-exp/data/run291360.data.root:/eventsTree;1'>
<Report start=8151330 stop=8740215 source='/pbs/throng/training/nantes-m2-rps-exp/data/run291360.data.root:/eventsTree;1'>
<Report start=8740215 stop=10927537 source='/pbs/throng/training/nantes-m2-rps-exp/data/run291362.data.root:/eventsTree;1'>
<Report start=109