# Exploring tautau signal files with coffea

In this notebook we explore how to make selections on our signal file and match the reconstructed objects (jets, electrons, muons, etc) to our generator level information (from the simulation).

We first download one file from a simulated signal dataset (Higgs to tau-tau):

```
mkdir data/
scp -r cmslpc-sl7.fnal.gov:/eos/uscms/store/user/lpcdihiggsboost/cmantill/PFNano/2017_UL_ak15/GluGluHToTauTau_M125_TuneCP5_13TeV-powheg-pythia8/RunIISummer20UL17Jun23-106X_mc2017_realistic_v9-v3/210629_192831/0000/nano_mc2017_93.root data/nano_mc2017_93.root 
```

Then, we import some libraries:

In [1]:
import awkward as ak
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
from coffea.nanoevents.methods import candidate, vector

# we suppress ROOT warnings where our input ROOT tree has duplicate branches - these are handled correctly.
import warnings
warnings.filterwarnings("ignore", message="Found duplicate branch ")

And we open the file with the coffea `NanoEventsFactory`:

In [2]:
fname = "data/nano_mc2017_93.root"
events = NanoEventsFactory.from_root(fname, schemaclass=NanoAODSchema, entry_stop=10000).events()

Now, let's define some requirements on the leptons:

In [3]:
# leptons
goodmuon = (
    (events.Muon.pt > 25)
    & (abs(events.Muon.eta) < 2.4)
    & events.Muon.mediumId
)
nmuons = ak.sum(goodmuon, axis=1)
lowptmuon = (
    (events.Muon.pt > 10)
    & (abs(events.Muon.eta) < 2.4)
    & events.Muon.looseId
)
nlowptmuons = ak.sum(lowptmuon, axis=1)
            
goodelectron = (
    (events.Electron.pt > 25)
    & (abs(events.Electron.eta) < 2.5)
    & (events.Electron.mvaFall17V2noIso_WP80)
)
nelectrons = ak.sum(goodelectron, axis=1)
lowptelectron = (
    (events.Electron.pt > 10)
    & (abs(events.Electron.eta) < 2.5)
    & (events.Electron.cutBased >= events.Electron.LOOSE)
)
nlowptelectrons = ak.sum(lowptelectron, axis=1)

# since events can have more than one lepton (e.g. one high-pT electron and another lower pT muon), 
# we concatenate electrons and muons in the same array and select the higher-pT lepton (with ak.firsts)
goodleptons = ak.concatenate([events.Muon[goodmuon], events.Electron[goodelectron]], axis=1)
candidatelep = ak.firsts(goodleptons[ak.argsort(goodleptons.pt)])

# when we concatenate we lose the vector properties, so let's build another vector for our candidate lepton
candidatelep_p4 = ak.zip(
    {
        "pt": candidatelep.pt,
        "eta": candidatelep.eta,
        "phi": candidatelep.phi,
        "mass": candidatelep.mass,
        "charge": candidatelep.charge,
    },
    with_name="PtEtaPhiMCandidate",
    behavior=candidate.behavior,
)

Now, let's take a simple look at the fat-jets (large radius jets). 

Here, we define a variable `jet_arbitration` which will be used later. This indicates how we order and select our jet. 

Usually we want the highest pT jet in the collection (`pT` arbitration). However, for our HWW signal, maybe is useful to look at the jet closest to the lepton (`lep` arbitration) or at the jet closest to the missing energy in the event - representing the energy carried away by the neutrino (`met` arbitration).

In [4]:
jet_arbitration = 'met'

# let's define a collection of jets (with a pT threshold of 200 GeV)
fatjets = events.FatJet
candidatefj = fatjets[
    (fatjets.pt > 200)
]

# we take the Missing Transverse Energy MET from the event, and define the angular distance (delta_phi) of the MET with the jets
met = events.MET
dphi_met_fj = abs(candidatefj.delta_phi(met))

# here we define the angular distance (or maybe conical distance - in R) of the jets with the candidate lepton we chose above
dr_lep_fj = candidatefj.delta_r(candidatelep_p4)

# then we take the first jet (in something: pT, dR. w lepton, dphi w. MET...)
# we make use of the function ak.argmin that returns the index of the object that satistifies the minimum value (could also use ak.argmax)
# once we get that index, we use it to select our candidate jet
# (keepDims = True helps to keep the arrays dimensions)
if jet_arbitration == 'pt':
    candidatefj = ak.firsts(candidatefj)
elif jet_arbitration == 'met':
    candidatefj = ak.firsts(candidatefj[ak.argmin(dphi_met_fj,axis=1,keepdims=True)])
elif jet_arbitration == 'lep':
    candidatefj = ak.firsts(candidatefj[ak.argmin(dr_lep_fj,axis=1,keepdims=True)])
else:
    raise RuntimeError("Unknown candidate jet arbitration")

Note, that some of these variables will be `None` for many events, i.e. what if an event does not have jets with high pT (which happens a lot in this dataset) or if an event does not have any leptons inside.

To track that we are doing the correct thing we can take a look at one event by selecting that event in brackets (events are alwasy in the first axis of an array), i.e.:

In [5]:
# first we print the fatjets pt (note that here is the whole collection)
fatjets.pt

<Array [[], [], [], [], ... [], [214, 197], []] type='10000 * var * float32[para...'>

In [6]:
# you notice that for many of those events the array is empty..
# then we choose event 234 (which we know it has a fatjet - to see this you can just print the uproot tree above and see the columns)
evtid = 3767
fatjets.pt[evtid]

<Array [736, 376] type='2 * float32[parameters={"__doc__": "pt"}]'>

In [7]:
# we have two fatjets apparently, now we can see which of those our jet arbitration chose:
candidatefj.pt[evtid]

376.25

In [8]:
# it seems to have chosen the jet with higher pT here
# let's print for curiosity the values of the dR of the lep and fatjets and the dPhi of the MET and fat jets
# note, that if we have two things to print we better use the print function
print(dphi_met_fj[evtid])
print(dr_lep_fj[evtid])

[2.71, 0.437]
None


Now, let's play with another collection in the event, the `GenParticles`.
These can be obtained with the `events.GenPart` collection.
First, we need a function that will select given particles according to their particle ID (pdgID), or the flags of the process:

In [9]:
# this function will return us the particle objects that satisfy a low id, high id condition and have certain flags
def getParticles(genparticles,lowid=22,highid=25,flags=['fromHardProcess', 'isLastCopy']):
    absid = abs(genparticles.pdgId)
    return genparticles[
        ((absid >= lowid) & (absid <= highid))
        & genparticles.hasFlags(flags)
    ]

In [10]:
# first let's get all the higgs bosons in the event (pdgID=25)
higgs = getParticles(events.GenPart,25)

# make a mask to select all Higgs bosons that decay into taus (pdgID=15) by looking at the children.pdgId
is_htt = ak.all(abs(higgs.children.pdgId)==15,axis=2)

# now let's select our higgs to be all tau-tau decays
higgs = higgs[is_htt]

  entrypoints.init_all()


In [11]:
# now let's look at it's children
# we will have two tau-s as children
print(higgs.children.pdgId)

[[[15, -15]], [[15, -15]], [[15, -15]], ... [[15, -15]], [[15, -15]], [[15, -15]]]


In [12]:
# let's look at electrons, muons coming from taus
fromtau_electron = getParticles(events.GenPart,11,11,['isDirectTauDecayProduct'])
fromtau_muon = getParticles(events.GenPart,13,13,['isDirectTauDecayProduct'])

# now we count the number of gen particles 
n_electrons_fromtaus = ak.sum(fromtau_electron.pt>0,axis=1)
n_muons_fromtaus = ak.sum(fromtau_muon.pt>0,axis=1)

We can take a look at what these variables look like for our event:

In [13]:
print('n electrons ',n_electrons_fromtaus[evtid])
print('n muons ',n_muons_fromtaus[evtid])

n electrons  0
n muons  0


In [14]:
# now let's pick the visible gen taus
tau_visible = events.GenVisTau
n_visibletaus = ak.sum(tau_visible.pt>0,axis=1)
print('n visible taus',n_visibletaus[evtid])

n visible taus 2


In [16]:
# we can distinguish the tau-tau decay by:
# 1: had - had: 2 visible taus, 0 electrons 0 muons
# 8: ele - had: 1 visible tau, 1 electron 0 muons
# 10: muon - had: 1 visible tau, 1 muon 0 electrons

htt_flavor = (n_visibletaus==2)*1 + (n_visibletaus==1)*3 + (n_electrons_fromtaus==1)*5 & (n_muons_fromtaus==1)*7

In [17]:
htt_flavor[evtid]

1

Now, we also need to guarantee that both of the taus are inside of the jet cone

In [18]:
# since our jet has a cone size of 0.8, we use 0.8 as a dR threshold
matchedH = candidatefj.nearest(higgs, axis=1, threshold=0.8)

dr_fj_visibletaus = candidatefj.delta_r(tau_visible)
dr_fj_electrons = candidatefj.delta_r(fromtau_electron)
dr_fj_muons = candidatefj.delta_r(fromtau_muon)

dr_daughters = ak.concatenate([dr_fj_visibletaus,dr_fj_electrons,dr_fj_muons],axis=1)

Let's see how these arrays look for our event:

In [19]:
print('dr taus ',dr_fj_visibletaus[evtid])
print('dr electrons ',dr_fj_electrons[evtid])
print('dr muons ',dr_fj_muons[evtid])
print('dr daus ',dr_daughters[evtid])
print('number of daus matched ',ak.sum(dr_daughters<0.8,axis=1)[evtid])

dr taus  [0.219, 0.642]
dr electrons  []
dr muons  []
dr daus  [0.219, 0.642]
number of daus matched  2


Finally let's define a last matching condition indicating the number of visible daughters it is matched to:

In [21]:
htt_nprongs = ak.sum(dr_daughters<0.8,axis=1)

In [22]:
htt_nprongs[evtid]

2

Now let's define a function you can use in the processor

In [24]:
def match_Htt(genparticles,candidatefj):
    higgs = getParticles(genparticles,25)
    is_htt = ak.all(abs(higgs.children.pdgId)==15,axis=2)

    higgs = higgs[is_htt]
    
    fromtau_electron = getParticles(events.GenPart,11,11,['isDirectTauDecayProduct'])
    fromtau_muon = getParticles(events.GenPart,13,13,['isDirectTauDecayProduct'])
    tau_visible = events.GenVisTau
    
    n_visibletaus = ak.sum(tau_visible.pt>0,axis=1)
    n_electrons_fromtaus = ak.sum(fromtau_electron.pt>0,axis=1)
    n_muons_fromtaus = ak.sum(fromtau_muon.pt>0,axis=1)
    # 3(elenuqq),6(munuqq),8(taunuqq)
    htt_flavor = (n_quarks==2)*1 + (n_electrons==1)*3 + (n_muons==1)*5 + (n_taus==1)*7

    matchedH = candidatefj.nearest(higgs, axis=1, threshold=0.8)
    dr_fj_visibletaus = candidatefj.delta_r(tau_visible)
    dr_fj_electrons = candidatefj.delta_r(fromtau_electron)
    dr_fj_muons = candidatefj.delta_r(fromtau_muon)
    dr_daughters = ak.concatenate([dr_fj_visibletaus,dr_fj_electrons,dr_fj_muons],axis=1)
    # 1 (H only), 4 (H and one tau/electron or muon from tau), 5 (H and 2 taus/ele)
    htt_matched = (ak.sum(matchedH.pt>0,axis=1)==1)*1 + (ak.sum(dr_daughters<0.8,axis=1)==1)*3 + (ak.sum(dr_daughters<0.8,axis=1)==2)*5 
    
    return htt_flavor,htt_matched