# Event Reconstruction

In this notebook, we will take some output files from `MLTree` as input, and use its cluster trees -- trees containing lists of topo-clusters-- to produce a `ROOT` file containing an "event tree".

Note that the `MLTree` utility can already produce an event tree, which provides indices to reference the cluster tree. However, this may not be so helpful if combining multiple outputs from `MLTree`. This is because the events in each cluster tree are grouped by event (sequential *within* an event number), but the order of events is random and may be unique between outputs (with multiple output files generated for one run, corresponding with different species of particles or whatnot).

Our event tree will be a literal regrouping of the cluster tree(s) data, so that each entry in our event tree will provide a set of all clusters from that event. This will allow for quick reading and event analysis.

#### 1) Setup

First, let's import a bunch of packages we know we'll need right off-the-bat.

Note that as we've set up our environment with `conda`, our `ROOT` installation has all the bells and whistles. This includes the `pythia8` library and its associated `ROOT` wrapper, `TPythia8`. We can optionally use this for jet-clustering, as it comes `fj-core`.
Alternatively we could use the Pythonic interface for `fastjet` or [pyjet](https://github.com/scikit-hep/pyjet), but the latter requires linking an external fastjet build for speed and this doesn't seem to work when following their documentation.

In [1]:
import numpy as np
import ROOT as rt
import sys

Welcome to JupyROOT 6.22/02


In [2]:
# some extra setup
path_prefix = '/workspace/LCStudies/'

#### 2) Fetching the data

Now we get our data. For now, our classifiers are being trained to distinguish between $\pi^+$ and $\pi^0$. Assuming that all charged pions behave the same way, we can really treat this as a $\pi^\pm$ vs. $\pi^0$ classifier. **For our toy workflow, we'll say that we only want to cluster $\pi^\pm$ topo-clusters into jets.** We will treat $\pi^0$ as a background.

For our input data, we have `ROOT` files containing a tree called `ClusterTree`. In each tree, each entry corresponds with one topo-cluster, and the different files correspond with different topo-cluster parent particles (e.g. $3$ files for $\pi^+$,$\pi^-$ and $\pi^0$). Each topo-cluster entry contains information on the event from which it came ("runNumber" and "eventNumber"), and many topo-clusters (within and across files) share the same event. Our ultimate goal is to regroup this data into one file where each entry corresponds with one *event*. This is a sensible way to arrange the data before performing any jet clustering (which is performed by event), and writing to a file will allow us to skip this whole process after doing it once.

In [3]:
#TODO: Some of this meta-data is unused.
# ----- Meta-data for our dataset -----
layers = ["EMB1", "EMB2", "EMB3", "TileBar0", "TileBar1", "TileBar2"]
nlayers = len(layers)
cell_size_phi = [0.098, 0.0245, 0.0245, 0.1, 0.1, 0.1]
cell_size_eta = [0.0031, 0.025, 0.05, 0.1, 0.1, 0.2]
len_phi = [4, 16, 16, 4, 4, 4]
len_eta = [128, 16, 8, 4, 4, 2]
assert(len(len_phi) == nlayers)
assert(len(len_eta) == nlayers)
meta_data = {
    layers[i]:{
        'cell_size':(cell_size_eta[i],cell_size_phi[i]),
        'dimensions':(len_eta[i],len_phi[i])
    }
    for i in range(nlayers)
}

#### 3) Saving signal flags to clusters

We will add a signal flag for each cluster -- what we ultimately consider "signal" and "background" depends on the task at hand, and we might want some more complicated labeling (e.g. particle ID).

In [4]:
import glob
data_dir = '/workspace/LCStudies/data'
data_files = glob.glob(data_dir + '/*.root')
data_files = {x.split('/')[-1].replace('.root',''):x for x in data_files}
tree_name = 'ClusterTree'

sig_definition = {'signal':['piminus','piplus'],'background':['pi0']} # defining signal and background

if(path_prefix not in sys.path): sys.path.append(path_prefix)
from  util import qol_util as qu
from pathlib import Path
jet_data_dir = path_prefix + 'jets/data'
Path(jet_data_dir).mkdir(parents=True, exist_ok=True)

# Get our original data files.
files = {key:rt.TFile(file,'READ') for key, file in data_files.items()}
trees = {key:file.Get(tree_name) for key, file in files.items()}

# Now we want to effectively add some new columns. We accomplish this with "friend" trees.
# We're not actually making these trees friends yet. Instead we will form TChains and friend those.

# Creating our branch buffer.
data = {
    'signal':np.zeros(1,dtype=np.dtype('i2')),
    'file_index'   :np.zeros(1,dtype=np.dtype('i2')) # TODO: Is this needed any longer?
}

friend_tree_name = tree_name + '_friend'
friend_data_files = {}

file_index = 0
stride = 2000

for key in sorted(trees.keys()):
    
    friend_filename = data_files[key].split('/')[-1]
    friend_filename = jet_data_dir + '/' + friend_filename
    friend_file = rt.TFile(friend_filename,'RECREATE')
    friend_data_files[key] = friend_filename
    
    friend_tree = rt.TTree(friend_tree_name,friend_tree_name)
    branches = {}

    # --- Setup the branches using our buffer. This is a rather general/flexible code block. ---
    for bname, val in data.items():
        descriptor = bname
        bshape = val.shape
        if(bshape != (1,)):
            for i in range(len(bshape)):
                descriptor += '[' + str(bshape[i]) + ']'
        descriptor += '/'
        if(val.dtype == np.dtype('i2')): descriptor += 'S'
        elif(val.dtype == np.dtype('i4')): descriptor += 'I'
        elif(val.dtype == np.dtype('i8')): descriptor += 'L'
        elif(val.dtype == np.dtype('f4')): descriptor += 'F'
        elif(val.dtype == np.dtype('f8')): descriptor += 'D'
        else:
            print('Warning, setup issue for branch: ', key, '. Skipping.')
            continue
        branches[key] = friend_tree.Branch(bname,val,descriptor)
    # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    
    # Now we fill the friend tree.
    nentries = trees[key].GetEntries()

    # Signal flag and file index will be constant per file since input trees are divided by particle identity.
    sig = 0
    if(key in sig_definition['signal']): sig = 1
    
    prefix = 'Filling friend tree for ' + key + ':'
    if(len(prefix) < 32): prefix = prefix + ' ' * (32 - len(prefix))
    
    for i in range(nentries):
        data['signal'][0] = sig
        data['file_index'][0] = file_index
        friend_tree.Fill()
    
    friend_tree.Write()
    friend_file.Close()
    file_index += 1

#### 4) Merging topo-cluster data, writing to a file and creating an eventNumber index

Our trees containing the signal flag and network scores are now saved to disk. Now let's make `TChain`s and make them friends, so that we've effectively tacked on new columns to our original data. We will then save these `TChain`s as a `TTree` to an uncompressed file, and use the `TTreeIndex` functionality to sort our events. We do this conversion & saving because it appears to greatly speed up our reading when using the `TTreeIndex`. I assume this has to do something with the entries -- from both the main chain and its friend -- all being saved in the same file as opposed to being scattered across multiple ones. ([See here](https://root-forum.cern.ch/t/usage-of-tchainindex/19074/4) for a discussion of `TChainIndex` versus `TTreeIndex`).


In [5]:
chain = rt.TChain(tree_name)
friend_chain = rt.TChain(friend_tree_name)
for key in data_files.keys(): 
    chain.AddFile(data_files[key],-1)
    friend_chain.AddFile(friend_data_files[key],-1)
    
chain_filename = jet_data_dir + '/' + 'clusters.root'
chain_file = rt.TFile(chain_filename,'RECREATE','',0) # uncompressed file
clone = chain.CloneTree(-1,'FAST')
friend_clone = friend_chain.CloneTree(-1,'FAST')
clone.Write()
friend_clone.Write()
chain_file.Close()

chain_file = rt.TFile(chain_filename,'READ')
chain = chain_file.Get(tree_name)
friend_chain = chain_file.Get(friend_tree_name)
assert(chain.GetEntries() == friend_chain.GetEntries()) # number of entries must match, otherwise something has gone very wrong
nentries = chain.GetEntries()
chain.AddFriend(friend_chain)
print('The chains are now friends.')

The chains are now friends.


#### 5) Preparing to create an event TTree

Let's make a function to quickly get the maximum length of a vector branch in our tree. We only have one vector branch for now but there may be others with a future dataset.

This will let us turn the branch from one of type vector to one of type array. It's useful if we're adding an extra dimension, as I'm currently having issues with making branches like vectors of arrays on the Python side of things.

In [6]:
def GetMaxVectorLength(chain, branchname):
    draw_string = branchname + '@.size()'
    chain.Draw(draw_string)
    h = rt.gPad.GetPrimitive('htemp') # some slightly idiomatic ROOT stuff, one of the few examples of weird default behavior
    max_length = int(h.GetXaxis().GetBinCenter(h.FindLastBinAbove(0)))
    return max_length

Now we will create our `TTreeIndex`. We will use the branch `eventNumber` as our `majornumber`, so that the index effectively sorts our tree by `eventNumber` and we have our events grouped together. Since there are many duplicate `eventNumber` entries, and we do not use any `minornumber`, this is *not* a unique index. But this is fine, because we do not care about the sorting within any single `eventNumber` value. It just means that we *cannot* access every single entry with a call to `TTreeIndex::GetEntryNumberWithIndex()`, but rather we'll have to loop through the elements of `TTreeIndex:GetIndex()` sequentially.

**TODO:** As a consequence of our indexing, we do not protect against the possibility of two clusters having the same `eventNumber` but different `runNumber`s. We should add this at some point to avoid the possibility of mixing clusters from what are really separate events.

In [7]:
# setting minornumber to 0 effectively gets rid of it
chain.SetBranchStatus('*',0)
chain.SetBranchStatus('eventNumber',1)
chain_idx = rt.TTreeIndex(chain,'eventNumber','0') # TODO: consider changing to majornumber=runNumber, minorNumber=eventNumber. In this particular case runNumber is always the same.
n_idx = chain_idx.GetN()
assert(n_idx == chain.GetEntriesFast()) # ensure our TTreeIndex is of the right length, otherwise something is wrong
chain_indices = chain_idx.GetIndex() # a C++-style array of (ROOT) type Long64_t...
chain_indices = np.array([chain_indices[i] for i in range(n_idx)],dtype=np.dtype('i8')) #... now a numpy array

Let's determine the maximum number of topo-clusters per event. We can set a naïve upper bound with nfiles * max(nCluster), but the actual upper bound is probably lower than this.

In [8]:
chain.SetBranchStatus('*',0)
chain.SetBranchStatus('eventNumber',1)
chain.SetBranchStatus('nCluster',1)

n_clusters_max = 0
max_tmp = 0

chain.GetEntry(0)
eN_prev = chain.eventNumber

nentries = chain.GetEntries()
stride = int(nentries/100)
l = int(nentries/stride)
bar_length = 120
prefix = 'Finding max nClusters'
qu.printProgressBar(0, l, prefix=prefix, suffix='Complete', length=bar_length)
for i in range(nentries):
    idx = chain_indices[i]
    chain.GetEntry(idx)
    if(chain.eventNumber != eN_prev):
        if(max_tmp > n_clusters_max): n_clusters_max = max_tmp
        max_tmp = 0
    max_tmp += 1
    eN_prev = chain.eventNumber
    if(i%stride == 0): qu.printProgressBar(int(i/stride), l, prefix=prefix, suffix='Complete', length=bar_length)
    
print('max(nClusters) per event =',n_clusters_max)

Finding max nClusters |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% Complete
max(nClusters) per event = 34


Now we want to copy our data to a new `ROOT` file, where each entry corresponds with an **event**. Our `chain_indices` lets us loop through the existing data in a sensible way.

Certain variables are *per-event* variables, such as `runNumber`. These will remain as scalars. Other variables are *per-cluster* variables, such as `clusterE` (scalar) or `EMB1` (2D vector). These will become *arrays* of whatever their previous type was. The branch `nCluster` will keep track of their length for each event. Note that this means we be rewriting the contents of the `nCluster` branch, rather than copying it over -- it currently only keeps track of the number of clusters per event per file, not the total number of clusters per event.

As one last note, we will have to be a little careful about looping through our `TChain` of input trees for the sake of speed. We're using a `TTreeIndex` that we built, but this will amount to hopping around a lot (reading entries in a very non-sequential order w.r.t. the chain/trees). [This can slow things down with a lot of file I/0](https://root-forum.cern.ch/t/ttree-getentry-with-a-ttreeindex-is-too-slow/17370/5).

In [9]:
# Converting from ROOT type names to leaflist decorators.
# Vector decorator will not work, but gives a sensible string
# telling us the depth (how many vectors).
def RTypeConversion(type_string):
    if(type_string == 'Short_t' or type_string == 'short'):    return 'S'
    elif(type_string == 'Int_t' or type_string == 'int'):    return 'I'
    elif(type_string == 'Float_t' or type_string == 'float'):  return 'F'
    elif(type_string == 'Double_t' or type_string == 'double'): return 'D'
    elif('vector' in type_string): # special case
#         type_substring = '<'.join(type_string.split('<')[1:])
#         type_substring = '>'.join(type_substring.split('>')[:-1])
#         type_substring = RTypeConversion(type_substring)
#         return 'v_' + type_substring
        return type_string
    else: return '?'

def GetShape(shape_string):
    dims = shape_string.replace('[',' ').replace(']', ' ').split()
    return tuple([int(x) for x in dims])

def RType2NType(type_string):
    if(type_string == 'S'):   return np.dtype('i2')
    elif(type_string == 'I'): return np.dtype('i4')
    elif(type_string == 'L'): return np.dtype('i8')
    elif(type_string == 'F'): return np.dtype('f4')
    elif(type_string == 'D'): return np.dtype('f8')
    else: raise ValueError('Input not understood.')

In [10]:
# Explicitly turn on all branches.
chain.SetBranchStatus('*',1)
friend_chain.SetBranchStatus('*',1)

# n_clusters_max = int(len(data_files.keys()) * chain.GetMaximum('nCluster')) # safe upper limit, but might be unnecessarily high

# Building our branch buffer for our new trees. This time we'll add leaflists to the buffer as well.
# Slightly hacky but this should be pretty flexible for any basic-type branches.

branch_info = [x.GetListOfLeaves()[0] for x in chain.GetListOfBranches()]
branch_info = [(x.GetTitle(),x.GetTypeName()) for x in branch_info]
branch_names = [x[0].split('[')[0] for x in branch_info]

friend_branch_info = [x.GetListOfLeaves()[0] for x in friend_chain.GetListOfBranches()]
friend_branch_info = [(x.GetTitle(),x.GetTypeName()) for x in friend_branch_info]
friend_branch_names = [x[0].split('[')[0] for x in friend_branch_info]

branch_names = branch_names + friend_branch_names
branch_info = branch_info + friend_branch_info

# Now let's consider removing some branches that we don't think we'll need for our event dataset.
# This can potentially speed things up a lot. Especially true for branches of type std::vector at the moment,
# as I am probably not handling them in the smartest way.
branch_names_remove = ['cluster_cellE_norm']

# In some cases we might want to change a branch's type when copying it over. One should be careful with this,
# but an obvious case is nCluster -- which we are explicitly overwriting anyway, so we know its new type.
rtypes_forced = {'nCluster':'S'}

perEvent = ['runNumber','eventNumber','nCluster','file_index'] # keep track of which branches only need one entry per event
vector_branches = [] # keep track of any branches that are of (C++) type std::vector

# We must also keep track of the original shapes of any array branches that we read, as they will be read out as 1D cppy arrays
# and will need to be reshaped before being placed in our buffer.
input_shapes = {}

branch_buffer = {}
for entry in branch_info: 
    name = entry[0]
    shape = (1,)
    shape_string = ''
    if('[' in name): 
        shape_string = '[' + '['.join(name.split('[')[1:])
        shape = GetShape(shape_string)
    name = name.split('[')[0]
    
    rtype = RTypeConversion(entry[1])    
    if(name in rtypes_forced.keys()): rtype = rtypes_forced[name]
    
    if(name in branch_names_remove): continue
    
    # save the original shapes of non-scalar branches
    if(shape != (1,)): input_shapes[name] = shape # save the original shape

    if(name not in perEvent):
        if(shape == (1,)): 
            shape = (n_clusters_max,)
            shape_string = '[' + 'nCluster' + ']'
        else:
            shape = tuple([n_clusters_max] + list(shape))
            shape_string = '[' + 'nCluster' + ']' + shape_string
      
    if('vector' in rtype):
        continue
        # We will make the vector into an array. Assuming vector is of some basic type! (not vector of vectors, etc.)
        rsubtype = '<'.join(rtype.split('<')[1:])
        rsubtype = '>'.join(rsubtype.split('>')[:-1])
        rtype = RTypeConversion(rsubtype)
        n_max = GetMaxVectorLength(chain,name)
        input_shapes[name]=(n_max,)
        shape = tuple(list(shape) + [n_max])
        shape_string += '[' + str(n_max) + ']'
        vector_branches.append(name)
        #branch_buffer[name] = [rt.vector(rsubtype)(),0]
        #TODO: Add a branch for the vector length
        
    branch_buffer[name] = [np.zeros(shape,dtype=RType2NType(rtype)),name + shape_string + '/' + rtype]   

#### 6) Creating event TTree

Now we're really ready to make our event TTree -- each entry will correspond with a full event, and hold all of its topo-clusters.

In [11]:
# Determining how many events to write. By default we want them all, but for debugging we might only want some subset.
nevents = -1
nentries = chain_indices.shape[0]
if(nevents > 0):
    # determine how many entries we need to get this many events
    chain.SetBranchStatus('*',0)
    chain.SetBranchStatus('eventNumber',1)
    nevents_tally = 0
    chain.GetEntry(chain_indices[0])
    eN_prev = chain.eventNumber
    for i in range(nentries):
        chain.GetEntry(chain_indices[i])
        if(chain.eventNumber != eN_prev): nevents_tally += 1
        eN_prev = chain.eventNumber
        if(nevents_tally == nevents):
            nentries = i-1 # don't include the event we've just started
            break
            
if(nevents <= 0): nevents = 'all'
report_string = 'Preparing to write {nev} events, corresponding to {nen} input topo-clusters.'.format(nev=nevents,nen=nentries)
print(report_string)

Preparing to write 5000 events, corresponding to 21929 input topo-clusters.


In [12]:
# Activate the branches we need, keep any ignored ones deactivated
chain.SetBranchStatus('*',1)
for name in branch_names_remove: chain.SetBranchStatus(name,0)
    
# Strategies for speeding things up more. 
strategy = 3

if(strategy == 1 or strategy == 3):
    filesize = chain_file.GetSize() # in bytes
    chain.SetMaxVirtualSize(int(1.5 * filesize))
    for name in branch_names:
        if(name not in branch_names_remove): chain.GetBranch(name).LoadBaskets()
    print('Employing strategy 1 (load full input tree into memory).')

if(strategy == 2 or strategy == 3):
    # Increasing basket_size will help a little. But because entry access is somewhat random,
    # with each event's clusters generally spread out into multiple small groups of adjacent entries,
    # performance will quickly plateau and big increases may not give any real gains.
    basket_size_multiplier = 3
    basket_size = 16000 * basket_size_multiplier
    for name in branch_names:
        if(name in branch_names_remove): continue
        if(name in friend_branch_names): friend_chain.SetBasketSize(name,basket_size)
        else: chain.SetBasketSize(name,basket_size)
    print('Employing strategy 2 (increasing basket size for input tree).')

Employing strategy 1 (load full input tree into memory).
Employing strategy 2 (increasing basket size for input tree).


In [13]:
#TODO: Deal with read/write bugging out at 99% when trying to write the full dataset. Works fine for subsets.

import time
from pathlib import Path
Path(jet_data_dir).mkdir(parents=True, exist_ok=True)

# Make the new TFile and TTree.
event_filename = path_prefix + 'jets/data' + '/' + 'events.root'
event_treename = 'eventTree'
event_file = rt.TFile(event_filename,'RECREATE','',0) # TODO: Making this uncompressed for debugging purposes
event_tree = rt.TTree(event_treename,event_treename)
#event_tree.SetDirectory(0) # tree will start in memory!

# Set up the branches. Note that we must add branches specifying lengths *before*
# any branches whose lengths they specify. For now, that means nCluster must go first.
# TODO: Make this less hacky.
name = 'nCluster'
assert(name in branch_buffer.keys())
buffer   = branch_buffer[name][0]
leaflist = branch_buffer[name][1]
event_tree.Branch(name,buffer,leaflist)

for key, value in branch_buffer.items():
    if(key == 'nCluster'): continue
    name = key
    buffer = value[0]
    leaflist = value[-1]
    if(leaflist == 0): event_tree.Branch(name,value[0])
    else: event_tree.Branch(name,buffer,leaflist)

chain.GetEntry(chain_indices[0])
eN_prev = chain.eventNumber
cluster_idx = 0

stride = int(nentries/100)
l = int(nentries/stride)
bar_length = 120
prefix = 'Writing event tree:'
qu.printProgressBar(0, l, prefix=prefix, suffix='Complete', length=bar_length)

start = time.time()
# dt = np.zeros(5)
# Now loop through the input chain and write to our new tree.
for i in range(nentries):

#     t0 = time.time()
    chain.GetEntry(chain_indices[i])
    eN = chain.eventNumber
#     dt[0] += time.time() - t0
    
    if(eN != eN_prev):
        # We've finished an event and have just entered a new one.
        # Write everything that's in the buffer. Corresponds to the previous event.
#         t0 = time.time()
        event_tree.Fill()
        cluster_idx = 0 # reset cluster_idx
#         dt[1] += time.time() - t0
    
    # Fill info from the current event.
    for name in branch_buffer.keys():
        shape = branch_buffer[name][0].shape
        
        # Our per-event branches
        if(shape == (1,)):
#             t0 = time.time()
            if(name == 'nCluster'): branch_buffer[name][0][0] = cluster_idx + 1 # will reach max value before write
            else: branch_buffer[name][0][0] = getattr(chain,name) # a more C-like way of getting branches
#             dt[2] += time.time() - t0
        # Our per-cluster branches
        else:
            ndim = branch_buffer[name][0].ndim
            if(ndim == 1): 
#                 t0 = time.time()
                branch_buffer[name][0][cluster_idx] = getattr(chain,name) # per-cluster scalar
#                 dt[3] += time.time() - t0
            else:
                # Multi-dim branches.
#                 t0 = time.time()
#                 if(name in vector_branches): # Our one vector branch seems to be a culprit for slowdowns.
#                     n = len(getattr(chain,name))
#                     branch_buffer[name][0][cluster_idx,:n] = np.array(getattr(chain,name))[:]
#                     continue
                branch_buffer[name][0][cluster_idx,:] = np.array(getattr(chain,name)).reshape(input_shapes[name])[:] # per-cluster array
#                 dt[4] += time.time() - t0
    
    cluster_idx += 1
    eN_prev = eN
    if(i%stride == 0): qu.printProgressBarColor(int(i/stride), l, prefix=prefix, suffix='Complete', length=bar_length)
    if(i == nentries-1): # Make sure to call a Fill() if we're at the end of the chain
        event_tree.Fill()
        #qu.printProgressBar(l, l, prefix=prefix, suffix='Complete', length=bar_length)
    
event_file.cd()
event_tree.Write('',rt.TObject.kOverwrite)
event_file.Close()
end = time.time()
dt_tot = end - start
rate = (nentries / dt_tot)
print('{val:.1f} seconds. (input data rate = {rate:.1f} Hz)'.format(val=dt_tot,rate=rate))

Writing event tree: |[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0m[32m█[0

The above method is still a little slow, with most of the slowdown having to do with reading entries from our combined `clusterTree` in a non-sequential fashion.

#### 7) Cleanup

Now we clean up some of the intermediate files we created.

In [14]:
cleanup = True
import glob, os
rfiles = glob.glob(jet_data_dir + '/*.root')
if(cleanup):
    for rfile in rfiles:
        if(rfile == event_filename): continue
        print('Deleting',rfile,'.')
        os.remove(rfile)

Deleting /workspace/LCStudies/jets/data/piplus.root .
Deleting /workspace/LCStudies/jets/data/pi0.root .
Deleting /workspace/LCStudies/jets/data/piminus.root .
Deleting /workspace/LCStudies/jets/data/clusters.root .
