# Anatony of the zinv-analysis package

This notebook will demonstrate what the zinv-analysis pacakge does if you need to alter or include new modules to the zinv-analysis sequence

Start of by using the standard sequence defined under `reprocessing/`

In [1]:
import glob
import oyaml as yaml
import numpy as np
import pandas as pd
import dftools
import zinv

Welcome to JupyROOT 6.18/00


In [8]:
zinv.modules.analyse(
    "reprocessing/datasets.yaml",
    "reprocessing/module_sequence.yaml",
    "reprocessing/event_selection.yaml",
    "reprocessing/object_selection.yaml",
    "reprocessing/trigger_selection.yaml",
    "reprocessing/hdf_output.yaml",
    outdir="test_output",
    tempdir="_ccsp_temp",
    #mode="sge",
    #batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
    mode="multiprocessing",
    ncores=0,
    nblocks_per_process=1,
    nblocks_per_dataset=2,
    blocksize=1_000,
    sample="DYJetsToLL_Pt-250To400",
)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




HBox(children=(IntProgress(value=0, description='Finished', max=2, style=ProgressStyle(description_width='init…

HBox(children=(IntProgress(value=0, max=376), HTML(value='')))

File already exists /vols/build/cms/sdb15/phd/ZinvWidth/zinv-notebooks/notebooks/0_generate_tables/test_output/result.h5


HBox(children=(IntProgress(value=0, max=376), HTML(value='')))




[CustomReaderComposite([<zinv.modules.readers.EventTools.EventTools object at 0x7fb2aa681198>, <zinv.modules.readers.CollectionCreator.CollectionCreator object at 0x7fb2aa681c50>, <zinv.modules.readers.GenBosonProducer.GenBosonProducer object at 0x7fb2aa681240>, <zinv.modules.readers.LHEPartAssigner.LHEPartAssigner object at 0x7fb2aa681278>, <zinv.modules.readers.LHEPartAssigner.GenPartAssigner object at 0x7fb2aa681470>, <zinv.modules.readers.JecVariations.JecVariations object at 0x7fb2aa6817f0>, <zinv.modules.readers.ObjectFunctions.ObjectFunctions object at 0x7fb2aa87f438>, <zinv.modules.readers.SkimCollections.SkimCollections object at 0x7fb2aa87f390>, <zinv.modules.readers.ObjectCrossCleaning.ObjectCrossCleaning object at 0x7fb2aa882e48>, <zinv.modules.readers.ObjectCrossCleaning.ObjectCrossCleaning object at 0x7fb2aa882dd8>, <zinv.modules.readers.EventFunctions.EventFunctions object at 0x7fb2aa882ba8>, <zinv.modules.readers.CertifiedLumiChecker.CertifiedLumiChecker object at 0x7fb

Note that it may seem like it takes a long time to run over 2000 events here. However, it won't take significantly longer if we scaled this up to 1 million events. The scaling with the number of events is not linear as we're exploiting the higher-performance of numpy arrays with large contiguous memory blocks. But, take care not to ask for too many events per block as you'll run out of memory. On Imperial's batch system you can change the requested memory with the `-l h_vmem=XXG` option as seen above, up to a max of 24 GB, although not all cores will be available. For standard analysis running, I suggest maintaining the 24G option and keeping the number of events per block to about O(100,000). The requirements for data, mc and the various mc systematic variations will change the amount of resources required. Typically, data is the least resource demanding, mc is the most demanding on the memory, and mc systematic variations is the most demanding on the run time and may hit the batch wall time.

The process bar corresponding to a loop over event variables is done by the hdf_output module which runs the pre-defined cached functions and stores them to a file. Other part of the code really only setup functions to be used here. Without this module nothing would really happen.

Now let's add a custom module to the sequence

In [23]:
with open("reprocessing/module_sequence.yaml", 'r') as f:
    standard_sequence = yaml.load(f)

class Test(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    
    def begin(self, event):
        pass
    
    def event(self, event):
        # Object in nanoAOD
        print("event.Jet_pt: {}".format(event.Jet_pt))
        
        # Derived object - these are typically defined as functions that take 3 argument - the event, systematic variation name (''=nominal), and the nsigma
        print("event.METnoX_pt: {}".format(event.METnoX_pt(event, '', 0.)))
    
# Add the module just before the end (before the hdf_output module)
standard_sequence["sequence"].insert(-1, {
    "name": "test", # give it some name
    "module": "__main__.Test", # will dynamically load this class, so need to know how to import it. For classes defined here, they are found inside __main__
    "args": dict(), # pass arguments to __main__.Test.__init__
})
        
with open("test_sequence.yaml", "w") as f:
    yaml.dump(standard_sequence, f)

In [24]:
zinv.modules.analyse(
    "reprocessing/datasets.yaml",
    "test_sequence.yaml",
    "reprocessing/event_selection.yaml",
    "reprocessing/object_selection.yaml",
    "reprocessing/trigger_selection.yaml",
    "reprocessing/hdf_output.yaml",
    outdir="test_output",
    tempdir="_ccsp_temp",
    #mode="sge",
    #batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
    mode="multiprocessing",
    ncores=0,
    nblocks_per_process=1,
    nblocks_per_dataset=2,
    blocksize=1_000,
    sample="DYJetsToLL_Pt-250To400",
)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, description='Finished', max=2, style=ProgressStyle(description_width='init…

event.Jet_pt: [[218.86497 182.56685 94.12358 ... 34.892673 31.408762 15.594383] [244.2655 185.72528 95.68754 19.167973] [731.8863 582.10284 183.61334 ... 20.338747 20.77747 17.474726] ... [276.82138 225.51639 89.757774 ... 19.132902 18.777334 17.227882] [431.0058 81.593155 63.719055 ... 15.11278 15.719304 33.90681] [271.86877 218.17874 69.882454 55.793247 11.893495 16.074953]]
event.METnoX_pt: [264.94498   205.63593   128.68419   226.3038     31.866251  196.57057
 279.38022   283.93756   213.74533   289.02292   371.57718   400.531
 161.2286    130.14455   189.71674   318.4553    307.37164   322.21777
 243.9563    231.41388   243.88512   208.62251   289.55582   332.25586
 227.76152   175.69148   273.74628   300.79028   204.5453    296.1787
 244.45958   335.96527    70.01221   270.77496    64.9858     95.50695
 180.65292   273.63882   332.87158   217.47166   232.77406   257.24643
 365.8115    203.72266   221.05083   140.85063   295.3788    277.99054
 383.46133   293.1614    356.13058   2

HBox(children=(IntProgress(value=0, max=376), HTML(value='')))

File already exists /vols/build/cms/sdb15/phd/ZinvWidth/zinv-notebooks/notebooks/0_generate_tables/test_output/result.h5
event.Jet_pt: [[440.68213 180.12769 151.54207 ... 25.5282 19.956903 13.789705] [299.1675 217.22906 44.20339 ... 31.512815 0.0 23.069016] [179.16168 150.65381 112.485054 ... 14.794558 14.765489 15.369316] ... [290.39185 229.12956 108.214165 78.26603 19.800709] [303.35217 219.66817 164.78125 ... 32.67132 18.388023 25.609327] [310.8727 70.245895 26.708385 21.150078 15.43932]]
event.METnoX_pt: [224.61565   220.18597   180.34123   291.48947   297.46927   240.56845
  87.2958    225.0933    235.90097    97.11346   162.98474    12.885859
 173.40443   249.4964     56.291397  283.288     366.409      31.558643
 266.83435   202.64032   284.63144    12.653613  330.78433   272.79752
 217.45175   258.24573   328.20612   247.15295   393.11603   349.82324
  54.594387   46.543613   18.167099  300.84354   258.94855   400.21796
 270.60947    26.327492   41.511745  285.00894    39.84386

HBox(children=(IntProgress(value=0, max=376), HTML(value='')))

[CustomReaderComposite([<zinv.modules.readers.EventTools.EventTools object at 0x7fb27fcb3b38>, <zinv.modules.readers.CollectionCreator.CollectionCreator object at 0x7fb27fcb3160>, <zinv.modules.readers.GenBosonProducer.GenBosonProducer object at 0x7fb27fcb3588>, <zinv.modules.readers.LHEPartAssigner.LHEPartAssigner object at 0x7fb27fcb37f0>, <zinv.modules.readers.LHEPartAssigner.GenPartAssigner object at 0x7fb27fcb34a8>, <zinv.modules.readers.JecVariations.JecVariations object at 0x7fb27fcb3438>, <zinv.modules.readers.ObjectFunctions.ObjectFunctions object at 0x7fb27fcb3198>, <zinv.modules.readers.SkimCollections.SkimCollections object at 0x7fb27fcb3550>, <zinv.modules.readers.ObjectCrossCleaning.ObjectCrossCleaning object at 0x7fb27fcb3d68>, <zinv.modules.readers.ObjectCrossCleaning.ObjectCrossCleaning object at 0x7fb27fcb3c18>, <zinv.modules.readers.EventFunctions.EventFunctions object at 0x7fb2806682e8>, <zinv.modules.readers.CertifiedLumiChecker.CertifiedLumiChecker object at 0x7fb