# Generate tables

This notebook will convert NanoAOD files to the dataframe format for the $Z+\mathrm{jets}/\gamma+\mathrm{jets}$ analysis. No selection is applied here.

The configuration desired is communicated to the `zinv-analysis` repository through yaml config files found under the `reprocessing` directory. These can be edited as needed.

Import the relevant packages:

In [1]:
import glob
import oyaml as yaml
import numpy as np
import pandas as pd
import dftools

Welcome to JupyROOT 6.18/00


In [2]:
import zinv
help(zinv.modules.analyse)

Help on function analyse in module zinv.modules.analyse:

analyse(dataset_cfg, sequence_cfg, event_selection_cfg, physics_object_cfg, trigger_cfg, hdf_cfg, name='zinv', outdir='output', tempdir='_ccsp_temp', mode='multiprocessing', batch_opts='-q hep.q', ncores=0, nblocks_per_dataset=-1, nblocks_per_process=-1, nfiles_per_dataset=-1, nfiles_per_process=1, blocksize=1000000, cachesize=8, quiet=False, dryrun=False, sample=None)



In [3]:
help(zinv.modules.resume)

Help on function resume in module zinv.modules.resume:

resume(path, batch_opts='-q hep.q', sleep=5, request_resubmission_options=True)



In [13]:
with open("configs/datasets.yaml", "r") as f:
    datasets = yaml.load(f)
print(datasets["default"])
print(datasets["datasets"][0])

{'energy': 13000, 'lumi': 35860}
{'name': 'SingleMuon_Run2016B_v1', 'parent': 'SingleMuon', 'isdata': True, 'nevents': 2789243, 'sumweights': 2789243.0, 'files': ['root://xrootd-cms.infn.it///store/data/Run2016B_ver1/SingleMuon/NANOAOD/Nano14Dec2018_ver1-v1/90000/4074F613-50E6-5545-8347-CCF58D02E64C.root', 'root://xrootd-cms.infn.it///store/data/Run2016B_ver1/SingleMuon/NANOAOD/Nano14Dec2018_ver1-v1/90000/87B8F064-C966-FD4F-BD32-E1FCB470AC7B.root', 'root://xrootd-cms.infn.it///store/data/Run2016B_ver1/SingleMuon/NANOAOD/Nano14Dec2018_ver1-v1/90000/BD130B6E-3ACD-0743-9F08-D0D6BDCD372D.root'], 'file_nevents': [1514174, 142544, 1132525], 'DAS': '/SingleMuon/Run2016B_ver1-Nano14Dec2018_ver1-v1/NANOAOD', 'tree': 'Events', 'xsection': None}


In [22]:
with open("configs/module_sequence.yaml", "r") as f:
    sequence = yaml.load(f)
sequence["sequence"][5]

{'name': 'jec_variations',
 'module': 'zinv.modules.readers.JecVariations',
 'args': {'jes_unc_file': 'http://www.hep.ph.ic.ac.uk/~sdb15/Analysis/ZinvWidth/data/jecs/legacy/Summer16_07Aug2017_V11_MC_UncertaintySources_AK4PFchs.csv',
  'jer_sf_file': 'http://www.hep.ph.ic.ac.uk/~sdb15/Analysis/ZinvWidth/data//jecs/legacy/Summer16_25nsV1_MC_SF_AK4PFchs.csv',
  'jer_file': 'http://www.hep.ph.ic.ac.uk/~sdb15/Analysis/ZinvWidth/data/jecs/legacy/Summer16_25nsV1_MC_PtResolution_AK4PFchs.csv',
  'apply_jer_corrections': True,
  'jes_regex': 'jes(?P<source>.*)',
  'unclust_threshold': 15.0,
  'maxdr_jets_with_genjets': 0.2,
  'ndpt_jets_with_genjets': 3.0,
  'data': False}}

In [23]:
with open("configs/event_selection.yaml", "r") as f:
    event_selection = yaml.load(f)
event_selection

{'selections': {}, 'grouped_selections': {}, 'cutflows': {}}

In [25]:
with open("configs/object_selection.yaml", "r") as f:
    object_selection = yaml.load(f)
object_selection["MuonSelection"]

{'original': 'Muon',
 'selections': ['ev, source, nsig: ev.Muon_ptShift(ev, source, nsig)>30.',
  'ev, source, nsig: np.abs(ev.Muon.eta)<2.4',
  'ev, source, nsig: np.abs(ev.Muon.pfRelIso04_all)<0.15',
  'ev, source, nsig: ev.Muon.tightId>=1']}

In [26]:
with open("configs/trigger_selection.yaml", "r") as f:
    trigger = yaml.load(f)
trigger

{'Data': {}, 'MC': {}}

In [36]:
with open("configs/hdf_output.yaml", "r") as f:
    hdf_output = yaml.load(f)
print(hdf_output["attributes"]["Both"]['METnoX_pt'])
print(hdf_output["dtypes"]["METnoX_pt"])

ev, source, nsig: ev.METnoX_pt(ev, source, nsig)
float64


## Run the table generator

Note that the following block is commented out. Although it can be run within this notebook, the results are typically lost if the connection is dropped or any issues happens with the browser. Therefore, for longer running blocks it is advised to run this in the terminal in an `ipython` session (where blocks of code run here are saved in the ipython session's history for easy access)

In [4]:
#zinv.modules.analyse(
#    "configs/datasets.yaml",
#    "configs/module_sequence.yaml",
#    "configs/event_selection.yaml",
#    "configs/object_selection.yaml",
#    "configs/trigger_selection.yaml",
#    "configs/hdf_output.yaml",
#    outdir="/vols/cms/sdb15/Analysis/ZinvWidth/databases/2019/08_Aug/28_Legacy/Data",
#    tempdir="/vols/cms/sdb15/_ccsp_temp/",
#    mode="sge",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#    #mode="multiprocessing",
#    #ncores=0,
#    nblocks_per_process=4,
#    blocksize=1_000_000,
#    sample="data",
#)

The options provided are:

* `datasets.yaml` - contains the information on where the relevant NanoAOD files are located with important information/naming conventions
* `module_sequence.yaml` - the sequence of modules to run on the NanoAOD files. These modules are defined inside the `zinv-analysis` package, but can be defined outside
* `event_selection.yaml` - can be used to define an event selection flag to each events. Currently this doesn't do anything with the modules defined. Event selection is applied elsewhere.
* `object_selection.yaml` - the cuts defining the analysis-level physics objects
* `trigger_selection.yaml` - the triggers to use. Currently this is not used with te modules defined. Trigger selection is applied elsewhere, along with the event selection.
* `hdf_output.yaml` - the event attributes to save into the output dataframe. Each column can only have one value per event.

other options are hopefully self-explanatory.

If the command above was running and stopped for some reason, then it can be resumed (after ensuring all jobs are killed) with the following

In [5]:
#zinv.modules.resume(
#    "/vols/cms/sdb15/_ccsp_temp/tpd_20190828_211305_2ursnd44",
#    batch_opts="-q hep.q -pe hep.pe 2 -l h_rt=3:0:0 -l h_vmem=24G",
#    request_resubmission_options=False,
#)

The same is done for MC

In [6]:
#zinv.modules.analyse(
#    "configs/datasets.yaml",
#    "configs/module_sequence.yaml",
#    "configs/event_selection.yaml",
#    "configs/object_selection.yaml",
#    "configs/trigger_selection.yaml",
#    "configs/hdf_output.yaml",
#    outdir="/vols/cms/sdb15/Analysis/ZinvWidth/databases/2019/08_Aug/28_Legacy/MC",
#    tempdir="/vols/cms/sdb15/_ccsp_temp/",
#    mode="sge",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#    #mode="multiprocessing",
#    #ncores=0,
#    nblocks_per_process=4,
#    blocksize=1_000_000,
#    sample="MC",
#)

In [7]:
#zinv.modules.resume(
#    "/vols/cms/sdb15/_ccsp_temp/tpd_20190815_142352_3g0w1x_t",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#)

## TODO

All aspects of the $\mathrm{Z}/\gamma$ analysis are not implemented yet. The TODO list looks something like this:

* Cross-check the object definitions in `configs/object_selection.yaml`
    * Add a module to calculate the relevant high pt ID isolation variable and apply this to the high pt muons as needed
* Cross-check variables saved in `configs/hdf_output.yaml`
* Include necessary variables inside the JEC and lepton scale variation HDF output yaml configs
* Add a module to find the relevant gen photon and store this in the output
* Determine NNLO QCD+nNLO EW correction for $\gamma+\mathrm{jets}$
    * Need to apply the dynamic photon isolation requirement on possible gen photons