# Generate tables

This notebook will convert NanoAOD files to the dataframe format required for the Z invisible width analysis, with the additional derived variables. No selection is applied here, but convenient and not too limiting skims will be performed on these results.

The configuration desired is communicated to the `zinv-analysis` repository through yaml config files found under the `reprocessing` directory. These can be edited as needed.

Import the relevant packages:

In [1]:
import glob
import oyaml as yaml
import numpy as np
import pandas as pd
import dftools

Welcome to JupyROOT 6.18/00


In [2]:
import zinv
help(zinv.modules.analyse)

Help on function analyse in module zinv.modules.analyse:

analyse(dataset_cfg, sequence_cfg, event_selection_cfg, physics_object_cfg, trigger_cfg, hdf_cfg, name='zinv', outdir='output', tempdir='_ccsp_temp', mode='multiprocessing', batch_opts='-q hep.q', ncores=0, nblocks_per_dataset=-1, nblocks_per_process=-1, nfiles_per_dataset=-1, nfiles_per_process=1, blocksize=1000000, cachesize=8, quiet=False, dryrun=False, sample=None)



In [3]:
help(zinv.modules.resume)

Help on function resume in module zinv.modules.resume:

resume(path, batch_opts='-q hep.q', sleep=5, request_resubmission_options=True)



## Run the table generator

Note that the following block is commented out. Although it can be run within this notebook, the results are typically lost if the connection is dropped or any issues happens with the browser. Therefore, for longer running blocks it is advised to run this in the terminal in an `ipython` session (where blocks of code run here are saved in the ipython session's history for easy access)

In [1]:
#zinv.modules.analyse(
#    "configs/datasets.yaml",
#    "configs/module_sequence.yaml",
#    "configs/event_selection.yaml",
#    "configs/object_selection.yaml",
#    "configs/trigger_selection.yaml",
#    "configs/hdf_output.yaml",
#    outdir="/vols/cms/sdb15/Analysis/ZinvWidth/databases/2019/08_Aug/28_Legacy/Data",
#    tempdir="/vols/cms/sdb15/_ccsp_temp/",
#    mode="sge",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#    #mode="multiprocessing",
#    #ncores=0,
#    nblocks_per_process=4,
#    blocksize=1_000_000,
#    sample="data",
#)

The options provided are:

* `datasets.yaml` - contains the information on where the relevant NanoAOD files are located with important information/naming conventions
* `module_sequence.yaml` - the sequence of modules to run on the NanoAOD files. These modules are defined inside the `zinv-analysis` package, but can be defined outside
* `event_selection.yaml` - can be used to define an event selection flag to each events. Currently this doesn't do anything with the modules defined. Event selection is applied elsewhere.
* `object_selection.yaml` - the cuts defining the analysis-level physics objects
* `trigger_selection.yaml` - the triggers to use. Currently this is not used with te modules defined. Trigger selection is applied elsewhere, along with the event selection.
* `hdf_output.yaml` - the event attributes to save into the output dataframe. Each column can only have one value per event.

other options are hopefully self-explanatory.

If the command above was running and stopped for some reason, then it can be resumed (after ensuring all jobs are killed) with the following

In [None]:
#zinv.modules.resume(
#    "/vols/cms/sdb15/_ccsp_temp/tpd_20190828_211305_2ursnd44",
#    batch_opts="-q hep.q -pe hep.pe 2 -l h_rt=3:0:0 -l h_vmem=24G",
#    request_resubmission_options=False,
#)

The same is done for MC

In [2]:
#zinv.modules.analyse(
#    "configs/nominal/datasets.yaml",
#    "configs/nominal/module_sequence.yaml",
#    "configs/nominal/event_selection.yaml",
#    "configs/nominal/object_selection.yaml",
#    "configs/nominal/trigger_selection.yaml",
#    "configs/nominal/hdf_output.yaml",
#    outdir="/vols/cms/sdb15/Analysis/ZinvWidth/databases/2019/08_Aug/28_Legacy/MC",
#    tempdir="/vols/cms/sdb15/_ccsp_temp/",
#    mode="sge",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#    #mode="multiprocessing",
#    #ncores=0,
#    nblocks_per_process=4,
#    blocksize=1_000_000,
#    sample="MC",
#)

In [3]:
#zinv.modules.resume(
#    "/vols/cms/sdb15/_ccsp_temp/tpd_20190815_142352_3g0w1x_t",
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=24G",
#)