# Skim the processed NanoAOD results

Here the processed NanoAOD results are skimmed to remove unneccesary events. No other processing is performed. This is useful to avoid cumbersome processing of events that are never used in the analysis.

The `zdb-analysis` repository is primarily used with the skimming module to regenerate the dataframes rows corresponding to events.

In [1]:
import zdb
import glob
import os
import oyaml as yaml

In [2]:
help(zdb.modules.skim)

Help on function skim in module zdb.modules.skim:

skim(config, mode='multiprocessing', ncores=0, nfiles=-1, batch_opts='', output=None, chunksize=250000)



## Functions

In [3]:
def generate_config(outpath, selection, tables, filepaths):
    cfg = {
        "selection": "(" + ") & (".join(selection)+")",
        "tables": tables,
        "files": sorted(p for p in glob.glob(filepaths)),
    }
    with open(outpath, "w") as f:
        yaml.dump(cfg, f, indent=4)

## Configs

Create the yaml files within this notebook with the relevant selection (logical AND of the list)

In [4]:
!mkdir -p skims/

In [8]:
# Data
generate_config(
    "skims/data.yaml",
    ["IsCertified", "Flag_goodVertices", "Flag_globalSuperTightHalo2016Filter", "Flag_HBHENoiseFilter", "Flag_HBHENoiseIsoFilter", "Flag_EcalDeadCellTriggerPrimitiveFilter", "Flag_BadPFMuonFilter", "Flag_eeBadScFilter", "MET_dCaloMET<0.6", "nJetSelection>0", "nJetSelection==nJetVeto", "LeadJetSelection_chHEF>0.1", "LeadJetSelection_neHEF<0.8", "LeadJetSelection_pt>200.", "nPhotonVeto==0", "nBJetVeto==0"],
    ["Events"],
    "/vols/cms/sdb15/Analysis/ZinvWidth/databases/full/2020/02_Feb/10_SingleTable_FixObjectWeights/Data/*.h5",
)

# MC
generate_config(
    "skims/mc.yaml",
    ["(parent!='EWKV2Jets' | nGenBosonSelection==1)", "Flag_goodVertices", "Flag_globalSuperTightHalo2016Filter", "Flag_HBHENoiseFilter", "Flag_HBHENoiseIsoFilter", "Flag_EcalDeadCellTriggerPrimitiveFilter", "Flag_BadPFMuonFilter", "MET_dCaloMET<0.6", "nJetSelection>0", "nJetSelection==nJetVeto", "LeadJetSelection_chHEF>0.1", "LeadJetSelection_neHEF<0.8", "LeadJetSelection_pt>200."],
    ["Events"],
    "/vols/cms/sdb15/Analysis/ZinvWidth/databases/full/2020/02_Feb/10_SingleTable_FixObjectWeights/MC/*.h5",
)

# MC jec
generate_config(
    "skims/mc_jec.yaml",
    ["(parent!='EWKV2Jets' | nGenBosonSelection==1)", "Flag_goodVertices", "Flag_globalSuperTightHalo2016Filter", "Flag_HBHENoiseFilter", "Flag_HBHENoiseIsoFilter", "Flag_EcalDeadCellTriggerPrimitiveFilter", "Flag_BadPFMuonFilter", "MET_dCaloMET<0.6", "nJetSelection>0", "nJetSelection==nJetVeto", "LeadJetSelection_chHEF>0.1", "LeadJetSelection_neHEF<0.8", "LeadJetSelection_pt>200."],
    ["Events_jesTotalup", "Events_jesTotaldown", "Events_jerSFup", "Events_jerSFdown", "Events_unclustup", "Events_unclustdown"],
    "/vols/cms/sdb15/Analysis/ZinvWidth/databases/full/2020/02_Feb/10_SingleTable_FixObjectWeights/MC_JEC/*.h5",
)

# MC lepscales
generate_config(
    "skims/mc_lep.yaml",
    ["(parent!='EWKV2Jets' | nGenBosonSelection==1)", "Flag_goodVertices", "Flag_globalSuperTightHalo2016Filter", "Flag_HBHENoiseFilter", "Flag_HBHENoiseIsoFilter", "Flag_EcalDeadCellTriggerPrimitiveFilter", "Flag_BadPFMuonFilter", "MET_dCaloMET<0.6", "nJetSelection>0", "nJetSelection==nJetVeto", "LeadJetSelection_chHEF>0.1", "LeadJetSelection_neHEF<0.8", "LeadJetSelection_pt>200."],
    ["Events_eleEnergyScaleup", "Events_eleEnergyScaledown", "Events_muonPtScaleup", "Events_muonPtScaledown", "Events_photonEnergyScaleup", "Events_photonEnergyScaledown", "Events_tauPtScaleup", "Events_tauPtScaledown"],
    "/vols/cms/sdb15/Analysis/ZinvWidth/databases/full/2020/02_Feb/10_SingleTable_FixObjectWeights/MC_LEP/result_*.h5",
)

In [6]:
skim_dir = "/vols/cms/sdb15/Analysis/ZinvWidth/databases/skims/2020/02_Feb/10_SingleTable_FixObjectWeights/"
if not os.path.exists(skim_dir):
    os.makedirs(skim_dir)

As with the table generation code, run this elsewhere to avoid issues with loss of connection or browser crashes. The `multi_skim` function allows multiple config files to be run with a single master process for convenience.

In [7]:
#zdb.modules.multi_skim(
#    ["skims/data.yaml", "skims/mc.yaml", "skims/mc_jec.yaml", "skims/mc_lep.yaml"],
#    outputs=[
#        os.path.join(skim_dir, "data/result_{:05d}.h5"),
#        os.path.join(skim_dir, "mc/result_{:05d}.h5"),
#        os.path.join(skim_dir, "mc_jec/result_{:05d}.h5"),
#        os.path.join(skim_dir, "mc_lep/result_{:05d}.h5"),
#    ],
#    mode='sge',
#    ncores=100,
#    batch_opts="-q hep.q -l h_rt=3:0:0 -l h_vmem=12G",
#    chunksize=250_000,
#)