Skip to content
This repository has been archived by the owner on Nov 23, 2021. It is now read-only.

A ROOT::RDataFrame-friendly implementation of jet and MET variations for the CMS experiment NanoAOD files

License

Notifications You must be signed in to change notification settings

pieterdavid/CMSJMECalculators

Repository files navigation

CMSJMECalculators - moved to cp3-cms/CMSJMECalculators

This packages provides an efficient ROOT::RDataFrame-friendly implementation of the recipes for jet and MET variations for the CMS experiment, for use with samples in the NanoAOD format. The code was adopted from the bamboo analysis framework.

NOTE: This is a preview to gather feedback (please open an issue with yours), without any guarantees of stability (including in naming) for now

Update: This package was moved to cp3-cms/CMSJMECalculators (on the CERN gitlab instance), development continues there

Installation

For using these helpers from python, the recommended solution is to install the package (in a virtual or conda environment) with

pip install git+https://github.com/pieterdavid/CMSJMECalculators.git

scikit-build is used to compile the C++ components against the available ROOT distribution.

Inside a CMSSW environment, the install_cmssw.sh script can be used:

wget -q https://raw.githubusercontent.com/pieterdavid/CMSJMECalculators/main/install_cmssw.sh
source ./install_cmssw.sh

if a specific version is needed, the $VERSION variable can be set, e.g.

VERSION=0.1.0 source ./install_cmssw.sh

From C++ the package can be installed directly with CMake, using the standard commands (after cloning the repository):

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=<your-prefix> [other-options] <source-clone>
make
make install

This will also install the python modules in <your-prefix>/lib/pythonX.Y/site-packages/CMSJMECalculators/.

Usage

When installed as a python package or directly with CMake, the necessary components can be loaded with:

from CMSJMECalculators import loadJMESystematicsCalculators
loadJMESystematicsCalculators()

Note that this will load the shared library and headers or dictionary in cling, the ROOT interpreter, so they can from then on also be used in JITted code, e.g. from RDataFrame.

The variations are calculated by the C++ classes JetVariationsCalculator and FatJetVariationsCalculator for the AK4 and AK8 jet JER and JES variations, and Type1METVariationsCalculator and FixEE2017Type1METVariationsCalculator for the Type-1 MET variations, using the standard procedure or with the special recipe for 2017 (Type-1 smeared or standard MET is a configuration option). To use these, an instance should be created (with the C++ interpreter, to make it available from JITted code), and additional configuration passed by calling setter methods, e.g. in PyROOT:

import ROOT as gbl
calc = gbl.JetVariationsCalculator()
calc = getattr(gbl, "myJetVarCalc")
calc = gbl.JetVariationsCalculator()
# redo JEC, push_back corrector parameters for different levels
jecParams = getattr(gbl, "std::vector<JetCorrectorParameters>")()
jecParams.push_back(gbl.JetCorrectorParameters(textfilepath))
calc.setJEC(jecParams)
# calculate JES uncertainties (repeat for all sources)
jcp_unc = gbl.JetCorrectorParameters(textfilepath_UncertaintySources)
calc.addJESUncertainty("Total", jcp_unc)
# Smear jets, with JER uncertainty
calc.setSmearing(textfilepath_PtResolution, textfilepath_SF,
    splitJER,       # decorrelate for different regions
    True, 0.2, 3.)  # use hybrid recipe, matching parameters

The varied jet pt's and masses can be obtained by calling the produce method with the per-event quantities, converted to ROOT::VecOps::RVec:

from CMSJMECalculators.utils import toRVecFloat, toRVecInt
jetVars = calc.produce(toRVecFloat(tree.Jet_pt), toRVecFloat(tree.Jet_eta), ...)

since the full list of arguments can be long, and depends on a few parameters (for data the MC branches are not there, and not needed, and MET needs a few additional inputs), a helper function is provided, which can be used as follows:

from CMSJMECalculators.utils import getJetMETArgs
jetVars = calc.produce(*getJetMETargs(tree, isMC=True, forMET=False))

This will return an object that contains all the variations, e.g. jetVars.pt(0) will return the RVec with new nominal jet PTs. The corresponding names of the variations, which depend on the configuration, can be retrieved from the calculator by calling its available() method.

From (JITted) RDataFrame

When constructing the RDataFrame graph from python, the calculator needs to be constructed directly from the cling interpreter, such that it is available in the global C++ namespace for JITted code:

gbl.gROOT.ProcessLine("JetVariationsCalculator myJetVarCalc{};")
calc = getattr(gbl, "myJetVarCalc")

the second line retrieves a reference from PyROOT, such that the configuration methods can be called as above.

Inside the RDataFrame graph the varied jet pt's and masses can be defined as a new column:

df.Define("ak4JetVars", "myJetVarcalc.produce(Jet_pt, Jet_eta, Jet_phi, ...)")

(the full set of arguments is not reproduced here, but can be found from the utils.getJetMETargs method; since RDataFrame uses RVec internally no conversion is needed).

From C++

The PyROOT example above relies on the automatically generated bindings, so the C++ equivalent is almost identical, and straigthforward to obtain. When calling the produce method outside RDataFrame, most of the arguments may need to be converted to RVec, which fortunately supports all common kinds of array interfaces.

TODO expand C++ examples

Caching the text files

Since the JEC and JER parameter text files need to be downloaded from the corresponding repositories, which are quite big, a helper is provided that downloads only the files that are used, and caches them locally. It can be used like this (see the tests for more examples):

from CMSJMECalculators.jetdatabasecache import JetDatabaseCache
jecDBCache = JetDatabaseCache("JECDatabase", repository="cms-jet/JECDatabase")
jrDBCache = JetDatabaseCache("JRDatabase", repository="cms-jet/JRDatabase")
# usage example, returns the local path
pl = jecDBCache.getPayload("Summer16_07Aug2017_V11_MC", "L1FastJet", "AK4PFchs")

The cache can also be checked and updated with the checkCMSJMEDatabaseCaches script, which has an interactive mode (-i flag) that will start an IPython shell after constructing the two database cache helpers.

Testing and development

A set of pytest-based tests are included, to make sure the implementation stays consistent with the POG-provided python version in nanoAOD-tools. The tests compare the contents of the pt and mass branches for all variations. They can be run with

pytest tests

or, inside a CMSSW environment where python2 is the default

python3 -m pytest tests

TODO make tests python2-compatible, expand, scripts for larger tests samples?

About

A ROOT::RDataFrame-friendly implementation of jet and MET variations for the CMS experiment NanoAOD files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published