# Including systematic variations in your analysis

At some point in the development of a HEP data analysis workflow, the treatment of systematic variations in the observed quantities will become necessary. From the standpoint of a HEP physicist, the study of systematic variations involves many different, often conceptually complex cases. From the standpoint of the pure numerical computation, however, what typically happens is that the application must produce multiple results instead of a single one, each computed in a "universe" in which certain inputs take modified values.

In this notebook, we explore the RDataFrame API devoted to helping the user include systematic variations in their analysis code.

In [2]:
import ROOT

treename = "myDataset"
filename = "../../data/collections_dataset.root"
df = ROOT.RDataFrame(treename, filename)

## Registering variations for one observable

As a basic example, let's see how to include two variations for the column `px`, including giving them special labels. Note that in the call to `Vary` we can use values from columns available in the dataset, including the column we are currently registering variations for. In this example, we are booking the nominal histogram via the usual `Histo1D` method.

Note the following in the example:

* Custom names for variations can be passed in a list via the `variationTags` parameter.
* The name of the input column is used as the default name for the variation, unless the `variationName` parameter is used as in this example.
* The full variation name will be composed of the varied column name and the variation tags (e.g. "mypt:down", "mypt:up" in this example).
* The histogram is filled with values of the `good_pt` column, defined after the `Vary` call. The presence of systematic variations for certain columns is automatically propagated through filters, defines and actions, and RDataFrame will take these dependencies into account when producing varied results.

In [3]:
df = df.Define("pt", "sqrt(px*px + py*py)")

nominal_histo = (
    df.Vary(
        colName="pt",
        expression="ROOT::RVec<ROOT::RVecD>{pt*0.95, pt*1.05}",
        variationTags=["down", "up"],
        variationName="mypt")
      .Define("good_pt", "pt[E>100]")
      .Histo1D("good_pt")
)

In order to retrieve also all the varied histograms, we pass the pointer to the action just booked to the `VariationsFor` function, as shown below. This will return a dictionary containing the nominal histogram as well as all the varied histograms.

In [None]:
all_histos = ROOT.RDF.Experimental.VariationsFor(nominal_histo)

c = ROOT.TCanvas()

all_histos["nominal"].SetLineColor(ROOT.kBlue)
all_histos["nominal"].Draw()

all_histos["mypt:down"].SetLineColor(ROOT.kRed)
all_histos["mypt:down"].Draw("SAME")

all_histos["mypt:up"].SetLineColor(ROOT.kGreen)
all_histos["mypt:up"].Draw("SAME")

c.Draw()

## Registering variations for multiple columns simultaneously

The `Vary` function also allows to vary multiple columns simultaneously (in "lockstep"). The expression in this case must return an RVec of RVecs, one per column: each inner vector contains the varied values for one column, and the inner vectors follow the same ordering as the column names passed as first argument. Besides the variation tags, in this case we also have to explicitly pass a variation name as there is no one column name that can be used as default.

In [7]:
df = ROOT.RDataFrame(treename, filename)

nominal_histo_lockstep = (
    df.Vary(
        colNames=["px", "py"],
        expression="ROOT::RVec<ROOT::RVec<ROOT::RVecD>>{{px*0.95, px*1.05}, {py*0.95, py*1.05}}",
        variationTags=["down", "up"],
        variationName="pxAndpy")
      .Define("pt_lockstep", "sqrt(px*px + py*py)[E>100]")
      .Histo1D("pt_lockstep")
)

In [None]:
all_histos = ROOT.RDF.Experimental.VariationsFor(nominal_histo_lockstep)

c = ROOT.TCanvas()

all_histos["nominal"].SetLineColor(ROOT.kBlue)
all_histos["nominal"].Draw()

all_histos["pxAndpy:down"].SetLineColor(ROOT.kRed)
all_histos["pxAndpy:down"].Draw("SAME")

all_histos["pxAndpy:up"].SetLineColor(ROOT.kGreen)
all_histos["pxAndpy:up"].Draw("SAME")

c.Draw()