# Inspecting a RooWorkspace
_Valerio Ippolito - INFN Sezione di Roma_

What's inside a RooWorkspace created with HistFactory? Let's find out in two different ways.


## With ROOT
Is there a better way to understand what's inside a ROOT file than opening it? Let's see what has been created by `create_data/create_workspace.ipynb`:

In [None]:
!ls ../ws

The output folder of a HistFactory run usually contains many files, one of which is quite meaningful: it's the one with the name `combined`.

In [None]:
f = new TFile("../ws/ATLASIT_prova_combined_ATLASIT_prova_model.root")

In [None]:
f->ls()

The file contains:
- the `RooWorkspace` object, i.e. our likelihood model and associated datasets
- another object we shall neglect
- two directories containing the input histograms we used to build the workspace, one per channel (statistically independent set of data, cf. previous steps of this tutorial)
- the Measurement object representing what we created with create_workspace.ipynb, so effectively the specification of what is inside the Workspace

In case you ever panick and don't remember what's inside a workspace, you may do

In [None]:
ATLASIT_prova->PrintXML("blabla")

and then the `blabla` folder (which is created just in case) contains the XML files which specify, in some convoluted version of the English language, what's inside the workspace:

In [None]:
!ls blabla

In [None]:
!cat blabla/ATLASIT_prova.xml

Now, back on the workspace, which is usually called `combined`. What's inside it? We check it with

In [None]:
w = dynamic_cast<RooWorkspace*>(f->Get("combined"));
w->Print("")

The workspace is actually a collection of stuff:
- variables (which can be parameters, observables, constants... anything!)
- probability density functions (PDFs), i.e. mathematical functions of those variables (technically there is no distinction between observables and parameters at this stage)
- functions, which - differently from PDFs - aren't supposed to be normalized to 1 (for example, a histogram can be seen as a function of an observable, whose integral is equal to a given number of events)
- datasets, i.e. real data (the `Data` input histograms we gave when creating the workspace) and so-called Asimov data, which represent the overall _nominal_ background expectation;
- embedded datasets, i.e. internal representations of histograms
- parameter snapshots, i.e. "named copies" of the value (e.g. `9.888`) and settings (e.g. up and down error, `isConstant=kTRUE`) of a set of parameters
- named sets, i.e. "named sets" of parameters, which share the same meaning - e.g. "what's or what are the POI[s]", "which are the observables", etc.
- generic objects, like the notable HistFactory `ModelConfig` object, which represents how we should be putting together the overall likelihood model (PDF, data, and notion of who's the POI and who are the observable, the nuisance parameters and the constant parameters in our statistical task)

Please note that the meaning of _nominal_ depends strongly on the default parameters which were set in the creation of the workspace. More specifically: normalization parameters (such as the signal strength, which is usually the POI, and the background normalization factors) are taken at their nominal value as declared in the workspace creation. Typical mistake: expecting the Asimov dataset to represent the _background-only hypothesis_ when you set the signal strength default value to `1`.

We can for example identify what's the POI, and what are its default value and bounds:

In [None]:
w->set("ModelConfig_POI")->Print("v")

or understand what are the observables,

In [None]:
w->set("observables")->Print("v")

or we may do that via the ModelConfig object,

In [None]:
mc = dynamic_cast<RooStats::ModelConfig*>(w->obj("ModelConfig"));
mc->GetParametersOfInterest()->Print();
mc->GetObservables()->Print();

and inspect the variable info via

In [None]:
w->var("mu_ttH")->Print()

We can also perform some simple plotting, provided we choose the observables to look at. HistFactory workspaces always have two special ones:
- `channelCat` represents the category the events fall in (i.e. it's like an enum representing which of the various channels events belong to - for two channels, you'll see two _Possible states_ this discrete variable can take)
- `weightVar` represents instead the weight which may be associated to each event

In [None]:
w->cat("channelCat")->Print("v")

Verbose printouts (`Print("v")`) and tree printouts (`Print("t")`) help nerds understand what's going on - how a variable or a function or in general any RooFit object is defined, which functions use it, and so on and so forth.

## With CommonStatTools

The CommonStatTools package provides a simple tool to extract histograms out of the folder structure of the file containing the RooWorkspace, and put them out in a single ROOT file.

In [None]:
!python ../CommonStatTools/obtainHistosFromWS.py -i ../ws/ATLASIT_prova_combined_ATLASIT_prova_model.root -o validation_histos.root

In [None]:
!ls

In [None]:
f_histo = new TFile("validation_histos.root");
f_histo->ls();

In [None]:
c = new TCanvas("c2", "c", 800, 400);
c->Divide(2, 1);

// first channel
c->cd(1);
h_data = dynamic_cast<TH1D*>(f_histo->Get("ljets_HThad_5j3b_Data_regBin"));
h_ttH = dynamic_cast<TH1D*>(f_histo->Get("ljets_HThad_5j3b_ttH_regBin"));
h_ttbar = dynamic_cast<TH1D*>(f_histo->Get("ljets_HThad_5j3b_ttbar_regBin"));
h_singleTop = dynamic_cast<TH1D*>(f_histo->Get("ljets_HThad_5j3b_singleTop_regBin"));

h_ttH->SetLineColor(kRed);
h_ttbar->SetLineColor(kBlue+1);
h_singleTop->SetLineColor(kGreen-8);
h_ttH->SetTitle("ttH");
h_ttbar->SetTitle("ttbar");
h_singleTop->SetTitle("single top");
h_data->SetTitle("data");

h_data->Draw("pe");
auto S_plus_B = new THStack("S_plus_B", "S plus B");
S_plus_B->Add(h_singleTop);
S_plus_B->Add(h_ttbar);
S_plus_B->Add(h_ttH);
S_plus_B->Draw("hist same");


// second channel
c->cd(2);
h_data = dynamic_cast<TH1D*>(f_histo->Get("ljets_Mbb_ge6jge4b_Data_regBin"));
h_ttH = dynamic_cast<TH1D*>(f_histo->Get("ljets_Mbb_ge6jge4b_ttH_regBin"));
h_ttbar = dynamic_cast<TH1D*>(f_histo->Get("ljets_Mbb_ge6jge4b_ttbar_regBin"));
h_singleTop = dynamic_cast<TH1D*>(f_histo->Get("ljets_Mbb_ge6jge4b_singleTop_regBin"));

h_ttH->SetLineColor(kRed);
h_ttbar->SetLineColor(kBlue+1);
h_singleTop->SetLineColor(kGreen-8);
h_ttH->SetTitle("ttH");
h_ttbar->SetTitle("ttbar");
h_singleTop->SetTitle("single top");
h_data->SetTitle("data");

h_data->Draw("pe");
auto S_plus_B2 = new THStack("S_plus_B", "S plus B");
S_plus_B2->Add(h_singleTop);
S_plus_B2->Add(h_ttbar);
S_plus_B2->Add(h_ttH);
S_plus_B2->Draw("hist same");


c->cd(2)->BuildLegend();
c->Draw();

You may sometimes need to compare different versions of the same workspace. In this case, a convenient way to identify changes in histograms is provided by CommonStatTools: let's first create a second mockup workspace,

In [None]:
!cp ../ws/ATLASIT_prova_combined_ATLASIT_prova_model.root another_mysterious_ws.root

and then we can visualize the differences between the histograms contained in the "old" (`-o`) and "new" (`-n`) files:

In [None]:
!python ../CommonStatTools/compareHistos.py -o ../ws/ATLASIT_prova_combined_ATLASIT_prova_model.root -n another_mysterious_ws.root

Of course there may be residual differences due to different settings in terms of normalization factors or normalization uncertainties, but that might be checked for example by comparing the XML files which can be extracted a-posteriori from the `Measurement` object in the file (`vimdiff` them!).