# P-values

_Valerio Ippolito - INFN Sezione di Roma_

This is the part in which we run the p-value calculation.

## Local p-values

Let's first make sure CommonStatTools is compiled

In [None]:
!cd ../CommonStatTools; mkdir -p build; cd build; cmake ..; make

We then load the compiled library, and the headers for the class which deals with the p-value calculation

In [None]:
#include "../CommonStatTools/SignificanceCalculator.h"

In [None]:
R__ADD_LIBRARY_PATH(../CommonStatTools/build)

In [None]:
R__LOAD_LIBRARY(libCommonStatTools.so)

P-values are run on a given workspace, contained in some input file. The workspace is expected to contain the ModelConfig, which specifies how the content of the workspace should be used to perform a statistical analysis. P-values are run considering some dataset as data.

In [None]:
inputFile = TString("../ws/ATLASIT_prova_combined_ATLASIT_prova_model.root");
workspaceName = TString("combined");
modelConfigName = TString("ModelConfig");
dataName = TString("obsData");

Let's retrieve them

In [None]:
input_f = new TFile(inputFile);
w = dynamic_cast<RooWorkspace*>(input_f->Get(workspaceName));
mc = dynamic_cast<RooStats::ModelConfig*>(w->obj(modelConfigName));
dataset = dynamic_cast<RooDataSet*>(w->data(dataName));

We will use the simple class `SignificanceCalculator` - who is the guy who will actually run significance calculation for us.

In [None]:
CommonStatTools::SignificanceCalculator calculator;
calculator.SetCPU(1);
calculator.CalculateSignificance(mc, dataset);

The p-value calculation is very simple: it's given by (https://arxiv.org/pdf/1007.1727.pdf)
$$q_0 = 2(NLL_0 - NLL)$$
there $NLL_0$ is the negative log-likelihood calculated when the POI is set to zero (background-only hypothesis), and $NLL$ is the value when also the POI is free to float.

In [None]:
std::cout << "Significance is: " << calculator.GetSignificance()
          << ", p-value is: " << calculator.GetPvalue() << "\n";

Toys which repeat the calculation over many variations of the global observables may be used as a way to check how likely is it to have a fluctuation higher than the observed one (as in the concept of _global p-value_), and can be run easily:

In [None]:
N_toys = 1000;

calculator.SetSeed(1337); // useful to run in batch and be sure to merge many independent outputs!
calculator.SetPrintoutFrequency(10); // -1 will disable the printout
calculator.CalculateSignificanceToys(w, mc, dataset, N_toys);

pValues = calculator.GetToysPvalues();
significances = calculator.GetToysSignificances();

Let's visualize the output

In [None]:
h_pval = new TH1F("pval", "pval", 100, 0, 1);
for (int i = 0; i < pValues.size(); i++) {
    cout << "toy " << i << ": pval " << pValues[i] << " sign " << significances[i] << endl;
    h_pval->Fill(pValues[i]);
}

c = new TCanvas("c", "c", 600, 600);
h_pval->Draw();
c->Draw();

The output may also be persisted to ROOT file:

In [None]:
output_f = new TFile("my_pvalues.root", "RECREATE");
calculator.WriteResultsToROOTfile(output_f, "p0");
calculator.WriteToysToROOTfile(output_f, "toys");
output_f->Write();
delete output_f;

which is in turn read out easily:

In [None]:
output_f = new TFile("my_pvalues.root");
output_f->ls();

In [None]:
t = dynamic_cast<TTree*>(output_f->Get("p0"));
t->Show(0);

In [None]:
c = new TCanvas("c", "c", 600, 600);
t = dynamic_cast<TTree*>(output_f->Get("toys"));
t->Draw("significance");
c->Draw();

## Global p-values based on crossings

CommonStatTools provides also another basic implementation of the global p-value calculation, based on the crossing method (ATL-PHYS-PUB-2011-011)

You need to provide:
- the maximum local significance, in number of gaussian sigmas
- the number of crossings

In [None]:
!../CommonStatTools/build/getGlobalP0 3.4 2