## Hypothesis Test Example

Our data are composed of two components, represented by two gaussians which have two different mean and sigma and our aim is to separate the two components (e.g. signal and background). 

We perform an hypothesis test to seprate the two and we study the efficiency and power of the test

In [1]:
int nsig = 10000; 
int nbkg = 10000; 
double mean_sig = 25; 
double sigma_sig = 4; 
double mean_bkg = 20; 
double sigma_bkg = 6; 
double xcut_min = 15; 
double xcut_max = 40;
std::vector<double> sigData(nsig); 
std::vector<double> bkgData(nsig); 
TRandom3 rndm(0);
// histograms for signal and background
auto hsig = new TH1D("hsig","Data",100,0.,40);
auto hbkg = new TH1D("hbkg","Background data",100,0.,40);

#### 1. Generation of Data Sample

In [2]:
// we generate first the signal events
for (int i = 0; i < nsig; ++i) {
    sigData[i]= rndm.Gaus(mean_sig,sigma_sig);
    hsig->Fill(sigData[i]);
}

In [3]:
// now we generate the backgroud events
for (int i = 0; i < nbkg; ++i) {
    bkgData[i]= rndm.Gaus(mean_bkg,sigma_bkg);
    hbkg->Fill(bkgData[i]);
}

In [None]:
hsig->SetFillColor(kYellow); 
hsig->Draw();
hbkg->SetLineColor(kBlue);
hbkg->Draw("SAME");
gPad->Draw();    

#### Data Selection as simple hypothesis test

We apply now a cut selecting signal events if x > x_cut
We compute the efficiency for the signal and the background contamination.

In this case we have: 

- null hypothesis, $H_0$ : x is distributed as signal events
- alternative hypothesis, $H_1$ : x is distributed as background events

And

- Error of the first type : probability of rejecting hypthesis when $H_0$ is true , i.e. 1 - signal efficiency
- Error of the second type: probability of accepting hypothesis when $H_1$ is true, i.e. background mis-identification probability

In [5]:
double x_cut = 25; 

In [6]:
auto selection_function = [&](double x){return x > x_cut;};

In [7]:
int nsig_selected  = std::count_if(sigData.begin(), sigData.end(), selection_function);

In [8]:
int nbkg_selected = std::count_if(bkgData.begin(), bkgData.end(), selection_function);

In [9]:
std::cout << "number of selected events is " << nsig_selected + nbkg_selected << std::endl;

number of selected events is 7044


In [10]:
std::cout << " signal efficiency          = " << double(nsig_selected)/nsig << std::endl; 
std::cout << " 1 - background efficiency  = " << 1. - double(nbkg_selected)/nbkg << std::endl; 

 signal efficiency          = 0.499
 1 - background efficiency  = 0.7946


### Receiver Operating Characteristic (ROC) curve

We make now a ROC curve plotting the signal and 1.- background efficiency for different data points

In [11]:
auto axis = new TAxis(100,xcut_min,xcut_max); 
int npoints = axis->GetNbins();
int ifirst = 1; int ilast = axis->GetNbins()+1;
auto graph = new TGraph(npoints);
for (int i = ifirst; i < ilast; ++i) { 
    x_cut = axis->GetBinLowEdge(i); 
    double sig_eff = double( std::count_if(sigData.begin(), sigData.end(), selection_function) ) / nsig;
    double bkg_eff = double( std::count_if(bkgData.begin(), bkgData.end(), selection_function) ) / nbkg;
    graph->SetPoint(i-ifirst,sig_eff, 1.-bkg_eff);
}

In [None]:
graph->SetTitle("ROC Curve;signal efficiency;1.-background efficiency");
graph->Draw("ACP");
gPad->Draw();
    