##### Example of Kolmogorov-Smirnov and Anderson-Darling GoF Test

We pwrform in this example a Goodness of fit test on a data sample vs a theoretical distribution. 
The distribution is a Gaussian with $\mu=0$ and $\sigma=1$.

The data sample consists instead of a large fraction (e.g. 96%) of data generated with the theoretical distribution (Gaussian with $\mu=0$ and $\sigma=1$) and the remaining few percent is contaminaited with a different Gaussian  with $\mu=3$  and $\sigma=0.5$

We perform both the Kolmogorov Smirnov and the Anderson-Darling test. 

At the end we try also using the Baker-Cousins $\chi^2$

In [None]:
%jsroot on

In [None]:
int n = 1000;
double frac1 = 0.96; 
int n1 = frac1*n; 
int n2 = n-n1;

#### Data Generation

In [None]:
TRandomMixMax r(1);
std::vector<double> x(n); 
for (int i = 0; i < n1; ++i ) { 
    x[i] = r.Gaus(0,1);
}
for (int i = n1; i < n; ++i ) { 
    x[i] = r.Gaus(3,0.5);
}

In [None]:
// fill histogram
auto h1 = new TH1D("h1","Data Sample distribution",100,-5,5);
h1->FillN(x.size(),x.data(),nullptr);

In [None]:
h1->Draw();
gPad->Draw();

We try also to fit the data with a Gaussian distribution. You see the fit is not very good

In [None]:
h1->Fit("gaus","L");
gStyle->SetOptFit(1111);
gPad->Draw();

In [None]:
//auto f1 =  h1->GetFunction("gaus");
auto pdf  = new TF1("pdf","ROOT::Math::normal_pdf(x,1,0)");

#### Make the GoF Tests

We make now the GoF with the theoretical function. Note that we do not use the fitted function, but the theoretical one.
We use the ROOT::Math::GoFTest class of ROOT to perform the goodness of fit test. 

In [None]:
ROOT::Math::GoFTest gof(x.size(),x.data(),*pdf);

##### Anderson-Darling Test

In [None]:
pvalue = gof.AndersonDarlingTest();
std::cout << "Anderson-Darling p-value = " << pvalue << std::endl;

##### Kolmogorov-Smirnov Test

In [None]:
pvalue = gof.KolmogorovSmirnovTest();
std::cout << "Kolmogorov-Smirnov p-value = " << pvalue << std::endl;

##### GoF Test using Baker-Cousins $\chi^2$

To compute the Baker-Cousins $\chi^2$ for the data with respect to our assumed theoretical function, we fit the function to the data, but keeping the $\mu$ and $\sigma$ fixed. We keep varying only the normalization constant. 

In [None]:
pdf2 = new TF1("gaus_pdf","[0]*ROOT::Math::normal_pdf(x,1,0)");
fitResult = h1->Fit(pdf2,"L S");
double chi2LR = 2 * fitResult->MinFcnValue();
std::cout << "Baker-Cousins chi-squared = " << chi2LR << std::endl;

In order to compute the p-value for this case, we need to calibrate the test-statistics using pseudoeperiments

In [None]:
// we should generate pseudo-experiments to get the correct p-values
hchi2 = new TH1D("hchi2","chi2 distribution",100,0,200);
h = new TH1D("h","gaussian experiment",100,-5,5);    
for (int iexp = 0; iexp < 1000; ++iexp) { 
    pdf2->SetParameter(0,10);
    h->FillRandom(pdf2->GetName(),n);
    auto r = h->Fit(pdf2,"L S Q");
    hchi2->Fill(2. * r->MinFcnValue() );
    h->Reset();
}
hchi2->Draw();
gPad->Draw();

After getting the test statistic distribution from the pseudo-experiment we fit to a  $\chi^2$ distribution where the free parameter is the number of degree of freedom, as we did in the exercise yesterday

In [None]:
fchi2 = new TF1("fchi2","[Constant]*ROOT::Math::chisquared_pdf(x,[ndf])",0,100);
fchi2->SetParameters(hchi2->GetEntries()*hchi2->GetBinWidth(1), hchi2->GetMean());
hchi2->Fit(fchi2,"LS");
ndf = fchi2->GetParameter("ndf");
std::cout << "Fitted ndf is " << ndf << std::endl;
gStyle->SetOptFit(1111);
gPad->Draw();

In [None]:
pvalue = ROOT::Math::chisquared_cdf_c(chi2LR, ndf);
std::cout << "Computed p-value for Baker-Cousins chi-squared = " << pvalue << std::endl;

The example can be extended by computing the p-value distributions using pseudo-experiments for the KS and AD test. 