# Bias of Least Square (Chi-Square) Fit

The aim of this notebook is to show the bias, when fitting an histogram with the least square method (Chi-square histogram fit). 

We generate a set of data points with size *n* (e.g. *n* = 1000) uniformly distributed in an arbitrary interval (e.g. [0,100] ) and we perform a fit to the histogram with a constant function to determine the function offset parameter, which is directly related to the generated number of events. 

#### 1. Generation  of events

We generate *n* events (*n* = 1000) in the range [0,100] and we will with these events an histogram with nbins=100

In [None]:
auto h1 = new TH1D("h1","A Constant Distribution",100,0,100);

In [None]:
int n = 1000;
// use seed = 0 to get different number every time
TRandom3 r(0); 
for (int i = 0; i < n; ++i) { 
    h1->Fill(r.Uniform(0,100));
}

#### 2. Fit events with a constant function and using the Neyman chi-square

We create first the fit function, in this case constant function

$$f(x) = A $$ 

If we have generated *n* events, the true value of the constant $A_{true} =  \frac{N}{nbins}$, where $nbins$ is the number of bins of the histogram. 

In [None]:
auto f1 = new TF1("f1","[A]");
double trueValue = double(n)/h1->GetNbinsX();   

In [None]:
ROOT::Math::MinimizerOptions::SetDefaultMinimizer("Minuit2"); 
TFitResultPtr result_neyman, result_pearson, result_likelihood;


We perform first a chi-square fit. The default one in ROOT is the Neyman chi-square, where the observed error are used.
Note that we use the fit option **S** to save the result of the fit in the `TFitResult` object

In [None]:
%jsroot on
canvas = new TCanvas(); 
result_neyman = h1->Fit(f1,"S");
canvas->Draw();

#### 3. Fit again the events  using the Pearson chi-square

Now we performa Pearson chi2 (fit option "P"). In this case the expected error is used

In [None]:
result_pearson = h1->Fit(f1,"S P ");

#### 4. Fit events with the binned likelihood method

Now we perform a binned likelihood fit. The Fit option to use is "L"

In [None]:
result_likelihood = h1->Fit(f1,"S L");

In [None]:
g = new TGraphErrors(3);
g->SetMarkerStyle(20);
g->SetPoint(0,1,result_neyman->Value(0));
g->SetPointError(0,0,result_neyman->Error(0));
//
g->SetPoint(1,2,result_pearson->Value(0));
g->SetPointError(1,0,result_pearson->Error(0));
//
g->SetPoint(2,3,result_likelihood->Value(0));
g->SetPointError(2,0,result_likelihood->Error(0));

g->Draw("A EP");

line = new TLine(0.8,trueValue,3.2,trueValue);
line->Draw();
line->SetLineColor(kRed);

gPad->Draw();

In [None]:
std::cout << "Neyman chi2 fit bias  = " << result_neyman->Value(0)-trueValue<< std::endl;
std::cout << "Pearson chi2 fit bias = " << result_pearson->Value(0)-trueValue << std::endl;
std::cout << "Likelihood fit bias   = " << result_likelihood->Value(0)-trueValue << std::endl;

This is expected. The bias as the Neyman chi-squared is $\approx - \chi^2/N_{bins}$ while for the Pearson chi-squared is + $0.5 * \chi^2/N_{bins}$.

You can repeat this exercise with a different function (e.g a Gaussian distribution), and  you can optionally study using pseudo-experiments (Monte Carlo) the distribution of the fitted parameters. 