Execute the following cell to render the README.md inside the notebook:

In [8]:
from IPython.display import display, Markdown
with open('README.md', 'r') as fh:
    content = fh.read()
display(Markdown(content))

# Hands-on binfit tutorial notebook

Tutorial for the [binfit](https://stash.desy.de/users/sutclw/repos/binfit)
package developed for template fits in Belle II analyses.

## Background information on `binfit` and relation to `TemplateFitter`

`binfit` is a python package for performing template fits in pure python
developed and maintained by [William
Sutcliffe](mailto:william.sutcliffe08@gmail.com).

Its code is based in large parts on [Maximillian Welsch](mailto:mwelsch@uni-bonn.de)'s `TemplateFitter` package,
which is also openly [available on github](https://github.com/welschma/TemplateFitter).

Another fork of the `TemplateFitter` package is being actively developed by [Felix Metzner](mailto:felix.metzner@kit.edu), also [on github](https://github.com/FelixMetzner/TemplateFitter). As far as I understand he extends it generalizes the template fitter, e.g. with support for arbitrary dimensions, adaptive binning.

## Other references / tutorials

An already existing example notebook can be found in the [binfit/examples/](https://stash.desy.de/users/sutclw/repos/binfit/browse/binfit/examples) directory of the packages. It requires you to clone binfit (see *Installation* section below). I will take inspiration from that.

There is a `docs/` directory which is meant to contain sphinx package documentation (what basf2 uses) in RestructuredText (rst format), but the installation instructions there are outdated. Maybe that will change. Feel free to contribute if you want.

Max had already given a nice tutorial for his package at the October 2019 B2GM.
In his talk he gave a very nice overview on the theory of template fitting, so I recommend you to look at his [slides](https://indico.belle2.org/event/1158/contributions/4726/attachments/2809/4241/b2gm_templatefitter.pdf). If you want to try out Max' `TemplateFitter`, e.g. to compare it to `binfit`, there's already good [tutorials](https://github.com/welschma/TemplateFitter/blob/master/examples/basic_example.ipynb) in his packages `examples` folder. Also, his package has a nice sphinx [online-documentation on readthedocs](https://templatefitter.readthedocs.io/en/latest/index.html).

## Tutorial author

[Michael Eliachevitch](mailto:meliache@uni-bonn.de "email")


## Installation

### Variant 1: pip install via single command
The fastest is to just install it via `pip` with a single command (if you haven't installed it yet, you can just execute the notebook cell below):

In [None]:
!python3 -m pip install --user --upgrade "git+ssh://git@stash.desy.de:7999/~sutclw/binfit.git"

### Variant 2: Clone the repository (recommended)

I would recommend thi options, because that makes it actually easier to navigate the source code locally.
Also, you will be able to browse the documentation and examples which come with the package.

```bash
git clone ssh://git@stash.desy.de:7999/~sutclw/binfit.git
cd binfit
python3 -m pip3 install --user --editable --upgrade . 
```

The last install command will install the package and its requirements. The `--editable` changes the installation so that it is performed by symlinking the package files instead of copying them. As a result, if you change something in the source code, the changes will immediately effect the installed version.

## Usage
As soon as the package is installed, you should be able to successfully import it:

In [124]:
import binfit
import pandas as pd
import numpy as np

### Load dataframes

In [100]:
df_umatch = pd.read_pickle('data/ulnu.pickle')
df_D = pd.read_pickle('data/D.pickle')
df_Dst = pd.read_pickle('data/Dst.pickle')
df_Dstst = pd.read_pickle('data/Dstst.pickle')

df_tot = pd.concat([df_umatch, dfD,dfDst,dfDstst])
df_tot.T

Unnamed: 0,3,4,13,18,33,35,42,81,94,99,...,407937,408075,408741,408830,408899,409074,409201,409242,409271,409299
Dgam_FF_downweight0,,,,,,,,,,,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000
Dgam_FF_downweight1,,,,,,,,,,,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000
Dgam_FF_downweight2,,,,,,,,,,,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000
Dgam_FF_downweight3,,,,,,,,,,,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000
Dgam_FF_downweight4,,,,,,,,,,,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
veto_slowNeuPi_q2,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,...,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000,-50.000000
BDT_prediction,0.861051,0.923874,0.855147,0.895396,0.868223,0.867276,0.881392,0.911360,0.895396,0.869359,...,0.869381,0.893322,0.876564,0.885773,0.860214,0.900854,0.880726,0.877784,0.869857,0.869244
final_weight,,,,,,,,,,,...,,,,,,,,,,
pp,0.374950,0.878394,0.530645,0.511303,0.887353,0.642553,0.674591,0.534917,0.390654,0.443020,...,1.507919,0.944721,0.686002,2.069228,1.670192,1.212945,1.006661,1.130347,1.545020,1.919404


Inspect the dataframes

In [105]:
df_tot.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 35916 entries, 3 to 409299
Columns: 557 entries, Dgam_FF_downweight0 to FinalWeight
dtypes: float32(195), float64(325), int32(33), int64(1), uint32(3)
memory usage: 122.5 MB


In [106]:
df_tot.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Dgam_FF_downweight0,10322.0,478.578506,2132.512677,6.162058e-01,1.000000,1.000000,1.000000,10000.000000
Dgam_FF_downweight1,10322.0,478.574170,2132.513648,6.845127e-01,1.000000,1.000000,1.000000,10000.000000
Dgam_FF_downweight2,10322.0,478.576607,2132.513102,6.170432e-01,1.000000,1.000000,1.000000,10000.000000
Dgam_FF_downweight3,10322.0,478.582045,2132.511884,6.759266e-01,1.000000,1.000000,1.000000,10000.000000
Dgam_FF_downweight4,10322.0,478.577504,2132.512901,6.884878e-01,1.000000,1.000000,1.000000,10000.000000
...,...,...,...,...,...,...,...,...
veto_slowNeuPi_q2,35916.0,-28.350130,430.470734,-5.000000e+01,-50.000000,-50.000000,-50.000000,10000.000000
BDT_prediction,35916.0,0.889762,0.023481,8.500130e-01,0.869359,0.888817,0.907894,0.986653
final_weight,11394.0,0.140711,0.039741,0.000000e+00,0.116893,0.170653,0.171431,0.171431
pp,35916.0,0.785941,0.522171,-1.192093e-07,0.394626,0.708361,1.082804,5.128110


### Create histograms

Binfit provided the `Hist1d` and `Hist2d` histogram classes which were first introduced in the `TemplateFitter` (this code was directly copied and unchanged, so you can switch between fitters without worrying about your histogram code).

Lets look at their signature and documentation, as we always do when we use functions/classes which we haven't seen before, to see how to use themn:

In [129]:
binfit.Hist1d?
# binfit.Hist2d?

[0;31mInit signature:[0m [0mbinfit[0m[0;34m.[0m[0mHist1d[0m[0;34m([0m[0mbins[0m[0;34m,[0m [0mrange[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mdata[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mweights[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
A 1 dimensional histogram.
    
[0;31mFile:[0m           ~/.local/lib/python3.6/site-packages/binfit/histograms/hist1d.py
[0;31mType:[0m           ABCMeta
[0;31mSubclasses:[0m     


#### Create 1D histograms in variable `gx_m` for different components

In [127]:
var = 'gx_m'
var_binning = np.array([0., 1.6, 1.9, 2.3, 2.5, 2.8])
bin_range = (var_binning[0], var_binning[-1])

hsig = binfit.Hist1d(bins=var_binning, range=bin_range, data=df_umatch[var], weights=df_umatch['tot_w_0'])
hD = binfit.Hist1d(bins=var_binning, range=bin_range, data=dfD[var], weights=dfD['tot_w_0'])
hDst = binfit.Hist1d(bins=var_binning, range=bin_range, data=dfDst[var], weights=dfDst['tot_w_0'])
hDstst = binfit.Hist1d(bins=var_binning, range=bin_range, data=dfDstst[var], weights=dfDstst['tot_w_0'])
htot = binfit.Hist1d(bins=var_binning, range=bin_range, data=dftot[var], weights=dftot['tot_w_0'])

#### Create 2D histograms in `gx_m` and `event_q2`

In [128]:
var2='event_q2'
#var2_binning = np.array([0., 2, 4, 6, 8, 10, 12, 14, 26])
var2_binning = np.array([0., 2, 4, 6, 8])
hsig2d = binfit.Hist2d(bins=[var_binning, var2_binning], data=[df_umatch[var], df_umatch[var2]], weights=df_umatch['tot_w_0'])
hD2d = binfit.Hist2d(bins=[var_binning, var2_binning], range=bin_range, data=[dfD[var],dfD[var2]], weights=dfD['tot_w_0'])
hDst2d = binfit.Hist2d(bins=[var_binning, var2_binning], range=bin_range, data=[dfDst[var],dfDst[var2]], weights=dfDst['tot_w_0'])
hDstst2d = binfit.Hist2d(bins=[var_binning, var2_binning], range=bin_range, data=[dfDstst[var],dfDstst[var2]], weights=dfDstst['tot_w_0'])
htot2D = binfit.Hist2d(bins=[var_binning, var2_binning], range=bin_range, data=[dftot[var],dftot[var2]], weights=dftot['tot_w_0'])