# pyBumpHunter : A model agnostic bump hunting tool in python

**presenter : Louis VASLIN  -  Université Clermont Auvergne (FR)**

![pyBH_logo](img/pyBH_logo.png)


[Link to github](https://github.com/scikit-hep/pyBumpHunter)  
[Link to PyPI](https://pypi.org/project/pyBumpHunter/)

In [None]:
import numpy as np
import pyBumpHunter as BH

## The BumpHunter algorithm

BumpHunter is a usefull algorithm used in HEP community that allows to look for a **excess** (or deficit) in a **data** distribution w.r.t. a **reference background**.

It will compute the **local** and **global p-value** of the most significant **localized deviation** (bump) in the data.

The global p-value is computed by **generating toys** from the reference and scaning them like real data.  
The global p-value is then *the p-value of the local p-values*.

### Why pyBumpHunter ?

Public implementation based on pure python (numpy/scipy).  
Doesn't depend on the ROOT software.  
Recently integrated to Scikit-HEP.  
Still under active development.

## 1D scan

The basic scan that compare a data distribution to a refference background.  
It will give the position and width of the bump, as well as its local and global p-value.  
Also give a *rought* evaluation of the signal content of the bump (data - ref).

In [1]:
# Generating some background and data

# Create a BumpHunter1D instance

# Call the scan method

# Display results


## 2D scan

This is the extension of the BumpHunter algorithm to 2D distributions.  
Works the same way as in 1D.

In [2]:
# Generate some background and data

# Create a BumpHunter2D instance

# Call the scan method

# Display results


## Signal injection

Add the possibility to perform signal injection test using the BumpHunter algorithm to evaluate the sensibility to a given signal model.  
The injection is perform based on the reference background distribution and a signal distribution.  
The injections stops when the required sensitivity is reached (based on global significance).

In [3]:
# Generate somme background ad signal

# Create a BumpHunter1D instance

# Call the injection method

# Display results

## Side-band normalization

This option allows to correct the number of background events in the calculation of the local p-value.  
The normalization scale is computed as the data/background ratio outside of the bump window.  
![rescale](img/rescale.png) 
This normalization is done automatically durring the scan.  
This metohd can also penalize the p-value when there is a discrepency between data and background ouside of the bump window.

In [4]:
# Generating some background and data

# Create a BumpHunter1D instance

# Call the scan method with side-band normalization on

# Display results

## Multi-channel combination (under development)

This option allows to combine multiple channels in order to obtain a combined global p-value and significance.  
There are two combination method :

* **multiply**  
    Applyable only if all channels have the same binning.  
    For each position and width, the combined local p-value is defined as the product of the local p-value per channel.  
    The bump window is then defined as usual.

* **exclude**  
    Applyable even if all channels have different binning.  
    The combined bump window is defined as the intersection of the bump windows of each channel.  
    If there is no intersection, then the local p-value is set to 1.  
    Otherwise, the combined local p-value is the product of the local p-value per channel.

### Planned syntax

To define multiple channels with different bining, we pass a *list of bining* to the `bins` option.

```python3
bh = BH.BumpHunter1D()
bh.bins=[40,50] # 40 bins for ch0 and 50 bins for ch1
```

To call a scan method in one of the two multi-channel modes, we pass a list of data and background distributions in argument.  
The choice of the combination method is done with the `comb` argument.
```python3
bh.bump_scan(data, bkg, comb='exclude')
```

## Plan and future development

### For the next release

* Adding multi-channel combination

### Major additions planned

* API refactoring, with a DataHandler class that manage the histograming part separatly

* Add possibility to treat systematic uncertainties

* Other features that might be interesting (***your* ideas are welcome!**)