# PyUnfold Examples

The notebook containing these examples can be found on the [PyUnfold GitHub repository](https://github.com/jrbourbeau/pyunfold/tree/master/docs/examples.ipynb).

In [None]:
from pyunfold import iterative_unfold

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(2)
%matplotlib inline

## Example 1: Two-bin case

The simplest unfolding case consists of observing a number of counts for two classes of observations, class 1 and class 2. 

### Observed distribution

These observations should be formatted in an array-like object (e.g. `list`, `tuple`, `numpy.ndarray`, etc.). In this example, there are 100 observations of class 1 and 150 observations of class 2.

In [None]:
data = [100, 150]

Note that here, the first item in `data` corresponds to class 1 while the second item corresponds to class 2. This ordering is important and should be consistent throughout. 

The error for these observations is also needed for unfolding. Here, we'll assume the errors are simply Poisson counting errors (i.e. $\mathrm{error_{i} = \sqrt{counts_{i}}}$).

In [None]:
data_err = np.sqrt(data)

In [None]:
labels = ['Class 1', 'Class 2']
sns.barplot(labels, data, palette="viridis")
plt.ylabel('Observed distribution')
plt.show()

### Response matrix

However, these quality of these observations are not perfect. Sometimes they are incorrect (i.e. a patricular observation of class 1 gets misidentified as belonging to class 2 and vice versa). These misidentifications are encapsulated in the response matrix. 

In [None]:
response = [[0.9, 0.15],
            [0.1, 0.85]]
response_err = [[0.01, 0.01],
                [0.01, 0.01]]

In [None]:
sns.heatmap(response, annot=True, cmap='viridis',
            xticklabels=labels, yticklabels=labels,
            vmin=0, vmax=1)
plt.show()

### Detection efficiencies

In [None]:
efficiencies = [1.0, 1.0]
efficiencies_err = [0.01, 0.01]

### Perform iterative unfolding

In [None]:
unfolded_results = iterative_unfold(data, data_err,
                                    response, response_err,
                                    efficiencies, efficiencies_err)

In [None]:
unfolded_results

In [None]:
unfolded_dist = unfolded_results['unfolded']
unfolded_dist

In [None]:
sns.barplot(labels, unfolded_dist, palette="viridis")
plt.ylabel('Unfolded distribution')
plt.show()

Comparing the observed and unfolded counts distributions for each class.

In [None]:
df = pd.DataFrame({'label': 2*labels,
                   'dist': list(data) + list(unfolded_dist),
                   'type': ['Observed', 'Observed', 'Unfolded', 'Unfolded']})
sns.factorplot(x='label', y='dist', data=df,
               hue='type', kind='bar', palette='viridis',
               legend=False)
plt.xlabel('')
plt.ylabel('Distributions')
plt.legend()
plt.show()

## Example 2: Multi-bin case

### True and observed distributions

In [None]:
num_samples = int(1e5)
true_samples = np.random.normal(loc=0.0, scale=1.0, size=num_samples)

For this example, the observed (measured) samples are the true samples with some random Gaussian noise added.

In [None]:
random_noise = np.random.normal(loc=0.3, scale=0.5, size=num_samples)
observed_samples = true_samples + random_noise

In [None]:
bins = np.linspace(-3, 3, 21)

In [None]:
plt.hist(true_samples, bins=bins, histtype='step', lw=3,
         alpha=0.7, label='True distribution')
plt.hist(observed_samples, bins=bins, histtype='step', lw=3,
         alpha=0.7, label='Observed distribution')
plt.xlabel('X')
plt.ylabel('Counts')
plt.legend()
plt.show()

In [None]:
data_true, _ = np.histogram(true_samples, bins=bins)
data_true = np.array(data_true, dtype=float)
data_true

In [None]:
data_observed, _ = np.histogram(observed_samples, bins=bins)
data_observed

Statistical Poisson errors

In [None]:
data_observed_err = np.sqrt(data_observed)
data_observed_err

### Response matrix

In [None]:
response, _, _ = np.histogram2d(observed_samples, true_samples, bins=bins)
response_err = np.sqrt(response)

Column normalize the response matrix to give $P(E|C)$

In [None]:
response = response / response.sum(axis=0)
response_err = response_err / response.sum(axis=0)

In [None]:
plt.imshow(response)
plt.colorbar()
plt.show()

### Efficiencies

For now, we'll assume uniform efficiencies of 1.

In [None]:
efficiencies = response.sum(axis=0)
efficiencies

In [None]:
efficiencies_err = np.full_like(efficiencies, 0.1, dtype=float)
efficiencies_err

### Perform iterative unfolding

In [None]:
unfolded_results = iterative_unfold(data=data_observed,
                                    data_err=data_observed_err,
                                    response=response,
                                    response_err=response_err,
                                    efficiencies=efficiencies,
                                    efficiencies_err=efficiencies_err,
                                    ts='ks',
                                    ts_stopping=0.001)

In [None]:
unfolded_results

Comparison of true, observed, and unfolded distributions

In [None]:
bin_midpoints = (bins[1:] + bins[:-1]) / 2
plt.hist(true_samples, bins=bins, histtype='step', lw=3,
         alpha=0.7,
         label='True distribution')
plt.hist(observed_samples, bins=bins, histtype='step', lw=3,
         alpha=0.7,
         label='Observed distribution')

plt.errorbar(bin_midpoints, unfolded_results['unfolded'],
             yerr=unfolded_results['sys_err'],
             alpha=0.8,
             elinewidth=2,
             capsize=3,
             ls='None', marker='o', ms=5, 
             label='Unfolded distribution')

plt.xlabel('X')
plt.ylabel('Counts')
plt.legend()
plt.show()