In [2]:
import boost_histogram as bh
import numpy as np
import matplotlib.pyplot as plt

## 5: Generalized histograms and Accumulators

Boost-Histogram offers more than ordinary histograms, it is a generalized histogram library that supports many so-called binned statistics.

These binned statistics are represented by accumulators. These are classes that accept samples and compute something from them. For example, the arithmetic mean:

In [3]:
mean = bh.accumulators.Mean()
mean.fill([.3, .4, .5])

Mean(count=3, value=0.4, variance=0.01)

Interesting properties of the accumulator can be accessed as attributes.

In [5]:
print(f"mean.count={mean.count} mean.value={mean.value:g} mean.variance={mean.variance:g}")

# Python 3.8:
# print(f"{mean.count=} {mean.value=} {mean.variance=}")

mean.count=3.0 mean.value=0.4 mean.variance=0.01


Here are the other currently supported accumulators. The list is growing and more suggestions from users are welcome!

In [8]:
[x for x in dir(bh.accumulators) if not x.startswith("__")]

['Mean', 'Sum', 'WeightedMean', 'WeightedSum']

Accumulator | Description | Attributes
----------- | :----------- | :-----------
Mean        | Computes the arithematic mean of samples | count, value, variance
WeightedMean | Computes the mean of weighted samples | sum_of_weights, sum_of_weights_squared, value, variance
Sum         | Computes sum of real weights more accurately | value
WeightedSum | Like Sum, but also keeps track of weight variance | value, variance

Most of these should be pretty clear, but let's talk a bit about computing sums on a computer.

### Spotlight: the Sum accumulator

Why we have a special accumulator for computing sums is best demonstrated in action.

In [23]:
# summing floats that differ a lot in magnitude leads to roundoff errors
values = [1e100, 1, -1e100]

print(f"np.sum             : {np.sum(values)}") 

# Sum accumulator keeps track of these round-off errors
s = bh.accumulators.Sum()
s.fill(values)
print(f"bh.accumulators.Sum: {s.value}")

np.sum             : 0.0
bh.accumulators.Sum: 1.0


The Sum accumulator produces the correct result, while `np.sum` does not. This is not a failure of `np.sum`, it is just how arithmetic with floating point numbers on a computer work.

In [24]:
1e16 + 1 == 1e16

True

The Sum accumulator uses a special algorithm (the [Kahan-Babuška-Neumeier algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)) to compensate for this loss in precision at the cost of doing 4x as many computations and using 2x as much memory compared to ordinary summation. Because of the performance penalty, it is not used by default.

As a general rule, in Boost-Histogram we give you **choices and reasonable defaults** . If you need extra accuracy, use `Sum`. By default, you get normal summation, which is less precise but more performant.

## 6: Changing the storage

While you can import and use these accumulators directly, the intended use of them is in your *histogram storage*. The storage of a histogram holds its accumulators, one per cell.

Remark about language:
* Bin: refers to a thing pointed to by a single axis (an interval for a continuous axis)
* Cell: refers to a thing in the histogram, which is addressed by all bins taken together)

![](histogram_layout.svg)

In [None]:
hist6 = bh.Histogram(bh.axis.Regular(10, 0, 10), storage=bh.storage.Mean())

In [None]:
hist6.fill([0.5]*3, sample=[.3, .4, .5])

In [None]:
hist6[0]

In [None]:
hist6.view()

In [None]:
hist6.view().value

In [None]:
hist6.view().variance

## 7: Making a density histogram

Let's try to make a density histogram like Numpy's.

In [None]:
bins = [-10, -7, -4, -3, -2, -1, -.75, -.5, -.25, 0, .25, .5, .75, 1, 2, 3, 4, 7, 10]
d7, e7 = np.histogram(data1 - 3.5, bins=bins, density=True)
plt.hist(data1 - 3.5, bins=bins, density=True);

Yes, it's ugly. Don't judge.

We don't have a `.density`! What do we do? (note: `density=True` is supported if you do not return a bh object)

In [None]:
hist7 = bh.numpy.histogram(data1 - 3.5, bins=bins, histogram=bh.Histogram)

widths = hist7.axes.widths
area = np.prod(widths, axis=0)

area

Yes, that does not need to be so complicated for 1D, but it's general.

In [None]:
factor = np.sum(hist7.view())
view = hist7.view() / (factor * area)

In [None]:
plt.bar(hist7.axes[0].centers, view, width=hist7.axes[0].widths);

## 8: Axis types

There are more axes types, and they all provide the same API in histograms, so they all just work without changes:

In [None]:
hist8 = bh.Histogram(
    bh.axis.Regular(30, 1, 10, transform=bh.axis.transform.log),
    bh.axis.Regular(30, 1, 10, transform=bh.axis.transform.sqrt)
)

In [None]:
hist8.reset()
hist8.fill(*make_2D_data(mean=(5, 5), widths=(5, 5)))

In [None]:
plothist2d(hist8);

## 9: And, circular, too!

In [None]:
hist9 = bh.Histogram(bh.axis.Regular(30, 0, 2*np.pi, circular=True))
hist9.fill(np.random.uniform(0, np.pi*4, size=300))

Now, the really complicated part, making a circular histogram:


In [None]:
ax = plt.subplot(111, polar=True)
plothist(hist9);