## Fitting distributions to data with `paramnormal`.

In addition to explicitly creating distributions from known parameters, `paramnormal.fit` provides a similarly, though even less complete, interface to `scipy.stats` maximum-likelihood estimatation methods.

Again, we'll demonstrate with a lognormal distribution and compare parameter estimatation with scipy.

In [None]:
%matplotlib inline

import warnings
warnings.simplefilter('ignore')

import numpy as np
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')

import paramnormal

Let's start by generating a reasonably-sized random dataset and plotting a histogram.

In [None]:
np.random.seed(0)
x = paramnormal.lognormal(mu=1.75, sigma=0.75).rvs(370)

bins = np.logspace(-0.5, 1.75, num=25)
fig, ax = plt.subplots()
_ = ax.hist(x, bins=bins, normed=True)
ax.set_xscale('log')
ax.set_xlabel('$X$')
ax.set_ylabel('Probability')
seaborn.despine()

Pretending for a moment that we didn't generate this dataset with explicit distribution parameters, how would we go about estimating them?

Scipy provides a maximum-likelihood estimation for estimating parameters:

In [None]:
from scipy import stats
stats.lognorm.fit(x)

Unfortunately those parameters don't really make any sense based on what we know about our articifical dataset.

That's where paramnormal comes in:

In [None]:
params = paramnormal.fit.lognormal(x)
params

This matches well with our understanding of the distribution.

The returned `params` variable is a `namedtuple` that we can easily use to  create a distribution of our own and generate a nice plot with our histogram.

In [None]:
dist = paramnormal.lognormal(*params)

# theoretical PDF
x_hat = np.logspace(-0.5, 1.75, num=100)
y_hat = dist.pdf(x_hat)

bins = np.logspace(-0.5, 1.75, num=25)
fig, ax = plt.subplots()
_ = ax.hist(x, bins=bins, normed=True, alpha=0.375)
ax.plot(x_hat, y_hat, zorder=2, color='g')
ax.set_xscale('log')
ax.set_xlabel('$X$')
ax.set_ylabel('Probability')
seaborn.despine()