# Introduction to Input Modelling for Simulation

In this lab you will learn

* How to use auto_fit() to test the fit of distributions
* How to interpret the Chi-Squared and KS-Test results. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Using `auto_fit`

auto_fit is code written by **Dr Michael Allen**, University of Exeter Medical School. The source code can be found on pythonhealthcare.org https://pythonhealthcare.org/2018/05/03/81-distribution-fitting-to-data/

The program:

* Fits a number of distributions to the data and conducts both a Chi-Squared and KS-Test on the data. The data is ranked by the Chi-Squared statistic.
* Produces histograms, p-p and q-q plots of the data versus theoretical distributions

There is an implementation of this code in this repo in `input_modelling.fitting`.  Use the following code to import.

In [None]:
from input_modelling.fitting import auto_fit

Let's test `auto_fit` on some synthetic data.  We will use exponentially distributed data.  We will need to confirm this to `pd.DataFrame` as that the datatype the function `auto_fit` accepts as a parameter.

In [None]:
rng = np.random.default_rng(42)
samples = rng.exponential(scale=32, size=10_000)
samples = pd.DataFrame(samples)
samples.head()

In [None]:
auto_fit(samples)

The exponential distribution is ranked the highest as it has the lowest Chi-Squared statistic.  The p-value comes from the KS-test.  We can see that it is not significant which means we can interpret as we do not reject the distribution as an option.  Note that a number of other distributions are also a good fit for this data!

Let's plot the distributions as well using some extra options in `auto_fit`

In [None]:
dists_to_test = ['expon', 'gamma']

auto_fit(samples, hist=True, pp=True, dist_names=dists_to_test)