We can examine a column of data by calculating percentages of value of the data:
```
df.col.value_counts(normalize=True)
```

## Generating Confidence Intervals by Simulating a Sampling Distribution

Use the following steps to generate a confidence interval by simulating a sampling distribution from a dataset that has only 2 possible values:
1. Create a method which samples from the distrubtion.
```
def sample(value_a_percentage, n=1000):
    return pd.DataFrame({'new_col_name_a': np.where(np.random.rand(n) < value_a_percentage, 'value_a_name', 'value_b_name')})
```
2. Using the sample method, store results from a sufficient number of experiments.  An example:
```
def samplingdist(value_a_percentage, n=1000):
    return pd.DataFrame([sample(0.51).new_col_name_a.value_counts(normalize=True) for i in range(num_experiments)])
```
3. Creating histogram:
```
dist.value_a_name.hist(histtype='step', bins=20)
```
4. Create a function which examines quantiles:
```
def quantiles(value_a_percentage, n=1000, confidence_interval_percent):
    dist = samplingdist(value_a_percentage, n)
    offset = confidence_interval_percent/200.0
    return dist.value_a_name.quantile(offset), dist.value_a_name.quantile(1-offset)
```
5. To determine the confidence interval:
    1. Lower range of the confidence interval is the input of the quantiles function where the second returned value is as close to the value_a_percentage.
    2. Upper range of the confidence interval is the input of the quantiles function where the first returned value is as close to the value_a_percentage.
```

## Generating a Confidence Interval with Bootstrapping

1. Generate bootstrap by sampling from population with replacement for a sufficient number of n per sample.  This should be conducted a sufficient number of times, N.
```
bootstrap = pd.DataFrame({'new_col_name_mean': [df.sample(n, replace=True).col_name.mean() for i in range(N)]})
```
2. Creating histogram with the original population mean:
```
bootstrap.new_col_name_mean.hist(histtype='step')
plt.axvline(df.col_name.mean(), color=color='C1')
```
3. Determining the confidence interval:
```
def quantiles(df_col, confidence_interval_percent):
    offset = confidence_interval_percent/200.0
    return df_col.quantile(offset), df_col.quantile(1-offset)
```

## Statistical Modeling
Capture how the variation of response variables are affected by the variation of explanatory variables.  Model's parameters are refined until the difference (risidual) between predicted and observed values are minimized.

Models can be used to:
* Reveal qualities or trends about the population
* Predict future behavior



## Fitting Models to Data

One method is using ordinary least squares.

```
import statsmodels
import statsmodels.api as sm
import  statsmoidels.formula.api as smf

model = smf.ols(formmula = TILDE NOTATION INPUT, data=INPUT DATA)
grandmean = model.fit()

```