# The Wild and Wonderful World of Monte Python!
### A computational approach to statistical inference with Monte Carlo simulation and resampling methods using python
by Henry Bechtel

With inspiration from:
- "Statistics for Hackers", by Jake VanderPlas 
- "There is only one test", by Allen Downey



##### Some definitions:
Monte Carlo Simulation: Uses random sampling to solve problems (usually includes an assumed data-generating process)
- Direct Simulation

Resampling: Could be considered a subset of Monte Carlo but usually involves  sampling from observed data (no DGP)
- Shuffling
- Bootstrapping 
- Cross Validation


Why care about Monte Carlo and resampling?

- Learn about statistical assumptions and their effects on statistical inference (to experiment)
- Conduct hypothesis testing more transparently and intuitively 
- 

##### Computational Approach to Hypothesis Testing

- Model the null hypothesis
- Choose a test statistic (one that reflects the effect you are testing) 
- Simulate it
- Count

Importing dependancies

In [131]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [358]:
plt.rcParams['figure.figsize'] = [8.0, 4.0]

### Monte Carlo Simulation 
- Experimentation on modeling assumptions
- Hypothiesis testing with an assumed model (DGP)

#### Experimentation on modeling assumptions

#### Hypothiesis testing with an assumed model (DGP)
Let's suppose you come across a coin that has been flipped 30 times and has landed on heads 21 of those times. Is it a fair coin?

First, we need to state the null hypothesis: "The coin is fair and the 21 heads are the result of random chance"

Our test statistic will be the sum (count) of all the heads flipped.

Here we have a DGP model to simulate the conditionsa of the null hypothesis: a binary random variable with equal probability of landing heads or tails.

One coin flip can be coded as:

In [158]:
np.random.randint(2)

1

A trail of 30 flips can be coded as:

In [148]:
np.random.randint(2, size = 30)

array([1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
       1, 1, 1, 0, 1, 1, 1])

Simulating a large number of 30-flip trials with a for-loop:

In [147]:
M = 0
numTrials = 100000
for i in range(numTrials):
    trial = np.random.randint(2, size = 30)
    if sum(trial) >= 21:
        M += 1
    p = M/numTrials
print(p)

0.0216


In general, ANALYTICALLY calculating the sampling distribution is hard. But SIMULATING the sampling distribution is easy.

### Resampling Methods

Hypothiesis testing withOUT a generative model

- Permutation Testing
- Jackknife
- Bootstrap
- Cross Validation

### Bootstrap 

### Cross Validation