# Monte Python [under construction]
### A computational approach to statistical inference with Monte Carlo simulation and resampling methods using python
by Henry Bechtel

With inspiration from:
- "Statistics for Hackers", by Jake VanderPlas 
- "There is only one test", by Allen Downey



##### Some definitions:
Monte Carlo Simulation: Uses random sampling to solve problems (usually includes an assumed data-generating process (DGP))
- Direct Simulation

Resampling: Could be considered a subset of Monte Carlo but usually involves  sampling from observed data (no DGP)
- Shuffling
- Bootstrapping 
- Cross Validation


Why care about Monte Carlo and resampling?

- Learn about statistical assumptions and their effects on statistical inference (to experiment)
- Conduct hypothesis testing more transparently and intuitively 

##### Computational Approach to Hypothesis Testing

- Model the null hypothesis
- Choose a test statistic (one that reflects the effect you are testing) 
- Simulate it
- Count

Importing dependancies

In [1]:
import random as random
# import numpy as np
# import pandas as pd
# import seaborn as sns
# import matplotlib.pyplot as plt

### Monte Carlo Simulation 
- Experimentation on modeling assumptions
- Hypothiesis testing with an assumed model (DGP)

#### Experimentation on modeling assumptions

#### Hypothiesis testing with an assumed model (DGP)
Let's suppose you come across a coin that has been flipped 30 times and has landed on heads 21 of those times. Is it a fair coin?

First, we need to state the null hypothesis: "The coin is fair and the 21 heads are the result of random chance"

Our test statistic will be the sum (count) of all the heads flipped.

Here we have a DGP model to simulate the conditionsa of the null hypothesis: a binary random variable with equal probability of landing heads or tails.

Lets code the the null hypothesis! Here we define what we mean by a fair coin flip.

In [11]:
def coin_flip(null_hyp_prob = 0.5):
    ''' Function returns 1 for heads and 0 for tails. '''
    result = 1 if random.random() >= null_hyp_prob else 0
    return result

Let's record what we observed

In [12]:
number_of_flips = 30
observed_heads = 21

Now, if we could set up this experiment and repeat it over many simulated trials, this is what we'd expect to see.

In [13]:
trials = 1000000

count = 0
for i in range(trials):
    trial_heads = 0
    for flip in range(number_of_flips):
        trial_heads += coin_flip()

    # how many times would we have seen something like what we observed (or even more extreme)    
    if trial_heads >= observed_heads:
        count += 1

p = count/trials
print(f"p-value: {p}")

# Arbitrary p-value threshold interpretation:
if p < 0.05:
    print("The coin is likely biased. Null Hypothesis rejected.")
else:
    print("The Null Hypothesis couldn't be rejected.")

p-value: 0.021361
The coin is likely biased. Null Hypothesis rejected.


In general, ANALYTICALLY calculating the sampling distribution is hard. But SIMULATING the sampling distribution is easy.

### Resampling Methods

Hypothiesis testing withOUT a generative model

- Permutation Testing
- Jackknife
- Bootstrap
- Cross Validation

### Bootstrap 

### Cross Validation