A quick intro to Google Colab - we'll be using this to run jupyter notebooks in class

In [None]:
print("Hello world!")

See detailed introductions at: 
https://colab.research.google.com/

Some of the most useful features for our class: 
- linear algebra
    - Recommended reference: https://cheatsheets.quantecon.org/
- data management
    - pandas 
- plotting 
    - Recommended libraries: Seaborn, matplotlib
- statistics packages 
    - scipy.stats: test statistics, reference distributions
    - statsmodels: classical statistics, e.g. regression, ANOVA
    - scikit-learn: statistical learning software, for clustering, classification, regression, etc 

# Exercise: Exponential Hypothesis Testing
A firm produces engine parts that wear out over time. 
The lifetime of an engine part can be modeled by an exponential distribution; that is, for average lifetime of the part $\lambda$, the probability density of lifetime $x$ is: $f(x; \lambda) = \begin{cases} \lambda e^{- \lambda x} \qquad & x \geq 0 \\ 0 & \text{ else} \end{cases}$. 
The firm has developed a new model, has conducted tests to find the lifetimes of the new parts, and wants you to determine if the new part is a significant improvement. 

## Part 1:
First, find the maximum likelihood estimator for the parameter $\lambda$. Use this parameter to find the MLE for distributions of the files `old_part.csv` and `new_part.csv`. 

In [None]:
import numpy as np
import pandas as pd
path_base = 'https://raw.githubusercontent.com/maxoboe/6419_recitations/main/R1/'
X_old_part = pd.read_csv(path_base + 'old_part.csv').values.reshape(-1,)
X_new_part = pd.read_csv(path_base + 'new_part.csv').values.reshape(-1,)

### Part 2: 
Write down a null hypothesis and alternate hypothesis for the question of whether the new part lasts longer than the old part.

$H_0$: 

$H_A$:

### Part 3: 
Evaluate a likelihood ratio test to evaluate the specified null hypothesis. Find the value of the test, and the appropriate parameter for the corresponding $\chi^2$ distribution.

A helper function is provided that finds the likelihood of a vector of data given a parameter guess.

In [None]:
def likelihood(X, param):
    def indiv_likelihood(x):
        if x < 0: return 0
        return param * np.exp(-param * x)
    return np.prod([indiv_likelihood(x) for x in X])


### Part 4:
At a significance level of $\alpha = 0.05$, do you reject the null hypothesis? 
(assume the sample size is large enough that you can apply Wilks' theorem)
Hint: use the package `scipy.stats` and and refer to [this link](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html) to find the CDF of the chi squared distribution. 

Alternately, use an online lookup table like [this](https://www.socscistatistics.com/pvalues/chidistribution.aspx).

In [None]:
from scipy.stats import chi2

### Part 5: 
Now consider some alternate tests. For each, find the desired test statistic and discuss the result. 

Hint: use the package `scipy.stats` and and refer to [this link](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests) for reference. 

1. Assuming that durations are normally distributed, evaluate a test for the hypothesis that the two distributions have the same mean. 
2. Without making any distributional assumptions, test the hypothesis that the two distributions have the same mean.
3. Test the hypothesis that the two distributions are the same.