# Assignment 11

In this assignment we'll examine a data set called `failure_counts.csv`. 

## Instructions

Please complete this Jupyter notebook and **don't** convert it to a `.py` file. Upload this notebook, along with any `.stan` files and any data sets as a `zip` file to Gradescope. Your work will be manually graded by our TA. 

Protip: if you write your `.stan` file generally enough, it will work with most of the models below, and you won't need to keep recompiling the model object!


In [3]:
import pandas as pd
import numpy as np
import os
from cmdstanpy import CmdStanModel
import matplotlib.pyplot as plt

## Description of The Data Set

We are interested in estimating the probability of failure of a new type of rocket. We have little to no data on this particular type of rocket (it's new, and rocket launches are expensive). However, the data we *do* have is on *related* rockets. 

This dataset is taken from a Github repository belonging to Alexandre Bouchard-Côté. Each row is a separate type of rocket. For each of those rockets, the data shows how many times that rocket has been launched, as well as how many of those launches have been considered "failures."

Here are the first few rows.



In [28]:
rockets = pd.read_csv("failure_counts.csv")
rockets.head()

Unnamed: 0,LV.Type,numberOfLaunches,numberOfFailures
0,Aerobee,1,0
1,Angara A5,1,0
2,Antares 110,2,0
3,Antares 120,2,0
4,Antares 130,1,1


## Problem 1

Please answer the following general questions. Try to place each subquestion answer in a separate cell to keep everything organized. 

1. Why are the over simplistic and the overly flexible models inappropriate for this data set? Explain why hierarchical models are an essential and important tool to estimate the quantity of interest.
2. Write out a model to conduct inference on this data set. Describe all parts of the complete-data likelihood. Use mathematical notation. Describe how you chose your priors.


Estimate the parameters of your hierarchical model. Attach your modified `.stan` file to your submission so that we may run it when graded your work.

Please be sure to address the following questions about $\theta$ inference:

3. Are the convergence diagnostics satisfactory?
4. Report point and interval estimates for global/top level parameter estimates and *interpret* these parameter estimates. Describe why they make sense.
5. Provide histograms and scatterplots for the posterior $\theta$. Describe why the relationship between all elements of $\theta$ "makes sense."
6. Use a `generated quantities` block to come up with a histogram for the new (completely unobserved) rocket's probability of failure on its first launch. What are the chances that this probability is greater than $.01$?


Regarding the $z$ inference.

7. Provide a plot that shows how related each rocket's failure probability is.
8. Explain why inference on $z$ is not your primary goal.
9. Explain a real-world situation when inference on $z$ *could* be your primary goal. 

Finally:

10. Describe one weakness of the model's assumptions. 
11. How do you think a frequentist would approach this problem differently? 




## Hints:

For problem 6, we assume $y_j \mid z_j \sim \text{Binomial}(n_j, z_j)$ for all $j$. *This is true even for $j$s that are from out of sample data.* Let's call the out of sample new rocket $j=368$ because there are $367$ rows in the in sample data.

Now, be careful what you condition on. If you want to be a full Bayesian **you're only allowed to condition on what you know: the observed data.**

By the rules of probability and the assumptions of our model:

$$
p( \text{new rocket fails} \mid y_{1:367}) = \iint p(y_{368} = 1 \mid z_{368}) p(z_{368} \mid \alpha, \beta) \pi(\alpha, \beta \mid y_{1:367}) \text{d}z_{368} \text{d}\theta = \iint z_{368} \text{Beta}(\alpha, \beta)\pi(\alpha, \beta \mid y_{1:367}) \text{d}z_{368} \text{d}\theta 
$$

**Therefore, for each $\alpha, \beta$ sample generated by your MCMC algorithm, use those parameters to sample one $z_{368}$ from a $\text{Beta}(\alpha, \beta)$ distribution. All of those samples of the quantity $z_{368}$ can be plotted as a histogram, or used to estimate the center of the histogram, or tail probabilities of the histogram.**

The reasoning proving why this "works" is because that expression above is a theoretical average (or expectation). *Sample* averages converge to *theoretical* averages by the law of large numbers! 
