In [47]:
import numpy as np
import pandas as pd
import scipy 
import matplotlib.pyplot as plt 
from scipy.stats import norm, binom, uniform
from scipy.special import logsumexp

%run ../tools.py

# Exercise 1 (Statistical Rethinking, McElreath et al, Chapter 5)


In [3]:
data = pd.read_csv('../milk_clean.csv',sep=',')
print(data.columns)

Index(['Unnamed: 0', 'clade', 'species', 'kcal.per.g', 'perc.fat',
       'perc.protein', 'perc.lactose', 'mass', 'neocortex.perc'],
      dtype='object')



**1) Analyze the relationship of milk energy with respect to percentage fat and percentage lactose through two independent linear regressions. Comment.**

**2) What happens if we regress kcal.per.g with respect to both perc.fat and perc.lactose ?**

**3) Can you explain the differences observed between the results of questions 1 and 2 ?**

**4) Study the effect of correlation between predictors**

To answer this question you will create a dummy variable whose correlation with perc.fat varies. Then you will fit many linear regressions (let's say 500) using these two variables and observe the effect on the mean standard deviation of b_fat.

# Exercise 2  (Statistical Rethinking, McElreath et al, Chapter 6)

**1) Using the *Milk* dataset  of the previous exercise, fit four different models for describing the kcal.per.g data :**

- One with both neocortex and log_mass
- One with neocortex
- One with log_mass
- One with no predictor (an intercept only)

**For each of this model, you will compute their WAIC with standard error (SE), dWAIC with standard error (dSE), pWAIC, and the weight criteria. What can you conclude by analyzing these numbers ? You will present the results in a table.**

We denote by neocortex the variable necortex.perc / 100.

dWAIC is the difference between each WAIC and the lowest WAIC among the models.
 
The standard error (SE) of a score S is given by:

$$
SE = \sqrt{Npoints * var(S)}
$$

The weight criteria is the *Akaike weight* which is given for model i by the formula:

$$w_i = \frac{\exp(-0.5dWAIC_i)}{\sum_j \exp(-0.5dWAIC_j)}$$

In [10]:
data['log_mass'] = np.log(data['mass'])
data['neocortex'] = data['neocortex.perc'] / 100

**2) In order to better intepret the results of question 2, analyze the posterior estimates of the slope parameters for the different models. Present the results using box plots.**

# Exercise 3

Consider a random variable $\theta$ following a uniform distribution on [0, 1]. Let's consider a dummy experiment in which two outcomes are possible (success and failure). The experiment is made n times with k successes. The experiments are independent from each other. The probability of success is given by $\theta$. We will denote by y the random variable describing the number of successes. 

**1) Write the data likelihood and the posterior distribution $p(\theta|y)$ (up to constant).**

**2) Estimate the posterior distribution $p(\theta|y)$ using the Laplace approximation. You will derive the computations yourself.**

For any probability density function (pdf) that is smooth and well-peaked around its point of maxima, Laplace proposed to approximate it by a normal pdf. To do so he used a 2-term Taylor expansion trick around the local maxima of the log-pdf. Let denote $g(\theta) = \log p(\theta|y)$ and $\theta_0$ its maximum.

Following the Taylor expansion we can write:
$$
g(\theta) = g(\theta_0) + g'(\theta_0)(\theta - \theta_0) + \frac{1}{2}g''(\theta_0)(\theta - \theta_0)^2.
$$

We know g reaches a local maxima in $\theta_0$, therefore $g'(\theta_0) = 0$ and :

$$
g(\theta) = g(\theta_0) + \frac{1}{2}g''(\theta_0)(\theta - \theta_0)^2.
$$

If we exponentiate this expression we obtain :

$$
\exp (g(\theta)) = p(\theta|y) = \exp(g(\theta_0)) exp(\frac{1}{2}g''(\theta_0)(\theta - \theta_0)^2).
$$

This can be identified to a Gaussian with mean $\theta_0$ and variance $-\frac{1}{g''(\theta_0)}$.

**3) Plot the posterior pdf of $\theta$ obtained with the Laplace approximation and the true posterior.**
