# Exercises

## Problem 1: Heart Attacks in Rabbits

Here, we will study whether the damage of heart attacks in rabbits can be mitigated by cooling down the heart.

In an [experiment](https://link.springer.com/content/pdf/10.1007/BF00788947), researchers induced artery occlusion in $n=32$ rabbits and studied whether the damage in the heart is mitigated by cooling the heart prior to (group 1) or shortly after (group 2) inducing the heart attack compared to a control group (group 3) without any cooling.

We’ll assume the following linear model,
$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \varepsilon$$
where
- $y$ (“Infarc”): size of the damaged area in grams
- $x_1$ (“Area”): size of the area at risk in grams
- $x_2$ (“X2”): 1 if rabbit is in group 1, 0 otherwise
- $x_3$ (“X3”): 1 if rabbit is in group 2, 0 otherwise

Note: $x_2$ and $x_3$ are categorical variables. Group 3 is the control group and has $x_2 = x_3 = 0$.

(a) Make a scatter plot of the size of the damaged area vs. the area at risk. Plot the $3$ different groups in different colors and make sure to include a legend.

(b) Write down the simplified equations for the expected damaged area for the 3 groups, i.e., plug in the values for $x_2$ and $x_3$ for these groups. What is the physical interpretation of $\beta_2$ and $\beta_3$?

(c) Create the design matrix of the linear equation described above.

(d) Determine the best-fit predictions using `sklearn.linear_model.LinearRegression`.

(e) Add the best-fit predictions for each of the 3 groups onto the scatter plot from exercise (a).

(f) Do the results suggest that cooling down the heart mitigates heart attack damage?

(This exercise was partially inspired by https://online.stat.psu.edu/stat462/node/134/.)

In [19]:
# download and read the data
import urllib.request
import numpy as np
import pandas as pd
from urllib.request import urlretrieve

def design_matrix(damaged, risk):
    damaged = list(damaged)
    risk = list(risk)
    X = np.empty((len(damaged), 2))
    for row in range(len(X)):
        for col in range(len(X[0])):
            X[row][col] = float(damaged[row]) / float(risk[col])
    print(X)
    beta = np.linalg.inv(X.T @ X) @ X.T @ y
    return X @ beta

urlretrieve("https://online.stat.psu.edu/stat462/sites/onlinecourses.science.psu.edu.stat462/files/data/coolhearts/index.txt", 'coolhearts.txt')
df = pd.read_csv('coolhearts.txt', sep='\t')

group1 = df[df['X2'] == 1]
group2 = df[df['X3'] == 1]

design_matrix(group1["Infarc"], group1["Area"])


[[0.14090909 0.08051948]
 [0.27727273 0.15844156]
 [0.075      0.04285714]
 [0.23181818 0.13246753]
 [0.46818182 0.26753247]
 [0.56590909 0.32337662]
 [0.5        0.28571429]
 [0.67954545 0.38831169]
 [0.79545455 0.45454545]
 [0.79545455 0.45454545]
 [1.33636364 0.76363636]]
