<a href="https://colab.research.google.com/github/francji1/01ZLMA/blob/main/code/01ZLMA_ex03_GLM_statistical_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 01ZLMA - Exercise 03
Exercise 03 of the course 01ZLMA.

## Contents

* Statistical Inference
 ---
* Testing
 ---




#  Necessary theory recap from Lecture 04

Under the conditions of regularity holds

1.  $ \ U(\beta) \sim N_{p}(0,I(\beta)) \Rightarrow  I^{-\frac{1}{2}}(\beta)\, U(\beta) {\stackrel{D}{\longrightarrow}} N_{p}(0, 1)$
2. $ U(\beta)I^{-1}(\beta)U(\beta)\sim \chi^{2}(p) \Rightarrow U(\beta)^T I^{-1}(\beta)U(\beta)  {\stackrel{D}{\longrightarrow}} \chi^{2}(p)$
3. Consistency of $\hat{\beta}$ and Wald statistics: \\
 $\hat{\beta}\sim N_{p}(\beta,I^{-1}(\beta)) \Rightarrow
(\hat{\beta}-\beta)^T I(\beta)(\hat{\beta}-\beta) {\stackrel{D}{\longrightarrow}} \chi^{2}(p)$



Saturated and null model

* Null model: $\mu_i = \mu, \forall i \in \{1, \ldots , n\}$ \\
The Null Model assumes one parameter for all of the data points, which means you only estimate 1 parameter.
* Saturated model: $Y_i = \hat{\mu_i}, \forall i \in \{1, \ldots , n\}$ \\
The Saturated Model is a model that assumes each data point has its own parameters, which means you have n parameters to estimate.
* Proposed Model:  model, where you try to explain your data points with $p$ parameters + an intercept term, so you have p+1 parameters, where $1 \leq p \leq n$.

Questions:
* What is the difference between null and saturated model?
* Which model has greater log-likelihoood value?
* Which model has the highest log-likelihood value?
* What can you say about asymptotic distributions of $\hat{\beta}$ and $U(\hat{\beta})$ for saturated model?

# Download custom library from GitHub
(using `wget` library)

In [None]:
# Please note that this cell works may not work in other env-s that Google Colab
!pip install wget
import wget
url = "https://github.com/francji1/01ZLMA/raw/main/code/helpers.py"
wget.download(url, '../content/helpers.py')  # path where Colab can find libraries

## Let's code

In [None]:
import numpy as np
import scipy
from scipy import stats

import statsmodels.api as sm
import statsmodels.formula.api as smf
import sklearn

import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.graphics.api import abline_plot

import pandas as pd

from helpers import DiagnosticPlots, Anova

sns.set_theme()

## Exercise 1: IWLS for Poisson Regression from the last week

Generate synthetic data from a Poisson generalized linear model (GLM) using a canonical **log link**.

- Set number of observations: $N = n*m=40$, $n = 1,\ldots,20$ , $m=2$
- Design matrix:
$$
X = \begin{bmatrix}
1 & \log(x_{2,1}) \\
1 & \log(x_{2,2}) \\
\vdots & \vdots \\
1 & \log(x_{2,n})
\end{bmatrix},\quad x_{2,i}=i
$$
- Choose regression coefficients:
$$
\beta = \begin{bmatrix} 0.5 \\ 1.2 \end{bmatrix}
$$

Generate response variable $Y$ from:
$$
\lambda_i = e^{X_i\beta}, \quad Y_i \sim \text{Poisson}(\lambda_i)
$$


- Manually implement the IWLS algorithm in Python (or R):

  - Derive and Compute weights ($W$).
  - Derive and Calculate adjusted response ($Z$).
  - Derive and Write IWLS
  - Update regression coefficients iteratively until convergence.
  - Compare your IWLS estimates with a standard GLM package
  -  discuss convergence, correctness, and interpretability of the results
---



---

###  Theoretical Derivation

Explicitly derive step-by-step:

**Step 1: Model and Link Function**
$$
Y_i \sim \text{Poisson}(\lambda_i), \quad i = 1,2,\dots,n \times m
$$


- Canonical link function:
$$
g(\lambda_i) = \log(\lambda_i) = X_i\beta = \beta_0 + \beta_1 \log(x_{2,i})
$$

 - The mean parameter:

$$
\lambda_i =  e^{X_i\beta} = e^{\beta_0}(x_{2,i})^{\beta_1}
$$


**Step 2: Log-Likelihood**
- The Poisson probability mass function for each observation $Y_i$:

$$
f(Y_i|\lambda_i) = \frac{\lambda_i^{Y_i} e^{-\lambda_i}}{Y_i!},\quad i=1,2,\dots,n\times m
$$
- Log-Likelihood
$$
\ell(\beta) = \sum_{i=1}^{n\times m} \left[ Y_i(X_i\beta) - e^{X_i\beta} - \log(Y_i!) \right]
$$

**Step 3: Score Function**

$$
U(\beta) = \frac{\partial \ell(\beta)}{\partial \beta} = X^T(Y - \mu), \quad \mu = e^{X\beta}
$$

**Step 4: Fisher Information Matrix**

$$
I(\beta) = X^T W X, \quad W = \text{diag}(\mu)
$$

**Step 5: IWLS Update Equations**

$$
\beta^{(t+1)} = \beta^{(t)} + \left[X^T W X\right]^{-1} X^T W (Z - X\beta^{(t)})
$$
with the adjusted response:
$$
Z = X\beta^{(t)} + \frac{Y - \mu}{\mu}
$$

Each iteration involves solving the weighted least squares equation:
$$
(X^T W X)\beta^{(t+1)} = X^T W Z
$$

---


### Solution
Solution based on Example 2 from the last Exercise 02

In [None]:
n  = 20 # n observations
m  = 2 # m parameters to estimate
X1 = np.ones((n*m,))  # Intercept
X2 = np.array([i for i in range(1, n+1)] * m) # Regressors
X = np.vstack([X1, np.log(X2)]).T # design matrix
beta = np.array([0.9, 1.3]) # Regression coefficients
lamdas = np.exp(X @ beta) # Means
Y = np.random.poisson(lamdas, n*m) # Response variable with Poisson distribution

d = pd.DataFrame(data={'Y': Y, 'X1': X1, 'X2':X2})
d.head()

In [None]:
model = smf.glm(formula='Y~np.log(X2)', data=d, family=sm.families.Poisson()).fit()
print(model.summary())


In [None]:
# standard api requires specifying endog (response) and exog (explanatory) design matrices
model = sm.GLM(endog=Y, exog=X, family=sm.families.Poisson()).fit()
print(model.summary())


In [None]:
# Plot data
beta_e = model.params; print(f'estimated params are:{beta_e}')
y_hat = model.predict(); print(f'fitted values are:{y_hat}')

fig, ax = plt.subplots()
ax.scatter(X2, Y, color='red', marker='o')
ax.plot(np.unique(y_hat), color='blue')
ax.set_title('Poisson model')
ax.set_xlabel('Time Index')
ax.set_ylabel('Number of cases')
plt.show()

Repetition using custom function:

In [None]:
# function to calcualate weights W
def calc_W_inv(X, beta):
    return np.diag(np.exp(X @ beta))

In [None]:
# function to calcualate weights Z
def calc_Z(X,Y,beta):
    return X@beta + (Y - np.exp(X@beta)) / np.exp(X@beta)

In [None]:
# IWLS for example 2

def IWLS(X,Y,beta_init,maxiter,epsilon):
    res = {'FM': None, 'SV': None, 'betas': None}
    # Fisher-scoring algorithm
    i = 1     # first iteration

    beta_i = beta_init

    while i <= maxiter:
        W = calc_W_inv(X,beta_i)
        Z = calc_Z(X,Y,beta_i)
        beta_pred = beta_i
        beta_i = np.linalg.solve(X.T@W@X, X.T@W@Z)
        diff = np.max(np.abs(beta_i - beta_pred))
        if diff < epsilon:
            break
        W = calc_W_inv(X, beta_i)
        Z = calc_Z(X, Y, beta_i)

        res['SV'] = X.T@W@Z
        res['FM'] = X.T@W@X
        res['betas'] = np.linalg.solve(X.T@W@X, X.T@W@Z)
        i += 1
    return res

In [None]:
# Estimation of betas
result1 = IWLS(X,Y,np.ones(2),100,10^(-6))
print(f'Estimation of parameters: {result1["betas"]}')                # Estimation of parameters
print(f'Estimated Fisher information matrix: {result1["FM"]}')        # Estimated Fisher information matrix
print(f'Estimated covariance matrix: {np.linalg.inv(result1["FM"])}') # Estimated covariance matrix  = Inverse of estimated Fisher information matrix


Comparison of our custom solution with the built in glm function:

In [None]:
print(model.summary())
FIM1 = model.cov_params()
print(f'estimated covariance matrix {FIM1}')

In [None]:
# to find out what params has `model` object
for attr in dir(model):
    if not attr.startswith('_'):
        print(attr)

Asymptotics:

* $ (\hat{\beta} - \beta) \sim N_{p}(0, I^{-1}(\beta))$
* Estimated Fisher information matrix  $\hat{I}(\hat{\beta}) = (X^T \hat{W} X)$  matrix.
*  Estimated covariance matrix $\hat{V} (\hat{\beta}) = (X^T \hat{W} X)^{-1}$

In [None]:
n = 10
repet = 50
n_observ = np.array([1,2,5,10,100, 500])
betas_hat = np.zeros((6, repet, 2))

for _, i in enumerate(n_observ):
    for j in range(repet):
        X1 = np.ones((n*i,))
        X2 = np.array([i for i in range(1, n+1)]*i)
        X  = np.vstack([X1, np.log(X2)]).T
        beta = np.array([0.9, 1.3]) # Regression coefficients
        lamdas = np.exp(X @ beta) # Means
        Y = np.random.poisson(lamdas, n*i)
        betas_hat[_, j] = sm.GLM(endog=Y, exog=X, family=sm.families.Poisson()).fit().params


In [None]:

for i in range(len(n_observ)):
    print(f"Number of observations: {n_observ[i]*n}")
    print(np.cov((betas_hat[i] - beta).T))
    print(np.mean(betas_hat[i] - beta))

## Hypothesis testing

Use the model from the beginning again.

In [None]:
# Data generation
#np.random.seed(0)
#n = 30
#X1 = np.ones(n)
#X2 = np.log(np.arange(1, n + 1))
#X = np.column_stack([X1, X2])
#beta = np.array([0.5, 1.2])
#mu = np.exp(X @ beta)
#Y = np.random.poisson(mu)


In [None]:
# Data generation
n  = 20
m  = 2

X1 = np.ones((n*m,))
X2 = np.array([i for i in range(1, n+1)]*m)
X  = np.vstack([X1, np.log(X2)]).T
mu = np.exp(X @ beta)
beta = np.array([0.9, 1.3]) # Regression coefficients
lamdas = np.exp(X @ beta) # Means
Y = np.random.poisson(lamdas, n*m)


In [None]:
from scipy.stats import chi2


# Fit full and reduced models
model_full = sm.GLM(Y, X, family=sm.families.Poisson()).fit()
X_reduced = X[:, [0]]
model_reduced = sm.GLM(Y, X_reduced, family=sm.families.Poisson()).fit()

# Wald test (individual parameters)
wald_stats = (model_full.params / model_full.bse)**2
wald_pvalues = chi2.sf(wald_stats, 1)

# Joint Wald test
wald_stat_joint = model_full.params.T @ np.linalg.inv(model_full.cov_params()) @ model_full.params
wald_pvalue_joint = chi2.sf(wald_stat_joint, len(model_full.params))

# Score test (omit X2 manually)
mu_reduced = model_reduced.mu
score_vector = X[:, 1].T @ (Y - mu_reduced)
I_reduced = (X[:, 1]**2 * mu_reduced).sum()
score_stat = (score_vector**2) / I_reduced
score_pvalue = chi2.sf(score_stat, 1)

# Deviances and LRT (manual)
deviance_full = model_full.deviance
deviance_reduced = model_reduced.deviance
lrt_stat = deviance_reduced - deviance_full
lrt_df = int(model_reduced.df_model - model_full.df_model)
lrt_pvalue = chi2.sf(lrt_stat, abs(lrt_df))

# Built-in tests
wald_test_sm = model_full.wald_test(np.eye(len(beta)), scalar=True)
score_test_sm = model_reduced.score_test(X[:, [1]])

# --- Results ---
print("Manual Computations:\n")
print(f"Wald test Intercept: Statistic = {wald_stats[0]:.4f}, p-value = {wald_pvalues[0]:.4f}")
print(f"Wald test X2: Statistic = {wald_stats[1]:.4f}, p-value = {wald_pvalues[1]:.4f}")

print(f"\nJoint Wald test: Statistic = {wald_stat_joint:.4f}, p-value = {wald_pvalue_joint:.4f}")

print(f"\nScore test (omit X2): Statistic = {score_stat:.4f}, p-value = {score_pvalue:.4f}")

print(f"\nDeviance (Full): {deviance_full:.4f}")
print(f"Deviance (Reduced): {deviance_reduced:.4f}")

print(f"\nLikelihood Ratio Test (omit X2): Statistic = {lrt_stat:.4f}, p-value = {lrt_pvalue:.4f}, df = {abs(lrt_df)}")

print("\nBuilt-in Tests from Statsmodels:\n")
print(f"Wald test (built-in): Statistic = {wald_test_sm.statistic:.4f}, p-value = {wald_test_sm.pvalue:.4f}")

# Correct handling of score_test_sm output
score_stat_sm_scalar = score_test_sm[0][0][0] if score_test_sm[0].size > 1 else score_test_sm[0].item()
score_pvalue_sm_scalar = score_test_sm[1].item()

print(f"\nScore test (built-in): Statistic = {score_stat_sm_scalar:.4f}, p-value = {score_pvalue_sm_scalar:.4f}")


In [None]:
import numpy as np
import statsmodels.api as sm
from scipy.special import gammaln

# Log-likelihood manually (matching model.llf)
loglike_manual = np.sum(Y * np.log(model.mu) - model.mu - gammaln(Y + 1))
print("Manual Log-Likelihood:", loglike_manual)
print("Built-in Log-Likelihood:", model.llf)

# Pearson chi2 manually
pearson_chi2 = np.sum(((Y - model.mu)**2) / model.mu)
print("Manual Pearson chi2:", pearson_chi2)
print("Built-in Pearson chi2:", model.pearson_chi2)




In [None]:
model = sm.GLM(endog=Y, exog=X, family=sm.families.Poisson()).fit()
print(model.summary())

# the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.
FIM1 = model.cov_params()
print(f'estimated covariance matrix {FIM1}')


**Calculation** of Z value
 $$Z_i = \frac{\hat{\beta_i}}{(I^{-1}(\hat{\beta_i}))_{ii}}$$

In [None]:
# Testing statistics from summary table
print(model.summary())



In [None]:
# By definition

z_stat = model.params / np.sqrt(np.diag(model.cov_params()))
print(z_stat)
z_stat == model.tvalues

In [None]:
# p-values of the test
p_val = 2*scipy.stats.norm.sf(z_stat, loc=0, scale=1)
print(f'pval: {p_val}')
p_val == model.pvalues

In [None]:
### 100(1-alpha) confidence interval
alpha = 0.05
u = scipy.stats.norm.ppf(1-alpha/2,0,1)
CI_LB = model.params[1] - u * np.sqrt(np.diag(model.cov_params())[1])
CI_UB = model.params[1] + u * np.sqrt(np.diag(model.cov_params())[1])

print(f"2.5% CI = {CI_LB},ESTIM = {model.params[1]}, 97.5% CI = {CI_UB}")


# built in function
print(model.conf_int())

Question:

* Compare hypothesis testing in LM vs. GLM

## Comparison of inference in Linear Regression (LR) and Generalized Linear Models (GLM)

### Dimension notation:

- $n$: number of observations (rows in dataset)
- $p$: number of parameters (including intercept, columns in $X$)
- $q$: number of parameters tested simultaneously in multiple-parameter tests (difference between full and reduced model parameters)

---

### Hypothesis tests summary table

| Model Type       | Test type                      | Distribution         | Name(s) used       |
|------------------|--------------------------------|----------------------|--------------------|
| **Linear (OLS)** | Individual parameter           | $t_{n-p}$            | t-test             |
| **Linear (OLS)** | Multiple parameters            | $F_{q,n-p}$          | F-test (ANOVA)     |
| **GLM (MLE)**    | Individual parameter           | $N(0,1)$             | Wald test, z-test  |
| **GLM (MLE)**    | Multiple parameters (Wald)     | $\chi^2_q$           | Wald test          |
| **GLM (MLE)**    | Likelihood Ratio Test (nested) | $\chi^2_q$           | LRT, Chi-squared   |
| **GLM (MLE)**    | Score (Rao) test               | $\chi^2_q$           | Score test         |

---



## Comparison of inference in Linear Regression (LR) and Generalized Linear Models (GLM)

| Aspect                      | Linear Regression (LR)                                 | Generalized Linear Models (GLM)                           |
|-----------------------------|--------------------------------------------------------|------------------------------------------------------------|
| **Model specification**     | $$Y = X\beta + \varepsilon,\quad \varepsilon \sim N(0,\sigma^2 I)$$ | $$g(\mu) = X\beta,\quad Y\sim\text{Exponential family}$$  |
| **Estimator type**          | OLS (also MLE under normality)                         | MLE                                                        |
| **Variance estimation**     | Explicitly estimated: $$\hat{\sigma}^2 = \frac{\|Y - X\hat{\beta}\|^2}{n-p}$$ | Implicitly determined by mean-variance relationship (no separate parameter) |
| **Estimator distribution**  | Exact finite-sample: $$\frac{\hat{\beta}_j - \beta_j}{SE(\hat{\beta}_j)} \sim t_{n-p}$$ | Asymptotic (large-sample): $$\frac{\hat{\beta}_j - \beta_j}{SE(\hat{\beta}_j)} \xrightarrow{d} N(0,1)$$ |
| **Test for single parameter** | $$t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)} \sim t_{n-p}$$ | Wald test (z-test): $$Z = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}\sim N(0,1)$$ |
| **Test for multiple parameters** | F-test: $$F = \frac{(RSS_0 - RSS_1)/q}{RSS_1/(n-p)}\sim F_{q,n-p}$$ | Likelihood Ratio or Wald test: $$\chi^2 = 2(l_{\text{full}}-l_{\text{reduced}})\sim\chi^2_q$$ |
| **Small sample inference**  | Exact (t and F-distributions)                          | Approximate (not exact, relies on large-sample assumptions) |
| **Large sample inference**  | t-distribution converges to normal (z-test)            | Normal (z-test, Wald test) approximation                   |

                              |

---



# Inference and Hypothesis Tests in GLM (statsmodels)

GLM inference typically relies on Maximum Likelihood Estimation (MLE) using iteratively reweighted least squares (IRLS).


### Wald Test
- **Formula:**
  $$
  W = (\hat{\beta} - \beta_0)^T [I(\hat{\beta_0})]^{-1} (\hat{\beta} - \beta_0)
  $$
- Evaluated using the covariance matrix (Hessian-based).
- Assumes large-sample normality of parameter estimates.

### Likelihood Ratio Test (LRT)
- **Formula:**
  $$
  LR = 2(l_{full} - l_{reduced})
  $$
- Compares log-likelihoods of nested models.
- More powerful if correctly specified; asymptotically follows $\chi^2$ distribution.

### Score (Rao) Test
- **Formula:**
  $$
  R = s(\hat{\beta_0})^T [I(\hat{\beta_0})]^{-1}s(\hat{\beta_0})
  $$
  - Where $s$ is the gradient (score function), and $I$ is Fisher information.
- Only requires the restricted model (does not fit the full model).
- Computationally efficient for large models.

## Deviance
- **Definition:**
  $$
  D = 2(l_{sat} - l_{model})
  $$
  - Measures discrepancy between fitted model and the saturated (fully parameterized) model.
- Used primarily to assess overall model fit, smaller deviance indicates better fit.



# Deviance

Deviance is a measure of goodness of fit of a GLM.


Log-likelihood of the saturated model is the highest possible one with given data, $\tilde{\mu}_i = y_i$ and $\tilde{\theta_i} = \theta(y) = (b')^{-1}(y_i)$.
$$l(\tilde{\mu},\phi;y)=\sum_{i=1}^{n}\frac{y_{i}\tilde{\theta}_{i}-b(\tilde{\theta}_{i})}{a_{i}(\phi)}+\sum_{i=1}^{n}c(y_i,\phi)$$

Scale deviance statistics:
$${S(y,\hat{\mu},\phi)}=2\left[l(\tilde{\mu},\phi;y)-l(\hat{\mu},\phi;y)\right]
=2\sum_{i=1}^{n}\frac{y_{i}(\tilde{\theta}_{i}-\hat{\theta}_{i})
-\left(b(\tilde{\theta}_{i})-b(\hat{\theta}_{i})\right)}{a_{i}(\phi)}.
$$

Deviance:
Let $a_{i}(\phi)=a_{i}\phi$, then
$$S(y,\hat{\mu},\phi)=\frac{D(y,\hat{\mu})}{\phi},
$$
and
$$
D(y,\hat{\mu})=2\sum_{i=1}^{n}\frac{y_{i}(\tilde{\theta}_{i}-\hat{\theta}_{i})
-\left(b(\tilde{\theta}_{i})-b(\hat{\theta}_{i})\right)}{a_{i}}
$$

### Comparison of two models

Assume model $D_0$ with $p_0$ paramters and its sub-model $D_1$ with $p_1$ parameters, then
$$ \frac{1}{\phi} (D_0 - D_1) \sim \chi_{(p_0 - p_1)} $$

Question:
* Can we take deviance as a measure of the model quality?
* Can we use deviance as a measure of the saturated model quality?
* Complete the sentence: Compare two GLMs with deviance is like compare two LMs with ...

In [None]:
# Add random variable to the previous model
Z = scipy.stats.uniform.rvs(loc=0, scale=1, size=n*m)
model_0 = sm.GLM(endog=Y, exog=np.hstack([X, Z[:, None]]), family=sm.families.Poisson()).fit()
print(model_0.summary())

In [None]:
# Proposed model
m1 = sm.GLM(endog=Y, exog=X, family=sm.families.Poisson())
model_1 = m1.fit()
print(model_1.summary())

In [None]:
# Null model

model_n = sm.GLM(endog=Y, exog=X[:, 0], family=sm.families.Poisson()).fit()
print(model_n.summary())

In [None]:
# Saturated model CANNOT BY OBTAINED BY STATSMODELS BY DEFAULT 'CAUSE THEY PREVENT ZERO DIVISION

I = np.diag(np.ones((m*n,)))

model_s = sm.GLM(endog=Y, exog=I, family=sm.families.Poisson()).fit()
print(model_s.summary())
print(f'Residual deviance is: {model_s.deviance}')

For Poisson model:
$$D = 2 \sum_{i=1}^n y_i log( \frac{y_i}{\hat{\mu_i}})$$

In [None]:
mu_est_0 = model_0.predict()
mu_est_1 = model_1.predict()

Dev_0 = 2*np.sum(Y*np.log(Y/mu_est_0))
print(Dev_0)
Dev_1 = 2*np.sum(Y*np.log(Y/mu_est_1))
print(Dev_1)


In [None]:
anova = Anova()
anova(model_1)

## Anova testing


In [None]:
display(anova(model_1))
display(anova(model_1, test = "Cp"))
display(anova(model_1, test = "Chisq"))

display(anova(model_1, model_0, test = "Rao"))
print(anova(model_1, model_0, test = "LRT"))


In [None]:
# p-value of deviance tst
# H0: model fit data
p_dev = scipy.stats.chi2.sf(model_1.deviance, df=model_1.df_resid)

print(p_dev)

# critical value
C_val = scipy.stats.chi2.isf(0.05, model_1.df_resid)
print(C_val)

print(model_1.summary())

display(anova(model_1,model_s, test = "LRT"))   # saturated vs. final model



#### Rao score statistics (for Poisson GLM)


In [None]:
rao = np.sum((Y-model_1.predict())**2/model_1.predict())

print(f'rao score statistic: {rao}')
print(f'p-val of rao test: {scipy.stats.chi2.sf(rao, df=model_1.df_resid)}')

######  By saturated model

anova(model_1,model_s, test = "Rao")

# Your turn:
1. Generate data with followings parameters
 * $Y \sim Poi(\mu_i)$, where $E[Y_i] = \mu_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} = x_i^T \beta \  \Rightarrow \ q(\mu_i) = \mu_i =  x_i^T \beta  = \eta_i$
* $X_{i1} \sim N(50,10)$
* $X_{i2} \sim U(10,60)$
* $X_{i3} \sim Ber(0.45)$
* $n = 40$
2. Compute $\hat{\mu_i}$  for saturated, null,"full","best" models.
3. Compute Deviance, Rao, Wald statistics for your model and compare final model with the saturated and "full" ones.
4. Generate 100x data for  $n \in \{20,40,60,80,100 \}$ and plot $(\hat{\beta_i} -\beta_i)$ vs. $(n)$
