### 2.3 Generalized linear models

Consider a family of densities with respect to a $\sigma$- finite
measure of the form

$$
f(y;\mu,\sigma^{2})=a(\sigma^{2},\,y)\exp\left\{ \frac{1}{\sigma^{2}}\left\{ \theta(\mu)y-k(\theta(\mu))\right\} \right\}
$$

$y\in E,\subseteq\mathbb{R}$ , $\mu\in M\subseteq\mathbb{R}$, $\sigma^{2}\in\Phi\subseteq\left(0,\infty\right)$

where $a(\sigma^{2},y)$ is a known, positive function. Such a family
is called an *exponential dispersion family*, and $\sigma^{2}$
is called the *dispersion parameter*. 

If Y is of exponential dispersion family form, then its cumulant generating
function (i.e logarithm of the mgf)
$$
K(t;\mu,\sigma^{2})=\frac{1}{\sigma^{2}}\left\{ K\left(\sigma^{2}t+\theta(\mu)\right)-K(\theta(\mu))\right\} 
$$

so 

$$\mathbb{E}_{\mu,\sigma^{2}}\left(y\right)=K'(\theta(\mu))$$
and
$$Var_{\mu,\sigma^{2}}(y)=\sigma^{2}K''(\theta(\mu))\equiv\sigma^{2}V(\mu)$$

(see example sheet)

The function $V(\mu)$ is the **variance function**, and it turns
out that an exponential dispersion family $K(t)=\sum_{r=1}^{\infty}\kappa_{r}\frac{t^{r}}{r!}$
is completely characterized by $\left(V(\mu),\mu,\Phi\right)$. We
may therefore write $Y\sim ED(\mu,\sigma^{2}V(\mu))$, $\mu\in\mathcal{M}$,
$\sigma^{2}\in\Phi$ to mean that $y$ is of exponential dispersion
family form, with mean $\mu$ and variance $\sigma^{2}V(\mu)$.

Examples: $N(\mu,\sigma^{2})$, $Poi(\mu)$, $\frac{1}{n}Binomial(n,\mu)$,
$\Gamma(\upsilon,\varphi)$

A **generalized linear model** (GLM) is a model for independent
responses $Y_{1},...,Y_{n}$ in which:

i) $Y_{i}\sim ED\left(\mu_{i},\sigma_{i}^{2}V(\mu_{i})\right),$ $\mu_{i}\in\mathcal{M}$,
where $\sigma_{i}^{2}=\sigma^{2}a_{i}$, where $\sigma^{2}$ is an
unknown dispersion parameter and $a_{1},..,a_{n}$ are known constants.
Thus each $Y_{i}$ comes from the same exponential dispersion family,
and $\mathbb{E}Y_{i}=\mu_{i}$. When $Y_{i}\sim\frac{1}{n_{i}}Bin(n_{i},\mu_{i})$,
we have $\sigma_{i}^{2}=\frac{1}{n_{i}}$, so we can take $a_{i}=\frac{1}{n_{i}}$,
and $\sigma^{2}=1$ (Challenger shuttle, tested the O-rings but at the wrong
temperature, so $n$ has to depend on $i$)

ii) The $i$-th component of the **linear predictor** $\eta_{i}=x_{i}^{T}\beta$
and $\mu_{i}$ are related through $g(\mu_{i})=\eta_{i},\,i=1,...,n$,
where $g$ is a strictly increasing twice differentiable function
called the **link function**. Here, $x_{i}^{T}=\left(x_{i1},...,x_{ip}\right)$
is a vector of known explanatory variables, and $\beta=\left(\beta_{1},...,\beta_{p}\right)^{T}$
is an unknown vector of regression coefficients. 

The choice $g(\mu)=\theta(\mu)$ is called the
**cannonical link function**, and it simplifieds the calculations
in certain cases. For instance, if $Y=(Y_{1},..,Y_{n})^{T}$ is a
vector of responses from a GLM, then its density is
$$
f_{y}(y;\mu,\sigma^{2})=\left\{ \prod_{j=1}^{n}a(\sigma^{2},\,y_{j})\right\} \exp\left\{ \sum_{i=1}^{n}\frac{\theta(\mu_{i})y_{i}}{\sigma_{i}^{2}}-\sum_{i=1}^{n}\frac{K\left(\theta(\mu_{i})\right)}{\sigma_{i}^{2}}\right\} 
$$

where $\mu_{i}=g^{-1}(x_{i}^{T}\beta)$. Thus, in general, there is
no reduction in dimensionality from sufficiency. However, for the
cannonical link function, the density is
$$
f_{y}(y;\beta,\sigma^{2})=\left\{ \prod_{i=1}^{n}a(\sigma^{2},\,y_{i})\right\} \exp\left\{ \beta^{T}\sum_{i=1}^{n}\frac{x_{i}y_{i}}{\sigma_{i}^{2}}-\sum_{i=1}^{n}\frac{K\left(x_{i}^{T}\beta\right)}{\sigma_{i}^{2}}\right\} 
$$

from which we see that the vector 
$$
\sum_{i=1}^{n}\frac{x_{i}y_{i}}{a_{i}}=\left(\sum_{i=1}^{n}\frac{x_{i1}y_{i}}{a_{i}},...,\sum_{i=1}^{n}\frac{x_{ip}y_{i}}{a_{i}}\right)^{T}
$$
is sufficient for $\beta$, for each fixed value of $\sigma^{2}$. 

In general, there is no closed-form expression for the MLE $\hat{\beta}$,
but we can use a Newton-Ralphson type algorithm (Fisher scoring is
a slight variant) to find a sequence converging to $\hat{\beta}$.
Moreover, under mild conditions on $X_{1},...,X_{n}$, we can apply
the results of section 2.2 to deduce that $n^{1/2}(\hat{\beta}-\beta)\overset{d}{\to}N_{p}\left(0,\,i^{(1)}(\beta)^{-1}\right)$.
This result can be used to estimate the standard deviation of components
of $\hat{\beta}$, or to test hypotheses about $\beta$. 

Alternatively, tests can be based on the **deviance**, which is
closely related to the likelihood ratio statistic.