\section{Simple introduction}
\textbf{Generalized linear model, yleistetty lineaarinen malli, }
Framework for modelling response variables that are bounded or discrete. 
\begin{enumerate}
    \item Poisson regression: count data
    \item Logistic regression: binary data
    \item Multinomial logistic regression: categorical data
    \item Ordered logit: ordinal data
\end{enumerate}

Three important assumptions of the general linear model:
\begin{enumerate}
    \item Normality of "error" - model residuals need to be normally distributed (not the raw data)
    \item Homogeneity of variance - model residuals should be equal over the range of predicted response
    \item Linearity - straight line
\end{enumerate}

Used to analyse poisson (counts) and binomial (binary or probability) distributed data.

Two functions for GLM, one each to model mean and variance. Link function relates mean to explanatory variables. Variance function depicts variance of response as function of its means. 

Assumption of GLM: it is not overdispersed or underdispersed!

Dispersion parameter = variance/mean. Acceptable range 0.5 < dispersion parameter < 1.5.

If we have overdispersion, use the quasipoisson model

\begin{minted}[breaklines]{R}
glm(infected ~ species, family=quasipoisson)
\end{minted}

\section{Introduction to GLM}
Assumptions of the standard linear model
\begin{enumerate}
\item The $i$th observation $y_i$ is normally distributed, $y_i \sim N(\mu_i, \sigma_i^2)$
\item There is a common variance $\sigma_i^2 = \sigma^2$ for all $i$.
\item $\textbf{x}_i^T \mathbf{\beta} = \mu_i$    
\end{enumerate}

Theory of GLMs introduced by Nelder and Wedderburn (1972). Generalized the linear regression mdel by assuming that each observation $y_i$ has a distribution in the exponential dispersion family
\[
f(y_i; \theta_i, \phi) = \exp \Bigg( \frac{y_i\theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \Bigg)
\]given functions a, b and c. It can be shown that 
\[
E(y_i) = \mu_i = b'(\theta_i)
\]\[
\textrm{Var}(y_i) = \sigma^2 = b''(\theta_i) a (\phi)
\]

\subsection{Components of a generalized linear model}
GLMs are composed of three components
\begin{enumerate}
\item The random component: The components $y_i$ are independent random  variables with expected value $\mu_i$ and standard deviation $\sigma$ 
\item The systematic component: $v_i = \mathbf{x}_i^T \mathbf{\beta}$
\item The link between the random and systematic part: $v_i = g(\mu_i)$ or $\mu_i = h(v_i)$, where g is so-called link function and $h=g^{-1}$ is the inverse link function. 
\end{enumerate}

Some of the most common link functions
\begin{enumerate}
\item Log link $g(\mu) = \log(\mu)$ (Poisson response)
\item Logit link $g(\mu) = \log \Bigg( \frac{\mu}{1-\mu} \Bigg)$ (Binomial response)
\item Probit link $g(\mu) = \Phi ^{-1} (\mu)$ (Binomial response)
\item Complementary log-log link $g(\mu) = \log(-\log(1-\mu))$ (Binomial response)  
\end{enumerate}

The choice of the link function: the nature of dependence, the measurement scale, computability

\subsubsection{Random component of a GLM}
The random component of a GLM consists of a response variable $y$ with independent observations $(y_1, \dots, y_n)$ having probability density or mass function for a distribution in the exponential family.

\subsubsection{Linear predictor of a GLM}
The linear predictor of a GLM relates parameters $\{\eta_i\}$ pertaining yo $\{E(y_i)\}$ to the explanatory variables $x_1, \dots, x_p$ using a linear combination of them
\[
\eta_i = \sum_{j=1}^p \beta_j x_{ij}, i = 1, \dots, n
\]
\[
\mathbf{\eta} = \mathbf{X \beta}
\]

\subsubsection{Link function of a GLM}
The link function connects the random component with the linear predictor. The GLM links $\eta_i$ to $\mu_i$ by $\eta_i = g(\mu_i)$, where the link function is a monotonic, differentiable function. 

A GLM with identity link function is a linear model

\subsubsection{GLMs for normal, binomial, and Poisson responses}
Model fitting for linear models does not require the normality assumption.

Many response variables are binary. We represent the "success" and "failure" by 1 and 0. A Bernoulli trial for observation $i$ has probabilities $P(y_i = 1) = \pi_i$ and $P(y_i = 0) = 1 - \pi_i$, for which $\mu_i = \pi_i$. GLMS using the logit link have the form:
\[
\log \bigg( \frac{\mu_i}{1 - \mu_i} \bigg) = \sum_{j=1}^p \beta_j x_{ij}, i = 1, \dots, n
\]
They are called logistic regression models or logit models. 

Some response variables have counts as their possible outcomes. The simplest probability distribution for count data is the Poisson.
\[
\log \m_i = \sum_{j=1}^p \beta_j x_{ij}
\]
This called a Poisson loglinear model.

\subsection{Advantages of GLMs versus transforming the data}
It is challenging to find a transformation that provides both approximate normality and constant variance.

GLMS provide a unified theory of modeling that encompasses the most important models for continuous and discrete response variables. 

\subsection{Quantitative/qualitative explanatory variables and interpreting effects}
\subsubsection{Quantitative and qualitative variables in linear predictors}
Explanatory variables in a GLM can be
\begin{enumerate}
    \item Quantitative: simple linear regression
    \item Qualitative factors: ANOVA
    \item Mixed
\end{enumerate}

For example, suppose observation $i$ measures an individual's annual income $y_i$, number of years of job experience $x_{i1}$, and gender $x_{i2}$. Then
\[
\mu_i = \beta_0 + \nbeta_1x_{i1} + \beta_2x_{i2} + \beta_3 x_{i1}x_{i2}
\]
This model corresponds to straight lines $\mu_i = \beta_0 + \beta_1 x_{i1}$ for males and $\mu_i = (\beta_0 + \beta_2) + (\beta_1 + \beta_3)x_{i1}$ for females. 

A qualitative explanatory variable having $c$ categories can be represented by $c-1$ indicator variables and terms in the linear predictor and $c-1$ columns in the model matrix $X$. The R software uses as default the "first-category-baseline" parameterization, which constructs indicators for categories $2, \dots, c$.

\subsubsection{Interval, nominal and ordinal values}
Ordinal explanatory variables can be treated as qualitative by ignoring the ordering and using a set of indicator variables. Alternatively, they can be treated as quantitative by assigning monotone scores to the categories and using a single $\beta x$ term in the linear predictor. 

\subsubsection{Interpreting effects in linear models}
Consider the linear model, $x_{i1}$ the student's number of years of math education, $x_{i2}$ age of the student and $x_{i3}$ mother's number of years of math education.
\[
\mu_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_3 x_{i3}
\]
"The difference between the mean math achievement test score of a subpopulation of students having certain number of years of math education and a subpopulation of having one fewer year equals $\beta_1$, when both subpopulations have the same value for $\beta_2 x_{i2} + \beta_3 x_{i3}$. 

\subsection{Model matrices and model vector spaces}
For the data vector $\mathbf{y}$ with $\mathbf{\mu} = E(\mathbf{y})$, consider the GLM $\mathbf{\eta} = \mathbf{X \beta}$ with link function $g$ and transformed mean values $\mathbf{\eta} = g(\math{\mu})$

\subsubsection{Model matrices induce model vector spaces}
Column space of $\mathbf{X}$, which we denote by $C(\mathbf{X})$

\subsubsection{Dimension of model space equals rank of model matrix}
The rank of matrix $\mathbf{X}$ is the number of vectors in a basis for $C(\mathbf{X})$. $\mathbf{X}$ has full rank when $rank(X) = p$.

Collinearity! Drop the predictor!

\subsection{Identifiability and estimability}
In the 
When the model matrix is not of full rank, $beta$ is not identifiable!

\subsection{Example: using software to fit a GLM}
Crabs data, $y$ the number of satellite males, $C$ female color, $S$ spine condition, $W$ weight and $Wt$ width. 

\begin{minted}[breaklines]{R}
fit.pois <- glm(y ~ 1, family = poisson(link = identity), data=Crabs)
summary(fit.pois)
\end{minted}

Fitting the Poisson distribution with a GLM containing only an intercept and using the identity link function gives an estimated Poisson mean that is the sample mean. However the sample variance of 9.92 suggest that a Poisson assumption is inappropriate for the marginal distribution of $y$.

\subsubsection{Linear model using weight to predict satellite counts}
\begin{minted}[breaklines]{R}
fit.weight <- lm(y ~ weight, data=Crabs)

fit.weight2 <- glm(y ~ weight, family=gaussian(link=identity), data=Crabs)
\end{minted}

\subsubsection{Comparing mean numbers of satellites by crab color}
\begin{minted}[breaklines]{R}
fit.color <- glm(y ~ factor(color)) #normal dist. is default
summary(fit.color)

fit.color2 <- glm(y ~ factor(color), family=poisson(link=identity))
summary(fit.color2)
\end{minted}

The estimates are the same, but the standard error values are much smaller than under the normal assumption. 

\section{Linear models: least squares theory}
Fitting and inference for the lienar model. For $n$ independent observations $\mathbf{y} = (y_1, \dots, y_n)^T$ with $\mu_i = E(y_i)$ and $\mathbf{\mu} = (\mu_1, \dots, \mu_n)^T$, denote the covariance matrix by
\[
\mathbf{V} ) var(\mathbf{y}) = E[(\mathbf{y} - \mathbf{\mu})(\mathbf{y} - \mathbf{\mu})^T]
\]

Let $\mathbf{X} = (x_{ij})$ denote the $n \times p$ model matrix, where $x_{ij}$ is the value of the explanatory variable $j$ for observation $i$, 
\[
\mathbf{\mu} = \mathbf{X\beta}, \mathbf{V} = \sigma^2 \mathbf{I}
\]
where $\beta$ is a $p \times 1$ parameter vector with $p \leq n$ and $\mathbf{I}$ is the $n \times n$ identity matrix. 

The ordinary linear model is
\[
\mathbf{y} = \mathbf{X\beta} + \epsilon
\]

\subsection{Least squares model fitting}
Having formed a model matrix $\mathbf{X}$ and observed $\mathbf{y}$, how do we obtain parameter estimates $\mathbf{\hat{\beta}}$ and fitted values $\mathbf{\hat{\mu}} = \mathbf{X \hat{\beta}}$ taht best satisfy the linear model? We determine the value of $\mathbf{\hat{\beta}}$ that minimizes 
\[
\|\mathbf{y} - \mathbf{\hat{\mu}} \| = \sum_i(y_i - \hat{\mu_i})^2 = \sum_{i=1}^n \bigg(y_i - \sum_{j=1}^p \hat{\beta}_j x_{ij} \bigg)^2
\]
Using least squares corresponds to maximum likelihood when we add a normality assumption to the model. To maximize the log-likelihood function, we must minimize $\sum_i (y_i - \mu_i)^2$

\subsubsection{The normal equations and least squares solution}
The likelihood equation in matrix form:
\[
L(\mathbf{\beta}) = 
\|\mathbf{y} - \mathbf{X \beta} \|^2 = 
(\mathbf{y} - \mathbf{X\beta})^T(\mathbf{y} - \mathbf{X \beta}) = \mathbf{y}^T\mathbf{y} -2 \mathbf{y}^T \mathbf{X \beta} + \mathbf{\beta}^T \mathbf{X}^T \mathbf{X \beta}
\]
If we take the derivative $\partial L(\beta)/\partial \beta = -2 \mathbf{X}^T(\mathbf{y} - \mathbf{X \beta})$ and let it be zero, we get
\[
\mathbf{X}^T\mathbf{y} = \mathbf{X}^T\mathbf{X \hat{\beta}}
\]
\[
\mathbf{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
\]

\subsubsection{Moments of estimators}
Recall that for a matrix of constants $\mathbf{A}$, $E(\mathbf{Ay}) = \mathbf{A}E(\mathbf{y})$ and $var(\mathbf{Ay}) = \mathbf{A}var(\mathbf{y})\mathbf{A}^T$. So the mean and variance of the least squares estimator are
\[
E(\mathbf{\beta}) = E[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}] = 
(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{y}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X\beta} = \beta
\]
\[
var(\hat{\beta}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\sigma^2\mathbf{I})\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^T \mathbf{X})^{-1}
\]

\subsubsection{Least squares solutions when X does not have full rank}
When $\mathbf{X}$ does not have full rank, the least squares estimate $\mathbf{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$ is not then unique, reflecting that $\mathbf{\beta}$ is not identifiable. 

\subsubsection{Orthogonal subspaces and residuals}
Now suppose $\mathbf{W} = C(\mathbf{X})$, the model space spanned by the columns of a model matrix $\mathbf{X}$. Residuals are prediction errors $\mathbf{e} = (\mathbf{y} - \mathbf{X\hat{\beta}})$. $\mathbf{e}$ is in the orthogonal complement to the model space $C(\mathbf{X})$, that is $C(\mathbf{X})^{\perp} = N(\mathbf{X}^T)$. 

\subsubsection{Alternatives to least squares}
Regularization methods add an additional term to the function minimized. 

\subsection{Projections of data onto model spaces}
Let $\mathbf{P}_X$ denote the projection matrix onto the model space $C(\mathbf{X})$ corresponding to a model matrix $\mathbf{X}$ for a linear model If $\mathbf{X}$ has full rank, then $\mathbf{P}_X$is the hat matrix $\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$.

The projection matrix plays a key role for linear models. The first important result is that the projection matrix projects the data vector $\mathbf{y}$ to the fitted value vector $\mathbf{\hat{\mu}}$ that is the unique point in the model space $C(\mathbf{X})$ that is closest to $\mathbf{y}$. 

Data projection gives unique least squares fit. for each $\mathbf{y} \in \R^n$ and its projection $\mathbf{P_Xy} = \mathbf{\hat{\mu}}$ omnto the model space $C(\mathbf{X})$ for a linear model $\mathbf{\mu} = \mathbf{X \beta}$
\[
\|\mathbf{y} - \mathbf{P_X y}\| \leq \|\mathbf{y} - z \|
\]
for all $z in C(\mathbf{X})$ with equality if and only if $\mathbf{Z} = \mathbf{P_X y}$.

Data = fit + residuals: for fitted values $\mathbf{\hat{\mu}}$ of a linear model $\mathbf{\mu} = \mathbf{X \beta}$ obtained by least squares
\[
\|\mathbf{y}\|^2 = \|\mathbf{\hat{\mu}}\|^2 + \|\mathbf{y} - \mathbf{\hat{\mu}}\|^2
\]

\subsection{Linear model examples: projections and SS decompositions}
\subsubsection{Model for the one-way layout}
We next extend the null model to the linear model for the one-way layout. This is the model for comparing means $\{\mu_i\}$ for $c$ groups. Let $y_{ij}$ denote obsevration $j$ in group $i$, with $n = \sum_i n_i$. With independent observations, an important case having this data format is the completely randomized experimental design. The model for $\mu_i ) E(y_{ij})$ has linear predictor
\[
E(y_{ij}) = \beta_0 + \beta_i
\]
We express the linear predictor as $\mathbf{\mu} = \mathbf{X \beta}$ with 
\[
\mathbf{X \beta} = \begin{bmatrix}
\mathbf{1}_n & \mathbf{1}_n & \mathbf{0}_n & \dots & \mathbf{0}_n \\
\mathbf{1}_n & \mathbf{0}_n & \mathbf{1}_n & \dots & \mathbf{0}_n \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\mathbf{1}_n & \mathbf{0}_n & \mathbf{0}_n & \dots & \mathbf{1}_n \\
\end{bmatrix}
\begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_c \\
\end{bmatrix}
\]

\subsubsection{Sums of squares and ANOVA table for one-way layout}



\section{Continuous response}
t

\section{Dichotomous response}
t

\section{Models for count data}
The first subsection presents models that assume a Poisson distribution for a count response variable. The loglinear model, using a log link to connect the mean with the linear predictor, is most common. For the Poisson distribution, the variance must equal the mean, and data often exhibit greater variability than this. Negative binomial distributions handles overdispersion. 

Zero-inflated data: more zeros than expected

\subsection{Poisson GLMs for counts and rates}
The simplest distribution for count data is the Poisson.

\subsubsection{Poisson distribution}
The Poisson probability mass function $p(y;\mu) = e^{-\mu}\mu^y/y!$ for $y = 0, 1, 2, \dots$, $E(y) = var(y) = \mu$

The Poisson also applies as an approximation for the binomial when the number of trials $n$ is large and $\pi$ is very small, with $\mu = n \pi$. 

\subsubsection{Poisson GLMs and loglinear models}
Since $var(y_i) = \mu_i$, the GLM likelihood equations for $n$ independent observations simplify for a Poisson response with linear predictor $\eta_i = g(\mu_i) = \sum_j \beta_j x_{ij}$ having link functions $g$ to
\[
\sum_{i=1}^n \frac{(y_i . \mu_i)x_{ij}}{var(y_i)}\bigg( \frac{\partial \mu_i}{\partial \eta_i} \bigg) = 
\sum_{i=1}^n \frac{(y_i - \mu_i)x_{ij}}{\mu_i} \bigg( \frac{\partial \mu_i}{\partial \eta_i} \bigg) = 0
\]
The Poisson loglinear model is
\[
log \mu_i = \sum_{j=1}^p \beta_j x_{ij}
\]

A 1-unit increase in $x_{ij}$ has a multiplicative impact of $e^{\beta_j}$

\subsubsection{Model fitting and goodness of fit}
The estimated covariance matrix of $\hat{\beta}$ is
\[
\hat{var}(\hat{\mathbf{\beta}} = (\mathbf{X}^T \hat{\mathbf{W}} \mathbf{X})^{-1}
\]
where with the log link $W$ is the diagonal matrix with elements $w_i = \mu_i$.

The deviance of a Poisson GLM is
\[
D(\mathbf{y}, \hat{\mathbf{\mu}}) = 2 \sum_{i=1}^n \bigg[y_i \log \bigg(\frac{y_i}{\hat{\mu_i}}\bigg) - y_i + \hat{\mu_i} \bigg]
\]
and the corresponding Pearson statistic is
\[
X^2 = \sum_{i=1}^n \frac{(y_i - \hat{\mu_i})^2}{\hat{\mu}_i}
\]

\subsubsection{Example: one-way layout comparing Poisson means}
For the one-way layout for a count response, let $y_{ij}$ be observation $j$ of a count variable for group $i$, $i = 1, \dots, c$, $j = 1, \dots, n_i$. The model has the form $\log \mathbf{\mu} = \mathbf{X \beta}$ with
\[
\mathbf{\mu} = 
\begin{bmatrix}
\mu_1 \mathbf{1}_{n_1} \\
\mu_2 \mathbf{1}_{n_2} \\
\vdots \\
\mu_c \mathbf{1}_{n_c} \\
\end{bmatrix},
\mathbf{X \beta} = 
\begin{bmatrix}
\mathbf{1}_{n_1} & \mathbf{0}_{n_1} & \ddots & \mathbf{0}_{n_1} \\
\mathbf{0}_{n_2} & \mathbf{1}_{n_2} & \ddots & \mathbf{0}_{n_2} \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{0}_{n_c} & \mathbf{0}_{n_c} & \dots & \mathbf{1}_{n_c} \\
\end{bmatrix}
\]
Analogous to the one-way ANOVA for a normal response, we can test $H_0: \mu_1 = \dots = \mu_c$. 

\subsubsection{Modeling rates: including an offset in the model}
Often the expected value of a response count $y_i$ is proportional to an index $t_i$. For instance, $t_i$ might be an amount of time and/or a population size, such as in modeling crime counts for various cities.

A loglinear model for the expected rate has the form
\[
\log(\mu_i/t_i) = \sum_{j=1}^p \beta_j x_{ij}
\]

\subsubsection{Example: Lung cancer survival}
Let $\mu_{ijk}$ denote the expected number of deaths and $t_{ijk}$ the total time at risk for histology $i$ and stage $j$, in follow-up time interval $k$. The Poisson GLM for the death rate
\[
\log(\mu_{ijk}/t_{ijk}) = \beta_0 + \beta_i^H + \beta_j^S + \beta_k^T
\]

\begin{minted}[breaklines]{R}
logrisktime = log(risktime)
fit <- glm(count ~ factor(histology) + factor(stage) + factor(time), family = poisson(link = log), offset = logrisktime)
\end{minted}

\subsection{Poisson/multinomial models for contingency tables}
The Poisson model generates the multinomial model after we condition on an overall sample size

\subsubsection{Connection between Poisson and multinomial distributions}
The conditional probability of a set of counts $\{y_i\}$ satisfying $\sum_j y_j = n$ is
\[
\frac{P(y_1 = n_1, y_2 = n_2, \dots, y_c = n_c)}{P(\sum_j y_j = n)}
= 
\frac{\prod_i (e^{-\mu_i}\mu_i^{n_i}/n_i!)}{\exp(-\sum_j \mu_j)(\sum_j \mu_j)^n/n!} = \bigg( \frac{n!}{\prod_i n_i!} \bigg) \prod_{i=1}^c \pi_i^{n_i}
\]
which the multinomial distribution characterized by the sample size $n$ and the probabilities $\{\pi_i\}$.

\subsubsection{GLM of independence in two-way contingency tables}

\subsection{Negative binomial GLMs}
Combats overdispersion!


\subsection{Poisson regression in R}
Poisson regression models are best used for modelling events where the outcomes are counts. Count data can also be expressed as rate data, since the number of times an event occurs within a timeframe can be expressed as a raw count. 
Generalized linear models are models in which response variables follow a distribution other than the normal distribution. The generak mathematical form of Poisson regression models is:
\[
\log(y) = \alpha + \beta_1 x_1 + \beta_2 x_2 + \dots \beta_p x_p
\]

In Poisson regression models, predictor or explanatory variables can have a mixture of both numeric or categorical values.

Equidispersion: mean and variance of the distribution are equal

Check dispersion parameter
\begin{minted}[breaklines]{R}
mean(data$breaks)
var(data$breaks)
\end{minted}

Fit the Poisson model with $glm()$

\begin{minted}[breaklines]{R}
poisson.model <- glm(breaks ~ wool + tension, data, family= poisson(link = "log"))
\end{minted}

Following is the interpretation for the parameter estimates:
\begin{enumerate}
    \item $\exp(\alpha)$ effect on the mean $\mu$, when $X = 0$
    \item $\exp(\beta)$ with every unit increase in $X$, the predictor variable has multiplicative effect of $\exp(\beta)$ on the mean of $Y$, that is $\mu$.
    \item If $\beta = 0$, then $\exp(\beta) = 1$, and the expected count is $\exp(\alpha)$, and $Y$ and $X$ are not related.
    \item If $\beta > 0$, then $\exp(\beta) > 1$, and the expected count is $\exp(\beta)$ times larger than when $X = 0$.
    \item If $\beta < 0$, then $\exp(\beta) < 1$, and the expected count is $\exp(\beta)$ times smaller than when $X = 0$.
\end{enumerate}

If the p is less than 0.05, then the variable has an effect on the response variable.

If the residual deviance is greater than the degrees of freedom, then overdispersion exists. 

So, to have a more correct standard error we can use a quasi-poisson model:
\begin{minted}[breaklines]{R}
poisson.model2 <- glm(breaks ~ wool + tension, data, quasipoisson(link = "log"))
\end{minted}

Once the model is made, we can use $predict$ to predict outcomes using new dataframes containing data other than the training data. 

\begin{minted}[breaklines]{R}
newdata = data.frame(wool = "B
, tension = "M)

predict(poisson.model2, newdata = newdata, type = "response")
\end{minted}

\subsubsection{Poisson regression modeling using rate data}



\section{Numerical maximization of a function}
t

\section{Delta method}
t

\section{Generalized linear mixed models}
t