# Part 3: Mixed Effects Regression Models
<b>Author</b>: Sterling Cutler
<br>
<b>Date</b>: March 22, 2018

## Mixed Effects

One important consideration in using GLM's is that the model assumes the data is independent. In our auto bodily injury dataset, we are assuming the no claimant is represented in more than one claim. If a claimant had filed two different claims, then we should ask ourselves, "How might the first claim have impacted the likelihood of experiencing the second? How independent are these events?"

Some possible explanations for why these events may not be independent could include:
- first accident left claimant with impaired driving abilities (physical and/or mental)
- claimant is continuing poor driving habits that resulted in first accident
- claimant is committing multiple cases of insurance fraud

This is somewhat of a simplification, but it demonstrates the importance of understanding why models should try to capture nonindependence where it may be present. Mixed effects models handle this by accounting for random effects (or variance components) in addition to fixed effects. 

Because the GLMM package currently only supports Poisson and binomial distributions for the target variable, let's use a different dataset that includes claim count for simulated auto collisions over three years. We'll filter down to one year and claim count as our target variable.

In [3]:
# Load and preview raw data
library(insuranceData)

data("ClaimsLong")
df <- subset(ClaimsLong, period==1)
rm(ClaimsLong)

cat('Dataset shape:', dim(df))
head(df)

ERROR: Error in library(insuranceData): there is no package called 'insuranceData'


In [None]:
# View distribution of categorical and target variables
table(df$agecat)
table(df$valuecat)
hist(df$numclaims, xlab="Number of Claims", main="Auto Claim Count")

## Generalized Linear Mixed Model (GLMM)

https://cran.r-project.org/web/packages/glmm/vignettes/intro.pdf

GLMMs are an extension of the generalized linear mixed models that include both fixed and random effects[1]. The general form of the model (in matrix notation) is: $y=X\beta+Z\mu+\epsilon$

- $y$ is a $Nx1$ column vector, the outcome variable 
- $X$ is a $Nxp$ matrix of the $p$ predictor variables 
- $\beta$ is a $px1$ column vector of the fixed-effects regression coefficients (the $\beta$ s)
- $Z$ is the $Nxq$ design matrix for the $q$ random effects (the random complement to the fixed $X$)
- $\mu$ is a $qX1$ vector of the random effects (the random complement to the fixed $\beta$)
- $\epsilon$ is a $Nx1$ column vector of the residuals, that part of $y$ that is not explained by the model $X\beta+Z\mu$

Here we'll use the vehicle value category as our fixed effect and the driver's agecategory as our random effect.

In [None]:
library(glmm)

glmm <- glmm(fixed=numclaims ~ 0 + valuecat, 
             random= numclaims ~ 0 + agecat,
             varcomps.names= c("claim_value"),
             data = df,
             m = 100,
             family.glmm="poisson.glmm")

summary(glmm)

## HGLM
Doc: https://cran.r-project.org/web/packages/hglm/hglm.pdf

Link: https://www.diva-portal.org/smash/get/diva2:685966/FULLTEXT02.pdf

In [None]:
library(hglm)

hglm <- hglm(X="matrix for fixed effects", 
             y="dependent variable",
             Z="matrix for random effects",
             family = gaussian(link = identity),
             rand.family = gaussian(link = identity),
             conv=1e-6,
             maxit=50)

summary(hglm)

## LME
Doc: https://cran.r-project.org/web/packages/nlme/nlme.pdf

Link: http://www.bodowinter.com/tutorial/bw_LME_tutorial2.pdf

In [None]:
library(nlme)

lme <- lme(fixed="two-sided linear formula of Y ~ X form", 
           data=AutoBi, 
           random="two-sided formula of form ~ x1 + ... + xn | g1/.../gm")

summary(lme)

## NLME
Doc: https://cran.r-project.org/web/packages/nlme/nlme.pdf

Link: http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf

In [None]:
nlme <- nlme(model="nonlinear model formula of Y ~ X form", 
             data=AutoBi,
             fixed="two-sided linear formula of Y ~ X form", 
             random="two-sided formula of form ~ x1 + ... + xn | g1/.../gm")

summary(nlme)

## Sources
[1] https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/