# Log-Linear Model

This is a demo of log-linear model.

## Code and Result

In [1]:
# Packages.
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import glm

In [2]:
# Load the data.
data = pd.read_table("deathpen.txt", sep="\s+")
data

Unnamed: 0,DeathPen,Defendant,Victim,freq
0,Yes,White,White,53
1,No,White,White,414
2,Yes,Black,White,11
3,No,Black,White,37
4,Yes,White,Black,0
5,No,White,Black,16
6,Yes,Black,Black,4
7,No,Black,Black,139


Assume that $$\text{DeathPen} \perp \text{Defendant} \mid \text{Victim}.$$

In [3]:
# Fit the model.
mod1 = glm(formula="freq~C(DeathPen)*C(Victim)+C(Defendant)*C(Victim)", data=data, 
            family=sm.families.Poisson(sm.families.links.log()))
mod1.fit().params

Intercept                                   4.937366
C(DeathPen)[T.Yes]                         -3.657131
C(Victim)[T.White]                         -1.198864
C(Defendant)[T.White]                      -2.190256
C(DeathPen)[T.Yes]:C(Victim)[T.White]       1.704546
C(Defendant)[T.White]:C(Victim)[T.White]    4.465384
dtype: float64

If we fit the marginal table over $\text{Victim}$ and $\text{Defendant}$, the parameters involving $\text{Defendant}$ are the same.

In [4]:
mod2 = glm(formula="freq~C(Defendant)*C(Victim)", data=data, 
            family=sm.families.Poisson(sm.families.links.log()))
mod2.fit().params

Intercept                                   4.269697
C(Defendant)[T.White]                      -2.190256
C(Victim)[T.White]                         -1.091644
C(Defendant)[T.White]:C(Victim)[T.White]    4.465384
dtype: float64

We can also check that the subsets of $C=\{\text{Victim}\}$ are given by the other condition we have $$\lambda_W=\lambda_W^{AC}+\lambda_W^{BC}-\lambda_W^C$$ for $W \subseteq C$.

In [5]:
mod3 = glm(formula="freq~C(DeathPen)*C(Victim)", data=data, 
            family=sm.families.Poisson(sm.families.links.log()))
mod3.fit().params

Intercept                                4.350278
C(DeathPen)[T.Yes]                      -3.657131
C(Victim)[T.White]                       1.068042
C(DeathPen)[T.Yes]:C(Victim)[T.White]    1.704546
dtype: float64

In [6]:
mod4 = glm(formula="freq~C(Victim)", data=data, 
            family=sm.families.Poisson(sm.families.links.log()))
mod4.fit().params

Intercept             3.682610
C(Victim)[T.White]    1.175263
dtype: float64

In [7]:
mod2.fit().params[[0, 2]] + mod3.fit().params[[0, 2]] - mod4.fit().params[[0, 1]]

Intercept             4.937366
C(Victim)[T.White]   -1.198864
dtype: float64

The coefficients above match those from `mod1`.