This notebook illustrates variance components analysis of a two-level
simulated dataset.  In our discussion below, "Group 2" is nested within
"Group 1".  As a concrete example, "Group 1" might be school districts,
with "Group 2" being individual schools.

In [None]:
import numpy as np
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import VCSpec
import pandas as pd
from scipy import sparse

Make the notebook reproducible

In [None]:
np.random.seed(3123)

In [None]:
def generate_nested(n_group1=100, n_group2=10, n_rep=10, group1_sd=2, group2_sd=3, unexplained_sd=4):

    n_group1 = 200
    n_group2 = 20
    n_rep = 10

    # Group 1 indicators
    group1 = np.kron(np.arange(n_group1), np.ones(n_group2 * n_rep))

    # Group 1 effects
    u = group1_sd * np.random.normal(size=n_group1)
    effects1 = np.kron(u, np.ones(n_group2 * n_rep))

    # Group 2 indicators
    group2 = np.kron(np.ones(n_group1), np.kron(np.arange(n_group2), np.ones(n_rep)))

    # Group 2 effects
    u = group2_sd * np.random.normal(size=n_group1*n_group2)
    effects2 = np.kron(u, np.ones(n_rep))

    e = unexplained_sd * np.random.normal(size=n_group1 * n_group2 * n_rep)
    y = effects1 + effects2 + e

    df = pd.DataFrame({"y":y, "group1": group1, "group2": group2})

    return df

Generate a data set to analyze

In [None]:
df = generate_nested()

The population values of "group 1 Var" and "group 2 Var" are 2^2=4 and 3^2=9,
respectively.  The unexplained variance, listed as "scale" at the top
of the summary table, has population value 4^2=16.

In [None]:
model1 = sm.MixedLM.from_formula("y ~ 1", re_formula="1", vc_formula={"group2": "0 + C(group2)"},
                groups="group1", data=df)
result1 = model1.fit()
print(result1.summary())

If we wish to avoid the formula interface, we can build
the design matrices manually.

In [None]:
def f(x):
    n = x.shape[0]
    g2 = x.group2
    u = g2.unique()
    u.sort()
    uv = {k:v for k, v in enumerate(u)}
    ii = np.arange(n)
    jj = [uv[g2[j]] for j in range(n)]
    jj = np.asarray(jj)
    oo = np.ones(n)
    q = len(u)
    mat = sparse.coo_matrix((oo, (ii, jj)), shape=(n, q))
    mat = np.asarray(mat.todense())
    colnames = ["%d" % z for z in u]
    return mat, colnames

Set up the variance components

In [None]:
vcm = df.groupby("group1").apply(f).to_list()
mats = [x[0] for x in vcm]
colnames = [x[1] for x in vcm]
names = ["group2"]
vcs = VCSpec(names, [colnames], [mats])

Fit the model

In [None]:
oo = np.ones(df.shape[0])
model2 = sm.MixedLM(df.y, oo, exog_re=oo, groups=df.group1, exog_vc=vcs)
result2 = model2.fit()