## Theory

Suppose that a group of $P$ students is given a test with $I$ multiple choice questions. Let $Y_{pi}=1$ mean that student $p \in \{1, ..., P\}$ has the correct answer on item $i \in \{1, ..., I\}$. Assuming that all responses are conditionally independent given the parameters $a, b, \theta$, we have:

$$Y_{pi} | a,b,\theta \sim Ber(\frac{e^{a_i\theta_p - b_i}}{1 + e^{a_i\theta_p - b_i}})$$

where $\theta_p$ measures the student's learning ability, $a_i$ measures the items discriminatory power, and $b_i$ measures the item's difficulty. Take the following prior assumptions on the distribution of parameters $\theta, a, b$, where all the priors are assumed independent: $a_i \sim \mathcal{N}(0, \sigma_a^2), b_i \sim \mathcal{N}(0, \sigma_b^2), \theta_p \sim \mathcal{N}(0,1)$. Then the posterior distribution of $(a, b, \theta)$ is given by

$$\pi(a,b,\theta | y) = exp\bigr\{ \frac{1}{2\sigma_a^2}||a||^2 - \frac{1}{2\sigma_b^2}||b||^2 + \frac{1}{2}||\theta||^2 + \sum_{p,i} y_{pi}(a_i\theta_p - b_i) - log(1 + e^{a_i\theta_p - b_i}) \bigl\}$$

and the full conditionals (the conditional distribution of one variable given all others) are given by

$$\pi(a_i | b,\theta,y) = exp\bigl\{ \frac{a_i^2}{2\sigma_a^2} + \sum_{p=1}^P a_i y_{pi} \theta_p - log(1+e^{a_i\theta_p - b_i}) \bigr\}$$


$$\pi(b_i | a, \theta, y) = exp\bigl\{ \frac{b_i^2}{2\sigma_b^2} \sum_{p=1}^P y_{pi}b_i - log(1+e^{a_i\theta_p - b_i}) \bigr \}$$

$$\pi(\theta_p | a, b, y) = exp\bigl\{ \frac{\theta_p^2}{2} + \sum_{i=1}^I a_iy_{pi}\theta_p - log(1+e^{a_i\theta_p - b_i}) \bigr\}$$

Since these full conditionals are not easy to sample from, below is an implementation of the Metropolis-within-Gibbs sampler to make sampling easier.

In [1]:
import torch

In [None]:
def theta_conditional():  # outputs a 1xP row
    pass

def a_conditional():      # outputs a 1xI row
    pass

def b_conditional():      # outputs a 1xI row
    pass

In [None]:
def compute_acceptance():
    pass    

In [None]:
def gibbs_sampler(init_a, init_b, init_theta, y, niter=10000):
    
    assert length(init_a) == length(init_b)
    
    I = length(init_a)
    P = length(init_theta)
    
    AS = torch.empty(size=(niter, I))
    BS = torch.empty(size=niter, I)
    THETAS = torch.empty(size=(niter, P))
    
    AS[0] = init_a
    BS[0] = init_b
    THETAS[0] = init_theta
    
    for s in range(1, niter):
        
        # Sample next potential a from the full conditional
        a_s = a_conditional(BS[s-1], THETAS[s-1], y)
        
        # Compute acceptance probability & sample from uniform
        A_a = torch.min(1, compute_acceptance())
        U_a = torch.distributions.Uniform(torch.tensor[0.0], torch.tensor[1.0])
        
        # Keep sample with computed acceptance probability
        if U-a <= A_a:
            AS[s] = a_s
        else:
            AS[s] = AS[s-1]
        
        # Sample next potential b from the full conditional
        b_s = b_conditional(AS[s], THETAS[s-1], y)
        
        # Compute acceptance probability & sample from uniform
        A_b = torch.min(1, compute_acceptance())
        U_b = torch.distributions.Uniform(torch.tensor[0.0], torch.tensor[1.0])
        
        # Keep sample with computed acceptance probability
        if U_b <= A_b:
            BS[s] = b_s
        else:
            BS[s] = BS[s-1]
        
        # Sample next potential theta from the full conditional
        theta_s = theta_conditional(AS[s], BS[s], y)
        
        # Compute acceptance probability & sample from uniform
        A_t = torch.min(1, compute_acceptance())
        U_t = torch.distributions.Uniform(torch.tensor[0.0], torch.tensor[1.0])
        
        # Keep sample with computed acceptance probability
        if U_t <= A_t:
            THETAS[s] = theta_s
        else:
            THETAS[s] = THETAS[s-1]

In [None]:
# TODO:
#  1. Implement full conditionals
#  2. Implement acceptance probability
#  3. How to modularize the sampling so not so much code is repeated?
#  4. Test & debug