## Item Response Theory

**Item response theory (IRT)** is a branch of psychometrics that models how both **person** and **item** characteristics influence the probability of discrete responses to item.   

- **Item parameter:** characteristics of the item itself ($i$).

- **Person parameter:** Some latent trait ($\theta$) varying across individuals ($j$).



::: {layout-ncol="2"}

:::col

</br>

</br>
For example, the 2 parameter logistic model (2PL): 


$$P(Y^j_{i} = 1| \theta_j, a_i, b_i) =  \frac{\exp[a_i(\theta_j - b_i)]}{1 + \exp[a_i(\theta_j - b_i)]}$$
</br>

<center>
$a$ = discrimination

$b$ = difficulty

$\theta$ = latent trait

</center>

:::

:::col

</br>
</br>


![](images/two_pl.png)

:::
:::


## Adding parameters

::: {.panel-tabset}

## 1PL


::: {layout-ncol="2"}

:::col


</br>
</br>
</br>
</br>
</br>

<center>

$$P(Y = 1| \theta, b) =  \frac{\exp[(\theta - b)]}{1 + \exp[(\theta - b)]}$$


$\theta$ = latent trait

$b$ = difficulty

</center>

:::



:::col

</br>
</br>
</br>


![](images/one_pl.png)

:::
:::

## 2PL


::: {layout-ncol="2"}

:::col


</br>
</br>
</br>
</br>
</br>

<center>

$$P(Y = 1| \theta, b, a) =  \frac{\exp[a(\theta - b)]}{1 + \exp[a(\theta - b)]}$$

$\theta$ = latent trait

$b$ = difficulty

$a$ =  discrimination

</center>

:::



:::col

</br>
</br>
</br>


![](images/two_pl.png)

:::
:::


## 3PL



::: {layout-ncol="2"}

:::col


</br>
</br>
</br>
</br>
</br>

<center>

$$P(Y = 1| \theta, b, a, c) =  c + (1-c) \frac{\exp[a(\theta - b)]}{1 + \exp[a(\theta - b)]}$$

$\theta$ = latent trait

$b$ = difficulty

$a$ =  discrimination

$c$ = guessing

</center>

:::



:::col

</br>
</br>
</br>


![](images/three_pl.png)

:::
:::




## 4PL

::: {layout-ncol="2"}

:::col


</br>
</br>
</br>
</br>


<center>

$$P(Y = 1| \theta, b, a, c, d) =  c + (d-c) \frac{\exp[a(\theta - b)]}{1 + \exp[a(\theta - b)]}$$

$\theta$ = latent trait

$b$ = difficulty

$a$ = discrimination

$c$ = guessing

$d$ = slipping

</center>

:::



:::col

</br>
</br>
</br>


![](images/four_pl.png)

:::
:::
:::

## Estimation and Sample Size 

The more item parameters are added to the model, the more flexible the item response functions (IRF).


However, the additional parameters of the **3PL** and **4PL** tend to require large sample sizes ($N \geq 1000$) to be stably estimated.

</br>

<center>  What about **1PL** and **2PL** models? </center>

</br>


::: {layout-ncol="2"}

:::col

<center> **Maximum Likelihood** </center>

</br>

<i class="fa fa-solid fa-thumbs-up" style="color: #9B3922;"></i></i> 1PL models seem to be stably estimable with sample sizes as low as $N = 100$ (Finch & French, 2019).


</br>

<i class="fa fa-solid fa-thumbs-down" style="color: #9B3922;"></i> 2PL models seems to require a sample size of $N = 200$ or more (Drasgow, 1989; Liu & Yang, 2018).

:::


:::col
<center> **Markov Chain Monte Carlo** </center>


</br>

<i class="fa fa-solid fa-thumbs-up" style="color: #9B3922;"></i></i> 1PL models showed good coverage when $N = 100$ and generally outperformed maximum likelihood (Finch & French, 2019).


</br>

<i class="fa fa-solid fa-thumbs-up" style="color: #9B3922;"></i> 2PL models   of $N = 100$ or more (Drasgow, 1989; Liu & Yang, 2018).

:::
:::




<style>
.vl {
  border-left: 3px solid #9B3922;
  height: 400px;
  position: absolute;
  left: 49%;
  margin-left: 10px;
  top: 43%;
}
</style>

<div class="vl"></div>



## Asymmetric IRT Models

::: {layout-ncol="2"}

:::col

All the IRT models presented so far generate IRFs that are symmetric **symmetric**.

</br>
</br>

![](images/symmetric_irf.png)


:::

:::col

Samejima (2000) was the first scholar to propose the use **asymmetric** IRT with her logistic positive exponent (LPE) model.


</br>

![](images/asymmetric_irf.png)


:::
:::


## Simple Asymmetric IRT Models

The LPE has shown to have some identification issues in simulation studies (Lee & Bolt, 2018), especially in small sample sizes. Two recently proposed asymmetric IRT models (Shim et al., 2023) may help address this issue:

::: {layout-ncol="2"}

:::col
<center> 

**Complementary Log-Log (CLL)**

$$ P(Y = 1| \theta) = 1 - \exp[-\exp[a(\theta - b)]]$$


![](images/CLL_vs_3pl.png)

</center>

:::




:::col
<center> 

**Negative Log-Log (NLL)**

$$ P(Y = 1| \theta) =  \exp[-\exp[-a(\theta - b)]]$$
![](images/CLL_vs_3plU.png)


</center>
:::

:::


## What to do about Small Sample Sizes? 

Although the NLL and CLL may approximate more complex models, complex IRFs remain hard to approximate in small sample sizes ($N \leq 250$) with a single model.

</br>

<center> Can we do better with **Bayesian model averaging (BMA)**? </center>

</br>

**Model averaging** takes into account model uncertainty by weighting model predictions according to their relative plausibility.

The core of **BMA** is the *expected log pointwise predictive density*, $ELPD = \sum log(p(y_i |y_{-i}, M_k))$, which is an approximation of leave-one-out cross validation ($LOO_{CV}$).

</br>

::: {layout-ncol="2"}

:::col

**Two type of weights:**

- BMA weights with Bayesian bootstrapping (BMA+)

- Stacking weights

:::

:::col



<center>
**Idea:** How much better can BMA of simple symmetric and asymmetric models do compared to model selection (MS) in the context of IRT? 
</center>


:::
:::

## Averaging Predicted Probability: The Scale of $\theta$ 

::: {layout-ncol="2"}


:::col
<center> **Ideal Scenario** </center>

Ultimately, one would like to get the best possible estimate of $P(Y = 1|\theta)$ by averaging along the $\theta$ continuum.
![](images/avg_plot.png)


:::

:::col
<center> **Reality** </center>

However, the same person will get a different $\theta$ depending on the model that is fit to the data. 

</br>
</br>

![](images/theta_plot.png)


:::
:::


## Empirical and Theoretical Quantiles of $\theta$ 

Although the same person may be assigned a different $\theta$ depending on the model, the **relative rank** of participants should be scale invariant.


This means that instead of averaging IRFs at the same $\theta$ values, it should be more sound to average the IRFs at the same $\theta$ **quantile**. 


These quantiles will be estimated empirically for each of the models to be averaged. Thus, the averaged probability of a keyed response at each **empirical quantile** will be 

$$\overline{IRF}_{q} = \sum_{1}^{m} W_{m}P_{m}(\theta_{mq})$$
here $q$ represents a specific quantile, $m$ represents one of the candidate models, and $W$ represents the weight assigned to each model.

</br>

Form comparison, IRFs will also be averaged at **theoretical quantiles** of the standard normal distribution.

# {}

- Is it possible to leverage simple symmetric and asymmetric IRT models (1PL, 2PL, CLL, NLL) to recover complex IRFs in **small sample sizes**?

- Which method among **BMA**, model selection (**MS**), and kernel smoothing IRT (**KS**) will produce better IRF recovery?

- Will averaging at **empirical $\theta$ quantiles** achieve better IRF recovery than averaging at **theoretical $\theta$ quantiles**?

- How will **stacking weights** and **BMA+** weights behave? 

## Simulation: Data Generation

Most data generating conditions were designed to be "realistic", where the true data generating model is not included in the set of candidate models. There were **4 data generating conditions**:

::: {layout-ncol="2"}

:::col


- **2PL:** $\frac{\exp[a(\theta - b)]}{1 + \exp[a(\theta - b)]}$

This condition is meant to be a sort of "control condition", as this model will be included in the candidate models.

- **2MPL:** $\frac{1}{1 +\exp[-(a_{1}\theta_{1} + a_{2}\theta_{2} + d)]}$

This model is meant to simulate a condition in which the unidemensionality of $\theta$ assumption is violated.
 
- **GLL~ua~** and **GLL~la~**

A relatively complex IRT model that allows for both symmetry and asymmetry (Zhang et al., 2023)



:::



:::col



::: r-stack


::: {.fragment .fade-in-then-out fragment-index="1"}
</br>

![](images/Stukel_fig.png)
:::

::: {.fragment .fade-in-then-out fragment-index="1"}
</br>

![](images/Stukel_fig.png)
:::

```{tex}

\begin{flushleft} 
\begin{tabular}{cccccccc}
\toprule
  \multicolumn{2}{c}{\textbf{2PL}} & 
    \multicolumn{2}{c}{\textbf{2MPL}} & 
    \multicolumn{2}{c}{\textbf{GLL\textsubscript{la}}} & 
    \multicolumn{2}{c}{\textbf{GLL\textsubscript{ua}}} \\
  \cmidrule(lr){1-2}\cmidrule(lr){3-4}\cmidrule(lr){5-6} \cmidrule(lr){7-8}
   Par & Dist & Par & Dist & Par & Dist & Par & Dist\\
  \cmidrule{1-8}
   \(\theta\) & \(\mathcal{N} (0, 1)\) &  \(\theta_{1,2}\) &  \(\mathcal{MVN} (0, 1), \rho = .3\) & \(\theta\) & \(\mathcal{N} (0, 1)\) & \(\theta\) & \(\mathcal{N} (0, 1)\)  \\
   \textit{a} & \(\mathcal{N} (1.5, 0.5)\) & \(a_1\) & \(\mathcal{N} (1.5, 0.5)\) & \textit{a} & \(\mathcal{N} (1.5, 0.5)\) & \textit{a} & \(\mathcal{N} (1.5, 0.5)\)  \\
   \textit{b} & \(\mathcal{N} (0, 1)\) & \(a_2\) & \(\mathcal{N} (0.5, 0.25)\) & \textit{b} & \(\mathcal{N} (0, 1)\) & \textit{b} & \(\mathcal{N} (0, 1)\) \\
    &  & \( d \) &  \(\mathcal{N} (0, 1)\) & \(\alpha_2\) & \(\mathcal{N} (0, 0.5)\)  & \(\alpha_1\) & \(\mathcal{N} (0, 0.5)\)  \\
    
   \bottomrule
\end{tabular}
\begin{singlespace}
  \footnotesize{\textit{Note.} The numbers in parentheses represent means and standard deviations respectively. 2PL = two-parameter logistic model; 2MPL = two-parameter multi-trait logistic model; GLL\textsubscript{la} = generalized logistic link with upper asymptote parameter fixed at 0; GLL\textsubscript{ua} = generalized logistic link with lower asymptote parameter fixed at 0.} 
\end{singlespace}
\end{flushleft}

```


:::



:::
:::
