# Test Generator (Girth)

Read in the candidates and item data and generate a randomised test from them using the
[Girth](https://eribean.github.io/girth/) package

We assume that the 1PL model is used.

$$
Pr(X=1) = \frac{exp(\theta-b)}{1 + exp(\theta-b)}
$$

The benefit of using the Girth library is that is it in addition to generating synthetic test data it can also be used to estimate the IRT parameters.

## Data Ingest

There are two files in the `data` folder that we need: `items.csv` and `candidates.csv`.

In [23]:
import numpy as np
from numpy.random import seed
from typing import List, Tuple
from csv import reader
import pandas as pd
from girth.synthetic import create_synthetic_irt_dichotomous
from girth import onepl_mml


def getDataAsList(datafile: str) -> List[Tuple]:
    """Turn a CSV datafile into a list of tuples

    :param datafile: the CSV file to load data from
    :return: a list of rows (tuples)
    """
    with open(datafile, 'r', encoding='utf-8-sig') as fs:
        csv_reader = reader(fs)
        row_list = list(map(tuple, csv_reader))
        return row_list[1:]    # ignore the header row
    

# convert the raw data into a simple duple of ( ucid, theta )
def getCandidates() -> List[Tuple]:
    candidates = getDataAsList('data/candidates.csv')
    new_list = [(c[0], float(c[1])) for c in candidates]
    return new_list
    

# convert the raw data into a simple triple of ( uiid, a, b )
def getItems() -> List[Tuple]:
    items = getDataAsList('data/items.csv')
    new_list = [(i[0], float(i[1]), float(i[2])) for i in items]
    return new_list

In [24]:
items = getItems()
candidates = getCandidates()

## Test Generation
We use the `create_synthetic_irt_dichotomous()` function from Girth to create the random test data. The `generateTest()` function reads data into the the numpy arrays that Girth requires, and then generates a randomised test before converting it into a pandas data frame for display.

In [25]:
def convertTupleListToArray(tl: List[Tuple], arity: int):
    l = [i[arity] for i in tl]
    return np.array(l)


def generateTest(itemList: List[Tuple], candidateList: List[Tuple]):
    discrimination = convertTupleListToArray(items, 1)
    difficulty = convertTupleListToArray(items, 2)
    theta = convertTupleListToArray(candidates, 1)
    t = create_synthetic_irt_dichotomous(difficulty, discrimination, theta)
    return t


def convertTestToDataframe(test, itemList: List[Tuple], candidateList: List[Tuple]):
    header = []
    for i in items:
        header.append(i[0])
    rownames = []
    for i in candidates:
        rownames.append(i[0])
    df = pd.DataFrame(test.T, index=rownames, columns=header)
    return df

In [47]:
synthetic_test = generateTest(items, candidates)
testDf = convertTestToDataframe(synthetic_test, items, candidates)

In [48]:
(testDf)

Unnamed: 0,I001,I002,I003,I004,I005,I006
C001,1,1,1,0,0,1
C002,0,0,0,0,0,0
C003,1,1,1,1,1,1
C004,1,1,1,1,0,0
C005,1,0,0,0,0,0


In [49]:
# Solve for parameters
estimates = onepl_mml(synthetic_test)

# Unpack estimates
a = estimates['Discrimination']
bs = estimates['Difficulty']

In [50]:
(a)

5.888360477462128

In [51]:
(bs)

array([-0.87995839, -0.26497006, -0.26497006,  0.26495167,  0.87993295,
        0.26495167])