# Test Generator (Girth)

Read in the candidates and item data and generate a randomised test from them using the
[Girth](https://eribean.github.io/girth/) package

We assume that the 1PL model is used.

$$
Pr(X=1) = \frac{exp(\theta-b)}{1 + exp(\theta-b)}
$$

The benefit of using the Girth library is that is it in addition to generating synthetic test data it can also be used to estimate the IRT parameters.

## Data Ingest

There are two files in the `data` folder that we need: `items.csv` and `candidates.csv`. If you want to generate a randomised set of candidates, then run the `generateCandidates` notebook first. Note: this will overwrite the `candidates.csv` file.

In [None]:
import numpy as np
from numpy.random import seed
from typing import List, Tuple
from csv import reader
import pandas as pd
from girth.synthetic import create_synthetic_irt_dichotomous, create_synthetic_irt_polytomous
from girth import onepl_mml

TEST_DICHOTOMOUS = 1
TEST_SCALAR = 2

CANDIDATE_ID = 0
CANDIDATE_FNAME = 1
CANDIDATE_SNAME = 2
CANDIDATE_THETA = 3

ITEM_ID = 0
ITEM_A = 1
ITEM_B = 2
ITEM_K = 3


def getDataAsList(datafile: str) -> List[Tuple]:
    """Turn a CSV datafile into a list of tuples

    :param datafile: the CSV file to load data from
    :return: a list of rows (tuples)
    """
    with open(datafile, 'r', encoding='utf-8-sig') as fs:
        csv_reader = reader(fs)
        row_list = list(map(tuple, csv_reader))
        return row_list[1:]    # ignore the header row
    

# convert the raw data into a simple duple of ( systemname, givenName, familyName, theta )
def getCandidates() -> List[Tuple]:
    candidates = getDataAsList('data/candidates.csv')
    new_list = [(c[0], c[1], c[2], float(c[3])) for c in candidates]
    return new_list
    

# convert the raw data into a simple triple of ( uiid, a, b )
def getItems() -> List[Tuple]:
    items = getDataAsList('data/items.csv')
    new_list = [(i[0], float(i[1]), float(i[2]), int(i[5])) for i in items]
    return new_list

In [None]:
items = getItems()
# split the items into two sets: dichotomous and scalar items
dichotomous_items = [i for i in items if i[ITEM_K] == 1]
scalar_items = [i for i in items if i[ITEM_K] > 1]
max_level = max([i[ITEM_K] for i in scalar_items])

candidates = getCandidates()

## Test Generation
We use the `create_synthetic_irt_dichotomous()` and `create_synthetic_irt_polytomous()` functions from Girth to create the random test data. The `generateTest()` function reads data into the the numpy arrays that Girth requires, and then generates a randomised test before converting it into a pandas data frame for display.

In [None]:
def convertTupleListToArray(tl: List[Tuple], arity: int):
    l = [i[arity] for i in tl]
    return np.array(l)


def generateTest(itemList: List[Tuple], candidateList: List[Tuple], 
                 testType: int = TEST_DICHOTOMOUS, seed: int = None):
    discrimination = convertTupleListToArray(itemList, ITEM_A)
    theta = convertTupleListToArray(candidateList, CANDIDATE_THETA)
    if testType == TEST_SCALAR:
        # difficulty must be a [2D array (items x n_levels-1)] of difficulty parameters
        difficulty = np.random.randn(len(itemList), max_level-1)
        difficulty = np.sort(difficulty, 1)       
        t = create_synthetic_irt_polytomous(difficulty, discrimination, theta, seed=seed)
    else:
        difficulty = convertTupleListToArray(itemList, ITEM_B)
        t = create_synthetic_irt_dichotomous(difficulty, discrimination, theta, seed=seed)
    return t


def generateScalarTest(itemList: List[Tuple], candidateList: List[Tuple], seed: int = None):
    synthetic_tests = []
    for i in itemList:
        t = generateTest(i, candidateList, TEST_SCALAR)
        synthetic_tests.append[t]
    

def convertTestToDataframe(test, itemList: List[Tuple], candidateList: List[Tuple]):
    header = []
    for i in items:
        header.append(i[0])
    rownames = []
    for i in candidates:
        rownames.append(i[0])
    df = pd.DataFrame(test.T, index=rownames, columns=header)
    return df

In [None]:
synthetic_test1 = generateTest(dichotomous_items, candidates, TEST_DICHOTOMOUS)
synthetic_test2 = generateScalarTest(scalar_items, candidates)

In [None]:
items = dichotomous_items + scalar_items
testDf = convertTestToDataframe(synthetic_test, items, candidates)

TypeError: 'float' object is not subscriptable

In [None]:
items = dichotomous_items + scalar_items
testDf = convertTestToDataframe(synthetic_test, items, candidates)

In [None]:
(testDf)

### GenerateRandomTests function
We also include a `GenerateRandomTests()` function that is the same as the one in the `generateTest.ipynb` notebook. You can then call the `GenerateRandomTests()` function as many times as you want to re-generate a test. It will generate different results every time (unless you pass in an integer seed value).

Add items and candidates to the data files to generate larger tests.

When you are happy with the results you can write out to a results CSV file.

In [None]:
def GenerateRandomTests(seed: int = None):
    synthetic_test = generateTest(items, candidates, seed)
    testDf = convertTestToDataframe(synthetic_test, items, candidates)
    return testDf

In [None]:
df = GenerateRandomTests(89)

(df)

In [None]:
df.to_csv('data/results.csv', index=True, index_label='systemname')

### Solving using Standard Estimation
You can use either maximum marginal likelihood (MML) or joint maximum likelihood (JML) estimation methods with the Girth library. Here we use MML estimation for the 1PL model (`onepl_mml`) to separately estimate the item parameters (`a` is the discrimination parameter and `bs` is an array of item difficulties) using univariate optimization methods. 

In [None]:
estimates = onepl_mml(synthetic_test)

a = estimates['Discrimination']
bs = estimates['Difficulty']

In [None]:
(a)

In [None]:
(bs)