# Test Generator

Read in the candidates and item data and generate a randomised test from them.

We assume that the 1PL model is used.

$$
Pr(X=1) = \frac{exp(\theta-b)}{1 + exp(\theta-b)}
$$

## Data Ingest

There are two files in the `data` folder that we need: `items.csv` and `candidates.csv`. From these we generate a randomised test.

In [1]:
import numpy as np
from numpy.random import seed
from typing import List, Tuple
from csv import reader
import pandas as pd


def getDataAsList(datafile: str) -> List[Tuple]:
    """Turn a CSV datafile into a list of tuples

    :param datafile: the CSV file to load data from
    :return: a list of rows (tuples)
    """
    with open(datafile, 'r', encoding='utf-8-sig') as fs:
        csv_reader = reader(fs)
        row_list = list(map(tuple, csv_reader))
        return row_list[1:]    # ignore the header row
    

# convert the raw data from the candidates.csv file into
# a simple quadruple of ( systemname, givenName, familyName, theta )
def getCandidates() -> List[Tuple]:
    candidates = getDataAsList('data/candidates.csv')
    new_list = [(c[0], c[1], c[2], float(c[3])) for c in candidates]
    return new_list
    

# convert the raw data from the items.csv file into
# a simple triple of ( uiid, a, b )
def getItems() -> List[Tuple]:
    items = getDataAsList('data/items.csv')
    new_list = [(i[0], float(i[1]), float(i[2]), int(i[5])) for i in items]
    return new_list

In [2]:
items = getItems()
candidates = getCandidates()

## Item Response Generation

The `getItemResponse()` function is used to generate a randomised response: correct (1) or incorrect (0) for a given candidate taking an item.

In [3]:
def getItemResponse(b: float, theta: float, seed: int = None) -> str:
    """Gets a randomised item response for a given candidate
    according to the 1PL model:

    P(X=1) = e^(theta-b) / 1 + e^(theta-b)

    :param b: the difficulty parameter for the item
    :param theta: the latent ability of the candidate
    :return: '0' = incorrect, '1' = correct
    """
    if seed is None:
        rng = np.random.default_rng()
    else:
        rng = np.random.default_rng(seed)
    rv = rng.random()

    p1 = np.exp(theta - b) / (1 + np.exp(theta - b))
    p0 = 1 - p1

    assert p0 <= 1.0
    assert p1 <= 1.0

    rLookup = {
        '0': [0.00, p0],
        '1': [p0, 1.00]
    }
    r = {k: v for (k, v) in rLookup.items() if v[0] <= rv <= v[1]}
    rKey = list(r.keys())

    assert rKey[0] == '1' or rKey[0] == '0'

    return rKey[0]


def getScalarResponse(b: float, theta: float, maxValue: int = 1, seed: int = None) -> str:
    """Gets a randomised item response for a given candidate
    according to the 1PL model:

    P(X=1) = e^(theta-b) / 1 + e^(theta-b)

    :param b: the difficulty parameter for the item
    :param theta: the latent ability of the candidate
    :param maxValue: the maximum possible value of the scalar response
    :return: '0' = incorrect, '1' = correct
    """

    response = 0
    
    if maxValue == 1:
        response = getItemResponse(b, theta, seed)
    else:
        if seed is None:
            rng = np.random.default_rng()
        else:
            rng = np.random.default_rng(seed)
        rv = rng.random()
        
        response = np.random.randint(maxValue+1)
        
    return response


We iterate through the data and genereate item responses for each candidate. Each candidate takes a test comprising each item; with a simulated response being generated for each.

In [4]:
def GenerateRandomTests(seed: int = None):
    test_responses = [] # a list of lists

    # generate a header row for the results
    header = []
    header.append('systemname')
    for i in items:
        header.append(i[0])

    # now create the simulated test responses
    for c in candidates:
        test = []
        test.append(c[0])
        for i in items:
            if i[3] > 1:
                # proivde a polytomous response
                r = int(getScalarResponse(i[2], c[3], i[3], seed))
            else:
                # provide a dichomtomous response
                r = int(getItemResponse(i[2], c[3], seed))
            test.append(r)
        test_responses.append(test)

    df = pd.DataFrame(test_responses, columns=header)
    return df

In [5]:
df = GenerateRandomTests()

(df)

Unnamed: 0,systemname,A1L_7616_01#6789,A1L_20679_02#6790,A1L_5480_03#6791,A2L_5483_04#6792,A2L_24442_05#6793,A2L_7620_06#6794,A2L_7627_07#6795,B1L_20849_08#6796,B1L_4287_09#6797,...,S3B,S4,S5,B2R_4464_WA_01,B2R_4161_WA_02,C1R_4135_WA_03,C1R_4421_WA_04,C1R_4136_WA_05,W1,W2
0,DT0001,1,1,0,0,1,0,0,1,1,...,5,2,10,0,0,0,0,0,18,3
1,DT0002,0,0,0,0,0,0,0,0,0,...,1,1,4,0,0,0,0,0,15,17
2,DT0003,1,1,1,1,1,1,0,1,1,...,5,2,12,0,0,1,0,1,10,9
3,DT0004,0,0,0,0,1,0,0,0,1,...,6,3,15,0,0,0,0,0,5,26
4,DT0005,1,0,0,0,1,0,1,1,1,...,0,0,8,0,0,0,1,1,1,26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,DT0396,1,1,1,1,1,1,1,1,1,...,3,11,8,1,1,0,0,0,3,35
396,DT0397,0,0,1,1,0,0,0,0,0,...,6,4,10,0,0,0,0,0,12,36
397,DT0398,1,1,1,1,1,1,1,0,1,...,0,1,1,0,0,1,0,1,11,31
398,DT0399,1,1,1,0,1,1,0,0,1,...,4,8,11,1,0,1,0,1,1,5


## Next Steps

You can call the `GenerateRandomTests()` function as many times as you want to re-generate a test. It will generate different results every time (unless you pass in an integer seed value).

Add items and candidates to the data files to generate larger tests.

When you are happy with the results you can write out to a results CSV file like this:

In [6]:
df.to_csv('data/results.csv', index=False)