The purpose of this analysis is to observe trends across question difficulty as assess by our LMKT model. Specifically, we want to see what types of questions do students differ in difficulty with respect to, and what types of question are consistently easy or hard. Does this align with our knowledge of difficult language learning concepts? 

In order to study this, we first load training data for our question generation model, which as our paper describes, contains difficulty values predicted by our LMKT models for the last question in the sequence. 

In [22]:
import hashlib
from collections import defaultdict
import numpy as np
with open("../../data/generation_train_data/french_train","rb") as f:
    french_data = f.readlines()
with open("../../data/generation_train_data/spanish_train","rb") as f:
    spanish_data = f.readlines()

We now build dictionaries mapping the different target questions to their estimated difficulty for different students. 

In [80]:
def get_data_tuple(data_line):
    data_line = str(data_line)
    difficulty = float(data_line.split("<QU>")[0][8:])
    student = "<QU>".join(data_line.split("<G>")[0].split("<QU>")[1:])
    student = hashlib.md5(student.encode()).hexdigest()
    target = data_line.split("<G>")[-1][:-9]
    return student, difficulty, target

def build_dict(data):
    data_dict = defaultdict(list)
    seen_students = defaultdict(list)
    for l in data:
        student, difficulty, target = get_data_tuple(l)
        if(student in seen_students[target]):
            continue
        seen_students[target].append(student)
        data_dict[target].append(difficulty)
    return data_dict

french_dict = build_dict(french_data)
spanish_dict = build_dict(spanish_data)

Finally, we can analyze the average difficulty for different target questions across students, as well as the variance in difficulty (std across students), indicating what questions our LM-KT model tends to always deem "difficult" or "easy", or what questions rely heavily on the specific student knowledge state.

In [81]:
phrases = [" no!", " the man", " we eat nineteen crepes.", " happy new year!", " agreed, thank you very much!", " tonight, we are eating outside."]
for phrase in phrases:
    print(phrase)
    print((np.mean(french_dict[phrase])), np.std(french_dict[phrase]))

 no!
0.9651052631578946 0.02255028238087193
 the man
0.8512666666666667 0.09027055629236665
 we eat nineteen crepes.
0.27428125000000003 0.12117864765889037
 happy new year!
0.633547619047619 0.14932645642875597
 agreed, thank you very much!
0.006076923076923079 0.0039117483492588055
 tonight, we are eating outside.
0.24366666666666667 0.011897712198383165


In [82]:
phrases = [" no, thanks.", " in the kitchen", " a lemon", " mom, come in, please.", " why don't you touch the turtle?", " we eat fish at lunchtime."]
for phrase in phrases:
    print(phrase)
    print((np.mean(spanish_dict[phrase])), np.std(spanish_dict[phrase]))

 no, thanks.
0.9565999999999998 0.010104784345381462
 in the kitchen
0.8918372093023257 0.06505397767098225
 a lemon
0.8304 0.12645489314376096
 mom, come in, please.
0.6554727272727272 0.15391129139811252
 why don't you touch the turtle?
0.10790909090909091 0.0936438655423782
 we eat fish at lunchtime.
0.06927272727272728 0.036514233610995414
