<h1> DKT Model for ASSISSTMENT </h1>
<p> This notebook is for creating RNN model for generating output vector as a feature for predicting student's property </p>

<i>
DKT Model Code for this project is based on: https://github.com/davidoj/deepknowledgetracingTF
</i>

In [3]:
import pandas as pd
import numpy as np
import os
import sys
import csv
import collections
import tensorflow as tf
from sklearn.metrics import roc_auc_score

<h1> Data Processing Block </h1>
<p> Student logs, including training and validation set, are loaded into the notebook in pandas dataframe. After loading, useful columns are extracted and stored into the following:</p>
<h3>Variables</h3>
<ul>
    <li> 
        <b>training_label_df, validation_test_label, student_df</b>: Includes all columns from the original file 
    </li>
    <li>
        <b>student_df_dkt</b>: Columns <i>"ITEST_id", "skill", "actionId" and "correct"</i> are extracted. All records with same ITEST_id are gathered togethered and sorted in ascending order with respect to actionId.
    </li>
    <li> 
        <b>skill_list</b>: All skills included in the student records are extracted. It <b>MUST</b> be used for onehot encoding later to maintain encoding consistency. <b><i>[!IMPORTANT]</i></b>
    </li>
    <li>
        <b>id_entrynum</b>: A dictionary with ITESTId(Key), num entry(value), sorted by ITESTId
    </li>
</ul>

<h3>Functions</h3>

<b>student_dkt_data(data, skill_list=skill_list, id_entrynum=id_entrynum)</b>
<ul>
    <li> data: student_df_dkt in pandas dataframe sorted in ITEST_id and actionId </li>
    <li> skill_list: The list stored all skills in a given order </li>
    <li> id_entrynum: a dictionary sorted in ITESTId with value on num of entries per ITESTId </li>
    <li> Output[i]: [# of questions answer of student i, (skills of questions, correctness of the answer)] </li>
</ul>

<i>Last update: 31/10/2017 </i>

In [4]:
# Data preparation
# Merge student logs into pandas data frame
data_dir = './data/'

student_log_paths = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.startswith('student_log')]
training_label_path = os.path.join(data_dir, 'training_label.csv')
validation_test_label = os.path.join(data_dir, 'validation_test_label.csv')

dfs = []
for path in student_log_paths:
    temp = pd.read_csv(path)
    dfs.append(temp)
student_df = pd.concat(dfs)

training_label_df = pd.read_csv(training_label_path)
validation_test_label_df = pd.read_csv(validation_test_label)
student_df = student_df[student_df.scaffold == 0]

student_df_dkt_static = student_df[['ITEST_id', 'AveKnow', 'AveCarelessness', 'AveCorrect', 'AveResBored', 'AveResEngcon', 'AveResConf' , 'AveResFrust', 'AveResOfftask', 'AveResGaming']]
student_df_dkt_dynamic = student_df[['ITEST_id', 'skill', 'actionId', 'correct', 'hint', 'hintCount', 'hintTotal']]
student_df_dkt_static = student_df_dkt_static.drop_duplicates()
# Please do not modify this one. This will be the skill_list used throughout the whole training
skill_list = student_df_dkt_dynamic.skill.unique()

#Sort student_df according to ITESTId then actionId
student_df_dkt_dynamic.sort_values(["ITEST_id","actionId"], inplace=True, ascending=True)

# Create a dictionary with ITESTId(Key), num entry(value), sorted by ITESTId
id_entrynum = student_df_dkt_dynamic.groupby('ITEST_id').nunique()['actionId'].to_dict()
id_entrynum = collections.OrderedDict(sorted(id_entrynum.items()))

ValueError: No objects to concatenate

In [31]:
print(student_df_dkt_static)
skill_list_index= dict(zip(range(len(skill_list)),skill_list))
print(skill_list_index)

       ITEST_id   AveKnow  AveCarelessness  AveCorrect  AveResBored  \
0             8  0.352416         0.183276    0.483902     0.208389   
1056         35  0.255164         0.158848    0.379658     0.222796   
2049         39  0.281693         0.152227    0.454545     0.274700   
2467         64  0.157938         0.098357    0.334038     0.198394   
3886         77  0.191948         0.094195    0.413249     0.261455   
4203        126  0.250838         0.111159    0.500000     0.273188   
4609        134  0.183801         0.113211    0.323420     0.267901   
4878        156  0.271432         0.183643    0.384766     0.255130   
5390        160  0.222439         0.146000    0.387009     0.234373   
6129        164  0.117598         0.081440    0.256983     0.252204   
6308        205  0.229988         0.126401    0.445407     0.242691   
6885        215  0.313221         0.184542    0.475655     0.257214   
7419        243  0.388355         0.218368    0.566243     0.251760   
7970  

In [5]:
import pickle
with open("assisstment_dynamic.pkl", "wb") as fp:   # Unpickling
    pickle.dump(student_df_dkt_dynamic, fp)
with open("assisstment_static.pkl", "wb") as fp:   # Unpickling
    pickle.dump(student_df_dkt_static, fp)
with open("assisstment_skill.pkl", "wb") as fp:   # Unpickling
    pickle.dump(skill_index)
with open("assisstment_dynamic.pkl", "write") as fp:   # Unpickling
    b = pickle.load(fp)
with open("assisstment_static.pkl", "rb") as fp:   # Unpickling
    c = pickle.load(fp)
with open("assisstment_skill.pkl", "rb") as fp:   # Unpickling
    d = pickle.load(fp)
    
print(b)

NameError: name 'student_df_dkt_dynamic' is not defined

In [8]:
with open("data/assisstment_dynamic.pkl", "rb") as fp:   # Unpickling
    b = pickle.load(fp)
with open("data/assisstment_static.pkl", "rb") as fp:   # Unpickling
    c = pickle.load(fp)
with open("data/assisstment_skill.pkl", "rb") as fp:   # Unpickling
    d = pickle.load(fp)

In [9]:
print(b)

       ITEST_id                                      skill  actionId  correct  \
0             8            properties-of-geometric-figures      9950        0   
2             8   sum-of-interior-angles-more-than-3-sides      9952        0   
3             8   sum-of-interior-angles-more-than-3-sides      9953        0   
4             8   sum-of-interior-angles-more-than-3-sides      9954        1   
5             8   sum-of-interior-angles-more-than-3-sides      9955        0   
6             8   sum-of-interior-angles-more-than-3-sides      9956        1   
7             8                             point-plotting      9957        0   
16            8                              reading-graph      9966        1   
17            8                                       area      9967        0   
22            8                                       area      9972        0   
29            8                                square-root      9979        0   
39            8             