
This dataset captures student interactions with **VARA**, a digital learning platform where educational materials are organized as structured **learning trajectories**. These trajectories consist of:

1. **Sequential Episodes**  
   - Arranged in dependent order  
   - Each targets a specific **concept** or skill  

2. **Flexible Activities**  
   - Student-directed exploration within episodes  
   - Multiple attempts allowed  
   - Built-in **hints** and supplementary materials  

Students interactions to learning trajectories are recorded and processed to extract behavioral features such as number of attempts, time spent, completions, etc.  Additionally, students' metacognitive indicators are obtained through self-reported data. 



#### Dataset samples
The dataset contains interaction records of **108 students** with **14 episodes**. The snapshots below provide samples of dataset.

In [40]:
import pandas as pd

# load data
df = pd.read_csv('../Eduflex.xlsx - Pankajile.csv')

# columns containing knowledge-related data
knowledge_cols = ['pre_knowl','post_knowl','knowledge gain']

# columns containing self-reported data
self_report_cols = ['T5_learning difficulty',
                     'T2_Post_effectiveness',
                     'T4_Post_needforhelp',
                     'T7_shared_control_student',
                     'T8_shared_teacher',
                     'T11__cogn_load',
                     'T13_own_effort',
                     'T14_TAM_easy']

# columns containing episodes data
episodes_cols =[item for item in df.columns.to_list() if item not in knowledge_cols + self_report_cols + ['Gender']]

# knowledge data
df_knowledge = df[knowledge_cols]

# self-reported data
df_self_report = df[self_report_cols]

# episodes data
df_episodes = df[episodes_cols]

df_knowledge.head()


Unnamed: 0,pre_knowl,post_knowl,knowledge gain
0,0.49,0.6,0.11
1,0.86,0.97,0.11
2,,,
3,0.93,0.97,0.04
4,0.88,0.87,-0.01


In [42]:
df.shape

(108, 133)

In [38]:
df_self_report.head()

Unnamed: 0,T5_learning difficulty,T2_Post_effectiveness,T4_Post_needforhelp,T7_shared_control_student,T8_shared_teacher,T11__cogn_load,T13_own_effort,T14_TAM_easy
0,,4.67,4.5,5.0,3.0,1.0,5.0,3.8
1,3.4,1.67,3.0,5.0,5.0,5.0,5.0,4.2
2,3.6,3.0,3.0,3.25,3.0,3.0,3.0,3.6
3,2.8,3.0,2.0,3.25,4.75,2.63,3.75,3.8
4,2.6,2.67,3.0,2.75,3.75,2.88,2.75,4.6


In [36]:
df_episodes.head()

Unnamed: 0,Student_ID,E1,E1_total_tasks,E1_tasks_completed,E1_task_complexity,E1_total_hints,E1_hints_used,E1_total_materials,E1_addMat_used,E1total_activity_count,...,E13_total_activity_count,E14,E14_total_tasks,E14_tasks_completed,E14_task_complexity,E14_total hints,E14_total_hints_used,E14_total materials,E14__addMat_used,E14_total_activity
0,keila1,0.22,8,87.5,1.43,3,0.0,2,0,42,...,46,284,8,62.5,1.8,0,0,4,0,82
1,keila2,0.79,8,87.5,1.43,3,66.67,2,0,75,...,46,284,8,62.5,1.8,0,0,4,0,79
2,keila3,75.0,8,37.5,1.33,3,0.0,2,0,28,...,62,284,8,12.5,1.0,0,0,4,0,39
3,keila4,75.0,8,50.0,1.5,3,66.67,2,0,73,...,55,284,8,62.5,1.8,0,0,4,0,138
4,keila5,75.0,8,50.0,1.5,3,33.33,2,0,41,...,94,284,8,62.5,1.8,0,0,4,0,66




::: {.callout-warning style="border-color: #d9534f; background-color: #fdf7f7;"}
## Missing values
Some of the data columns in the datasets **contain missing values** which need to be handled before moving to analysis.
:::