Skip to content
Browse files

Add files via upload

  • Loading branch information...
sharmaroshan committed Apr 10, 2019
1 parent a4a6a79 commit 9dff2fe77ed72b7259421d6f682495445a9478cf
@@ -0,0 +1,83 @@
Educational Process Mining (EPM):A Learning Analytics Data Set

Version 1.0
Mehrnoosh Vahdat(1,2), Luca Oneto(1), Davide Anguita(1), Mathias Funk (2), and Matthias Rauterberg (2)

1 - Smartlab - Non-Linear Complex Systems Laboratory
DITEN - Universit� degli Studi di Genova, Genoa (I-16145), Italy.
2 - Department of Industrial Design, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands

la '@'

The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa.

We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students, and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at:

and to know more about the exercises contents of each session see 'exercises_info.txt'.

Our data set contains the students' time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the students� data per session. Each 'Session' folder contains up to 99 CSV files each dedicated to a specific student log during that session. The number of files in each folder changes due to the number of students present in each session. Each file contains 13 features. See 'features_info.txt' for more details.

For the details of activities performed by the students during the course, see 'activities_info.txt'

The data set includes the following files:

- 'README.txt'

- 'features_info.txt': contains information about the variables used on the feature vector.

- 'features.txt': List of all features.

- 'activities_info.txt': contains information about the variable 'activity'.

- 'activities.txt': list of all activities.

- 'exercises_info.txt': contains information about the variable 'exercise'.

- 'grades_info.txt': contains information about the grade data.


- 'Processes': contains the data files from Session 1 to 6.

- 'logs.txt': shows information about the log data per student Id. It shows whether a student has a log in each session (0: has no log, 1: has log).

- 'final_grades.xlsx': contains the results of the final exam in two sheets.

- 'intermediate_grades.xlsx': contains the grades for the students' assignments per session.

- 'final_exam.pdf': shows the content of the final exam (original in Italian).

- 'final_exam_ENG.pdf': shows the content of the final exam translated in English.


For more information about this data set please look at:
la '@'

Use of this data set in publications must be acknowledged by referencing the following publication [1]

[1] M. Vahdat, L. Oneto, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: G. Conole et al. (eds.): EC-TEL 2015, LNCS 9307, pp. 352-366. Springer (2015).
DOI: 10.1007/978-3-319-24258-3 26

This data set is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.

Other Related Publications:
[2] M. Vahdat, L. Oneto, A. Ghio, G. Donzellini, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics methodology to profile students behavior and explore interactions with a digital electronics simulator. In: de Freitas, S., Rensing, C., Ley, T., Munoz-Merino, P.J. (eds.) EC-TEL 2014. LNCS, vol. 8719, pp. 596�597. Springer (2014).

[3] M. Vahdat, A. Ghio, L. Oneto, D. Anguita, M. Funk, M. Rauterberg, Advances in learning analytics and educational data mining, in: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2015).

Mehrnoosh Vahdat, Luca Oneto, Davide Anguita, Mathias Funk, and Matthias Rauterberg. September 2015.
@@ -0,0 +1,15 @@
1 Study_Es_# of session_# of exercise
2 Deeds_Es_# of session_# of exercise
3 Deeds_Es
4 Deeds
5 TextEditor_Es_# of session_# of exercise
6 TextEditor_Es
7 TextEditor
8 Diagram
9 Properties
10 Study_Materials
11 FSM_Es_# of session_# of exercise
12 FSM_Related
13 Aulaweb
14 Blank
15 Other
@@ -0,0 +1,117 @@
Activity Selection

The details about the activities and their meanings are as follows:

The priority to find keywords and assign activity names is based on the order in 'activities.txt'. For example, we looked for all the keywords of our interest relevant to the course and educational activities of the students and if nothing is found, we assigned "Other".

Abbreviation of activities:

Es: Exercise
#: Number
Deeds: Digital Electronics Education and Design Suite
Diagram: Simulation Timing Diagram
FSM: Finite State Machine Simulator


Description of activities:

Study_Es_# of session_# of exercise

It indicates that a student is studying / viewing the content of a specific exercise (e.g. Study_Es_6_1). To know more about the content of exercises, see 'exercise_info.txt'.


Deeds_Es_# of session_# of exercise

It indicates that the student is working on a specific exercise inside the Deeds simulator (Digital Circuit Simulator) (e.g. Deeds_Es_6_1). To get more familiar with the simulator and its components, see 'README.txt'.



This shows when the student is on Deeds simulator but it is not clear what exercise he is working on.

Suggestion: As we consider the 'exercise' feature (the third feature) from the moment the student study the content of an exercise to the moment he changes to another exercise, this can be estimated for assigning the number of exercise to Deeds as well.



It contains other activities related to Deeds, for instance when the students save circuit image or export VHDL.


TextEditor_Es_# of session_# of exercise

when the student is writing the results of his work to submit later to the instructor. The students use a text editor (Word, Office, etc.) to answer to the questions and explain the solution they found through Deeds simulator (e.g. TextEditor_Es_6_1). To know more about the structure of exercises in text editor, see 'exercise_info.txt'.



It indicates that the student is working on an exercise in the text editor but it is not clear which exercise it is. This happens due to change of file names by the student, so we cannot recognize automatically which exercise he works on. Again, the suggestion given above on Deeds_Es holds.



It shows that the student is using the text editor but not on exercises, this can contain other activities related to the text editor, for instance when they just open it, etc.



When the students use 'Simulation Timing Diagram' to test the timing simulation of the logic networks, while using the Deeds simulator. It also contains these components: "Input Test Sequence" and "Timing Diagram View Manager ToolBar".



Deeds simulator, Simulation Timing diagram, and FSM contain the properties window, which allows to set all the required parameters of the component under construction. For instance, the Properties can contain: "Switch Input", "Push-Button", "Clock properties", "Output properties", "textbox properties". We label all as 'Properties'.

Suggestion: to understand if 'Properties' refer to Deeds simulator or Simulation Timing diagram, you can look at the previous activity.



The student is viewing some materials relevant to the course (provided by the instructor).


FSM_Es_# of session_# of exercise

When the student is working on a specific exercise on 'Finite State Machine Simulator' (e.g. FSM_Es_6_1).



When the student is handling the components of Finite State Machine Simulator.



Students are using Aulaweb as a learning management system (based on Moodle) which is used for the course of digital electronics at the University of Genoa. In Aulaweb, the students might access the exercises, download them, upload their work, check the forum news, etc.



When the title of a visited page is not recorded.



When the student is not viewing any pages described above, then we assigned 'Other' to the activity. This includes, for majority of the cases, the student irrelevant activity to the course (e.g. if the student is on Facebook).

@@ -0,0 +1,46 @@

The content of exercises for the laboratory sessions can be retrieved from:

To compare the content of the exercise with exercise feature values, please consider the code assigned to it on the website. Each code contains a zip file containing the files to be used in a text editor as well as the format to be used in Deeds simulator:

Es_1_1 = "001002"
Es_1_2 = "005030"
Es_1_3 = "005040"
Es_1_4 = "005050"

Es_2_1 = "015090"
Es_2_2 = "015095"
Es_2_3 = "015065"
Es_2_4 = "015100"
Es_2_5 = "015070"
Es_2_6 = "015080"

Es_3_1 = "020045"
Es_3_2 = "020120"
Es_3_3 = "020055"
Es_3_4 = "025130"

Es_4_1 = "030140"
Es_4_2 = "030144"
Es_4_3 = "030160"
Es_4_4 = "030164"
Es_4_5 = "035220"

Es_5_1 = "030180"
Es_5_2 = "035200"
Es_5_3 = "035210"
Es_5_4 = "035230"

Es_6_1 = "045270"
Es_6_2 = "045280"
Es_6_3 = "045290"
Es_6_4 = "050300"
Es_6_5 = "050310"
Es_6_6 = "050425"

@@ -0,0 +1,13 @@
1 session
2 student_Id
3 exercise
4 activity
5 start_time
6 end_time
7 idle_time
8 mouse_wheel
9 mouse_wheel_click
10 mouse_click_left
11 mouse_click_right
12 mouse_movement
13 keystroke
@@ -0,0 +1,99 @@
Feature Selection

The features selected for this data set come from pre-processing of data collected through a logging program.

Due to ethical reasons and to ensure the anonymity of our users, we cannot share the original log files, instead, we share the data transformed and cleaned in an appropriate format.

The original logs contain the logging data of client system per approximately a second, while the features are calculated in order to be allocated to a particular activity.

The features are selected and presented in a suitable format for Process Mining. In this sense, the data is presented per session, per student, and per exercise. Each CSV file belongs to a specific session and a specific student (named by the student Id). Each file contains several exercises of that session presented in 'exercise' feature. Each 'exercise' contains activities, which start-time, end-time, and other features are allocated to that.

Process Mining is a process management technique that allows to extract valuable knowledge from the event logs. In Process Mining, normally events / activities are linked together in a process instance or case. The potential cases in our data set are: session, student_Id, and exercise.


Here is the list of features with more details:


It shows the number of laboratory session from 1 to 6.



It shows the Id of student from 1 to 115.



It shows the Id of the exercise the student is working on. Each session contains 4 to 6 exercises, shown as 'Es_# of the session_# of the exercise' (e.g. Es_1_2: exercise 2 of session 1).
'Es' with no number means the student has not started the exercise yet.



The activities are labeled based on the title of web pages that are on focus / in the view of the student. To ensure anonymity, we did not publish the exact name of visited pages by the students thus renamed and augmented the pages into 'activity' names. To read about the details of activity labels, see 'activities_info.txt'.



It shows the start date and time of a specific activity with the format: hh:mm:ss



It shows the end date and time of a specific activity with the format: hh:mm:ss



It shows the duration of idle time between the start and end time of an activity in milliseconds.



It shows the amount of mouse wheel during an activity.



It shows the number of mouse wheel clicks during an activity.



It shows the number of mouse left clicks during an activity.



It shows the number of mouse right clicks during an activity.



It shows the distance covered by the mouse movements during an activity.



It shows the number of keystrokes during an activity.


BIN +962 KB final_exam.pdf
Binary file not shown.
BIN +925 KB final_exam_ ENG.pdf
Binary file not shown.
BIN +20 KB final_grades.xlsx
Binary file not shown.

0 comments on commit 9dff2fe

Please sign in to comment.
You can’t perform that action at this time.