Skip to content

cns-iu/edx-learnertrajectorynetpipeline

Repository files navigation

edX Learner and Course Analytics and Visualization Pipeline

The edX Learner and Course Analytics and Visualization Pipeline is a R script processing pipeline used to with course and student level data from an edX course database or edX Data Package. The pipeline was designed to:

Protocol

A generalize protocol is available at Protocols.io, edX Learner and Course Analytics and Visualization Pipeline, V.3 dx.doi.org/10.17504/protocols.io.zfhf3j6.

The protocol covers how to use Scripts 1-7 in the pipeline, and was written prior to the development of Script 0. The newest script is a redundant process that anonymizes data, and should be applied to data from an edX course after setting up the project directory and extracting data from tarballs prior to processing, analysis and visualization.

Data Processing and Analysis Pipeline

Each script in the data processing and Analysis pipeline is briefly described below:

  1. edX-0-userDataAnonymizer.R script loads data sets with potential user identifying data (e.g. names, emails and personal addresses) and removes processes theses datasets to ensure usere and student privacy is maintained for analysis.
  2. edX-1-courseStructureMeta.R script extracts a the course structure from the edX Data Package files. The course structure is used in processing log files and creating the node lists in learner trajectory networks.
  3. edX-2-studentUserList.R script processes user profile datasets from the edX Data Package to identify active students in the course, and exclude instructors, teaching assistants and beta testers from the user log datasets. The script generates a list of learners’ edX user IDs.
  4. edX-3-eventLogExtractor.R script processes the daily edX course’s event tracking logs (which use streaming JSON format) for active students in the course. Logs are collected for each day of the course, combining all students’ actions in one file. The script loops through the known learner user identifiers generated by the edX-2-studentUserList.R to extract a raw event log for each student in the course. The logs are saved as individual CSV files. The processing speed of this script will be based on the number of students and their volume of recorded activity.
  5. edX-4-eventLogFormatter.R script processes the individual students event logs, extracted by the edX-3-eventLogExtractor.R script. The script uses the course structure dataset generated by edX-1-courseStructureMeta.R script as part of the log processing. The script allows a researcher to identify the types of events that are maintained in the final event logs for a student for analysis. All events in the log are aligned to the lowest level of the course structural hierarchy; provides temporal ordering and event period calculations and outlier estimates. The script loops through the identified list of learners, and sorts students into further groups based on the size of their processed log files (for example, the script separates students with fewer than 10 events to remove them from the analysis). The script creates of two new lists of student users based on analysis of the processed event logs: active and inactive students who were not excluded by edX-2-studentUserList.R script.
  6. edX-5-learnerTrajectoryNet.R script creates a learner trajectory network for each student in the course based on the individual’s processed event logs and user list generated by edX-4-eventLogFormatter.R. The script first creates an edge list to document transitions between modules in a course, and then creates a node list that describe a student’s interaction with each low level module in the course. The script exports a node and an edge lists for each student as: 1) two CSV files, and as well as 2) a JSON formatted learner trajectory network that combines the nodes and edge lists datasets are combined into a single file.
  7. edX-6-moduleUseAnalysis.R script uses the node lists generated by edX-5-learnerTrajectoryNet.R and lists of student IDs generated by edX-4-eventLogFormatter.R. The script aggregates the node lists from individual students’ learner trajectory networks to provide an analysis centering the course structure overall student interactions and activity. Analysis is completed for the lowest modules in the course hierarchy. The results are saved as a CSV data that can be joined to the course structure data set produced by edX-1-courseStructureMeta.R script.
  8. edX-7-studentFeatureExtraction.R script uses the course metadata generated by the edX-1-learnerTrajectoryNet.R script, and output user list and processed student logs and list of student IDs generated by the edx-4-eventLogFormatter.R script. The script loops through the list of student processed event logs to create a set of frequency statistics of student activity in an EdX course (e.g. number of sessions, events, unique modules, event_types), calculations of temporal use of content (overall, and relevant module and event types).
  9. edx-8-cohortLearnerPathwaysNet.R script uses a list of student user ids, the course structural data, and processed students' edX event log (CSV) to create a learner transition networks for a group of students using the igraph library. The network represents student's transition between EdX blocks learning modules using a student's event log of click actions in an EdX course.

Sample Data

Sample data sets are provided in the Data directory, which were created by the processing and analysis scripts described above. A short index of these files is available to review at the Rmarkdown documentation site Sample Data Index.

Visualization Documentation

The visualizations generated from the data processing and analysis pipeline are visualizations created using R are available for replicated as R Markdown websites.

  1. Figure 2.B
  2. Figures 4 and 5
  3. Figure 6

A Note on Using the Learner Trajectory Network Pipeline

The processing scripts are provided under Apache License 2.0. Contributors provide permission for commercial use, modification, distribution, patent use, and private use. Licensed works, modifications, and larger works may be distributed under different terms and without source code. The scripts are provided with a limited liability and warranty; use these data processing scripts at your own discretion, and make preservation copies of any source data prior to use.

Additional modifications are likely needed to make use of this pipeline when processing other course datasets that use the edX Data Package format specification. Organizational implementation of the edX learning management systems may use customized event log tracking systems, courses may use different types of edX block modules, and logs may include types of events that were not encountered in this project (e.g. error events, or edX discussion forums). An exploratory analysis of the course structure and event logs is advisable at the outset of a project.

About

Exploratory data analysis pipeline using edX event logs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages