We examine a dataset from the Open University in the UK. The dataset has granular activity data for students in 6 classes over a period of 2 years (2013-2014).
- We find statistically significant correlations between specific types of activities and activity patterns and learning outcomes (class grade).
This is our third Flatiron School project (NYC Data Science), for module 4
Se the presentation and conclusions on Google Slides or view the pdf in our repository.
There are two Jupyter notebooks, feature-engineering.ipynb and analysis.ipynb.
- feature-engineering assembles the dataset and engineers a number of features related to activity and attention metrics. To use this, you will have to download the original dataset from the Open University (link above). It is about 500MB. The directory of csvs should be put in the top level of the directory structure as "anonymisedData".
- analysis performs the analysis for the presentation. It works with the 5MB dataset in the repository
There is also a utilities.py file that provides a number of conveniences for use in the notebooks.
Software required: statsmodels, pandas, scipy, and seaborn, as well as their dependencies.
We ran this code with:
- Python 3.6
- statsmodels 0.10.1
- pandas 0.25.1
- scipy 1.3.1
- seaborn 0.9.0