This is the future! No one goes to physical schools any more and MOOCs rule the world.
Harvard and MIT have released a great dataset around engagement statistics for their MOOC courses.
One of the biggest issues with MOOCs is engagement. We will try to predict the probability of 'engagement' of a student given all the other columns. We will define engagement here as either: explored == 1 OR certified == 1.
- Build a column labeled 'engagement' that represents this definition.
- Try to predict the engagement of a student. Remember we do not want to use any columns that we will not have before a student has completed a course in a live scenario (i. e. grade, viewed, explored, certified, etc.)
- Which features are most important for engagement?
- What's the confusion matrix for the selected predictors.
- What's the best accuracy, recall and precision.