# Example 1: Student modelling & prediction

Here we attempt to identify students at risk, based on their activity in Moodle. As "at risk" we identify a student that has scored lower a grade lower than 2 in the course (0 is a fail, 5 is an excellent).
## Data
we provide a file with students' grade and a file with the number of times a student has accessed a component of a certain type. 
For example, if you print the data you will see that the student with user id (uid) 10020 has accessed a URL component 22 times.

First we load the libraries that we will need

In [None]:
library("corrplot")
library("DAAG")

In [None]:
#read the datasets containt information about accessing components and grades
component_stats = read.csv("components_stats.csv")
grades = read.csv("grades.csv")

#merge these two datasets. 
#In this way, we create a unified dataset taht associates student's online activity with performance(grades)
comp_grade = merge(component_stats, grades, by = "uid")


#lets have a look at the 10 first rows of our dataset
comp_grade[1:10,]

## Students at risk

In [None]:
#now lets create a variable "success" that we will use to signify students at risk
# For this example, students with success = 0 are considered to be at risk
comp_grade <- transform(comp_grade, success= ifelse(Final.Grade>= 3, 1, 0))

#lets see how it looks!
comp_grade[1:10,]

# Step 1: Identify Predictors

Then we run a correlation analysis to identify which metrics relate to student's performance. Then we plot the results of the correlation analysis!

In [None]:
compcor = cor(comp_grade)
corrplot(compcor, method = "circle")

Lets see some correlations' results between specific metrics and final grade, to get a rough idea what we're talking about!

In [None]:
cor.test(comp_grade$Final.Grade,comp_grade$Page)
cor.test(comp_grade$Final.Grade,comp_grade$System)
cor.test(comp_grade$Final.Grade,comp_grade$HotPot.module)
cor.test(comp_grade$Final.Grade,comp_grade$Book)
cor.test(comp_grade$Final.Grade,comp_grade$Mindmap)

####################################################

## Step 2: Build a student model

Here we implement a logistic regression model to predict students at risk.
We use the metrics we explore above (components) as predictors and we attempt to predict the value of the variable "success".

Here our model is a binary classifier - 1 means the student will complete the course successfully and 0 means the student is at risk.

####################################################

In [None]:
# we train the student model using our dataset and logistic regression where success is the variable 
# we want to predict using as input the metrics:Page, System etc.
predictSuccess <- glm(success ~ Page + System + HotPot.module + Book + Mindmap, data = comp_grade, family = binomial())

#here we provide an overview of the model we just trained
summary(predictSuccess)

#print out the cross validation results for the training phase
CVbinary(predictSuccess)

## Step 3: Prediction

In [None]:
#we split our dataset into two, in order to try out the prediction
trainset = comp_grade[0:27,]
testset = comp_grade[28:31,]

#we retrain the model using the trainset
predictSuccess <- glm(success ~ Page + System + HotPot.module + Book + Mindmap, data = trainset, family = binomial())

#we predict the variable success for the test set and compute the accuracy of our prediction
fitted.results <- predict(predictSuccess,newdata=testset,type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != testset$success)
print(paste('Accuracy',1-misClasificError))


## Conclusion


