# Predicting Grades
Let's predict the students' grades from their habits outside of class using the dataset provided in chapter 5 of "*Artificial Intelligence: A Non-Technical Introduction*" by Tad Gonsalves.

## Import relevant modules

In [14]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Load Dataset

In [6]:
data = pd.read_csv('grades.csv')
data

Unnamed: 0,ID,Attendance,Study,Hobbies,Drinks,Grades
0,A20170,7,5,5,5,D
1,A20171,15,10,4,2,A
2,A20172,12,9,5,0,B
3,A20173,13,8,6,0,B
4,A20174,5,4,10,1,F
5,A20175,9,6,6,4,C
6,A20176,14,7,5,1,A


## Preparing Dataset
Divide the dataset to input (X) and output (y)

In [7]:
X = data.drop(columns=['Grades','ID'])
X

Unnamed: 0,Attendance,Study,Hobbies,Drinks
0,7,5,5,5
1,15,10,4,2
2,12,9,5,0
3,13,8,6,0
4,5,4,10,1
5,9,6,6,4
6,14,7,5,1


In [8]:
y = data['Grades']
y

0    D
1    A
2    B
3    B
4    F
5    C
6    A
Name: Grades, dtype: object

## Learning and Prediction
Let's use **Decision Tree** to predict a random student's grade

In [12]:
model = DecisionTreeClassifier()
model.fit(X,y)
predictions = model.predict([[8,3,6,3], [1,8,6,1]])
predictions

array(['F', 'B'], dtype=object)

## Calculating the Model's Accuracy
We don't know the accuracy of the previous model because we used all of our dataset for training. So, this time let's split the datasets to *training* and *test* datasets to help validate our model's performance. As a rule of thumb, let's set aside 20% of our data for testing and the rest for training

In [31]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

score = accuracy_score(y_test, predictions)
score

0.5

**Max. accuracy is 50% when using Decision Tree (after multiple runs).**
Why does the accuracy change each time we run? Because the training sample is very little (not enough) and datasets were divided in different ways each time we run it

## Saving and Loading a Trained Model
We can save and load a model using the **Joblib** library (not pre-installed) in sklearn

In [33]:
from sklearn.externals import joblib

# Saving model
joblib.dump(model, 'grade-prediction.joblib')

['grade-prediction.joblib']

In [34]:
# Loading model
gradePred = joblib.load('grade-prediction.joblib')
y_hat = gradePred.predict([[14, 9, 5, 1]])
y_hat

array(['A'], dtype=object)

## Visualizing a Trained Model
Decision Tree is the easiest ML model to understand. Let's visualize how our model makes decisions when predicting using **Graphviz** library

In [36]:
from sklearn import tree

X = data.drop(columns=['Grades','ID'])
y = data['Grades']

model = DecisionTreeClassifier()
model.fit(X, y)

tree.export_graphviz(model, out_file='grade-prediction.dot',
                     feature_names=['Attendancee', 'Study', 'Hobbies', 'Drink'],
                     class_names=sorted(y.unique()),
                     label='all',
                     rounded=True,
                     filled=True)

**How to view the model:**
1. Open VS Code to open the *.dot* file
2. Install "Graphviz (dot) language support for Visual Studio Code" by Joao Pinto, if you don't have it installed yet. Close VS Code and re-open
3. Re-open the file
4. Click on the 3 bullets symbol at the upper-right corner, select "Open Preview to the side"

### References:
#### Python Machine Learning Tutorial (Data Science) by Programming with Mosh (https://youtu.be/7eh4d6sabA0?list=PLELfgDZTP2qWjcpG5dLXNAkxEkaudYaDX)