# Course Enrollment

## Authors: Isaac, Conor, and Quoc

    The purpose of this notebook is to teach you, the user, how to use the code that we have spent the past year creating. The purpose of that code is to predict course enrollment for given quarters. What that means is that you can give our code a large amount of data, which we will teach you how to format and where to put it, and the code will learn the patterns of that code and predict the number of students that will take every class for the quarters that you wish to predict.

In [1]:
# This code block contains all necessary libraries needed to run the code
# Consult the README file to learn how to install the libraries on your machine
import pandas as pd
import random
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score, root_mean_squared_error
import matplotlib.pyplot as plt


# These are not libraries but our code, made to be run from only a few lines for clarity's sake
from curation import Curate
from create_datasets import CreateData
from kfold_test import kfold
from dataset import EnrollmentDataset
from sklearn_model import sklearnModel
from sklearn_trainer import Trainer

In [2]:
# This is the data that we use to train our model
# Make sure your data is in this format and put the path to your data where data/WWU_course_info.csv is
# We got this from the Registrar's office
base_data = pd.read_csv('data/WWU_course_info.csv', encoding='ISO-8859-1')
base_data.head()

Unnamed: 0,TERM,CRN,SUBJECT,COURSE_NUMBER,TITLE,ACTUAL_ENROLL,CAPENROLL,PRIMARY_BEGIN_TIME,PRIMARY_END_TIME,U,M,T,W,R,F,S,PRIMARY_INSTRUCTOR_TENURE_CODE,CAMPUS
0,201510,10046,COMM,442,Video Workshop,12,16,1700.0,1820.0,,,T,,R,,,NT,M
1,201510,10049,ENG,101,Writing and Critical Inquiry,24,24,830.0,950.0,,M,,,,F,,,M
2,201510,10052,ENG,101,Writing and Critical Inquiry,24,24,830.0,950.0,,M,,W,,,,,M
3,201510,10055,ECON,206,Intro to Microeconomics,60,60,1130.0,1250.0,,M,,W,,F,,NT,M
4,201510,10060,ENG,101,Writing and Critical Inquiry,24,24,1000.0,1120.0,,M,,,,F,,,M


In [3]:
# This class creates two seperate files, machine_learning_data.csv and visualization.csv
# Both are used for what the names imply, we will use the first to train our models and the 
# second to create graphs and understand the data
Curate.main(base_data)

In [4]:
# This class takes the machine learning data created in the previous class and splits it up into smaller chunks
# That way when training takes place it can randomize the quarters and train on them and test on them randomly which does better than in order
CreateData.main()

In [5]:
# Using various searching algorithms, we have found that the below model is one of our best models
model = MLPRegressor(
    hidden_layer_sizes=(100,),
    activation='tanh',
    solver='adam',
    learning_rate='adaptive',
    max_iter=500,
    alpha=0.0001,
    verbose=False
)

In [6]:
Trainer.main()

Validation RMSE: 6.12
R² Score: 0.58


('data/fold_1.csv',
 0      18
 1      35
 2      33
 3      28
 4      13
        ..
 183    16
 184    26
 185    20
 186    24
 187    35
 Name: ACTUAL_ENROLL, Length: 188, dtype: int64,
 array([17.62174007, 32.29785176, 28.8919625 , 19.70403695, 21.16444817,
        21.16444817, 48.56792646, 34.16256005, 21.19502117, 16.68745252,
        20.39511723, 19.54838641, 20.63149352, 28.54941419, 20.94158016,
        20.39511723, 36.88513671, 16.68745252, 20.39511723, 20.39511723,
        19.54838641, 28.24077531, 23.39198164, 21.35384605, 20.94158016,
        14.55878146, 17.62174007, 31.09242495, 21.16444817, 39.62862517,
        21.19502117, 33.44599798, 17.14707743, 21.16444817, 36.40232933,
        22.18737511, 21.59770577, 33.62834413, 26.50444396, 29.64291852,
        20.63149352, 28.22635001, 21.19502117, 27.1282813 , 19.35516235,
        21.46625456, 51.32003066, 22.91804195, 21.86174652, 16.64847187,
        27.30116273, 19.35516235, 28.01375969, 21.34195269, 35.39844686,
       