### CS/ECE/ISyE 524 &mdash; Introduction to Optimization &mdash; Spring 2020 ###

# Grade Boosters #

### Algorithms predicting the best courses should be taken by CS students based on their interests, GPA, among other factors.###

#### Student 1 (email address), Student 2 (email address), Student 3 (email address), and Student 4 (email address)

*****

### Table of Contents

1. [Introduction](#1.-Introduction)
2. [Mathematical Model](#2.-Mathematical-model)
  1. [Data sets and Optimization algorithms used](#4.A.-Results)
  1. [Final Mathematical Model](#4.A.-Correctness)
3. [Solution](#3.-Solution)
4. [Results and Discussion](#4.-Results-and-discussion)
  1. [Results](#4.A.-Results)
  1. [Correctness](#4.A.-Correctness)
  1. [Things not taken into account](#4.A.-Things-not-taken-into-account)
  1. [Future work and application](#4.A.-Future-work-and-application)
5. [Conclusion](#5.-Conclusion)
6. [Attributions]

In [1]:
## RUN ME NOW ONLY ONCE TO INSTALL NECESSARY LIBRARIES
import Pkg;Pkg.add("PyPlot");Pkg.add("Cbc"); 

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m Installed[22m[39m StaticArrays ─ v0.12.2
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
 [90m [90137ffa][39m[93m ↑ StaticArrays v0.12.1 ⇒ v0.12.2[39m
[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
[90m [no changes][39m


## 1. Introduction ##

This project develops an algorithm to help students in the University of Wisconsin-Madison Computer Sciences department develop an optimal schedule based on their interests. In addition to providing their optimal schedule, the algorithm will provide insights into different course types as well as what careers these courses can lead them into. For example, courses in computer archetecture may lend someone different career options than courses in information security.

The algorithm takes into account course pre-requesites, students' areas of interest within computer science, courses' diffiulties (Average GPA), as well as areas of pursuits (industrial or academic). Furthermore, this project explores the trade-off between weights of areas of interests, grades, and area of pursuits in finding each student's optimal schedule. 

The purposes of this project are to:  
1. look for an algorithm to best serve the students as they go through a computer science program  
2. give Computer Science programs across the country ways to analysize the fit for their program in industry and / or academia  
3. give advisors of Computer Science programs better insights to better serve their undergraduate students  
4. provide social scientists and other social, administrative, political workers insights of computer science trends within American Higher Education. 

This project does not take into account of any statistical samples. This project solely looks for an algorithm to solve the optimization problem. See Future Work and Application in section 4 for how this algorithms can be applied.

## 2. Mathematical model ( model 1 )


### 1. Parameters  ###

#### 1.1 Courses ####

73 courses are used in our model; most are computer science courses, and the rest are mathematics, statistics, or ECE courses. They are represented by an array named $courses$.

#### 1.2 GPA#### 

We obtained a dataset from Kaggle that contains every class offered at UW-Madison with their corresponding number of As, ABs, Bs, Cs, Ds, and Fs given for each semester from 2006 to 2017. The dataset can be downloaded at https://www.kaggle.com/Madgrades/uw-madison-courses#grade_distributions.csv. GPA is calculated from this dataset and is stored in courses_and_GPA.csv. 

GPA is represented by an array named $gpa$. $gpa_i$ is the corresponding GPA for course i. 

#### 1.3 Prerequisites & Graduation Requirement List ####

We obtained the pre-requisites and graduation requirements from UW-Madison CS major requirement webpage (https://guide.wisc.edu/undergraduate/letters-science/computer-sciences/computer-sciences-bs/#requirementstext). 

Let the below variables be arrays that include the index of courses that satisfy different graduation requirements:

$ basic\_cs $: basic computer sciences $$ $$
    $ basic\_math $: basic mathematics $$ $$
    $ add\_math $: additional mathematics (beyond calculus) $$ $$
    $ theory $: theory of computer science $$ $$
    $ soft\_hard $: software and hardware $$ $$
    $ app $: application $$ $$
    $ elect $: elective $$ $$



Let array $pred$ records the pre-requisites for each course; $pred_i$ represents the indexes of pre-requisites courses for course i.

#### 1.4 Number of Courses ####

NUM_COURSES refers to at least how many courses to take to satisfy degree requirement. It is set to be 15.

### 2. Decision Variables ###

Let $i$ be the index of possible classes listed from the class table, $x_i$ be a binary variable for each class. The length of $x$ is 73, and $x_i=1$ when the class is taken, $x_i=0$ when the class is not taken.

### 3. Constraints ###

1) Graduation Constraint: For each requirement category, specific number of courses need to be taken:

$$\quad \sum_{i} x_i = 5 \ for \ i \ in \ basic\_cs $$
    $$\quad \sum_{i} x_i = 2 \ for \ i \ in \ basic\_math,$$
    $$\quad \sum_{i} x_i \geq 2 \ for \ i \ in \ add\_math ,$$
    $$\quad \sum_{i} x_i \geq 1 \ for \ i \ in \ theory ,$$
    $$\quad \sum_{i} x_i \geq 2 \ for \ i \ in \ soft\_hard ,$$
    $$\quad \sum_{i} x_i \geq 1 \ for \ i \ in \ app ,$$
    $$\quad \sum_{i} x_i \geq 2 \ for \ i \ in \ elect .$$
    
For example, in order to graduate, a student must take at least one class in application area.

2) Prerequisite Constraint: For each class i, its pre-requisites must be taken at first:

$$ x_j >= x_i \ for \ j \ in \ pred_i $$

3) A class can not count for more than one degree requirement:

$$\quad \sum_{i} x_i = \ NUM\_COURSES $$

### 4. Objective ###

$$max(\quad \sum_{i = 1}^{73}gpa_i*x_i)$$

## 3. Solution ( model 1 )##


In [15]:
using CSV

raw = CSV.read("courses_and_GPA.csv")
(m,n) = size(raw)

courses = raw[:,1]
gpa = convert(Matrix{Float64}, raw[:,2:n])
pre = [[],[],[],[2],[2,3],[3],[3],[69,70,3],[51],[51],[49],[51],[6,7],[55,3],[51,59,3],[70,6],[3,51],[51],[18,3],[69,1],[45,51],[3,69],[5,6],[5,6],[5,6],[1],[3,68],[23],[47,3,5],[4,5],[6,47],[69,6],[5,6],[55,73],[1],[3,69],[70,6],[],[3,51],[],[24],[24],[32],[],[44],[44],[69],[47],[71,73],[69],[69],[58],[52],[47],[69],[47],[],[57],[47],[47],[51],[47],[53],[56,57],[64],[64],[56],[],[68],[],[47],[48],[68]]                                 

basic_cs = [2,3,5,6,70]
basic_math = [68,69]
add_math = [55,73,8,10,14,15,18,19,46,47,50,51,52,53,54,56,58,60,61,62,12,63,64,65,66,67,48,49,71,72]
theory = [37,16]
soft_hard = [7,13,23,24,30,33,41,42]
app = [8,9,14,15,17,18,22,27,28,29,32,35]
elect = [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,41,42,43,40]
len = length(gpa);

In [16]:
NUM_COURSES = 15

using JuMP, Cbc

m = Model(with_optimizer(Cbc.Optimizer, logLevel = 0))

@variable(m, x[1:len], Bin)  # whether or not a course is taken

# Must satisfy degree requirements
@constraint(m, sum( x[basic_cs[i]] for i in 1:length(basic_cs) ) == 5)
@constraint(m, sum( x[basic_math[i]] for i in 1:length(basic_math) ) == 2)
@constraint(m, sum( x[add_math[i]] for i in 1:length(add_math) ) >= 2)
@constraint(m, sum( x[theory[i]] for i in 1:length(theory) ) >= 1)
@constraint(m, sum( x[soft_hard[i]] for i in 1:length(soft_hard) ) >= 2)
@constraint(m, sum( x[app[i]] for i in 1:length(app) ) >= 1)
@constraint(m, sum( x[elect[i]] for i in 1:length(elect) ) >= 2)

# A class can not count for more than one degree requirement
@constraint(m, sum( x[i] for i in 1:len ) == NUM_COURSES)

# Must satisfy prerequisite requirements
for i in 1:len
    for j in pre[i]
        @constraint(m, x[i] <= x[j])
    end
end

# Maximize GPA
@objective(m, Max, sum(x[i]*gpa[i] for i in 1:len) / NUM_COURSES);

optimize!(m)

println("The list of courses to take: ")
println()
opt_gpa = 0
for i in 1:len
    if (value(x[i]) > 0)
        print(courses[i],", ")
        opt_gpa += gpa[i]
    end
end
opt_gpa /= NUM_COURSES
println()
println()
println("The optimal GPA will be ",opt_gpa)

The list of courses to take: 

COMP SCI 252, COMP SCI 300, COMP SCI 354, COMP SCI 400, COMP SCI 407, COMP SCI 506, COMP SCI 547, COMP SCI 558, COMP SCI 577, COMP SCI 579, MATH 234, MATH 461, MATH 221, MATH 222, MATH 240, 

The optimal GPA will be 3.2599699242


## 4. Results and discussion ##

#### Discussion ####

Based on pre-requisite and graduation requirement, model 1 gives a CS student the easiest course load to graduate. The recommended list include some non-popular courses and the student can further explore on them. With the highest possible GPA in CS courses, the student may find it easier to look for jobs or apply for graduate programs. Avoiding the challenging courses, the student will also face less pressure during study. 

However, there are some limitations in this model. For example, course credit is not reflected in GPA calculation as each course is weighted equally, and therefore, the optimal GPA is not precise. However, if course credit is included, the model will turn into a complicated non-linear programming model, which can be hard to solve. Furthermore, the calculated optimal GPA will not lose too much precision since most courses have three or four credits. Therefore, course credit is left out of the model. A major limitation of this model is that some courses in the generated list are not useful and in reality, a student will not take them even if they boost GPA.

In the next steps, we plan to print the course list as a rough schedule so the student can better know how to take theses courses. We will also complete model 2 and compare it with model 1.


## 5. Conclusion ##

Summarize your findings and your results, and talk about at least one possible future direction; something that might be interesting to pursue as a follow-up to your project.

## 6. Author Contributions

Note: The contributions in each category must sum to 100%. See Canvas for more details on what type of work belongs in each category.

#### 1. Modelling  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  

  
#### 2. Analysis  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 3. Data Gathering  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 4. Software Implementation  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 5. Report writing and poster presentation   
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  

#### 5. Other Outside Attributrions
How Doctoral Students and Graduates Can Facilitate Boundary Spanning between Academia and Industry Kunttu_et_al_TIMReview_June2018.pdf (p52)