<img src="https://news.illinois.edu/files/6367/543635/116641.jpg" alt="University of Illinois" width="250"/>

# Project 22: UIUC GPA

## Team Members
* Yiping Li - [yipingl4@illinois.edu](mailto:yipingl4@illinois.edu)
* Leo Yang - [junjiey3@illinois.edu](mailto:junjiey3@illinois.edu)
* Shijie Sun - [shijies5@illinois.edu](mailto:shijies5@illinois.edu)
* Richwell Perez - [richwell@illinois.edu](mailto:richwell@illinois.edu)

## Problem Summary
The purpose of this project is to implement deep learning concepts and 
techniques on a real dataset: UIUC GPA. The general questions that will require the application of deep learning is predicting the GPA/grade distribution of UIUC courses in the future. The project will provide some visualization of the data and descriptive statistics, implement linear or logistic regression, and recurrent neural networks.

## License
Dataset is obtained from Professor Ulmschneider's uiuc-gpa-dataset. Project 
curated by Jared Canty (Summer 2022 Blackwell Program). All rights are reserved.


Dataset on UIUC GPA is available at
https://github.com/wadefagen/datasets/tree/master/gpa (“uiuc-gpa-dataset.csv”)



In [1]:
import numpy as np
import pandas as pd
import time
import random
import matplotlib
#%matplotlib notebook
import matplotlib.pyplot as plt
import scipy.stats
import matplotlib.offsetbox as offsetbox
from matplotlib.ticker import StrMethodFormatter

In [2]:
#for some reason, this needs to be in a separate cell
params={
    "font.size":15,
    "lines.linewidth":5,
}
plt.rcParams.update(params)

##**10/30 Milestone:**##

###**Dataset to pandas**

In [3]:
file_url = "https://raw.githubusercontent.com/wadefagen/datasets/master/gpa/uiuc-gpa-dataset.csv"

In [4]:
gpa_data = pd.read_csv(file_url, header=0)
gpa_data

Unnamed: 0,Year,Term,YearTerm,Subject,Number,Course Title,Sched Type,A+,A,A-,...,B-,C+,C,C-,D+,D,D-,F,W,Primary Instructor
0,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,LCD,6,13,0,...,1,0,3,0,1,1,0,0,0,"Lee, Sang S"
1,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,0,11,5,...,2,1,0,1,1,0,0,0,0,"Zheng, Reanne"
2,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,0,10,7,...,1,0,0,0,0,0,0,2,0,"Zheng, Reanne"
3,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,17,8,1,...,0,0,0,0,0,0,0,0,0,"Rosado-Torres, Alexander"
4,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,OD,0,8,4,...,2,1,0,0,0,0,1,3,1,"Wang, Yu"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64043,2010,Summer,2010-su,STAT,410,Statistics and Probability II,LEC,5,10,2,...,1,0,1,3,0,0,0,2,1,"Stepanov, Alexei G"
64044,2010,Summer,2010-su,STAT,440,Statistical Data Management,LEC,4,12,8,...,0,0,0,0,0,0,0,0,0,"Unger, David"
64045,2010,Summer,2010-su,TAM,212,Introductory Dynamics,LEC,0,1,3,...,7,5,1,1,0,2,0,1,0,"Morgan, William T"
64046,2010,Summer,2010-su,TAM,251,Introductory Solid Mechanics,LCD,1,2,2,...,0,3,3,2,0,0,1,1,0,"Ott-Monsivais, Stephanie"


In [5]:
gpa_scale = {
  'A+' : 4.0,
  'A' : 4.0,
  'A-' : 3.67,
  'B+' : 3.33,
  'B' : 3.0,
  'B-' : 2.67,
  'C+' : 2.33,
  'C' : 2.0,
  'C-' : 1.67,
  'D+' : 1.33,
  'D' : 1.0,
  'D-' : 0.67,
  'F' : 0.0,
} # defined from https://registrar.illinois.edu/courses-grades/explanation-of-grades/

letterGrades = list(gpa_scale.keys())
gpa_data['Students_Completed'] = gpa_data[letterGrades].sum(axis=1) # Student pop. per class without W

for l in gpa_scale:
  gpa_data[l + 'asNum'] = gpa_data[l] * gpa_scale[l]

newLetterGrades = [l + 'asNum' for l in letterGrades]
gpa_data['GPA'] = gpa_data[newLetterGrades].sum(axis=1) / gpa_data['Students_Completed'] # Label

letterGrades.append('W')
gpa_data['Students'] = gpa_data[letterGrades].sum(axis=1) # Student pop. per class including with W

gpa_data

Unnamed: 0,Year,Term,YearTerm,Subject,Number,Course Title,Sched Type,A+,A,A-,...,B-asNum,C+asNum,CasNum,C-asNum,D+asNum,DasNum,D-asNum,FasNum,GPA,Students
0,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,LCD,6,13,0,...,2.67,0.00,6.0,0.00,1.33,1.0,0.00,0.0,3.413793,29
1,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,0,11,5,...,5.34,2.33,0.0,1.67,1.33,0.0,0.00,0.0,3.440400,25
2,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,0,10,7,...,2.67,0.00,0.0,0.00,0.00,0.0,0.00,0.0,3.358519,27
3,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,DIS,17,8,1,...,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.0,3.928571,28
4,2022,Spring,2022-sp,AAS,100,Intro Asian American Studies,OD,0,8,4,...,5.34,2.33,0.0,0.00,0.00,0.0,0.67,0.0,2.921429,22
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64043,2010,Summer,2010-su,STAT,410,Statistics and Probability II,LEC,5,10,2,...,2.67,0.00,2.0,5.01,0.00,0.0,0.00,0.0,3.183226,32
64044,2010,Summer,2010-su,STAT,440,Statistical Data Management,LEC,4,12,8,...,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.0,3.774643,28
64045,2010,Summer,2010-su,TAM,212,Introductory Dynamics,LEC,0,1,3,...,18.69,11.65,2.0,1.67,0.00,2.0,0.00,0.0,2.595714,28
64046,2010,Summer,2010-su,TAM,251,Introductory Solid Mechanics,LCD,1,2,2,...,0.00,6.99,6.0,3.34,0.00,0.0,0.67,0.0,2.603333,21


In [6]:
gpa_data.columns

Index(['Year', 'Term', 'YearTerm', 'Subject', 'Number', 'Course Title',
       'Sched Type', 'A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+',
       'D', 'D-', 'F', 'W', 'Primary Instructor', 'Students_Completed',
       'A+asNum', 'AasNum', 'A-asNum', 'B+asNum', 'BasNum', 'B-asNum',
       'C+asNum', 'CasNum', 'C-asNum', 'D+asNum', 'DasNum', 'D-asNum',
       'FasNum', 'GPA', 'Students'],
      dtype='object')

In [7]:
# drop unused columns
gpa_data = gpa_data.drop(columns=newLetterGrades)
gpa_data = gpa_data.drop(columns=['YearTerm', 'Sched Type', 'Students_Completed']) # keeping number of W's for reversable computation

In [8]:
gpa_data.columns

Index(['Year', 'Term', 'Subject', 'Number', 'Course Title', 'A+', 'A', 'A-',
       'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'D-', 'F', 'W',
       'Primary Instructor', 'GPA', 'Students'],
      dtype='object')

In [9]:
gpa_data = gpa_data.reindex(columns=['Term', 'Year', 'Students', 'Subject', 'Number', 'A+', 'A', 'A-',
       'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'D-', 'F', 'W', 'Course Title',
       'Primary Instructor', 'GPA'])

In [10]:
gpa_data

Unnamed: 0,Term,Year,Students,Subject,Number,A+,A,A-,B+,B,...,C,C-,D+,D,D-,F,W,Course Title,Primary Instructor,GPA
0,Spring,2022,29,AAS,100,6,13,0,0,4,...,3,0,1,1,0,0,0,Intro Asian American Studies,"Lee, Sang S",3.413793
1,Spring,2022,25,AAS,100,0,11,5,3,1,...,0,1,1,0,0,0,0,Intro Asian American Studies,"Zheng, Reanne",3.440400
2,Spring,2022,27,AAS,100,0,10,7,4,3,...,0,0,0,0,0,2,0,Intro Asian American Studies,"Zheng, Reanne",3.358519
3,Spring,2022,28,AAS,100,17,8,1,1,1,...,0,0,0,0,0,0,0,Intro Asian American Studies,"Rosado-Torres, Alexander",3.928571
4,Spring,2022,22,AAS,100,0,8,4,1,1,...,0,0,0,0,1,3,1,Intro Asian American Studies,"Wang, Yu",2.921429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64043,Summer,2010,32,STAT,410,5,10,2,2,5,...,1,3,0,0,0,2,1,Statistics and Probability II,"Stepanov, Alexei G",3.183226
64044,Summer,2010,28,STAT,440,4,12,8,1,3,...,0,0,0,0,0,0,0,Statistical Data Management,"Unger, David",3.774643
64045,Summer,2010,28,TAM,212,0,1,3,2,5,...,1,1,0,2,0,1,0,Introductory Dynamics,"Morgan, William T",2.595714
64046,Summer,2010,21,TAM,251,1,2,2,1,5,...,3,2,0,0,1,1,0,Introductory Solid Mechanics,"Ott-Monsivais, Stephanie",2.603333


In [11]:
subject = gpa_data['Subject'].unique()
len(subject)

170

In [12]:
course = gpa_data['Course Title'].unique()
len(course)

5574

In [13]:
instructor = gpa_data['Primary Instructor'].unique()
len(instructor)

8867

In [14]:
term = gpa_data['Term'].unique()
print(len(term))
subject = gpa_data['Subject'].unique()
print(len(subject))
course = gpa_data['Course Title'].unique()
print(len(course))
instructor = gpa_data['Primary Instructor'].unique()
print(len(instructor))

4
170
5574
8867


In [15]:
# drop columns containing NaN
letterGrades.append('GPA')
print(letterGrades)
gpa_data = gpa_data.dropna().reset_index(drop=True)
display(gpa_data)

['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'D-', 'F', 'W', 'GPA']


Unnamed: 0,Term,Year,Students,Subject,Number,A+,A,A-,B+,B,...,C,C-,D+,D,D-,F,W,Course Title,Primary Instructor,GPA
0,Spring,2022,29,AAS,100,6,13,0,0,4,...,3,0,1,1,0,0,0,Intro Asian American Studies,"Lee, Sang S",3.413793
1,Spring,2022,25,AAS,100,0,11,5,3,1,...,0,1,1,0,0,0,0,Intro Asian American Studies,"Zheng, Reanne",3.440400
2,Spring,2022,27,AAS,100,0,10,7,4,3,...,0,0,0,0,0,2,0,Intro Asian American Studies,"Zheng, Reanne",3.358519
3,Spring,2022,28,AAS,100,17,8,1,1,1,...,0,0,0,0,0,0,0,Intro Asian American Studies,"Rosado-Torres, Alexander",3.928571
4,Spring,2022,22,AAS,100,0,8,4,1,1,...,0,0,0,0,1,3,1,Intro Asian American Studies,"Wang, Yu",2.921429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63867,Summer,2010,32,STAT,410,5,10,2,2,5,...,1,3,0,0,0,2,1,Statistics and Probability II,"Stepanov, Alexei G",3.183226
63868,Summer,2010,28,STAT,440,4,12,8,1,3,...,0,0,0,0,0,0,0,Statistical Data Management,"Unger, David",3.774643
63869,Summer,2010,28,TAM,212,0,1,3,2,5,...,1,1,0,2,0,1,0,Introductory Dynamics,"Morgan, William T",2.595714
63870,Summer,2010,21,TAM,251,1,2,2,1,5,...,3,2,0,0,1,1,0,Introductory Solid Mechanics,"Ott-Monsivais, Stephanie",2.603333


###**Download debugging and working dataset**

In [16]:
debugging_data = gpa_data.sample(6000) # ~10% of data
debugging_data.to_csv("debugging-dataset.csv", index=False)

In [17]:
working_data = gpa_data.sample(30000) # ~50% of data
working_data.to_csv("working-dataset.csv", index=False)