# Predicting Student Wait Time at LAIR for Stanford’s Introductory Computer Science Courses

## Project Group:
Sachin Allums (sachino)\
Justin Blumencranz (jmb25)\
Andrew Hong (amhong)\
Mahathi Mangipudi (mahathim)

Stanford enrolls over 2500 students each year in its two introductory computer science courses: CS106A and CS106B. These students have the opportunity to make use of LaIR, a space where they can receive one-on-one help from a section leader with their code for a given assignment. Currently, section leaders of the course are recommended to spend 15 minutes on each help request to better manage the flow of assistance. The purpose of our project is to develop a model that can predict how long students have to wait to receive help based on the assignment they are completing, the time they go to LaIR, and the number of days they go before the assignment deadline, among other features. 

# Model Selection

We have chosen to implement a linear regression model, which will take in a variety of features describing the context of a single LaIR request and output an estimated wait time for the student to recieve help.

***

First, let's import all of the neccesary packages for modeling and analysis

In [1]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import copy, math
from sklearn.model_selection import train_test_split

# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)

### Model Control Panel
Use the cell below to tweak the model for better performance

In [None]:
# Define split sizes
TRAIN_SIZE = 0.8
CV_SIZE = 0.1
TEST_SIZE = 0.1

# Model hyperparameters
LEARNING_RATE = None
ITERATIONS = 5.0e-3
LAMBDA =



Now we go get the data from our dataset and split it into TRAIN, VAL, and TEST sets! For this we use the handy `train_test_split` from `sklearn`

In [2]:
# File path to the dataset
dtype = {"waitTime": int, "daysLeftClean": float, "numInQueue": float}
dataset = pd.read_csv('master_database_March6_forModeling - master_database_March6 (1).csv', dtype=dtype)

# Ensure splits add up to 100%
assert train_size + cv_size + test_size == 1

# Split the data
train, test = train_test_split(dataset, test_size=1-TRAIN_SIZE)
test, crossValidation = train_test_split(test, test_size=TEST_SIZE/(1-TRAIN_SIZE))

# Print Set Sizes
print(f"Total number of examples: {len(dataset)}")
print(f"Sizes of TRAIN, CV, TEST: [{len(train)},{len(crossValidation)},{len(test)}]")


Total number of examples: 20237
Sizes of TRAIN, CV, TEST: [16189,2025,2023]


  dataset = pd.read_csv('master_database_March6_forModeling - master_database_March6 (1).csv', dtype=dtype)


Next we read in the features and labels

In [3]:
X_train = np.array(train[["numInQueue", "daysLeftClean"]])
y_train = np.array(train['waitTime'])
X_cross = np.array(crossValidation[["numInQueue", "daysLeftClean"]])
Y_cross = np.array(crossValidation[["waitTime"]])
X_test = np.array(test[["numInQueue", "daysLeftClean"]])
y_test = np.array(test['waitTime'])

print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(f"X CV Shape: {X_cross.shape}, X CV Type:{type(X_train)})")
print(f"y CV Shape: {Y_cross.shape}, y CV Type:{type(y_train)})")

X Shape: (16189, 2), X Type:<class 'numpy.ndarray'>)
y Shape: (16189,), y Type:<class 'numpy.ndarray'>)
X CV Shape: (2025, 2), X CV Type:<class 'numpy.ndarray'>)
y CV Shape: (2025, 1), y CV Type:<class 'numpy.ndarray'>)


In [4]:
def compute_error_bins(X, y, w, b):
    m = X.shape[0]
    small = 0
    medium = 0
    large = 0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        diff = abs(f_wb_i - y[i])
        if diff <= 2:
            small += 1
        elif diff >= 10:
            large += 1    
        else:
            medium += 1   
    return small, medium, large

In [6]:
def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b     
    return p    

In [10]:
def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

In [7]:
def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

In [8]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

In [11]:
b_init = 0.01
w_init = np.array([1, 1])

print(f"Type of w: {type(w_init)}, and type of b: {type(b_init)}")

# # Compute and display cost using our pre-chosen optimal parameters. 
# cost = compute_cost(X_train, y_train, w_init, b_init)
# print(f'Cost at optimal w : {cost}')

#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.01
iterations = ITERATIONS
alpha = LEARNING_RATE

# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                compute_cost, compute_gradient, 
                                                alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(5):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

# # plot cost versus iteration  
# fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
# ax1.plot(J_hist)
# ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
# ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
# ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost') 
# ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step') 
# plt.show()

# Compute cost function for cross validation set
print(compute_cost(X_cross, Y_cross, w_final, b_final))
small, medium, large = compute_error_bins(X_cross, Y_cross, w_final, b_final)
print(f"small error: {small}, medium error: {medium}, and large error: {large}")

# # Plot for just numInQueue against Wait Time
# plt.scatter(X_train, y_train)
# plt.xlabel('Num in Queue')
# plt.ylabel('Wait Time')
# plt.title("Num In Queue vs Wait Time")
# plt.axline((0, b_final), slope=w_final, linewidth=4, color='r')
# plt.show()

Type of w: <class 'numpy.ndarray'>, and type of b: <class 'float'>
dj_db at initial w,b: -5.892288591018763
dj_dw at initial w,b: 
 [-96.153  -1.735]
Iteration    0: Cost   112.61   
Iteration  100: Cost    94.08   
Iteration  200: Cost    92.74   
Iteration  300: Cost    91.91   
Iteration  400: Cost    91.41   
Iteration  500: Cost    91.11   
Iteration  600: Cost    90.92   
Iteration  700: Cost    90.81   
Iteration  800: Cost    90.74   
Iteration  900: Cost    90.70   
b,w found by gradient descent: 4.48,[ 1.263 -0.019] 
prediction: 9.43, target value: 21
prediction: 14.59, target value: 12
prediction: 17.10, target value: 9
prediction: 7.00, target value: 5
prediction: 8.16, target value: 7
[72.092]
small error: 331, medium error: 1238, and large error: 456
