We analyze the results of the best models obtained by cross-validation.

# Imports

In [1]:
import statsmodels.api as sm
import pandas as pd
from pickle import load

# Load data and models

In [2]:
from src.utils import get_data

X_train, X_test, y_train, y_test = get_data()

In [3]:
with open("../results/regression/gb_reg.pkl", "rb") as f:
    gb_reg = load(f)

In [4]:
with open("../results/regression/lin_reg.pkl", "rb") as f:
    lin_reg = load(f)

In [5]:
# Check scores

for model, name in [(lin_reg, "lin_reg"), (gb_reg, "gb_reg")]:
    print("Model:", name)
    print(f"Score on training set = {model.score(X_train, y_train):.3f}")
    print(f"Score on test set = {model.score(X_test, y_test):.3f}\n")

Model: lin_reg
Score on training set = 0.588
Score on test set = 0.640

Model: gb_reg
Score on training set = 0.639
Score on test set = 0.625



The discrepancy is probably due to chance. Other random seeds for the split give different (but similar) results.

# Feature importance

In [6]:
for feature, weight in zip(X_train.columns, gb_reg.feature_importances_):
    print(f"{feature}: {weight:.3f}")

Quiz: 0.030
HW: 0.047
Participation: 0.000
Midterm: 0.923


Clearly, the midterm is what matters most to the final exam score. Considering that the midterm and the final are done in class and they other assignments at home, it makes sense!

# Statistical analysis

Let us perform a simple statistical analysis for the (essentially optimal) linear regression model.

### Linear regression weights

In [7]:
# Check coefficients

print(f"Intercept: {lin_reg.intercept_:.3f}")
for feature, weight in zip(X_train.columns, lin_reg.coef_):
    print(f"{feature} weight: {weight:.3f}")

Intercept: 0.280
Quiz weight: 0.124
HW weight: 1.041
Participation weight: 0.024
Midterm weight: 0.720


Essentially, this means that everyone starts with 28 "free" points.
- Participation grade is essentially irrelevant. Getting full marks (5) would only increase the final exam grade by 0.1, on average. *This does not mean that participation does not matter!*
- The same can be said for quizzes (full marks 20).
- The HW grade seem to have a massive effect: a student with 20/20 on the HW can expect a final exam grade 20*1.04 ~ 21 points higher. Nonetheless, this is essentially due to the fact that most students get very high HW grades. This can be explained by the fact that is an at-home assignment with no time limit, lax grading, or academic dishonesty. Essentially, the HW grade can almost be counted as part of the intercept, and only students who really slack get low HW grades.
- As expected, the midterm is what matters the most, and each point on the midterm translates to 0.72 points on the final on average. For instance, a student with a 90 on the midterm can expect 90*0.72 + 28 ~ 93 on the final.

### p-values

In [8]:
X_train_sm = sm.add_constant(X_train)
lin_reg_sm = sm.OLS(y_train, X_train_sm).fit()
print(lin_reg_sm.summary())

                            OLS Regression Results                            
Dep. Variable:                  Final   R-squared:                       0.588
Model:                            OLS   Adj. R-squared:                  0.586
Method:                 Least Squares   F-statistic:                     213.6
Date:                Mon, 07 Jul 2025   Prob (F-statistic):          1.00e-113
Time:                        14:31:13   Log-Likelihood:                -2405.4
No. Observations:                 603   AIC:                             4821.
Df Residuals:                     598   BIC:                             4843.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const             0.2796      4.436      0.063

The p-values confirm the previous findings.

### Prediction

We predict the final exam grade of a few students, including confidence and prediction intervals. *Note that we need to include 1 for the intercept.*

We consider the following students.
- One who does excellently on all assignments.
- One who does poorly on all assignments, but still shows up to class.
- One who does well on at-home assignments, does not participate much, and does poorly on the midterm. Maybe a student who is not totally honest.
- A student who only does quick and important assignments (quizzes and midterm) but does not do HW or come to class. Maybe a student who studied the material before and may be a bit overconfident.
- A student who does poorly on timed assignments, but tries their best by doing their homework earnestly and coming to class.

In [9]:
X_pred = pd.DataFrame(
    {
        "Intercept": [1, 1, 1, 1, 1],
        "Quiz": [18, 8, 19, 17, 12],
        "HW": [19, 11, 20, 6, 19],
        "Participation": [5, 5, 3, 0, 5],
        "Midterm": [95, 60, 50, 85, 65],
    }
)

In [10]:
predictions = lin_reg_sm.get_prediction(X_pred)

In [11]:
results = X_pred.copy().drop(["Intercept"], axis=1)
results["Predicted final exam grade"] = predictions.predicted
# Confidence interval
results[["CI lower", "CI upper"]] = predictions.conf_int()
# Prediction interval
results[["PI lower", "PI upper"]] = predictions.summary_frame()[
    ["obs_ci_lower", "obs_ci_upper"]
]

In [12]:
results

Unnamed: 0,Quiz,HW,Participation,Midterm,Predicted final exam grade,CI lower,CI upper,PI lower,PI upper
0,18,19,5,95,90.85376,89.250008,92.457512,65.034567,116.672954
1,8,11,5,60,56.068672,51.876381,60.260962,29.96055,82.176794
2,19,20,3,50,59.553331,56.284938,62.821723,33.577551,85.52911
3,17,6,0,85,69.872185,61.558073,78.186297,42.794829,96.94954
4,12,19,5,65,68.496361,65.6639,71.328822,42.571825,94.420897


Interestingly, the last student have essentially equal predicted grades, but there is more uncertainty on the "good but slacking" student compared to the "hardworking but not-so-succeful" student.

In [13]:
# Save results

results.to_csv('../results/regression/predictions.csv')