# **Linear Regression on Study Hours vs GPA**

This report demonstrates how to apply linear regression to predict a student's GPA based on the number of hours they study. The process includes importing libraries, loading data, preparing inputs, training a model, and making predictions.

# **1. Importing Required Libraries**

We import libraries for numerical computation, data handling, and visualization.

In [None]:
# Importing libraries for numerical operations, data handling, and plotting
import numpy as np             # For numerical operations
import pandas as pd            # For data manipulation and analysis
import matplotlib              # For 2D plotting and visualization
from matplotlib import pyplot as plt  # For plotting graphs
import seaborn as sns          # For statistical data visualization

# **2. Loading the Dataset**

We read the dataset study-hours.csv into a pandas DataFrame.

In [None]:
# Load data from a CSV file into a DataFrame
df = pd.read_csv("datasets/study-hours.csv")  # Read CSV file into a pandas DataFrame named df

# **3. Extracting Input and Output Data**

We extract the study hours and GPA as NumPy arrays for processing.

In [None]:
# Extract 'study_hours' and 'gpa' columns as NumPy arrays
sh = np.array(df['study_hours'])  # Convert 'study_hours' column to NumPy array
y = np.array(df['gpa'])           # Convert 'gpa' column to NumPy array

# **4. Reshaping the Input Feature**

The input sh is reshaped into a 2D array suitable for model input.

In [None]:
# Reshape 'sh' array to a 2D column vector
sh = sh.reshape(sh.size, 1)  # Change shape from (n,) to (n,1) for model input

# **5. Creating the Design Matrix**

We add a column of ones to include the bias term in the regression equation.

In [None]:
# Add a column of ones to 'sh' for the bias term in linear regression
Z = np.hstack((np.ones((sh.size,1)), sh))  # Horizontally stack ones with 'sh' to form design matrix

# **6. Solving for Parameters Using the Normal Equation**

We compute the optimal coefficients using the closed-form solution.

In [None]:
# Compute regression coefficients using the Normal Equation
X = Z  # Assign design matrix to X for clarity
beta = np.linalg.inv(X.T @ X) @ X.T @ y  # beta = (XᵀX)⁻¹Xᵀy, gives best-fit line parameters

# Alternative: use pseudoinverse for stability
# beta = np.linalg.pinv(X) @ y  # More stable way to compute beta when XᵀX is not invertible

# **7. Fitting a Linear Regression Model with Scikit-learn**

We train the model using LinearRegression from scikit-learn and extract learned parameters.

In [None]:
# Train a linear regression model and print the intercept and slope
from sklearn.linear_model import LinearRegression        # Import LinearRegression class
model = LinearRegression()                               # Create a LinearRegression model object
model.fit(sh, y)                                          # Fit the model to input features 'sh' and target 'y'
print("Y-intercept (𝛽0): ", model.intercept_)            # Print the learned intercept (𝛽₀)
print("Slope (𝛽1): ", model.coef_)                       # Print the learned slope (𝛽₁)

# **8. Making a Prediction**

We use the manually computed parameters to predict GPA for a new value of study hours.

In [None]:
# Predict GPA for 5.0 study hours using the regression equation
studyhours = 5.0                                               # Set new input value for study hours
predicted_gpa = beta[0] + beta[1] * studyhours                 # Compute predicted GPA using beta coefficients
print("Predicted GPA value for new study hours: {:.4f}".format(predicted_gpa))  # Print predicted GPA rounded to 4 decimals