 # Linear Regression with Normal Equation (From Scratch)

 ## Introduction

 This notebook implements Linear Regression using the Normal Equation from scratch on the student dataset.

 ## Data Loading 

 Load the student data.

In [None]:
import numpy as np
import os
import sys

# Set project root directory and add it to the system path
project_root = os.path.abspath(os.path.join(os.getcwd(), "..", "..", ".."))
sys.path.append(project_root)


from src.scratch.utils.viz_utils import plot_scatter_for_regression


X_train = np.load("../../../data/processed/student_X_train.npy")
X_test = np.load("../../../data/processed/student_X_test.npy")
y_train = np.load("../../../data/processed/student_y_train.npy")
y_test = np.load("../../../data/processed/student_y_test.npy")

print("Training features shape:", X_train.shape)
print("Test features shape:", X_test.shape)
print("Training target shape:", y_train.shape)
print("Test target shape:", y_test.shape)



 ## Exploratory Data Analysis

 Visualize the first feature vs. target.

In [None]:
plot_scatter_for_regression(X_train, y_train, feature_index=0, title="Feature 1 vs Target", filename="feature1_vs_target_normal_scratch.png")


 ## Model Initialization

 Initialize with Normal Equation.

In [None]:
from src.scratch.models.linear_regression import LinearRegression

model = LinearRegression(method='normal')


 ## Training

 Train the model and measure time.

In [None]:
import time

start_time = time.time()
model.fit(X_train, y_train)
training_time = time.time() - start_time
print(f"Training Time: {training_time:.4f} seconds")


 ## Evaluation

 Calculate MSE and R².

In [None]:
from src.scratch.utils.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")


 ## Visualizations

 Visualize results (no learning curve since it's non-iterative).

In [None]:
from src.scratch.utils.viz_utils import plot_actual_vs_predicted, plot_residuals_vs_predicted, plot_qq_residuals, plot_residual_histogram

plot_actual_vs_predicted(y_test, y_pred, title="Actual vs Predicted (Normal Eq Scratch)", filename="actual_vs_predicted_normal_scratch.png")
plot_residuals_vs_predicted(y_test, y_pred, title="Residuals vs Predicted (Normal Eq Scratch)", filename="residuals_vs_predicted_normal_scratch.png")
plot_qq_residuals(y_test, y_pred, title="Q-Q Plot of Residuals (Normal Eq Scratch)", filename="qq_residuals_normal_scratch.png")
plot_residual_histogram(y_test, y_pred, title="Residual Histogram (Normal Eq Scratch)", filename="residual_histogram_normal_scratch.png")


 ## Conclusion

 The Normal Equation model achieved an MSE of {mse:.4f} and R² of {r2:.4f}. Visualizations show the model's performance without iterative convergence.