Skip to content

nilaycry/cpp-linear-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpp-linear-regression

This project implements multiple linear regression in C++ from scratch, featuring both gradient descent and the analytical normal equation solution.


What This Project Does

This project implements linear regression two ways:

  • Gradient Descent — iteratively nudges weights toward lower loss
  • Normal Equation — computes the exact optimal weights in one shot using linear algebra

Both methods are implemented without any ML libraries. Everything from matrix operations to normalization is written from scratch in C++.


The Math

Model

Given a sample with features $x_1, x_2, \ldots, x_k$, the model predicts:

$$\hat{y} = w_1 x_1 + w_2 x_2 + \cdots + w_k x_k + b$$

Loss Function (MSE)

$$\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Our goal is to minimize this loss function.

Gradient Descent

Weights are updated each epoch by computing the gradient of the loss:

$$w_j \leftarrow w_j + \frac{\alpha}{n} \sum_{i=1}^{n} x_{ij}(y_i - \hat{y}_i)$$

where $\alpha$ is the learning rate.

This process is repeated for a fixed number of epochs, or until the loss converges. It works on the principle of moving in the direction of the negative gradient to minimize the loss.

Normal Equation (Analytical Solution)

Instead of iterating, we can solve for the exact optimal weights directly. This can be done using the normal equation.

Derivation

To find the optimal weights $\mathbf{w}$ that minimize the Mean Squared Error (MSE) loss, we take the derivative of the matrix loss function with respect to $\mathbf{w}$ and set it to zero.

The MSE loss function in matrix form is: $$\mathcal{L}(\mathbf{w}) = \frac{1}{n} (\mathbf{X}\mathbf{w} - \mathbf{y})^T (\mathbf{X}\mathbf{w} - \mathbf{y})$$

Expanding the expression: $$\mathcal{L}(\mathbf{w}) = \frac{1}{n} (\mathbf{w}^T\mathbf{X}^T\mathbf{X}\mathbf{w} - 2\mathbf{w}^T\mathbf{X}^T\mathbf{y} + \mathbf{y}^T\mathbf{y})$$

Taking the gradient with respect to $\mathbf{w}$ and setting it to zero: $$\nabla_{\mathbf{w}} \mathcal{L} = \frac{2}{n} (\mathbf{X}^T\mathbf{X}\mathbf{w} - \mathbf{X}^T\mathbf{y}) = 0$$

$$\mathbf{X}^T\mathbf{X}\mathbf{w} = \mathbf{X}^T\mathbf{y}$$

Multiplying both sides by the inverse of $(\mathbf{X}^T\mathbf{X})$ yields the Normal Equation: $$\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$

where $\mathbf{X}$ is the data matrix with a prepended column of 1s (for the bias term). Computing this directly requires a matrix inverse using Gauss-Jordan elimination, which is $O(F^3)$ in time complexity for $F$ features.

Evaluation: R-Squared

$$R^2 = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

An $R^2$ close to 1.0 means the model explains nearly all variance in the data.


Project Structure

cpp-linear-regression/
├── data/
├── includes/
│   ├── regression.hpp      ← LinearRegression class
│   ├── normalizer.hpp      ← Z-score normalization
│   └── matrix.hpp          ← Matrix class with inverse
├── src/
│   ├── regression.cc
│   ├── normalizer.cc
│   ├── matrix.cc           ← Gauss-Jordan elimination
│   └── driver.cc           ← Training and evaluation
└── Makefile

Classes

LinearRegression

Method Description
predict(x) Dot product of weights and input + bias
computeLoss(X, y) Mean squared error
train(X, y, lr, epochs) Gradient descent training loop
trainAnalytical(X, y) Normal equation — exact closed-form solution
computeR2(X, y) R-squared metric
getWeights(), getBias() Accessors

Normalizer

Implements the fit/transform pattern to prevent data leakage.

Method Description
fit(X) Compute mean and std from training data only
transform(X) Z-score normalize a full dataset
transformSingle(x) Normalize a single sample at inference time

Z-score formula: $x_{norm} = \frac{x - \mu}{\sigma}$

Matrix

Full matrix class supporting the operations needed for the normal equation.

Method Description
multiply(B) Matrix multiplication
transpose() Flip rows and columns
inverse() Gauss-Jordan elimination on augmented matrix $[A \mid I]$
add(B) Element-wise addition
scalarMultiply(s) Scale all elements

Results

Synthetic Data

Trained on $y = 3x_1 + 2x_2 + 1 + \epsilon$ (Gaussian noise), 80/20 train/test split:

Method Test MSE
Gradient Descent ~0.19 0.996
Normal Equation ~0.19 0.996

Both methods converge to the same solution — the normal equation just gets there instantly.


Build & Run

make
./regression

Requires g++ with C++17 support.


Key Concepts Learned

  • Multiple linear regression with vectorized operations
  • Feature normalization and the fit/transform pattern to avoid data leakage
  • Gradient descent with batch updates
  • Matrix inverse via Gauss-Jordan elimination
  • Closed-form solution vs iterative optimization — same result, different paths
  • Train/test split and R² evaluation

Part of a Series

This project is part of a series building ML components from scratch in C++:

  • cpp-perceptron — single-layer perceptron trained on UCI Iris
  • cpp-linear-regression ← you are here
  • cpp-logistic-regression — logistic regression with gradient descent and normal equation
  • cpp-neural-network — neural network with backpropagation and gradient descent

About

A dependency-free implementation of Multiple Linear Regression built entirely from scratch in modern C++ (C++17).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors