Linear Regression is a starting point to learn machine learning from zero,
by re-implementing fundamental machine learning algorithms.
The purpose of this repository is to understand how classic ML models work internally.
- From Zero → Starting from no external ML frameworks, everything implemented from scratch
- Educational → designed as a step-by-step learning exercise
- Modularity → designed as encapsulated classes, can be reused in different projects.
These projects are mainly built with Rust and Python for visualization. Therefore, some dependencies are required.
Each project includes a venv.sh
and requirements.txt
to help you set up a virtual environment and install the necessary packages:
bash venv.sh
source venv/bin/activate
Cargo is needed to build and run Rust program.
The goal of this project is to predict car prices based on their mileage, as these two factors have a linear correlation. As an introduction to machine learning, gradient descent is used to minimize the loss function (MSE) in order to find the appropriate values of θ₀ and θ₁. These two parameters are updated simultaneously during the process, allowing the loss function to descend in the steepest direction.
Linear regression is implemented with three programs, main algorithm is implemented with Rust, visualizer with python:
Linear regression is one of the most fundamental algorithms in machine learning. The idea is to model the relationship between an input variable (mileage) and an output variable (price) with a straight line:
where
The regression problem can be seen as finding the best-fitting line that minimizes the difference between the predicted values
Since features may vary in scale, normalization ensures that all input data are on a comparable range. This speeds up convergence during gradient descent and prevents one feature from dominating the others.
The loss function measures how well our line fits the data:
where
We want to compute the gradient of the loss with respect to each parameter.
Start from the loss function for a single sample:
Differentiate with respect to
Since
For
For
Extending to all
Thus, the gradient is derived directly from the chain rule.
The parameters are updated iteratively using:
where
-
Predictor
- Usage:
cargo run <weight.txt>
- Reads model parameters from
weight.txt
. - If no
weight.txt
is available, it uses default weights(0, 0)
. - Outputs the predicted price for the given
km
value.
- Usage:
-
Trainer
- Usage:
cargo run <data.csv>
- Takes a CSV file containing training data (
km
,price
). - Trains the linear regression model using gradient descent.
- Saves the final model parameters into
weight.txt
.
- Usage:
-
Visualizer
- Usage:
python visualize.py <path_to_training_program> <path_to_data_csv>
- Runs the Rust training program, collects the results,
and visualizes the training process and regression line using Python (matplotlib).
- Usage:
This project is licensed under the MIT License - see the LICENSE file for details.