A production-quality, educational implementation of linear regression algorithms built from scratch using NumPy. This library provides clean, well-documented implementations for learning the mathematical foundations of linear regression while maintaining professional-grade code quality.
This project implements linear regression algorithms from first principles without using high-level ML libraries like scikit-learn. It's designed as both an educational tool and a functional library that demonstrates professional Python package development practices.
- π Educational Focus: Understand the mathematics behind linear regression
- ποΈ Production Quality: Professional package structure ready for PyPI
- π¬ From Scratch: Only NumPy used for mathematical operations
- π§ͺ Fully Tested: Comprehensive test suite with edge case handling
- π¦ Complete Package: Installable via pip with proper dependency management
See the full project architecture in DEVELOPMENT.md.
For a detailed explanation of the mathematical foundations behind linear regression, see mathematical_background.md.
- Python 3.8+
- pip package manager
Option 1: Install from PyPI (recommended)
pip install linreg-from-scratchOption 2: Clone & Setup for development
git clone https://github.com/illoonego/linear-regression-from-scratch.git
cd linear-regression-from-scratch
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install all dependencies using pyproject.toml (PEP 621)
pip install -e .[dev]
# For optional dependencies (notebooks, docs):
pip install -e ".[notebooks,docs]"Note: All dependencies are managed via
pyproject.toml. The legacyrequirements.txtfile has been removed for clarity and modern Python packaging best practices.
# Run all examples
python examples/basic_linear_regression.py
# Run specific examples
python examples/basic_linear_regression.py 1d # Simple regression
python examples/basic_linear_regression.py 2d # Multiple regressionimport numpy as np
from linear_regression import LinearRegression, StandardScaler, r2_score
# Create sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2.1, 3.9, 6.1, 8.0, 9.9]) # y β 2x with noise
# Option 1: Direct usage
model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X, y, method='gradient_descent')
predictions = model.predict(X)
print(f"Weights: {model.weights_}")
print(f"RΒ² Score: {r2_score(y, predictions):.4f}")
# Option 2: With preprocessing
scaler = StandardScaler()
X_scaled = scaler.fit(X).transform(X)
model.fit(X_scaled, y)
predictions_scaled = model.predict(X_scaled)
print(f"Weights (scaled): {model.weights_}")
print(f"RΒ² Score (scaled): {r2_score(y, predictions_scaled):.4f}")import numpy as np
from linear_regression import LinearRegression, r2_score
# House price prediction example
np.random.seed(42)
size_sqft = np.random.uniform(800, 2500, 100)
bedrooms = np.random.randint(1, 5, 100)
X = np.column_stack((size_sqft, bedrooms))
# True relationship: price = 150*size + 10000*bedrooms + 20000 + noise
price = 150 * size_sqft + 10000 * bedrooms + 20000 + np.random.randn(100) * 10000
model = LinearRegression(learning_rate=1e-7, n_iterations=5000)
model.fit(X, price)
predictions = model.predict(X)
print(f"Learned coefficients: {model.weights_[1:]}") # [size_coef, bedroom_coef]
print(f"Intercept: {model.weights_[0]}")
print(f"RΒ² Score: {r2_score(price, predictions):.4f}")LinearRegression: Complete implementation with both gradient descent and normal equation (closed-form solution)
StandardScaler: Feature standardization with robust validation
Examples: Working 1D and 2D regression demonstrations
Error Handling: Comprehensive input validation and edge case management
Verbose Training Output: Control progress printing with the verbose flag
Professional Structure: PyPI-ready package with proper metadata
See the DEVELOPMENT.md for the full roadmap and planned features.
# Run all tests
pytest tests/
# Run with coverage (see missing lines in terminal)
pytest --cov=src/linear_regression --cov-report=term-missing
# Run specific test file
pytest tests/test_linear_regression.py -vThis project uses GitHub Actions for:
- CI: Automatic tests, linting (ruff), formatting checks (black), and coverage reporting on every push and pull request. See
.github/workflows/python-ci.yml. - CD: Automated publishing to PyPI on new version tags. See
.github/workflows/python-cd.yml.
How releases work:
- When a new version tag (e.g.,
v1.0.0) is pushed, the CD workflow builds and publishes the package to PyPI using secure repository secrets. - See DEVELOPMENT.md for more on the release workflow.
# Format code
black src/ tests/ examples/
# Sort imports
isort src/ tests/ examples/
# Lint code
flake8 src/ tests/ examples/# Install with development dependencies
pip install -e ".[dev,notebooks,docs]"$ python examples/basic_linear_regression.py 2d
2D Multiple Linear Regression Example
----------------------------------------
Generating synthetic data...
Data points: 100
True weights: size coefficient=150, bedroom coefficient=10000, intercept=20000
Training model with Gradient Descent...
Iteration 0: Cost = 1250000000.0000
Iteration 500: Cost = 125678923.4567
Iteration 1000: Cost = 89234567.1234
Training completed!
Results:
Learned weights: size coefficient=149.87, bedroom coefficient=9989.23, intercept=20145.67
RΒ² Score: 0.9234
MSE: 89234567.12
Comparison with True Values:
model_gd = LinearRegression(learning_rate=0.01, n_iterations=1000, verbose=True)
model_gd.fit(X, y, method='gradient_descent')
predictions_gd = model_gd.predict(X)
print(f"GD Weights: {model_gd.weights_}")
print(f"GD RΒ² Score: {r2_score(y, predictions_gd):.4f}")
# Option 2: Normal Equation (closed-form)
model_ne = LinearRegression(verbose=False)
model_ne.fit(X, y, method='normal_equation')
predictions_ne = model_ne.predict(X)
print(f"NE Weights: {model_ne.weights_}")
print(f"NE RΒ² Score: {r2_score(y, predictions_ne):.4f}")
True: size=150.00, bedroom=10000.00, intercept=20000.00
Learned: size=149.87, bedroom=9989.23, intercept=20145.67
Error: size=0.13, bedroom=10.77, intercept=145.67This project demonstrates:
- Mathematical Understanding: Implement algorithms from equations
- Software Engineering: Professional Python package development
- Machine Learning: Core concepts without library abstractions
- Numerical Computing: Efficient NumPy vectorized operations
- Testing: Comprehensive test coverage with edge cases
- Documentation: Clear code documentation and user guides
We welcome contributions! Please see:
- CONTRIBUTING.md for guidelines and onboarding
- DEVELOPMENT.md for development workflow
- Issues for bug reports and feature requests
- Built for educational purposes to understand ML fundamentals
- Mathematical foundations from "The Elements of Statistical Learning"
- Inspired by the need for transparent, understandable ML implementations
Note: This is primarily an educational project. For production ML workflows, consider using established libraries like scikit-learn, though this implementation is production-quality and could be used in real applications.