A linear regression implementation from scratch to predict car prices based on mileage.
ft_linear_regression/
├── data/
│ └── data.csv # Training data (mileage vs price)
├── graphs/ # Generated visualization plots
│ ├── complete_demonstration.png # Complete demo visualization
│ └── precision_summary.png # Precision analysis summary
├── src/
│ ├── const.py # Learned parameters and normalization stats
│ ├── demo.py # Complete demonstration script
│ ├── estimatePrice.py # Price estimation script
│ ├── linear_regression.py # Core regression implementation
│ └── utils/
│ ├── load_csv.py # CSV loading utility
│ └── update_constants.py # Constants update utility
├── Makefile # Build automation
├── README.md # Project documentation
└── requirements.txt # Python dependencies
- Linear regression from scratch using gradient descent
- Data normalization for better convergence
- Complete data visualization including:
- Data distribution and repartition plots
- Scatter plot with regression line
- Cost function evolution during training
- Residual analysis
- Comparison of original vs normalized data
- Comprehensive precision calculation with metrics (MSE, RMSE, MAE, R², MAPE)
- Interactive testing mode
- Command-line price estimation
- Complete demonstration combining all features
Run the complete demonstration to see all features:
make demo
This will:
- Plot data distribution - Shows how the data points are spread
- Plot regression line - Shows the result of your linear regression
- Calculate precision - Comprehensive algorithm accuracy analysis
Run the complete demonstration to see all features:
make demo
This will:
- Load data from
data/data.csv
- Perform linear regression training
- Display data visualization
- Calculate precision metrics
- Generate demonstration plots
Use the trained model to estimate prices:
make estimate KM=<mileage_in_km>
Example:
make estimate KM=50000
# Output: Estimated price for 50000.0 km: 7427.10
Or run directly:
cd src
python estimatePrice.py <mileage_in_km>
You can run the core linear regression implementation directly:
cd src
python linear_regression.py
Remove generated files and cache:
make clean
For easier project management, use the Makefile:
# Complete demonstration (recommended)
make demo
# Individual components
make train # Train the linear regression model
make estimate KM=50000 # Estimate price for specific mileage
# Project management
make install # Install dependencies
make clean # Clean generated files
make help # Show all available commands
This project fulfills all the specified requirements:
- File:
src/demo.py
- Command:
make demo
- Output: Data distribution plots showing:
- Scatter plot with color-coded prices
- Data point distribution analysis
- Statistical information display
- Graphs:
complete_demonstration.png
- File:
src/demo.py
- Command:
make demo
- Output: Regression visualization showing:
- Original data points
- Linear regression line
- Model equation and parameters
- Correlation information
- Graphs:
complete_demonstration.png
- File:
src/demo.py
- Command:
make demo
- Output: Comprehensive precision metrics:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Percentage Error (MAPE)
- R-squared coefficient
- Quality assessment
- Graphs:
precision_summary.png
- Linear Regression with gradient descent optimization
- Feature normalization using z-score standardization
- Cost function: Mean Squared Error (MSE)
- Learning rate (α): 0.01
- Iterations: 1000
- Features: Mileage (km) with bias term
After training, the model predicts prices using:
price_normalized = θ₀ + θ₁ × mileage_normalized
price = price_normalized × std_price + mean_price
Where:
θ₀
(theta0): Intercept parameterθ₁
(theta1): Slope parameter- Normalization statistics are stored in
const.py
The training data (data/data.csv
) should have the following format:
km,price
240000,3650
139800,3800
150500,4400
...
km
: Mileage in kilometersprice
: Car price in euros
The current model achieves:
- R² Score: ~0.73 (73% of variance explained)
- RMSE: ~668 euros
- MAE: ~558 euros
- Python 3.x
- NumPy
- Pandas
- Matplotlib
Install dependencies:
make install
Or manually:
pip install -r requirements.txt
demo.py
: Complete demonstration script (all requirements in one)estimatePrice.py
: Command-line price estimation toollinear_regression.py
: Core gradient descent implementationconst.py
: Stores learned parameters and normalization statistics
utils/load_csv.py
: CSV file loading utilityutils/update_constants.py
: Constants update utility
data/data.csv
: Training datasetgraphs/*.png
: Generated visualization plots:complete_demonstration.png
: Main demo outputprecision_summary.png
: Precision metrics summary
Makefile
: Build automation and project commandsrequirements.txt
: Python dependenciesREADME.md
: Project documentation
$ make demo
🚀 DÉMONSTRATION COMPLÈTE DU PROJET ft_linear_regression
📊 Données chargées: 24 points
✅ 1. Répartition des données visualisée
✅ 2. Ligne de régression linéaire tracée
✅ 3. Précision de l'algorithme calculée
🏆 Qualité du modèle: 🟡 BONNE (R² = 0.733)
🎯 Erreur moyenne: 558€ (9.6%)
$ make estimate KM=100000
Estimating price for 100000 km...
Estimated price for 100000.0 km: 6354.70
$ make train
Training linear regression model...
=== Training Linear Regression Model ===
Loaded 24 data points
Data statistics:
Mileage: mean=101066.25, std=51565.19
Price: mean=6331.83, std=1291.87
Training Results:
Final parameters: θ₀=0.000000, θ₁=-0.856102
Mean Squared Error: 445645.25
Root Mean Squared Error: 667.57
Mean Absolute Error: 557.83
R-squared: 0.7330
✅ Training completed successfully!
This is an educational project implementing linear regression from scratch. Feel free to experiment with different:
- Learning rates
- Number of iterations
- Feature engineering approaches
- Visualization styles
Educational project - feel free to use and modify.