Simple Linear Regression Model - California Housing Price Prediction with Linear Regression

Overview

This project demonstrates a simple, yet robust, multiple linear regression model built with Python and scikit-learn to predict median house values in California.

Features

Multiple Features: The model uses multiple features (Median Income, House Age, and Average Rooms) for more accurate predictions.
Data Preprocessing: It includes a machine learning pipeline to handle data scaling, a crucial step for many models.
Model Persistence: The trained model is automatically saved to disk (linear_regression_model.joblib), allowing for easy reuse without retraining.
Comprehensive Evaluation: The script calculates and prints key metrics (Mean Squared Error and R-squared) to evaluate the model's performance.
Data Visualization: It generates and saves multiple plots (housing_prices_plot.png and housing_prices_residual_plot.png) for visual analysis.
Prediction Functionality: The script includes a practical example of how to use the trained model to make a prediction on new, unseen data.

Technologies used

Python: The core programming language for the project.
scikit-learn: A powerful machine learning library used for building the model, data splitting, and evaluation.
NumPy: A fundamental library for numerical operations and handling the dataset arrays.
Matplotlib: Used for creating the data visualizations, including the scatter and residual plots.
joblib: A library for saving and loading the trained machine learning model.

Model used (Architecture)

The core of this project is a LinearRegression model, which is a fundamental algorithm in supervised machine learning. The model is implemented within a scikit-learn pipeline. This pipeline's architecture consists of two main stages:

Data Preprocessing: The StandardScaler scales the features to have a mean of 0 and a standard deviation of 1. This is crucial for linear models to perform well, as it prevents features with larger values from disproportionately influencing the model.
Regression Model: The LinearRegression estimator fits a linear model to the preprocessed data, finding the best-fit line (or hyperplane in this case) that minimizes the sum of squared errors between the predicted and actual values.

Data Processing

The project performs the following data processing steps:

Data Splitting: The dataset is divided into a training set (80%) and a testing set (20%) to ensure the model's performance is evaluated on unseen data.
Feature Scaling: A StandardScaler is applied to the input features. This process transforms the data such that it has zero mean and unit variance. Scaling prevents features with a larger magnitude from dominating the learning process.

Data Analysis

This project performs data analysis through both quantitative metrics and visual inspection:

Quantitative Metrics: The model's performance is evaluated using two standard metrics:
Mean Squared Error (MSE): Measures the average squared difference between the estimated values and the actual value. A lower MSE indicates a better fit.
R-squared (R2): Represents the proportion of the variance in the dependent variable that can be predicted from the independent variables. A score closer to 1.0 indicates a stronger fit.

Model Training

The model training process is managed to be efficient and reproducible:

Training: The fit() method is called on the machine learning pipeline, which first scales the training data and then trains the LinearRegression model.
Persistence: Once trained, the entire pipeline is saved to a .joblib file. This is a common practice that "persists" the model, allowing it to be loaded directly for making predictions without the need for a full retraining process. The script intelligently checks for the existence of this file and either loads the existing model or trains a new one.

Prerequisites

Python 3.11+
Required packages (install via pip):

How to Run the Project

Clone this repository to your local machine:

git clone [https://github.com/sjain2580/simple-linear-regression](https://github.com/sjain2580/simple-linear-regression.git)
cd your-repo-name

Create and activate a virtual environment (optional but recommended):python -m venv venv

On Windows:

.\venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

Install the required libraries:

   pip install -r requirements.txt

To Run the Script: Simply execute the main Python script from your terminal.

python simple_linear_regression.py

Visualization

Prediction Plot: Compares the model's predicted house values against the actual values to show how well the linear relationship is captured.
Residual Plot: Plots the difference between the actual and predicted values. A good residual plot shows a random scatter of points around the zero line, indicating that the model's assumptions are met and it is not systematically under- or over-predicting.

Contributors

https://github.com/sjain2580 Feel free to fork this repository, submit issues, or pull requests to improve the project. Suggestions for model enhancement or additional visualizations are welcome!

Connect with Me

Feel free to reach out if you have any questions or just want to connect!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Linear Regression Model - California Housing Price Prediction with Linear Regression

Overview

Features

Technologies used

Model used (Architecture)

Data Processing

Data Analysis

Model Training

Prerequisites

How to Run the Project

Visualization

Contributors

Connect with Me

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
housing_prices_plot.png		housing_prices_plot.png
housing_prices_residual_plot.png		housing_prices_residual_plot.png
linear_regression.py		linear_regression.py
linear_regression_model.joblib		linear_regression_model.joblib
linear_regression_notebook.ipynb		linear_regression_notebook.ipynb
requirements.txt		requirements.txt

sjain2580/Simple-Linear-Regression-Model

Folders and files

Latest commit

History

Repository files navigation

Simple Linear Regression Model - California Housing Price Prediction with Linear Regression

Overview

Features

Technologies used

Model used (Architecture)

Data Processing

Data Analysis

Model Training

Prerequisites

How to Run the Project

Visualization

Contributors

Connect with Me

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages