<a href="https://colab.research.google.com/github/primriq/ML-Apex-Univ/blob/main/linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Multivariate Linear Regression
This notebook demonstrates how to build a multivariate linear regression model to predict house prices using features such as square footage, number of bedrooms, and age of the home. The objective is to show a clear, step-by-step workflow from data loading to making predictions.

### Loading libraries and dataset
We begin by importing pandas to handle tabular data and scikit-learn's `LinearRegression` model. We load the housing dataset from the GitHub repository.

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

url = 'https://raw.githubusercontent.com/primriq/ML-Apex-Univ/refs/heads/main/Datasets/homeprices%20(1).csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


### Understanding the dataset
The dataset contains the following columns:
- **area**: size of the house in square feet
- **bedrooms**: number of bedrooms
- **age**: age of the property in years
- **price**: target variable we want to predict

Our goal is to fit a regression model that best explains the relationship between these input features and the price.

### Preparing the feature matrix and target vector
We separate the input variables (area, bedrooms, age) from the target variable (price). These will be fed into the linear regression model.

In [None]:
X = df[['area', 'bedrooms', 'age']]
y = df['price']
X.head(), y.head()

(   area  bedrooms  age
 0  2600       3.0   20
 1  3000       4.0   15
 2  3200       NaN   18
 3  3600       3.0   30
 4  4000       5.0    8,
 0    550000
 1    565000
 2    610000
 3    595000
 4    760000
 Name: price, dtype: int64)

### Training the Linear Regression model
We create an instance of `LinearRegression` and fit it using the features and target. The model will learn coefficients for each input variable and an intercept term.

In [None]:
reg = LinearRegression()
X_filled = X.fillna(X['bedrooms'].mean()) # Impute missing 'bedrooms' with the mean
reg.fit(X_filled, y) # Fit with the imputed data

### Model parameters
The learned model can be interpreted mathematically as:

price = (coef_area × area) + (coef_bedrooms × bedrooms) + (coef_age × age) + intercept

We can inspect the coefficients below.

In [None]:
coef = reg.coef_
intercept = reg.intercept_
coef, intercept

(array([  116.66950551, 18756.28806982, -3675.75111708]),
 np.float64(231586.00639409182))

### Making predictions
To predict the price of a home, we supply its area, number of bedrooms, and age as a list.
Below, we compute predictions for two sample houses.

In [None]:
# Predict price for a 3000 sq ft home with 3 bedrooms, 40 years old
reg.predict([[3000, 3, 40]])



array([490833.34243748])

In [None]:
# Predict price for a 2500 sq ft home with 4 bedrooms, 5 years old
reg.predict([[2500, 4, 5]])



array([579906.16685223])

### Summary
In this notebook, we:
- Loaded a housing dataset
- Prepared the data for modeling
- Trained a multivariate linear regression model
- Examined model coefficients
- Used the model to make predictions

Multivariate linear regression is useful when multiple features jointly influence the outcome. This workflow serves as a foundation for many future regression tasks.