# Predictive Analytics Workshop

This lab aims to introduce you to a basic use case of a machine learning algorithm applied to real-world data, specifically the California house prices https://www.kaggle.com/code/ahmedmahmoud16/california-housing-prices/input

## Boston Housing Price Prediction - 

> The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows, their names are pretty self explanitory:

- Target Variable:
    - **medianHouseValue**: Median house value for households within a block (measured in US Dollars)
- Predictor Variables:
    - **longitude**: A measure of how far west a house is; a higher value is farther west
    - **latitude**: A measure of how far north a house is; a higher value is farther north
    - **housingMedianAge**: Median age of a house within a block; a lower number is a newer building
    - **totalRooms**: Total number of rooms within a block
    - **totalBedrooms**: Total number of bedrooms within a block
    - **population**: Total number of people residing within a block
    - **households**: Total number of households, a group of people residing within a home unit, for a block
    - **medianIncome**: Median income for households within a block of houses (measured in tens of thousands of US Dollars)
    - **oceanProximity**: Location of the house w.r.t ocean/sea

### Predicting House Prices
We'll now build a predictive analytics where you will:

- Visualize relationships between features and house prices
- Split the dataset into training/testing sets
- Train a linear regression model
- Evaluate model performance
- Save and use the model for predictions

### Importing essential libraries

### Load the California Housing dataset

## Step 0 Handling Missing Values
#### Fill missing values in 'total_bedrooms' with the median value

## 1️⃣ Visualizing the Data
### Scatter Plot: Relationship between Rooms & Price

- This helps identify trends and correlations.

### Pair Plot: Correlations Among Features

## 2️⃣ Splitting Data into Training & Testing
### Define features (X) and target (Y)

- Splitting data prevents overfitting by evaluating the model on unseen data.

## 3️⃣ Training the Linear Regression Model
### Train the model

- The model learns how features (income, rooms, bedrooms, population) affect house price.

## 4️⃣ Evaluating the Model
### Predict house prices on the test set

## **Regression Metrics & Interpretation**
**For Regression Models (if applicable)**


✅ **MAE** (Mean Absolute Error) → Measures absolute differences.

✅ **MSE** (Mean Squared Error) → Penalizes larger errors.

✅ **RMSE** (Root MSE) → Standardizes MSE to the same units.

✅ **R² Score** → Measures how well the model explains variance (best = 1.0).

✅ **Explained Variance Score** → Measures the proportion of the variance in the target variable that is captured by the model. It is similar to R², but while R² can be affected by bias in the predictions, the explained variance focuses purely on how well the model captures the spread of the data. A score closer to 1 indicates that most of the variance is explained by the model.

### Now, we test the model on an unseen portion of the data.

### Calculate Mean Squared Error (MSE) and R² Score

### Display evaluation metrics

## 5️⃣ Visualizing Predictions vs. Actual Prices

- Closer points are to the red line, the better the model.

## 6️⃣ Saving and Using the Model
### Save trained model

- The model is saved and can be reused for future house price predictions.