# üè† P2.1.1.7 ‚Äì Machine Learning Foundations

## Topic: House Price Prediction Example


## üéØ Learning Objectives

By the end of this notebook, you will be able to:

- Understand the steps in a regression ML pipeline
- Prepare data for regression tasks
- Build and evaluate a house price prediction model using Linear Regression
- Interpret model results and performance metrics


## üìù Problem Statement

We want to build a program that predicts house prices based on features like size, bedrooms, and age. The goal is to automate price estimation for new houses.


**Why is this important?**

- Helps buyers and sellers make informed decisions
- Automates price prediction for real estate platforms

## üîç Choosing the ML Type

For this problem, we use **Supervised Learning** because:
- We have labeled examples (house features and prices)
- The model learns from past data to predict new prices

**Why not Unsupervised or Reinforcement?**
- Unsupervised is for finding patterns without labels
- Reinforcement is for decision-making in environments (not regression)

## ü§ñ Choosing the Model & Why

We use the **Linear Regression** model because:
- It predicts continuous values (house prices)
- It is simple and interpretable
- It shows how features affect price

**Why not other models?**
- Decision Trees, Random Forests, etc. can be used, but Linear Regression is a classic choice for regression tasks and is easy to understand

## üõ†Ô∏è Example: House Price Prediction Pipeline

This example shows the steps:
1. Prepare features and target
2. Split data into train/test
3. Train Linear Regression model
4. Predict and evaluate

```

In [None]:
"""
House Price Prediction using Scikit-learn
-----------------------------------------
This program predicts house prices using
Linear Regression and real ML workflow.

Author: AI Course
"""

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


def main():
    print("HOUSE PRICE PREDICTION MODEL")
    print("-----------------------------")

    # Dataset
    # Features: [Size (sqft), Bedrooms, Age]
    X = np.array([
        [1000, 2, 10],
        [1200, 3, 5],
        [1500, 3, 8],
        [1800, 4, 3],
        [2000, 4, 2],
        [2300, 5, 1]
    ])

    # Target: Price in thousands
    y = np.array([200, 250, 300, 360, 400, 450])

    # Train-Test Split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )

    # Model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Predictions
    predictions = model.predict(X_test)

    # Evaluation
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    r2 = r2_score(y_test, predictions)

    print("Predictions:", predictions)
    print("RMSE:", round(rmse, 2))
    print("R2 Score:", round(r2, 2))

    # Feature importance
    print("\nFeature Coefficients:")
    for feature, coef in zip(["Size", "Bedrooms", "Age"], model.coef_):
        print(f"{feature}: {round(coef, 2)}")

    # New Prediction
    new_house = np.array([[1600, 3, 4]])
    predicted_price = model.predict(new_house)
    print("\nPredicted price for new house:", round(predicted_price[0], 2))


if __name__ == "__main__":
    main()

## üìä Understanding RMSE, R2 & Feature Importance

- **RMSE (Root Mean Squared Error):** Measures how far predictions are from actual values. Lower is better.
- **R2 Score:** Shows how well the model explains the variation in prices. Closer to 1 is better.
- **Feature Coefficients:** Show how each feature (size, bedrooms, age) affects price.

**Why do we need these?**
- To measure how well the model works
- To understand which features matter most
- To ensure the model is reliable before using it in real-world scenarios