# House Price Prediction

- [1 - Introduction](#Introduction)
    - [1.1 - Project Overview](#Project-Overview)
    - [1.2 - Problem Statement](#Problem-Statement)
    - [1.3 - Dataset Description](#Dataset-Description)

- [2 - Import Libraries](#Import-Libraries)

- [3 - Data Loading and Exploration](#Data-Loading-and-Exploration)
    - [3.1 - Load the Dataset](#Load-the-Dataset)
    - [3.2 - Display Basic Information](#Display-Basic-Information)
    - [3.3 - Show Summary Statistics](#Show-Summary-Statistics)
    
- [4 - Data Preprocessing](#Data-Preprocessing)
    - [4.1 - Handle Missing Values](#Handle-Missing-Values)
    - [4.2 - Encode Categorical Variables](#Encode-Categorical-Variables)
    - [4.3 - Feature Scaling](#Feature-Scaling)

- [5 - Data Splitting](#Data-Splitting)
    - [5.1 - Split into Train, Validation, and Test Sets](#Split-into-Train-Validation-and-Test-Sets)
    - [5.2 - Split Data into Features (X) and Target (y)](#Split-Data-into-Features-X-and-Target-y)


- [6 - Model Definition and Training](#Model-Definition)
    - [6.1 - Define the Logistic Regression Model using Sklearn](#Define-the-Logistic-Regression-Model-using-Sklearn)
    - [6.2 - Train the Model](#Train-the-Model)
    - [6.3 - Evaluating Model on Validation Set](#Validation-During-Training)

- [7 - Hypertuning of Model](#Hypertuning-of-Model)

- [8 - Model Evaluation](#Model-Evaluation)
    - [8.1 - Confusion Matrix and Scores](#Confusion-Matrix-and-Scores)
    - [8.2 - ROC Curve](#ROC-Curve)

- [9 - Conclusion](#Conclusion)

# 1 - Introduction

## 1.1 - Project Overview
The goal of this project is to develop a predictive model that can estimate the prices of houses in Bengaluru, India. Accurately predicting house prices is crucial for real estate agents, buyers, and sellers to make informed decisions. By analyzing various factors such as the size of the property, location, and available amenities, we aim to build a machine learning model that can effectively predict house prices based on historical data.

## 1.2 - Problem Statement
The real estate market in Bengaluru is dynamic and influenced by multiple factors, making it challenging to estimate property prices accurately. The primary objective of this project is to address the following questions:

- Can we build an accurate model to predict house prices using historical real estate data from Bengaluru?
- How can we interpret the model's predictions to provide actionable insights for real estate professionals and potential buyers?

By answering these questions, we aim to create a tool that can assist in making more accurate and informed real estate decisions.

## 1.3 - Dataset Description
The dataset used in this project is sourced from Kaggle and contains detailed information on various properties in Bengaluru, India.

### Bengaluru House Data
Each row in the dataset represents a property listing, and each column provides different attributes about the properties.

- **Number of Rows:** 13,320 (properties)
- **Number of Columns:** 9 (features)
- **Target Column:** "price"

### Data Composition
The dataset includes the following information:

- **Area Type:**
  - The type of area (e.g., Super built-up Area, Plot Area, Built-up Area).

- **Availability:**
  - The availability status of the property (e.g., Ready to Move, available from a specific date).

- **Location:**
  - The location of the property within Bengaluru.

- **Size:**
  - The size of the property in terms of the number of bedrooms (e.g., 2 BHK, 3 Bedroom).

- **Total Area:**
  - The total area of the property in square feet.

- **Number of Bathrooms:**
  - The number of bathrooms available in the property.

- **Number of Balconies:**
  - The number of balconies available in the property.

This dataset provides a comprehensive view of the real estate market in Bengaluru, allowing us to analyze and model the factors that influence house prices effectively.


# [2 - Import Libraries](#Import-Libraries)

In this section, we import the necessary libraries required for data manipulation, visualization, and building a machine learning model using Sklearn.


In [None]:
# Basic libraries for data manipulation and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sklearn for data preprocessing, building, training the model and evaluation
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_curve, auc
from sklearn.model_selection import GridSearchCV

# [3 - Data Loading and Exploration](#Data-Loading-and-Exploration)

## [3.1 - Load the Dataset](#Load-the-Dataset)

In this section, we will load the Bengaluru House dataset into a pandas DataFrame for further exploration and analysis.

In [None]:
# Load the dataset into a pandas DataFrame
data_path = './Bengaluru_House_Data.csv'
df = pd.read_csv(data_path)

# Display the first few rows of the dataset to verify loading
df.head()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0
