
# 🏠 House Price Prediction — Advanced Regression Techniques

This project uses the **Kaggle House Prices dataset** to predict the sale prices of houses using **machine learning regression techniques**.

## 📌 Objective
Predict the **final price of a home** based on various features like lot size, neighborhood, year built, number of rooms, and more.

## 📊 Dataset
Source: [Kaggle — House Prices: Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)

- **Train Data**: Includes features and the target variable (`SalePrice`).
- **Test Data**: Includes only features (no `SalePrice`), used for submission.
- **Features**: 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa.

## 🛠 Steps to Perform
1. **Load and Explore Data**
   - Import CSV files using `pandas`.
   - View dataset shape, info, and basic statistics.
   - Check for missing values.

2. **Data Cleaning**
   - Handle missing values (mean/median for numerical, mode for categorical).
   - Remove or replace outliers.
   - Encode categorical variables (`OneHotEncoder`, `LabelEncoder`).

3. **Feature Engineering**
   - Create new meaningful features.
   - Scale numerical values (`StandardScaler`, `MinMaxScaler`).

4. **Model Selection**
   - Try different regression models:
     - Linear Regression
     - Decision Tree Regressor
     - Random Forest Regressor
     - Gradient Boosting (XGBoost, LightGBM)
   - Perform **cross-validation**.

5. **Model Evaluation**
   - Metrics: RMSE (Root Mean Squared Error), R² Score.
   - Tune hyperparameters.

6. **Final Prediction & Submission**
   - Predict prices for the test dataset.
   - Save results in `.csv` format for Kaggle submission.

## 📦 Requirements
- Python 3.x
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- XGBoost / LightGBM

## 🚀 Future Improvements
- Implement deep learning models (TensorFlow / PyTorch).
- Use advanced feature selection techniques.
- Try stacking and blending models.

## loading of data 

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

train_df=pd.read_csv('house-prices-advanced-regression-techniques/train.csv' )
test_df=pd.read_csv('house-prices-advanced-regression-techniques/test.csv')
print(train_df.head(3))



   Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0   1          60       RL         65.0     8450   Pave   NaN      Reg   
1   2          20       RL         80.0     9600   Pave   NaN      Reg   
2   3          60       RL         68.0    11250   Pave   NaN      IR1   

  LandContour Utilities  ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold  \
0         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
1         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      5   
2         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      9   

  YrSold  SaleType  SaleCondition  SalePrice  
0   2008        WD         Normal     208500  
1   2007        WD         Normal     181500  
2   2008        WD         Normal     223500  

[3 rows x 81 columns]


## data preprocessing and analysis  

In [7]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC