<a href="https://colab.research.google.com/github/silentfortin/ai-portfolio/blob/main/03-ml-housing-prediction/RealEstateAI_PricePrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RealEstateAI Solutions - predictive model for real estate pricing

> 👨‍💻 Developed as part of the **AI Engineering Master – Week 2**

This project aims to build a predictive model for real estate pricing using linear regression techniques with regularization. The objective is to improve model generalization, reduce overfitting, and offer robust, accurate pricing estimates using Ridge, Lasso, and Elastic Net regressions.

## Key Goals:
- Load and preprocess the housing dataset
- Handle missing values and categorical variables
- Scale features for regularization methods
- Train Ridge, Lasso, and Elastic Net models
- Evaluate models using Cross-Validation and MSE
- Compare performance and model sparsity
- Visualize results and residuals

🔗 GitHub Repository:
[📁 ai-portfolio]()


## Load & Explore Dataset

- Load the housing data from the provided URL
- Display dataset info and initial rows

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [None]:
import kagglehub
from kagglehub import KaggleDatasetAdapter

In [None]:
# Set the path to the file you'd like to load
file_path = "Housing.csv"

# Load the latest version
df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "yasserh/housing-prices-dataset",
  file_path,
)

df.head(2)

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   price             545 non-null    int64 
 1   area              545 non-null    int64 
 2   bedrooms          545 non-null    int64 
 3   bathrooms         545 non-null    int64 
 4   stories           545 non-null    int64 
 5   mainroad          545 non-null    object
 6   guestroom         545 non-null    object
 7   basement          545 non-null    object
 8   hotwaterheating   545 non-null    object
 9   airconditioning   545 non-null    object
 10  parking           545 non-null    int64 
 11  prefarea          545 non-null    object
 12  furnishingstatus  545 non-null    object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB


In [None]:
df.isna().sum()

Unnamed: 0,0
price,0
area,0
bedrooms,0
bathrooms,0
stories,0
mainroad,0
guestroom,0
basement,0
hotwaterheating,0
airconditioning,0


# Useful information

*   The dataset contains 545 rows and 13 columns
*   The dataset contains two main data types: int64 for numerical columns and object for categorical columns
*   The `df.isna().sum()` function confirms that all columns have complete data

