Housing Price Prediction Model

This repository contains an exploration and implementation of various machine learning models to predict housing prices. The project was developed as part of the Machine Learning course at the Carlos III University of Madrid.

Overview

The project was centered around understanding the relationship between various input variables and housing prices. The primary goal was to create a model that could predict the price of houses based on these input features.

Data Preprocessing

One-Hot Encoding: Applied to all categorical variables. Irrelevant variables were discarded.
Normalization: Used Z-Score normalization on real (float) and integer data. However, special treatment was given to the price data due to its specific distribution.
Visualizations: Functions to visualize distributions and relations between variables were utilized.
Correlation Matrix: Established to visualize relationships among variables.
Data Splitting: The dataset was divided into training (80%) and testing (20%) samples.

Feature Engineering

A key component that set our model apart was the exploration of non-linear relations between features in our dataset. This exploration allowed us to create a unique set of input features, especially with a combination of bathrooms, bedrooms, and ratings (aseos+hab*rating) showing a linear correlation with the price.

Model Exploration

Baseline Model: Linear regression from scikit-learn was used as the baseline. Its performance was surprisingly competitive.
Other Models Tested: RandomForestRegressor, ElasticNet, Lasso, Ridge, DecisionTreeRegressor, KNeighborsRegressor, GradientBoostRegressor, AdaBoostRegressor, and CatBoostRegressor.
Model Optimization: Hyperparameters for GradientBoostRegressor and CatBoostRegressor were particularly optimized.

Metrics

MAE (Mean Absolute Error): Used to measure the average of the absolute difference between the predicted and actual values.
MAPE (Mean Absolute Percentage Error): Offered a percentage error which is more interpretable and scale-independent.

Results

The standout model in terms of performance was CatBoostRegresso, achieving MAPE values around 10-12% on the sample test set. In the competition held in class, its MAPE was of 12.91%, the best result among all groups, 0.64% better than the second best group.

Usage

You can test the model on your own by running the following command:

cd validator
python validator.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
validator		validator
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validator

validator

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

main.ipynb

main.ipynb

train.csv

train.csv

Repository files navigation

Housing Price Prediction Model

Overview

Data Preprocessing

Feature Engineering

Model Exploration

Metrics

Results

Usage

References

About

Releases

Packages

Contributors 2

Languages

inogii/lacasademickeymouse

Folders and files

Latest commit

History

Repository files navigation

Housing Price Prediction Model

Overview

Data Preprocessing

Feature Engineering

Model Exploration

Metrics

Results

Usage

References

About

Topics

Resources

Stars

Watchers

Forks

Languages