Project Overview

Project Statement: Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home

In this project I built a regressions model capable of predicting house price of given instance with high variety of Independent variables.

Tree based algorithms were used for model building starting with a Random Forest Regressor, then SVR, Linear Regressor, XGBoost Regressor and in last Decision Tree Regressor. Total five models were made.

Data

The dataset for this project is from kaggle's advanced housing price prediction v9. This can be found at: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data

Tools Overview

The following are the tools that are covered in the notebooks. They are popular tools that machine learning engineers and data scientists need in one way or another and day to day.

Python is a high level programming language that has got a lot of popularity in the data community and with the rapid growth of the libraries and frameworks, this is a right programming language to do ML.

NumPy is a scientific computing tool used for array or matrix operations.

Pandas is a great and simple tool for analyzing and manipulating data from a variety of different sources.

Matplotlib is a comprehensive data visualization tool used to create static, animated, and interactive visualizations in Python.

Scikit-Learn: Instead of building machine learning models from scratch, Scikit-Learn makes it easy to use classical models in a few lines of code. This tool is adapted by almost the whole of the ML community and industries, from the startups to the big techs.

Observations:

By taking a look at the data description, I noticed that the NA value for most of the variables actually means that variable isn’t present for that particular observation. For example, an observation which has it’s PoolQC variable as NA, actually means there is no Pool. Hence, I correctly replaced the NA values with None.
Also some categorical variables were actually represented with numbers which most machine learning models would have struggled with so I also replaced these variables such as the MSSubClass variable.

Project Steps

Loading training and test data
Data Wrangling
Exploratory Data Analysis
Train- Test Split
Feature Selection
Model Selection And Evaluation
Final model building
Inference
Submission for competition

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
data_description.txt		data_description.txt
entry_to_housing_prediction_competition.ipynb		entry_to_housing_prediction_competition.ipynb
sample_submission.csv		sample_submission.csv
submission.csv		submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Data

Tools Overview

Observations:

Project Steps

About

Releases

Packages

Languages

oluwasemilorebadejo/kaggle-advanced-housing

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Data

Tools Overview

Observations:

Project Steps

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages