Assignment topic: House_Prices

PROJECT

IBM Exploratory Data Analysis for Machine Learning

Course Project

Hannah_Reber

20.10.2020

project:

https://www.coursera.org/learn/ibm-exploratory-data-analysis-for-machine-learning/home/welcome

Assignment topic: House_Prices

Dataset download and description

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data "With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home."

Dataset summary

original dataset needed a lot of cleaning, but is very diverse and very good for cleaning practice

Initial plan for data exploration: Predict House_Prices

aiming to predict the sales prices

"Predict sales prices and practice feature engineering, RFs, and gradient boosting"

Actions taken for data cleaning and feature engineering

all actions taken: see sub/nb1-4

NA-Handeling
Assigning numeric values to all columns based on value count ranks
Merge repetative columns
Assigning contant

Key Findings and Insights

all key findings in statistical-analysis.ipynb

Correlations Top 5:

1. RANK: 'OveralQual'
1. RANK: 'GrLivArea'
1. RANK: 'GarageCars'
1. RANK: 'GarageArea'
1. RANK: 'TotalBsmtSF'

Timedependency SEASONALITY:

density (= number of houses sold) varies over time: peaks in summers
price range: most houses between 10K and 30K, only a handful prices >10K or >30K
relation: prices also seem to increase during selling-season(=summer)

Hypothesis

3 main hypothesis used in analysis

1. H1: Sales Prices are dependent on seasonality
1. H2: Summer is the best selling season
1. H3: OveralQuality is most important real estate property

Significance testing

full OLS models in sub4_STEP_4_OLS.ipynb

Significance was tested via OLS modeling and visualized via scatterplot, boxplot correlations and time series.

Next steps

ML model

Next steps suggestion: using the cleaned data to train a deep learning framework and compare predictions.

Contents_of_this_Repo

1/ analysis_notebook=summary+_overview_analysis

main_notebook = statistical-analysis.ipynb

2/ files_g1:9

png_images_generated_in_sub_notebooks_and_integrated_in_main_notebook = statistical-analysis.ipynb

3/ data_folder

input_and_output_csv = data

4/ subs_folder

all_sub notebooks_with_detailed_analysis_steps = subs

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
subs		subs
.gitkeep		.gitkeep
README.md		README.md
g1.png		g1.png
g2.png		g2.png
g3.png		g3.png
g4.png		g4.png
g5.png		g5.png
g6.png		g6.png
g7.png		g7.png
g9.png		g9.png
statistical-analysis.ipynb		statistical-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROJECT

IBM Exploratory Data Analysis for Machine Learning

Course Project

project:

Assignment topic: House_Prices

Dataset download and description

Dataset summary

Initial plan for data exploration: Predict House_Prices

aiming to predict the sales prices

Actions taken for data cleaning and feature engineering

all actions taken: see sub/nb1-4

Key Findings and Insights

all key findings in statistical-analysis.ipynb

Hypothesis

3 main hypothesis used in analysis

Significance testing

full OLS models in sub4_STEP_4_OLS.ipynb

Next steps

ML model

Contents_of_this_Repo

1/ analysis_notebook=summary+_overview_analysis

2/ files_g1:9

3/ data_folder

4/ subs_folder

About

Releases

Packages

Languages

hannahaih/Project_House_Prices

Folders and files

Latest commit

History

Repository files navigation

PROJECT

IBM Exploratory Data Analysis for Machine Learning

Course Project

project:

Assignment topic: House_Prices

Dataset download and description

Dataset summary

Initial plan for data exploration: Predict House_Prices

aiming to predict the sales prices

Actions taken for data cleaning and feature engineering

all actions taken: see sub/nb1-4

Key Findings and Insights

all key findings in statistical-analysis.ipynb

Hypothesis

3 main hypothesis used in analysis

Significance testing

full OLS models in sub4_STEP_4_OLS.ipynb

Next steps

ML model

Contents_of_this_Repo

1/ analysis_notebook=summary+_overview_analysis

2/ files_g1:9

3/ data_folder

4/ subs_folder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages