# Predicting the Most Valuable Home Improvement Projects In King County

## Introduction

The King County Housing Data Set contains information about the size, location, condition, and other features of houses in King County. A full description of the dataset's columns can be found below. The aim of this project is to develop a linear regression model than can predict which home improvement projects will add to the sale value of homes.

## Business Problem

A client in King County, WA wants to advise homeowners on home improvement projects that will add to the sale value of their homes.

## Analysis Questions

This analysis will seek to answer three questions about the data:

Question 1: Will enclosing a porch increase the sale price of a home?

Question 2: Is converting a garage to a bedroom a good way to increase the sale price of a home?

Question 3: Will upgrading to a forced-air heating system increase the sale price of a home?

## Previewing the data

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
saleprice = pd.read_csv('../../data/raw/EXTR_RPSale.csv')
saleprice.head(2)

Unnamed: 0,ExciseTaxNbr,Major,Minor,DocumentDate,SalePrice,RecordingNbr,Volume,Page,PlatNbr,PlatType,...,PropertyType,PrincipalUse,SaleInstrument,AFForestLand,AFCurrentUseLand,AFNonProfitUse,AFHistoricProperty,SaleReason,PropertyClass,SaleWarning
0,2857854,198920,1430,03/28/2017,0,20170410000541,,,,,...,3,7,15,N,N,N,N,16,2,20 31
1,2743355,638580,110,07/14/2015,190000,20150715002686,,,,,...,3,6,3,N,N,N,N,1,8,15


In [4]:
saleprice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 351067 entries, 0 to 351066
Data columns (total 24 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   ExciseTaxNbr        351067 non-null  int64 
 1   Major               351067 non-null  int64 
 2   Minor               351067 non-null  int64 
 3   DocumentDate        351067 non-null  object
 4   SalePrice           351067 non-null  int64 
 5   RecordingNbr        351067 non-null  object
 6   Volume              351067 non-null  object
 7   Page                351067 non-null  object
 8   PlatNbr             351067 non-null  object
 9   PlatType            351067 non-null  object
 10  PlatLot             351067 non-null  object
 11  PlatBlock           351067 non-null  object
 12  SellerName          351067 non-null  object
 13  BuyerName           351067 non-null  object
 14  PropertyType        351067 non-null  int64 
 15  PrincipalUse        351067 non-null  int64 
 16  Sa

In [6]:
housing_data = pd.read_csv('../../data/raw/EXTR_ResBldg.csv')
housing_data.head(2)

Unnamed: 0,Major,Minor,BldgNbr,NbrLivingUnits,Address,BuildingNumber,Fraction,DirectionPrefix,StreetName,StreetType,...,FpMultiStory,FpFreestanding,FpAdditional,YrBuilt,YrRenovated,PcntComplete,Obsolescence,PcntNetCondition,Condition,AddnlCost
0,9800,720,1,1,27719 SE 26TH WAY 98075,27719,,SE,26TH,WAY,...,0,0,0,2001,0,0,0,0,3,0
1,9802,140,1,1,2829 277TH TER SE 98075,2829,,,277TH,TER,...,0,0,0,2004,0,0,0,0,3,0


In [7]:
housing_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181510 entries, 0 to 181509
Data columns (total 50 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Major               181510 non-null  int64  
 1   Minor               181510 non-null  int64  
 2   BldgNbr             181510 non-null  int64  
 3   NbrLivingUnits      181510 non-null  int64  
 4   Address             181510 non-null  object 
 5   BuildingNumber      181510 non-null  object 
 6   Fraction            181510 non-null  object 
 7   DirectionPrefix     181146 non-null  object 
 8   StreetName          181510 non-null  object 
 9   StreetType          181510 non-null  object 
 10  DirectionSuffix     181146 non-null  object 
 11  ZipCode             154594 non-null  object 
 12  Stories             181510 non-null  float64
 13  BldgGrade           181510 non-null  int64  
 14  BldgGradeVar        181510 non-null  int64  
 15  SqFt1stFloor        181510 non-nul

In [8]:
parcel = pd.read_csv('../../data/raw/EXTR_Parcel.csv')
parcel.head(2)

Unnamed: 0.1,Unnamed: 0,Major,Minor,PropName,PlatName,PlatLot,PlatBlock,Range,Township,Section,...,SeismicHazard,LandslideHazard,SteepSlopeHazard,Stream,Wetland,SpeciesOfConcern,SensitiveAreaTract,WaterProblems,TranspConcurrency,OtherProblems
0,0,807841,410,,SUMMER RIDGE DIV NO. 02,41,,6,25,22,...,N,N,N,N,N,N,N,N,N,N
1,2,755080,15,,SANDER'S TO GILMAN PK & SALMON BAY,3,1.0,3,25,11,...,N,N,N,N,N,N,N,N,N,N


In [9]:
parcel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205199 entries, 0 to 205198
Data columns (total 82 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   Unnamed: 0              205199 non-null  int64  
 1   Major                   205199 non-null  int64  
 2   Minor                   205199 non-null  int64  
 3   PropName                196088 non-null  object 
 4   PlatName                176654 non-null  object 
 5   PlatLot                 205199 non-null  object 
 6   PlatBlock               205199 non-null  object 
 7   Range                   205199 non-null  int64  
 8   Township                205199 non-null  int64  
 9   Section                 205199 non-null  int64  
 10  QuarterSection          205199 non-null  object 
 11  PropType                205199 non-null  object 
 12  Area                    205193 non-null  float64
 13  SubArea                 205193 non-null  float64
 14  SpecArea            