# Example EDA on the King County Housing Dataset
## Abstract
This notebook provides an example of an EDA on a dataset of house sales in King County, Washington, USA. First, a (fictitious) stakeholder that wants to sell her houses is introduced to define the scope and guide the investigation into the data. The main priorities, definitions and assumptions, as well as the questions the stakeholder needs to be answered, are discussed.

Second, the investigation into the data is given in this notebook, structured and commented to be easily digested by, e.g., other data scientists. 
Finally, the results of the inquiry are summarized and actionable recommendations for the stakeholder are formulated.

## Table of Contents
1. [The King County Housing Price Dataset](#the-king-county-housing-price-dataset)
2. [Stakeholder Perspective](#Stakeholder-Perspective)
3. EDA
4. Summary and Conclusions

## The King County Housing Price Dataset
The [King County Housing Price Data Set](https://www.kaggle.com/harlfoxem/housesalesprediction) provides data on over 20,000 house sales in King County, Washington located in the center of the Seattle Metropolitan Area. The data covers the period from May 2014 to May 2015 and comprises details such as price, year of construction, number of bedrooms, geographical location etc. Our client is a prospective seller looking for new market insights and tasks us with an explorative data analysis of the dataset with focus on her individual makeup.

The data set has 21597 rows and 20 columns. Each row represents one transaction, for a house that was sold from May 2 2014 to May 27 2015. The columns provide details of the houses, such as price, living space or location. A description of the different columns has been kindly borrowed from [GeoDa Data and Lab](https://geodacenter.github.io/data-and-lab/KingCounty-HouseSales2015/)

|Variable|	Description|
|--------|-------------|
id	|Identification
date|	Date sold
price|	Sale price
bedrooms|	Number of bedrooms
bathrooms|	Number of bathrooms
sqft_liv|	Size of living area in square feet
sqft_lot|	Size of the lot in square feet
floors|	Number of floors
waterfront|	‘1’ if the property has a waterfront, ‘0’ if not.
view|	An index from 0 to 4 of how good the view of the property was
condition|	Condition of the house, ranked from 1 to 5
grade|	Classification by construction quality which refers to the types of materials used and the quality of workmanship. Buildings of better quality (higher grade) cost more to build per unit of measure and command higher value. Additional information in: KingCounty
sqft_above|	Square feet above ground
sqft_basmt|	Square feet below ground
yr_built|	Year built
yr_renov|	Year renovated. ‘0’ if never renovated
zipcode|	5 digit zip code
lat|	Latitude
long|	Longitude
squft_liv15|	Average size of interior housing living space for the closest 15 houses, in square feet
squft_lot15|	Average size of land lots for the closest 15 houses, in square feet

## Stakeholder Perspective
Stakeholder: Bonnie Williams
Age: 29 years
Occupation: Entrepeneur

Bonnie owns several houses across King County, mostly in "bad" neigborhoods and is looking to sell, preferably with large returns. She is open to renovating the houses before selling if it increases her bottom line. Bonnie inherited a few houses - of grade 3 to 7 exclusively - from her aunt Marie, i.e., her buildings are of lower construction quality, and mostly located in less desirable neighborhoods. Bonnie wants to sell to acquire enough money to start her own Data Science venture in King County. After discussing Bonnie's priorities we can make note of the assumptions that will guide our analysis:

* Scope: Bonnie has limited time, because she wants to start her new venture asap. She therefore wants to **sell within the year**. She needs first results of the EDA within **two days**.
* Bonnies wants to: 
  * learn about the **neighborhoods that have similar houses** and where they are in King County
  * understand the **pricing for comparable houses** to those she wants to sell
  * learn whether **renovating could increase her revenue**
  * learn **when to best sell within the year**

  
We'll also make the following assumptions:

* Bonnie's houses typically have 1-2 bedrooms, grade 3-7, no more than 2 baths and less than 10,000 sqft lot size

Based on these assumptions Bonnie asks us to find at least three useful insights into the data (at least one of them geographical), as well as actionable recommendations on how to best achieve her goals in the current market.


## EDA