# Analysis on Entering the Real Estate Market in King County
The goal of this repository is to provide business recommendations to a group of real estate brokers who are considering starting a new real estate firm in King County. This analysis examines thousands of house sales over the past 100 years to provide these real estate brokers key insights to keep in mind as they enter the real estate industry.
![kings%20county.jpg](attachment:kings%20county.jpg)

## Business Problem
This group of realtors want to enter the real estate market in King County, but they are unsure what type of houses their new firm will focus on. There are a plethora of options based on the different categories houses can fall under such as price, # of bedrooms and bathrooms, square footage, and location. This analysis was conducted to provide this new company with insights into the trends, performances, history, and future predictions of house prices based on the specific categories these houses fall under.

## Data
This analysis was conducted using data from 1 main source:
- King County House Sales dataset.

The raw csv file of this data can be found in the ```zipped_data``` folder.

The metrics analyzed dealt primarily with each house's descriptive factors such as price, square footage, bedrooms, bathrooms, etc.

The key metrics analyzed were ```log_price```, ```sqft_living```, ```grade_bins```, and ```zipcode_bins```.

This analysis was performed using Python and the code can be found in ```Data Cleaning.ipynb```, ```Data Understanding.ipynb```, and ```Data Analysis.ipynb```.

## Results

### 1. The average house price considering the key metrics analyzed is $519,063.82
![Hist%20Price-2.png](attachment:Hist%20Price-2.png)

### 2. The mean squared error is $166,301.08.

This means that for any given predictions there will be a +/- $166,301.08 margin of error for the real value of the home. I recognize that this is a considerable margin for error, but given the R-squared score for this metric is 0.654, I can confidently confirm that this degree of variance is not significant enough to discount the model as a whole.

### 3. Given the average house price and mean squared error, the following indicates the expected values of homes depending on the key variables examined:

![%25%20change%20in%20price.png](attachment:%25%20change%20in%20price.png)

In [6]:
import pandas as pd
df1 = pd.read_csv('C:/Users/User/Documents/Flatiron/Phase2/Final_Model_OLS_df.csv')
df1=df1.drop(columns = ['Unnamed: 0'])
df1

Unnamed: 0,Key Variables,Coefficient,Percentage Change,Dollar Change,Total
0,Square Foot Living,0.0003,0.03,155.72,519219.54
1,Grade 2: Average,0.2013,20.13,104487.55,623551.37
2,Grade 3: Above Average,0.42,42.0,218006.8,737070.62
3,Grade 4: Excellent,0.4512,45.12,234201.6,753265.42
4,Zipcode: Quad 2,-0.0778,-7.78,-40383.17,478680.65
5,Zipcode: Quad 3,-0.3378,-33.78,-175339.76,343724.06
6,Zipcode: Quad 4,-0.0377,-3.77,-19568.71,499495.11
7,Zipcode: Quad 5,-0.0355,-3.55,-18426.77,500637.05
8,Zipcode: Quad 6,-0.2265,-22.65,-117567.96,401495.86
9,Zipcode: Quad 7,-0.4331,-43.31,-224806.54,294257.28


# Conclusions

This analysis provides insight to 3 key variables identified: ```Square Footage```, ```Grade```, and ```Zipcode```.

The expected value of homes in King County according to each of the key variables have been analyzed and presented.

# Next Steps

I advise the members of the new real estate firm to evaluate their limitations to evaluate what types of investments they can make given the results of my analysis. 
Once the firm has established their core limitations, I will be able to provide more exact recommendations.

Specifically, if the new firm wanted to target certain zipcodes, I would be able to clean my data further to provide more relevant pricing by zipcode.