# Washington DC Housing Market Analysis 

# Exploratory Data Analysis

## Project Goal

The goal of this analysis is to explore Washington DC housing market data and gather initial findings. From these findings, we will reconfigure and group the Washington DC regions according to various housing statistics. Upon reconfiguration, these regions will then show similar/same sale prices and other housing characteristics. This analysis will be the basis of future evaluation including building predictive models to predict future home sale price and other notable housing market variables.

## Summary of Data

This analysis uses housing market data from the time period of February 2012 to October 2019, including data for prices (median sale price, percentage of homes sold above list price, percentage of homes that had price drop, etc.), inventory (number of homes on market, new listings, months of supply, etc.), and sales (number of homes sold, median days on market, etc.).

#### Data Source: https://www.redfin.com/blog/data-center

### Library Import

In [1]:
#Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Data Import and Data Cleaning
### Importing data and creating dataframes for each Washington DC region
#### All required datasets are downloaded in our folder titled "data"

In [2]:
# set up name label to match each csv dataset
lst = []
for i in range(1,83):
    location = 'data/'
    ext = 'data_crosstab ({}).csv'.format(i)
    final = location + ext
    lst.append(final)
    # print(final)

# pull datasets from data folder and clean data such as $1,200K, 1.5%, etc. Change data type from object to float
# create a new list: df_list to include all datasets
df_list = []
i = 0
for location in lst:
    df = pd.read_csv(location, encoding='utf-16', sep='\t')
    df = df.add_suffix('_'+str(i))
    # reformat the median sale prices from strings to floats
    #df["Median Sale Price" + "_"+ str(i)] = df["Median Sale Price" + "_"+ str(i)].str.replace("$", "").str.replace(",", "").str.replace("K","000").str.replace("%","").astype(float)
    for j in range(len(df.columns)-2):
        if df[df.columns[j+2]].dtype == 'object':
            df[df.columns[j+2]] = df[df.columns[j+2]].str.replace("$", "").str.replace(",", "").str.replace("K","000").str.replace("%","").astype(float)
        #df[df.columns[j+2]] = df[df.columns[j+2]].str.replace("$", "").str.replace(",", "").str.replace("K","000").str.replace("%","").astype(float)
    df_list.append(df)
    i += 1

#Overview of Washington DC housing market data
df_list[0].head()

In [None]:
#review the column names 
df.columns

In [None]:
# Ensure the data types are appropriate for analysis
df.info()

### Creating dataframe containing median sale prices, Homes Sold MoM and Inventory MoM of each Washington DC region from Feb. 2012 to Oct. 2019 

In [5]:
# Create datasets for median sale prices, Homes Sold MoM and Inventory MoM and save them in data folder


#Creating dataframe containing Median Sale Price of each Washington DC region from Feb. 2012 to Oct. 2019 
final_lst = []
i = 0
for df in df_list:
    final_lst.append(df["Median Sale Price" + "_"+ str(i)][0:93])
    i += 1
median_sale_price = pd.concat(final_lst, axis = 1)
median_sale_price.to_csv('data/Median Sale Price.csv')

#Overview of median_sale_price data
#median_sale_price

#Creating dataframe containing Homes Sold MoM of each Washington DC region from Feb. 2012 to Oct. 2019 
final_lst_2 = []
i = 0
for df in df_list:
    final_lst_2.append(df["Homes Sold MoM " + "_"+ str(i)][0:93])
    i += 1
homes_sold_mom = pd.concat(final_lst_2, axis = 1)
homes_sold_mom.to_csv('data/Homes Sold MoM.csv')

#Overview of homes_sold_mom data
#homes_sold_mom

#Creating dataframe containing inventory MoM of each Washington DC region from Feb. 2012 to Oct. 2019 
final_lst_3 = []
i = 0
for df in df_list:
    final_lst_3.append(df["Inventory MoM " + "_"+ str(i)][0:93])
    i += 1
inventory_mom = pd.concat(final_lst_3, axis = 1)
inventory_mom.to_csv('data/Inventory MoM.csv')

#Overview of inventory_mom data
#inventory_mom