# Overview

This project analyzes home values by zipcode and state between 1997 and 2017. The challenge and dataset can be found [here](https://discuss.codecademy.com/t/data-science-independent-project-4-home-value-trends/419948?_gl=1*1t2z3fs*_ga*ODUyNDAwMzMyMC4xNjk4NDk1NzIw*_ga_3LRZM6TM9L*MTY5ODYyNTYwNy41LjEuMTY5ODYyNjA2Ni40LjAuMA..).

## Dataset

To download the dataset for this project, uncomment and run the next cell. Note it requires `wget`.

In [1]:
!wget -O home_value_data.csv https://static-assets.codecademy.com/community/datasets_forum_projects/home_value_data.csv?_gl=1*e7jqre*_ga*ODUyNDAwMzMyMC4xNjk4NDk1NzIw*_ga_3LRZM6TM9L*MTY5ODc5ODEzMS43LjAuMTY5ODc5ODEzMS42MC4wLjA.

--2023-11-02 18:37:25--  https://static-assets.codecademy.com/community/datasets_forum_projects/home_value_data.csv?_gl=1*e7jqre*_ga*ODUyNDAwMzMyMC4xNjk4NDk1NzIw*_ga_3LRZM6TM9L*MTY5ODc5ODEzMS43LjAuMTY5ODc5ODEzMS42MC4wLjA.
Resolving static-assets.codecademy.com (static-assets.codecademy.com)... 104.18.199.63, 104.17.212.81, 2606:4700::6812:c73f, ...
Connecting to static-assets.codecademy.com (static-assets.codecademy.com)|104.18.199.63|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 283223829 (270M) [text/csv]
Saving to: ‘home_value_data.csv’


2023-11-02 18:37:49 (11.5 MB/s) - ‘home_value_data.csv’ saved [283223829/283223829]



In [2]:
import sqlite3

conn = sqlite3.connect("home_value.db")
cur = conn.cursor()

cur.execute('''CREATE TABLE home_value (
            zip_code INT, \
            city TEXT,
            state TEXT,
            metro TEXT,
            county TEXT,
            date DATE,
            value REAL)''')

<sqlite3.Cursor at 0x7f6e0ba694c0>

In [3]:
import pandas as pd

data = pd.read_csv('home_value_data.csv')
data.to_sql('home_value', conn, if_exists='append', index=False)

4202944

In [4]:
# Load the sql module to iPython

%load_ext sql

In [5]:
%sql sqlite:///home_value.db

## Execution

1. How many distinct zip codes are in this dataset?

In [6]:
%%sql

SELECT count(DISTINCT zip_code)
FROM home_value;

 * sqlite:///home_value.db
Done.


count(DISTINCT zip_code)
15452


2. How many zip codes are from each state?

In [7]:
%%sql

SELECT state, COUNT(DISTINCT zip_code)
FROM home_value
GROUP BY state;


 * sqlite:///home_value.db
Done.


state,COUNT(DISTINCT zip_code)
AK,28
AL,221
AR,119
AZ,233
CA,1230
CO,261
CT,122
DC,18
DE,39
FL,795


3. What range of years are represented in the data?

In [8]:
%%sql

SELECT 
MIN(substr(date, 1, 4)) AS 'MinYear',
MAX(substr(date, 1, 4)) AS 'MaxYear'
FROM home_value

 * sqlite:///home_value.db
Done.


MinYear,MaxYear
1996,2018


4. Using the most recent month of data available, what is the range of estimated home values across the nation?

In [9]:
%%sql

WITH recent AS (
    SELECT value 
    FROM home_value
    WHERE date IN (
        SELECT MAX(date)
        FROM home_value
    )
)
SELECT 
MIN(recent.value) AS 'MinValue',
MAX(recent.value) AS 'MaxValue'
FROM recent

 * sqlite:///home_value.db
Done.


MinValue,MaxValue
21600.0,17757800.0


## Analysis

1. Explore how home value differ by region as well as change over time. Using the most recent month of data available, which states have the highest average home values? How about the lowest?

In [10]:
%%sql

WITH
recent AS (
    SELECT state, value
    FROM home_value
    WHERE date IN (
        SELECT MAX(date)
        FROM home_value
    )
),
state_avg AS (
    SELECT recent.state, ROUND(AVG(recent.value), 0) AS 'avg_value'
    FROM recent
    GROUP BY state
    ORDER BY avg_value DESC
)
SELECT state_avg.state, state_avg.avg_value
FROM state_avg


 * sqlite:///home_value.db
Done.


state,avg_value
DC,826572.0
CA,750965.0
HI,711085.0
MA,475927.0
CO,442713.0
WA,414105.0
NJ,403476.0
NY,378121.0
NV,349835.0
UT,343777.0


2. Which states have the highest / lowest average home values for the year 2017? What about for the year 2007? 1997?

In [11]:
%%sql

WITH
avgs AS (
    SELECT 
        substr(date, 1, 4) AS 'Year', 
        state AS 'State', 
        ROUND(AVG(value), 2) as 'AvgValue'
    FROM home_value
    WHERE year IN ('1997', '2007', '2017')
    GROUP BY year, state
),
expensive AS (
    SELECT year, state, MAX(avgs.AvgValue)
    FROM avgs
    GROUP BY year
),
cheap AS (
    SELECT year, state, MIN(avgs.AvgValue) 
    FROM avgs
    GROUP BY year
)
SELECT expensive.year, expensive.state AS "Most Expensive State", cheap.state AS "Least Expensive State"
FROM expensive
JOIN cheap ON expensive.year = cheap.year

 * sqlite:///home_value.db
Done.


year,Most Expensive State,Least Expensive State
1997,HI,OK
2007,HI,OK
2017,DC,OK


3. What is the percent change in the average home values from 2007 to 2017 by state? How about from 1997 to 2017?

In [38]:
%%sql

WITH 
query_1 AS (SELECT state, ROUND(AVG(value), 0) AS value FROM home_value WHERE substr(date, 1, 4) = '2017' GROUP BY state),
query_2 AS (SELECT state, ROUND(AVG(value), 0) AS value FROM home_value WHERE substr(date, 1, 4) = '2007' GROUP BY state)
SELECT query_1.state, ROUND((query_1.value - query_2.value) / query_2.value * 100, 1) AS 'Avg % Change' FROM query_1
JOIN query_2 ON query_1.state = query_2.state

 * sqlite:///home_value.db
Done.


state,Avg % Change
AK,15.3
AL,-6.4
AR,6.7
AZ,-13.4
CA,15.1
CO,26.0
CT,-18.0
DC,29.8
DE,-18.4
FL,-9.1


In [33]:
%%sql

SELECT state, ROUND(AVG(value), 0) AS 'Value'
FROM home_value 
WHERE substr(date, 1, 4) = '2017'

 * sqlite:///home_value.db
Done.


state,Value
IL,266514.0


4. How would you describe the trend in home values for each state from 1997 to 2017? How about from 2007 to 2017? Which states would you recommend for making real estate investments?