## Zillow Single Family Home Values for NYC Neighborhoods:

# Buying a Home in NYC: What Neighborhoods are the Best Value?
### Applying Data Science Tools to Understand NYC's Residential Real Estate Fundamentals

    Josh Grasso | joshgrasso@gmail.com

This project seeks to understand the fundamental factors that explain differences in residential real estate prices across NYC. 

### Neighborhood-level Zillow Home Value Index (ZHVI) for Single-Family Homes (SFH)
Zillow has graciously made several datasets available, one of which is the Zillow Home Value Index (ZHVI) for Single-Family Homes (SFH), which is used in this project. The dataset is a monthly time series, going back as far as 1996 in some cases, with detail down to the “neighborhood” level. 

The neighborhood definitions used throughout this project are those defined by the NYC Department of City Planning, in which there are 306 neighborhoods in NYC’s 5 boroughs. The Zillow SFH data has a data series for 235 neighborhoods, which were all mapped to the neighborhood definitions used in this analysis. Further, the Zillow data has 424 total neighborhoods within the “New York-Newark-Jersey City” metro area – providing an additional 189 neighborhoods in surrounding areas across the Hudson River in New Jersey, on Long Island, and north into Westchester County - for possible further future analysis. Throughout the entire US, Zillow provides a data series for over 16k neighborhoods – an impressive level of granularity that speaks to the new capabilities of big data. 

My initial inspiration for the project came from exploring NYC real estate on the Zillow app – so it’s great to have access to their huge dataset, even if it’s not at the hyper-granular level of each individual listing. The NYC Department of Finance single-family residence data provides a good compliment to the Zillow data – since the NYC DoF dataset is at the transaction level. 

Zillow describes this dataset as being built on top of estimates for over 100mm homes in the US, including new construction homes and/or homes that have not traded on the open market in many years. The data is an index, with the most recent, present-day value of the time series being defined as the “typical home value” for the property universe, and the value of the index going back in time being engineered to reflect “the market’s total appreciation. In other words, the ZHVI appreciation can now be viewed as the theoretical financial return that could be gained from buying all homes in a given subset (by geography and/or home type) in one period and selling them in the next period.”

To exactly match the NYC DoF data and analysis, I restrict my initial focus on the Zillow data to the time period from 2005 to 2019. However, a deeper historic analysis is possible with 178 of the neighborhoods having data going all the way back to January 1996.  Further, the average price for each neighborhood during the full 2005 to 2019 period is used as the average price in the analysis/regression. This convention was used in the NYC DoF dataset, given the sparsity of transactions in some neighborhoods across certain years and, in some cases, across all years. Thus, the convention was carried over to the Zillow data, for uniformity. Finally, the average annual growth in prices was calculated by fitting a linear regression to the full, monthly dataset for each individual neighborhood; and using that monthly increase (slope: best-fit, $/month for 2005 to 2019) to calculate an annual percentage increase vs. the average sales price for the neighborhood during the period. This will be used to build a "momentum" metric for each neighborhood - to be used alongside a measure of "value" in determining which neighborhoods look most compelling from an investment perspective. In the analysis in the main notebook, we will compare the Zillow data to the NYC DoF data. 

### Resources: 
* https://www.zillow.com/research/data/
* Zillow API: https://documenter.getpostman.com/view/9197254/SzRuZCCj?version=latest
* Single Family & Condo/Co-op: https://www.zillow.com/new-york-ny/home-values/

In [1]:
import numpy as np
import pandas as pd
import requests

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
#plt.style.use('seaborn')  
sns.set()

from datetime import datetime
today = datetime.now()
month,day,year = today.month,today.day,today.year

In [2]:
from pathlib import Path
home_path = Path.home() / 'Jupyter' / 'Real_Estate' # / 'Zillow'

In [3]:
# Zillow Home Value Index (ZHVI)
# Source: https://www.zillow.com/research/data/

# Transition to Zillow API: 
# Source: https://documenter.getpostman.com/view/9197254/SzRuZCCj?version=latest

# "ZHVI All Homes (SFR, Condo/Co-op) Time Series, Smoothed, Seasonally Adjusted($)"
# "ZHVI All Homes (SFR, Condo/Co-op) Time Series, Raw, Mid-Tier ($)"
# "ZHVI All Homes- Top Tier Time Series ($)"
# "ZHVI All Homes- Bottom Tier Time Series ($)"
# "ZHVI Single-Family Homes Time Series ($)"

# "ZHVI Condo/Co-op Time Series ($)"
# "ZHVI 1-Bedroom Time Series ($)"
# "ZHVI 2-Bedroom Time Series ($)"
# "ZHVI 3-Bedroom Time Series ($)"
# "ZHVI 4-Bedroom Time Series ($)"
# "ZHVI 5+ Bedroom Time Series ($)"

ZHVI_SFR_Smoothed_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_SFR_Raw_Metro_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_raw_mon.csv' 
ZHVI_SFR_Top_City_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/City_zhvi_uc_sfrcondo_tier_0.67_1.0_sm_sa_mon.csv'
ZHVI_SFR_Bottom_City_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/City_zhvi_uc_sfrcondo_tier_0.0_0.33_sm_sa_mon.csv'
ZHVI_SFR_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_uc_sfr_sm_sa_mon.csv'

ZHVI_Condo_Coop_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_uc_condo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_1Br_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_bdrmcnt_1_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_2Br_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_bdrmcnt_2_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_3Br_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_bdrmcnt_3_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_4Br_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_bdrmcnt_4_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'
ZHVI_5Br_Neighborhood_url = 'https://files.zillowstatic.com/research/public_v2/zhvi/Neighborhood_zhvi_bdrmcnt_5_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv'


In [4]:
resp = requests.get(ZHVI_SFR_Neighborhood_url)  
local_path = home_path / 'ZHVI_SFR_Neighborhood.csv'
with open(local_path, 'wb') as output:
    output.write(resp.content)

In [5]:
zhvi_sfr_neighborhood_df = pd.read_csv(local_path)

In [6]:
zhvi_sfr_neighborhood_df

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2020-05-31,2020-06-30,2020-07-31,2020-08-31,2020-09-30,2020-10-31,2020-11-30,2020-12-31,2021-01-31,2021-02-28
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,158625.0,...,379764.0,382243.0,385078.0,387416.0,390320.0,394910.0,399899.0,404948.0,407098.0,409926.0
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,,...,200818.0,203652.0,206570.0,209856.0,213596.0,217240.0,221544.0,225433.0,229784.0,233085.0
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,152438.0,...,309044.0,309668.0,311213.0,314117.0,317991.0,320998.0,323665.0,326041.0,328666.0,331573.0
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,,...,3409659.0,3426785.0,3427352.0,3450461.0,3466617.0,3494627.0,3509855.0,3530960.0,3536208.0,3558627.0
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,,...,553276.0,556167.0,561551.0,569037.0,576432.0,582789.0,587732.0,591469.0,595466.0,600356.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16092,243401,17008,Fairland,Neighborhood,VA,VA,Roanoke,Roanoke,Roanoke City,,...,108778.0,109632.0,109825.0,109954.0,111093.0,112587.0,114246.0,115394.0,117525.0,119689.0
16093,229478,17008,Royalty Acres,Neighborhood,TN,TN,Clarksville,Clarksville,Montgomery County,99507.0,...,199063.0,201412.0,204483.0,208357.0,210314.0,212784.0,214439.0,216940.0,219022.0,221116.0
16094,117010,17008,Oakmont,Neighborhood,CA,CA,Santa Rosa,Santa Rosa,Sonoma County,240091.0,...,688000.0,683276.0,681140.0,682586.0,687900.0,693340.0,698736.0,701249.0,698771.0,695700.0
16095,122353,17008,Glendale,Neighborhood,DE,DE,Newark,Philadelphia-Camden-Wilmington,New Castle County,98404.0,...,195056.0,196893.0,198588.0,200317.0,202469.0,205278.0,207382.0,209140.0,211030.0,214532.0


In [7]:
# Brooklyn Heights is incorrectly labeled as New York County(Manhattan, should be Kings County(Brooklyn):
zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['RegionName'] == 'Brooklyn Heights']

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2020-05-31,2020-06-30,2020-07-31,2020-08-31,2020-09-30,2020-10-31,2020-11-30,2020-12-31,2021-01-31,2021-02-28
551,403122,574,Brooklyn Heights,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,,...,4357508.0,4352318.0,4356894.0,4344569.0,4341933.0,4323261.0,4335442.0,4359806.0,4412016.0,4490970.0


In [8]:
zhvi_sfr_neighborhood_df.loc[551, 'CountyName'] = 'Kings County'
zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['RegionName'] == 'Brooklyn Heights']

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2020-05-31,2020-06-30,2020-07-31,2020-08-31,2020-09-30,2020-10-31,2020-11-30,2020-12-31,2021-01-31,2021-02-28
551,403122,574,Brooklyn Heights,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,Kings County,,...,4357508.0,4352318.0,4356894.0,4344569.0,4341933.0,4323261.0,4335442.0,4359806.0,4412016.0,4490970.0


In [10]:
zhvi_sfr_neighborhood_df.columns  
# ['RegionID', 'SizeRank', 'RegionName', 'RegionType', 'StateName', 'State', 'City', 'Metro', 'CountyName'
# '1996-01-31' - '2021-02-28']

Index(['RegionID', 'SizeRank', 'RegionName', 'RegionType', 'StateName',
       'State', 'City', 'Metro', 'CountyName', '1996-01-31',
       ...
       '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31', '2020-09-30',
       '2020-10-31', '2020-11-30', '2020-12-31', '2021-01-31', '2021-02-28'],
      dtype='object', length=311)

In [18]:
#len(zhvi_sfr_neighborhood_df['Metro'].unique())  # 265
# zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['Metro'] == 'New York-Newark-Jersey City']  # 424 rows

In [19]:
ny_df = zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['State'] == 'NY'].reset_index(drop=True)

In [20]:
# ny_df['City'].unique()
# Cummuting Areas: ['Croton-on-Hudson', 'Pelham', 'New Rochelle', 'Scarsdale', 'Mount Vernon', 'Yonkers'
# 'Town of Mamaroneck', 'Manhasset', 'Massapequa', 'Great Neck', ]
# Far, but Interesting: ['Town Of Cornwall', 'Hyde Park']

In [21]:
nyc_df = ny_df[ny_df['City'] == 'New York']
nyc_df = nyc_df.reset_index(drop=True)

In [22]:
# Map County Names to Borough Names:
# nyc_df['CountyName'].unique()  
# ['New York County', 'Kings County', 'Queens County', 'Bronx County', 'Richmond County']
borough_list = ['Brooklyn', 'Queens', 'Bronx', 'Manhattan', 'Staten_Island']

map_county_borough_dict = {'New York County': 'Manhattan', 
                           'Kings County': 'Brooklyn', 
                           'Queens County': 'Queens', 
                           'Bronx County': 'Bronx',
                           'Richmond County': 'Staten_Island'}
nyc_df['Borough'] = nyc_df['CountyName'].replace(map_county_borough_dict)


In [23]:
# Map Zillow Neighborhood Names to NYC Neighborhood Names:


In [24]:
# Neighborhoods Names
# https://www1.nyc.gov/site/planning/data-maps/open-data.page

neighborhood_url = 'https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/Neighborhood_Names/FeatureServer/0/query?where=1=1&outFields=*&outSR=4326&f=pgeojson'
resp = requests.get(neighborhood_url)
neighborhood_json = resp.json()

neighborhood_ids_list = []
neighborhood_details_list = []

for neighborhood_dict in neighborhood_json['features']:
    neighborhood_ids_list.append(neighborhood_dict['id']) 
    
    d = {}
    d['ID'] = neighborhood_dict['id']
    # Neighborhood instead of name? 
    d['Name'] = neighborhood_dict['properties']['Name']
    d['Borough'] = neighborhood_dict['properties']['Borough']
    d['Lat'] = neighborhood_dict['geometry']['coordinates'][1]
    d['Long'] = neighborhood_dict['geometry']['coordinates'][0]
    
    neighborhood_details_list.append(d)

neighborhood_df = pd.DataFrame.from_dict(neighborhood_details_list)
neighborhood_df['Borough'] = neighborhood_df['Borough'].replace({'Staten Island': 'Staten_Island'})


In [25]:
neighborhood_df

Unnamed: 0,ID,Name,Borough,Lat,Long
0,1,Wakefield,Bronx,40.894713,-73.847202
1,2,Co-op City,Bronx,40.874302,-73.829941
2,3,Eastchester,Bronx,40.887564,-73.827808
3,4,Fieldston,Bronx,40.895446,-73.905644
4,5,Riverdale,Bronx,40.890843,-73.912587
...,...,...,...,...,...
301,302,Stapleton,Staten_Island,40.626936,-74.077903
302,303,Rosebank,Staten_Island,40.615313,-74.069807
303,304,West Brighton,Staten_Island,40.631887,-74.107183
304,305,Grymes Hill,Staten_Island,40.624193,-74.087250


In [26]:
# Matching Neighborhoods
# set(nyc_df['RegionName'].to_list()).intersection(neighborhood_df['Name'].to_list())
#set(nyc_df.set_index(['Borough', 'RegionName']).index.to_list()).intersection(
#    neighborhood_df.set_index(['Borough', 'Name']).index.to_list())

In [27]:
# In Zillow, but not in NYC
# set(nyc_df['RegionName'].to_list()).difference(neighborhood_df['Name'].to_list())
#set(nyc_df.set_index(['Borough', 'RegionName']).index.to_list()).difference(
#    neighborhood_df.set_index(['Borough', 'Name']).index.to_list())

In [28]:
# In NYC but not in Zillow
# set(neighborhood_df['Name'].to_list()).difference(nyc_df['RegionName'].to_list())
#set(neighborhood_df.set_index(['Borough', 'Name']).index.to_list()).difference(
#    nyc_df.set_index(['Borough', 'RegionName']).index.to_list())

In [29]:
map_zillow_neighborhoods_dict = {'Battery Park': 'Battery Park City',
 'Bronx Park': np.nan,
 'Chelsea-Travis': 'Travis',
 'Clove Lake': np.nan,
 'Columbia Street Waterfront District': np.nan,  # 'Cobble Hill',
 'DUMBO': 'Dumbo',
 'Douglaston-Little Neck': ['Douglaston', 'Little Neck'],
 'Flatiron District': 'Flatiron',
 'Floral park': 'Floral Park',
 'Flushing Meadows Corona Park': np.nan,
 'Fort Wadsworth': np.nan,
 'Garment District': 'Midtown South',
 'Grasmere - Concord': ['Grasmere', 'Concord'],
 'Greenwood': np.nan,  # 'Sunset Park',
 'Harlem': 'Central Harlem',
 'Highbridge': 'High  Bridge',
 'Jamaica': ['Jamaica Center', 'South Jamaica'],
 'John F. Kennedy International Airport': np.nan,
 'Meiers Corners': np.nan,  # 'Castleton Corners',
 'Navy Yard': 'Vinegar Hill',
 'New Utrecht': np.nan,  # 'Bensonhurst',
 'NoHo': 'Noho',
 'Pelham Bay Park': 'Pelham Parkway', 
 'SoHo': 'Soho',
 'South Bronx': np.nan,  # 'Melrose',
 'Throggs Neck': 'Throgs Neck',
 'Tremont': np.nan,  # 'East Tremont',
 'Westchester Heights': 'Westchester Square'}


In [30]:
update_nyc_df = nyc_df.copy()
update_nyc_df['Neighborhood'] = update_nyc_df['RegionName']
update_nyc_df['Neighborhood'] = [map_zillow_neighborhoods_dict.get(key,key) for key in update_nyc_df['Neighborhood']]
update_nyc_df = update_nyc_df.explode('Neighborhood')

update_nyc_df = update_nyc_df[update_nyc_df['Neighborhood'].notna()]
update_nyc_df = update_nyc_df.set_index(['Borough', 'Neighborhood'])
update_nyc_df = update_nyc_df.drop(columns= ['RegionID', 'SizeRank', 'RegionName', 'RegionType', 
                                             'StateName', 'State', 'City', 'Metro', 'CountyName'])

In [31]:
update_nyc_df

Unnamed: 0_level_0,Unnamed: 1_level_0,1996-01-31,1996-02-29,1996-03-31,1996-04-30,1996-05-31,1996-06-30,1996-07-31,1996-08-31,1996-09-30,1996-10-31,...,2020-05-31,2020-06-30,2020-07-31,2020-08-31,2020-09-30,2020-10-31,2020-11-30,2020-12-31,2021-01-31,2021-02-28
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Manhattan,Upper West Side,,,,,,,,,,,...,3409659.0,3426785.0,3427352.0,3450461.0,3466617.0,3494627.0,3509855.0,3530960.0,3536208.0,3558627.0
Manhattan,Upper East Side,,,,,,,,,,,...,6438819.0,6422202.0,6335594.0,6279095.0,6256637.0,6269719.0,6295604.0,6304847.0,6307959.0,6325276.0
Brooklyn,East New York,167953.0,167569.0,167057.0,166573.0,166910.0,167878.0,169047.0,168403.0,168039.0,166944.0,...,576731.0,582072.0,586671.0,592246.0,596740.0,601190.0,605672.0,608902.0,612572.0,615717.0
Manhattan,Washington Heights,,,,,,,,,,,...,1275424.0,1262666.0,1237399.0,1225851.0,1219155.0,1220856.0,1214242.0,1214514.0,1198243.0,1179422.0
Queens,Astoria,206866.0,206581.0,206346.0,206505.0,206674.0,206987.0,206935.0,207546.0,208388.0,209399.0,...,1083437.0,1086431.0,1092357.0,1094523.0,1096391.0,1096364.0,1098097.0,1098079.0,1092429.0,1085762.0
Queens,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Queens,Neponsit,419025.0,421803.0,425363.0,431369.0,435464.0,435273.0,435251.0,435824.0,438124.0,437587.0,...,1292404.0,1293950.0,1299044.0,1301646.0,1302773.0,1307017.0,1308694.0,1311078.0,1308230.0,1304628.0
Staten_Island,Shore Acres,,181374.0,179680.0,177002.0,172998.0,169572.0,169157.0,169113.0,170072.0,170466.0,...,577816.0,579446.0,584100.0,591835.0,595304.0,598876.0,602360.0,606933.0,612260.0,615880.0
Staten_Island,Charleston,222807.0,227862.0,230960.0,236589.0,240553.0,243938.0,246620.0,247642.0,248258.0,245753.0,...,627570.0,627394.0,626461.0,627332.0,627252.0,628920.0,630376.0,633705.0,636222.0,637987.0
Queens,Roxbury,,,,,,,,,,,...,525771.0,515589.0,506930.0,500005.0,494823.0,491362.0,489604.0,489543.0,491188.0,496719.0


In [44]:
print(len(update_nyc_df.columns))

302


In [47]:
# How far back do the neighborhoods go? 

(update_nyc_df.notna().sum(axis=1)).apply(lambda x: round(x,0)).value_counts().sort_index(ascending=True).tail()


265      1
276      1
289      1
301      3
302    178
dtype: int64

In [41]:
# NYC Neighborhood Sales Summary is for 2005 through 2019

update_nyc_df.columns = pd.to_datetime(update_nyc_df.columns)

zillow_2005_2019_df = update_nyc_df.T[update_nyc_df.T.index.year.isin(np.arange(2005,2019+1))]

# Average:
avg_zillow_2005_2019_df = (zillow_2005_2019_df.mean().to_frame(name='Avg_Price_2005_2019')
                           .sort_values('Avg_Price_2005_2019', ascending=False))


In [42]:
# Slope:

def zillow_price_trajectory(series):
    _regression_values = series.dropna().values
    if len(_regression_values) >= 2:
        return np.polyfit(np.arange(len(_regression_values)), _regression_values, 1)[0]
    else:
        return 0

growth_zillow_2005_2019_df = (zillow_2005_2019_df.apply(lambda x: zillow_price_trajectory(x) * 12)
                              .to_frame(name='Annual_Growth_2005_2019'))


In [43]:
growth_zillow_2005_2019_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Annual_Growth_2005_2019
Borough,Neighborhood,Unnamed: 2_level_1
Manhattan,Upper West Side,142382.802988
Manhattan,Upper East Side,73246.291750
Brooklyn,East New York,10962.549721
Manhattan,Washington Heights,20329.776904
Queens,Astoria,37379.828032
Queens,...,...
Queens,Neponsit,9722.528745
Staten_Island,Shore Acres,7909.504071
Staten_Island,Charleston,9455.466156
Queens,Roxbury,-58344.209790


In [44]:
growth_zillow_2005_2019_df = avg_zillow_2005_2019_df.join(growth_zillow_2005_2019_df)
growth_zillow_2005_2019_df['Growth_%_2005_2019'] = (growth_zillow_2005_2019_df['Annual_Growth_2005_2019']
                                                    / growth_zillow_2005_2019_df['Avg_Price_2005_2019'])
growth_zillow_2005_2019_df.sort_values(by='Growth_%_2005_2019', inplace=True, ascending=False)


In [45]:
growth_zillow_2005_2019_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Avg_Price_2005_2019,Annual_Growth_2005_2019,Growth_%_2005_2019
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Queens,Hunters Point,3.332755e+05,29088.598191,0.087281
Manhattan,Manhattanville,5.030461e+05,42159.332200,0.083808
Brooklyn,Crown Heights,8.010362e+05,66981.387734,0.083618
Brooklyn,Greenpoint,1.136574e+06,94784.523436,0.083395
Brooklyn,Bedford Stuyvesant,7.782242e+05,61284.251218,0.078749
Brooklyn,...,...,...,...
Brooklyn,Vinegar Hill,1.509302e+06,-37586.260571,-0.024903
Manhattan,Little Italy,5.957593e+06,-360353.683626,-0.060486
Queens,Breezy Point,6.229787e+05,-48111.455639,-0.077228
Queens,Roxbury,6.491009e+05,-58344.209790,-0.089885


In [50]:
# growth_zillow_2005_2019_df.loc['Brooklyn', 'Brooklyn Heights']

Avg_Price_2005_2019        3.536400e+06
Annual_Growth_2005_2019    1.582122e+05
Growth_%_2005_2019         4.473820e-02
Name: (Brooklyn, Brooklyn Heights), dtype: float64

In [46]:
# Save to CSV: 
growth_zillow_2005_2019_df.to_csv(path_or_buf= home_path / 'Zillow_NYC_SFR_2005_2019.csv')


In [47]:
# Caluculate Growth over Full Timeline Available: 

growth_zillow_all_years_df = (update_nyc_df.T.apply(lambda x: zillow_price_trajectory(x) * 12)
                              .to_frame(name='Annual_Growth_All_Years'))
growth_zillow_all_years_df.sort_values(by='Annual_Growth_All_Years', inplace=True, ascending=False)


In [48]:
growth_zillow_all_years_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Annual_Growth_All_Years
Borough,Neighborhood,Unnamed: 2_level_1
Manhattan,Greenwich Village,168487.998468
Manhattan,West Village,149886.888760
Manhattan,Noho,137760.297516
Brooklyn,Cobble Hill,131026.082855
Manhattan,Upper West Side,126165.385605
...,...,...
Brooklyn,Vinegar Hill,-36377.009885
Queens,Breezy Point,-36991.643697
Queens,Roxbury,-109280.057436
Manhattan,Battery Park City,-132414.252714


In [None]:
# Compare SFR and Condo/Co-op?
# Are these overlaping datasets? 

# Condos/Co-ops Are Different than Single Family 
# See: https://www.zillow.com/new-york-ny/home-values/


In [None]:
# NYC Metro-Area (Surrounding Neighborhoods: Newark, Jersey City, etc. )

#(zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['Metro'] == 'New York-Newark-Jersey City']
# .reset_index(drop=True))


In [21]:
# New Jersey
nj_df = zhvi_sfr_neighborhood_df[zhvi_sfr_neighborhood_df['State'] == 'NJ'].reset_index(drop=True)
nj_df['City'].unique()
# ['Jersey City', 'Newark', 

array(['Jersey City', 'Newark', 'Hamilton Township',
       'Woodbridge Township', 'Paterson', 'Cherry Hill Township',
       'Trenton', 'Plainsboro Township', 'East Orange',
       'Bridgewater Township', 'Berkeley Township', 'Voorhees Township',
       'Sparta Township', 'Woodbine', 'Camden', 'Rockaway Township',
       'North Bergen Township', 'Middletown Township', 'Matawan',
       'Mt Laurel Township', 'South Brunswick Township', 'Princeton',
       'Piscataway', 'Monroe Township', 'Neptune Township', 'Union',
       'Dayton', 'Monmouth Junction', 'Lavallette', 'South Orange',
       'Metuchen', 'Cape May Court House', 'Long Beach Township',
       'Montgomery Township', 'North Brunswick Township',
       'East Brunswick', 'Little Falls'], dtype=object)