# Capstone Project - The Battle of Neighborhoods

## Finding Ideal Neighborhood After Relocating to New City

### Introduction

Relocating to a new city and deciding on a neighborhood to live can be challenging, especially if you don't know the new city too well. You may be wondering if you can find a neighborhood that is similar to your current neighhorhood. And if you are looking to buy a proeprty, you would be interested in knowing what type of housing would be a better investment in the region. Luckily with the advance in technology, we can rely on information from the internet to aid in the this process. 

In this project, we will assume our client will be relocating from **San Francisco, CA to Cincinnati, OH** due to a change in job assignment. The cilent currently lives in **Sunset District** and really enjoys the area -- therefore the client would like to find a similar neighborhood in Cincinnati. The client would also be looking into buying a property and would like to understand if buying a condo or a single family house would be a better option in terms of price growth potential. 

### Data

In order to determine a similar neighborhood in Cincinnati, we will need:
* A list of all neighborhoods in Cincinnati, which can be obtained from https://www.zillow.com/cincinnati-oh/home-values/
* Types of venues in each neighborhood in Cincinnati, which can be obtained using **Foursqure API**
* Types of venues in current neighborhood in Cincinnati, which can be obtained using **Foursquare API**

And to understand what type of housing to buy, we will need:
* housing price trend based on housing type for neighborhoods of interest in Cincinnati, which can be obtained from https://www.zillow.com/research/data/

Once all of the data are downloaded, we can start the data cleaning process. 

In [1]:
import pandas as pd

Importing data files downloaded from the websites:

In [2]:
# The code was removed by Watson Studio for sharing.

Let's first look at the dataframe containing all the neighborhood information for Cincinnati.

In [3]:
neighborhood

Unnamed: 0,Region Name,Region Type,Type,Current,Month Over Month,Quarter Over Quarter,Year Over Year,5 Year Annualized,10 Year Annualized,Current.1,...,Quarter Over Quarter.8,Year Over Year.8,Current.9,Month Over Month.9,Quarter Over Quarter.9,Year Over Year.9,Current.10,Month Over Month.10,Quarter Over Quarter.10,Year Over Year.10
0,Cincinnati,city,All Homes,165600,0.00495155,0.0141641,0.0490533,0.067,0.0226,0.1379,...,-0.027474,0.0776418,---,---,---,---,1338,0.00224719,0.0237184,0.0686901
1,Avondale,neighborhood,All Homes,81500,0.00422596,0.0251934,0.171883,0.0837,---,---,...,---,---,0,---,-0.1116,-0.1116,1123,0.00447227,---,0.0209091
2,Bond Hill,neighborhood,All Homes,112700,-0.00577137,0.00923562,0.0749664,0.1134,0.0077,---,...,---,---,---,---,---,---,1098,0.00919118,-0.00991885,0.0309859
3,CUF,neighborhood,All Homes,162800,0.00782469,0.0215473,0.0518413,0.0941,0.0582,---,...,---,---,---,---,---,---,1565,0.0228758,0.083795,0.0653506
4,California,neighborhood,All Homes,128200,-0.00345999,-0.000405553,0.111007,0.0143,---,---,...,---,---,---,---,---,---,---,---,---,---
5,Camp Washington,neighborhood,All Homes,59400,-0.0187894,-0.0562805,-0.166968,---,---,---,...,---,---,---,---,---,---,---,---,---,---
6,Carthage,neighborhood,All Homes,69600,0.0145988,0.0543863,0.142768,0.1252,---,---,...,---,---,---,---,---,---,---,---,---,---
7,Central Business District,neighborhood,All Homes,344200,-0.0012418,-0.00350274,-0.0140517,0.0179,0.0114,---,...,---,---,---,---,---,---,---,---,---,---
8,Clifton,neighborhood,All Homes,305900,-0.000205929,0.000192932,-0.0189181,0.0275,0.0195,---,...,---,---,---,---,---,---,1934,0.0227393,0.0369973,0.00992167
9,College Hill,neighborhood,All Homes,132300,0.00702006,0.0169778,0.0523636,0.0823,0.0265,0.1163,...,---,---,---,---,---,---,1194,0.00420521,0.0101523,0.0492091


We are only interested in obtaining the list of neighborhoods in Cincinnati from this dataset, so we will remove the first row as it is the overall city data. We will only keep the list of neighborhoods in the dataframe.

In [4]:
neighborhood = neighborhood.iloc[1:53, 0:1]
neighborhood=neighborhood.reset_index(drop=True)
neighborhood

Unnamed: 0,Region Name
0,Avondale
1,Bond Hill
2,CUF
3,California
4,Camp Washington
5,Carthage
6,Central Business District
7,Clifton
8,College Hill
9,Columbia-Tusculum


Based on this data, there are 52 neighborhoods in Cincinnati. Let's draw them out on a map to ensure that everything is correct. We will use **geopy** to first obtain the latitude and longitude of each neighborhoods and then graph it using **folium**.

In [5]:
from geopy.geocoders import Nominatim

In [6]:
for i, region in zip(neighborhood.index, neighborhood['Region Name']):
    toadd = [region, 'Cincinnati, OH']
    address = ', '.join(toadd)
    print(address)
    geolocator = Nominatim(user_agent="cincy_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    neighborhood.loc[i, 'Latitude']=latitude
    neighborhood.loc[i, 'Longitude']=longitude

Avondale, Cincinnati, OH
Bond Hill, Cincinnati, OH
CUF, Cincinnati, OH
California, Cincinnati, OH
Camp Washington, Cincinnati, OH
Carthage, Cincinnati, OH
Central Business District, Cincinnati, OH
Clifton, Cincinnati, OH
College Hill, Cincinnati, OH
Columbia-Tusculum, Cincinnati, OH
Corryville, Cincinnati, OH
East End, Cincinnati, OH
East Price Hill, Cincinnati, OH
East Walnut Hills, Cincinnati, OH
East Westwood, Cincinnati, OH
English Woods, Cincinnati, OH
Evanston, Cincinnati, OH
Forestville, Cincinnati, OH
Fruit Hill, Cincinnati, OH
Hartwell, Cincinnati, OH
Hyde Park, Cincinnati, OH
Kennedy Heights, Cincinnati, OH
Linwood, Cincinnati, OH
Lower Price Hill, Cincinnati, OH
Madisonville, Cincinnati, OH
Mariemont, Cincinnati, OH
Millvale, Cincinnati, OH
Mt. Adams, Cincinnati, OH
Mt. Airy, Cincinnati, OH
Mt. Auburn, Cincinnati, OH
Mt. Lookout, Cincinnati, OH
Mt. Washington, Cincinnati, OH
North Avondale, Cincinnati, OH
North Fairmount, Cincinnati, OH
Northside, Cincinnati, OH
Oakley, Cinc

Let's look at the dataframe to ensure all coordinates were added correctly:

In [7]:
neighborhood

Unnamed: 0,Region Name,Latitude,Longitude
0,Avondale,39.147837,-84.494943
1,Bond Hill,39.174781,-84.467164
2,CUF,39.13035,-84.529351
3,California,39.065206,-84.427239
4,Camp Washington,39.136982,-84.537168
5,Carthage,39.196028,-84.478618
6,Central Business District,39.101663,-84.508125
7,Clifton,39.144952,-84.520226
8,College Hill,39.20228,-84.547167
9,Columbia-Tusculum,39.115435,-84.439646


Everything looks okay. Now we can start graphing it!

In [8]:
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

In [9]:
# obtain starting point of map, which is Cincinnati
address = 'Cincinnati, OH'

geolocator = Nominatim(user_agent="cincy_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cincinnati are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cincinnati are 39.1014537, -84.5124602.


In [10]:
# create map of Cincinnati using latitude and longitude values
map_cincy = folium.Map(location=[latitude, longitude], zoom_start=10)

# add boroughs as markers to map
for lat, lng, region in zip(neighborhood['Latitude'], neighborhood['Longitude'], neighborhood['Region Name']):
    label = region
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cincy)  
    
map_cincy

All neighborhoods are indeed within Cincinnati, so we have confirmed the list of neighborhoods is correct. Before we proceed to obtain venue information from Foursquare, let's add the geospatial information for Sunset District to the end of the dataframe.

In [12]:
geolocator = Nominatim(user_agent="cincy_explorer")
location = geolocator.geocode('Sunset District, San Francisco, CA')
#latitude = location.latitude
#longitude = location.longitude
neighborhood.loc[52, 'Region Name']= 'Sunset District(SF)'
neighborhood.loc[52, 'Latitude']=location.latitude
neighborhood.loc[52, 'Longitude']=location.longitude

In [13]:
neighborhood

Unnamed: 0,Region Name,Latitude,Longitude
0,Avondale,39.147837,-84.494943
1,Bond Hill,39.174781,-84.467164
2,CUF,39.13035,-84.529351
3,California,39.065206,-84.427239
4,Camp Washington,39.136982,-84.537168
5,Carthage,39.196028,-84.478618
6,Central Business District,39.101663,-84.508125
7,Clifton,39.144952,-84.520226
8,College Hill,39.20228,-84.547167
9,Columbia-Tusculum,39.115435,-84.439646


We can see that Sunset District shows up on the last row of the dataframe. We now have all the neighborhoods that we want to investigate in one dataframe and we can use Foursquare to obtain nearby venue information.

In [14]:
# The code was removed by Watson Studio for sharing.

In [15]:
# Create a function for getting nearby venues by inputting neighborhood information
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
import requests

In [18]:
#Obtain information for each neighborhood
LIMIT = 100
radius = 500
cincinnati_venues = getNearbyVenues(names=neighborhood['Region Name'],
                                   latitudes=neighborhood['Latitude'],
                                   longitudes=neighborhood['Longitude'])

Avondale
Bond Hill
CUF
California
Camp Washington
Carthage
Central Business District
Clifton
College Hill
Columbia-Tusculum
Corryville
East End
East Price Hill
East Walnut Hills
East Westwood
English Woods
Evanston
Forestville
Fruit Hill
Hartwell
Hyde Park
Kennedy Heights
Linwood
Lower Price Hill
Madisonville
Mariemont
Millvale
Mt. Adams
Mt. Airy
Mt. Auburn
Mt. Lookout
Mt. Washington
North Avondale
North Fairmount
Northside
Oakley
Over-The-Rhine
Paddock Hills
Pendleton
Pleasant Ridge
Riverside
Roselawn
Sayler Park
South Cumminsville
South Fairmount
Villages at Roll Hill
Walnut Hills
West End
West Price HIll
Westwood
Winton Hills
Winton Place
Sunset District(SF)


Let's examine the dataframe with venue information:

In [19]:
cincinnati_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bond Hill,39.174781,-84.467164,Hook Fish & Chicken,39.177210,-84.466174,Food
1,Bond Hill,39.174781,-84.467164,Richie's Restaurant,39.175322,-84.467506,Fried Chicken Joint
2,Bond Hill,39.174781,-84.467164,Bond Hill Quick-Stop,39.175709,-84.467273,Grocery Store
3,Bond Hill,39.174781,-84.467164,Pappys Construction,39.173477,-84.470863,Construction & Landscaping
4,CUF,39.130350,-84.529351,JBM ELITE TRAINING STUDIO,39.131596,-84.530455,Gym / Fitness Center
5,CUF,39.130350,-84.529351,China Food,39.128753,-84.525591,Chinese Restaurant
6,CUF,39.130350,-84.529351,BoneKrushers,39.129394,-84.534390,Gym / Fitness Center
7,CUF,39.130350,-84.529351,Warner Street Steps,39.126329,-84.530300,Trail
8,California,39.065206,-84.427239,River City Sports Complex,39.061732,-84.424841,Athletics & Sports
9,California,39.065206,-84.427239,California Golf Course,39.064940,-84.421508,Golf Course


In [20]:
cincinnati_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bond Hill,4,4,4,4,4,4
CUF,4,4,4,4,4,4
California,11,11,11,11,11,11
Camp Washington,8,8,8,8,8,8
Carthage,6,6,6,6,6,6
Central Business District,83,83,83,83,83,83
Clifton,30,30,30,30,30,30
College Hill,13,13,13,13,13,13
Columbia-Tusculum,25,25,25,25,25,25
Corryville,25,25,25,25,25,25


Now let's examine the data containing historic housing price from all U.S. neighborhoods, downloaded from Zillow. There are two types of housing that we will be reviewing - condo and single family house.

In [21]:
condo.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2019-10-31,2019-11-30,2019-12-31,2020-01-31,2020-02-29,2020-03-31,2020-04-30,2020-05-31,2020-06-30,2020-07-31
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,,...,121032.0,122871.0,122745.0,122504.0,122166.0,122644.0,123414.0,124485.0,124859.0,124767.0
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,,...,96994.0,97888.0,98980.0,100184.0,101197.0,102312.0,103437.0,104224.0,104589.0,104948.0
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,80872.0,...,137853.0,138318.0,138653.0,139089.0,139257.0,139871.0,140382.0,140581.0,140355.0,140437.0
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,281290.0,...,1216487.0,1217569.0,1224463.0,1223837.0,1222385.0,1216017.0,1221558.0,1223896.0,1227676.0,1240187.0
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,95808.0,...,461995.0,465084.0,468825.0,475578.0,483509.0,490766.0,495757.0,499479.0,503441.0,511399.0


Because this project is only interested in looking at housing opportunities in Cincinnati, we will filter the dataset to Cincinnati only.

In [22]:
condo = condo[condo['City']=='Cincinnati']
condo

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2019-10-31,2019-11-30,2019-12-31,2020-01-31,2020-02-29,2020-03-31,2020-04-30,2020-05-31,2020-06-30,2020-07-31
334,204284,366,Westwood,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,42223.0,...,58149.0,58880.0,59382.0,59924.0,60529.0,61259.0,61719.0,61884.0,62054.0,62187.0
669,275981,759,West Price HIll,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,116533.0,...,96745.0,97643.0,98103.0,98531.0,99072.0,99955.0,101169.0,102430.0,103949.0,105627.0
736,273417,839,CUF,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,53926.0,...,86969.0,87317.0,87108.0,87094.0,87158.0,87537.0,88373.0,89427.0,90502.0,90714.0
869,201414,995,Hyde Park,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,109484.0,...,233164.0,234633.0,235414.0,235768.0,235039.0,235186.0,235966.0,237447.0,238642.0,239997.0
992,200100,1147,College Hill,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,70756.0,...,124276.0,124641.0,124673.0,124722.0,124760.0,125311.0,126516.0,128625.0,130709.0,132097.0
1227,274624,1441,Mt. Washington,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,76404.0,...,114331.0,114233.0,114432.0,114953.0,115455.0,116072.0,117441.0,119720.0,122610.0,124913.0
1246,273561,1462,East Price Hill,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,,...,107068.0,107444.0,107759.0,109105.0,109804.0,110221.0,110366.0,110807.0,111257.0,111447.0
1263,200781,1481,Forestville,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,105119.0,...,150996.0,151143.0,151446.0,151950.0,151975.0,152196.0,152790.0,153788.0,154675.0,155053.0
1507,274620,1785,Mt. Airy,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,,...,138595.0,138108.0,137409.0,136743.0,136708.0,137071.0,137766.0,138906.0,139790.0,140223.0
1553,202657,1839,Oakley,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,194024.0,...,359068.0,362007.0,363237.0,364278.0,365701.0,366603.0,367242.0,368948.0,370919.0,373058.0


We will do the same for the dataset for single family house. First let's check the structure of the dataframe.

In [23]:
sfh.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2019-10-31,2019-11-30,2019-12-31,2020-01-31,2020-02-29,2020-03-31,2020-04-30,2020-05-31,2020-06-30,2020-07-31
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,157646.0,...,373045.0,374109.0,373431.0,373170.0,372750.0,372883.0,374234.0,376067.0,378153.0,380582.0
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,,...,189827.0,190873.0,192300.0,194167.0,196070.0,198417.0,201249.0,204028.0,206833.0,209779.0
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,151146.0,...,296613.0,297588.0,298331.0,299509.0,300467.0,302692.0,304572.0,305765.0,306149.0,307488.0
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,,...,3668229.0,3634977.0,3624311.0,3631332.0,3629466.0,3607803.0,3587931.0,3571308.0,3583153.0,3577278.0
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,,...,515448.0,518341.0,522066.0,526257.0,531599.0,537202.0,541961.0,544744.0,547841.0,553626.0


Then let's filter the dataset:

In [24]:
sfh = sfh[sfh['City']=='Cincinnati']
sfh

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2019-10-31,2019-11-30,2019-12-31,2020-01-31,2020-02-29,2020-03-31,2020-04-30,2020-05-31,2020-06-30,2020-07-31
355,204284,366,Westwood,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,88864.0,...,112246.0,113435.0,114342.0,115188.0,115926.0,117003.0,118480.0,120191.0,122299.0,124126.0
727,275981,759,West Price HIll,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,79293.0,...,94124.0,95550.0,96965.0,98051.0,99017.0,100064.0,101204.0,102232.0,103527.0,105002.0
806,273417,839,CUF,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,68944.0,...,165752.0,166978.0,167933.0,168773.0,169838.0,171172.0,172947.0,174711.0,176869.0,178726.0
956,201414,995,Hyde Park,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,225909.0,...,546189.0,548102.0,548268.0,548694.0,549415.0,551760.0,554385.0,558001.0,561841.0,565625.0
1107,200100,1147,College Hill,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,95216.0,...,134879.0,135940.0,136644.0,137483.0,138348.0,139472.0,141007.0,142611.0,144017.0,145063.0
1395,274624,1441,Mt. Washington,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,109272.0,...,182783.0,183981.0,184733.0,185517.0,185949.0,186732.0,187948.0,189649.0,191163.0,192627.0
1416,273561,1462,East Price Hill,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,,...,64822.0,66546.0,68278.0,69762.0,71329.0,72069.0,73137.0,73814.0,75279.0,75416.0
1435,200781,1481,Forestville,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,158701.0,...,281206.0,281998.0,282393.0,282741.0,282824.0,283515.0,284441.0,285805.0,287126.0,288598.0
1732,274620,1785,Mt. Airy,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,90499.0,...,143374.0,144552.0,145719.0,146927.0,148153.0,149236.0,150600.0,151919.0,153337.0,154379.0
1785,202657,1839,Oakley,Neighborhood,OH,OH,Cincinnati,Cincinnati,Hamilton County,98739.0,...,282802.0,283939.0,284793.0,285317.0,286240.0,287640.0,289619.0,292017.0,294797.0,297939.0


We now have cleaned up all datasets and are ready for further data analysis.