## Patrick's "Battle of the Neighborhoods" Coursera Capstone Project Notebook!

### Problem Set:
    Comparing neighborhoods in the cities of Cincinnati and Dayton Ohio
    

### Criteria: 
we are going to look at the number of nearby breweries and coffe shops as well as the average home cost to make a decision about the most fun, but affordable neighborhoods/city to live in.

### Use case: 
This type of analysis would be useful for home buyers looking for the best neighborhood to live in, or for investors looking for the best neighborhood to invest in (or where the market is saturated).

### Data:
The data used in this notebook is a table of city neighborhood data and locations with Zillow real estate average home cost added.
This data will be combined with a venue category search (brewery and coffee shop) to get three data points per neighborhood to complete the cluster analysis on.

Why brewery and Coffee shop? ...clusters of these buisnesses tend to open in trendy upcomming neighborhoods where real estate values are still cheap, but young professionals are congregating.

...

#### Prepare the environment by importing all necessary modules

In [43]:
import pandas as pd
import numpy as np
import json                                # library to handle JSON files
from geopy.geocoders import Nominatim      # convert an address into latitude and longitude values
import requests                            # library to handle requests
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm                 # Matplotlib and associated plotting modules
import matplotlib.colors as colors
from sklearn.cluster import KMeans         # import k-means from clustering stage
import folium
from bs4 import BeautifulSoup              # Import BeautifulSoup package to parse wiki website
import geocoder
import fastkml
from fastkml import  kml
import quandl                              # Setup API for neighborhood housing data
quandl.ApiConfig.api_key = "wYmDtm_wythZw4HbXAKr"

...

## Part 1: Data Wrangling - Gather City / Neighborhood Information

#### Read in the Dayton, OH Neighborhood Data with Latitude, Longitude data as well as real estate sales information from Zillow

In [80]:
filename="Dayton_hoods2.csv"
Dayton_data=pd.read_csv(filename)
print(Dayton_data.shape)
Dayton_data.head(10)

(62, 7)


Unnamed: 0,Latitude,Longitude,Neighborhood,City,State,County,Home Value
0,39.747656,-84.245002,Arlington Heights,Dayton,OH,Montgomery,30823
1,39.709226,-84.063269,Beavercreek,Dayton,OH,Greene,217489
2,39.733461,-84.140875,Belmont,Dayton,OH,Montgomery,87032
3,39.759672,-84.151428,Burkhardt,Dayton,OH,Montgomery,61238
4,39.741796,-84.200474,Carillon,Dayton,OH,Montgomery,31161
5,39.783338,-84.239734,College Hill,Dayton,OH,Montgomery,49168
6,39.77307,-84.239734,Cornell Heights,Dayton,OH,Montgomery,34983
7,39.780248,-84.229197,Dayton View Triangle,Dayton,OH,Montgomery,69062
8,39.797985,-84.198892,DeWeese,Dayton,OH,Montgomery,85244
9,39.760571,-84.194938,Downtown,Dayton,OH,Montgomery,169512


#### Now, Read in the Cincinnati, OH Neighborhood Data with Latitude, Longitude data as real estate sales information from Zillow

In [81]:
filename="Cincinnati_hoods2.csv"
Cincinnati_data=pd.read_csv(filename)
print(Cincinnati_data.shape)
Cincinnati_data.head(10)

(62, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,City,State,County,House Value
0,Avondale,39.144963,-84.497811,Cincinnati,OH,Hamilton,81719
1,Bond Hill,39.177785,-84.477659,Cincinnati,OH,Hamilton,111614
2,California,39.065338,-84.419893,Cincinnati,OH,Hamilton,128577
3,Camp Washington,39.13795,-84.537609,Cincinnati,OH,Hamilton,58310
4,Carthage,39.195869,-84.485014,Cincinnati,OH,Hamilton,68406
5,Clifton Heights,39.125934,-84.520908,Cincinnati,OH,Hamilton,298535
6,College Hill,39.198536,-84.548428,Cincinnati,OH,Hamilton,130138
7,Columbia-Tusculum,39.115193,-84.43614,Cincinnati,OH,Hamilton,325009
8,Corryville,39.136807,-84.503866,Cincinnati,OH,Hamilton,158269
9,CUF,39.125115,-84.525842,Cincinnati,OH,Hamilton,161286


### Now, with the neighborhood data, we can visualize it on a map using folium 

In [136]:
address = 'Dayton, OH'

geolocator = Nominatim(user_agent="OH_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dayton, OH are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dayton, OH are 39.7589478, -84.1916069.


In [137]:
# create map ofDayton, OH using latitude and longitude values
map_Dayton = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(Dayton_data['Latitude'], Dayton_data['Longitude'], Dayton_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Dayton)  
    
map_Dayton

In [134]:
address = 'Cincinnati, OH'

geolocator = Nominatim(user_agent="OH_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cincinnati, OH are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cincinnati, OH are 39.1014537, -84.5124602.


In [135]:
# create map ofCincinnati, OH using latitude and longitude values
map_Cincinnati = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(Cincinnati_data['Latitude'], Cincinnati_data['Longitude'], Cincinnati_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Cincinnati)  
    
map_Cincinnati

## Part 2: Use FourSquare to Gather Venues for the Two Cities

#### Set up FourSquare Client information

In [53]:
CLIENT_ID = 'G4XYV2OWUJSPBPVI00SJXGVWFAP21E3ZNJXHWTSAODQPKIVG' # your Foursquare ID
CLIENT_SECRET = 'J3TXUHL4AZBXEYYRU2OF4OW33BXUR1KKSCMWHLJOKYQUJK2I' # your Foursquare Secret
#VERSION = '20180605' # Foursquare API version
VERSION = '20200101' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: G4XYV2OWUJSPBPVI00SJXGVWFAP21E3ZNJXHWTSAODQPKIVG
CLIENT_SECRET:J3TXUHL4AZBXEYYRU2OF4OW33BXUR1KKSCMWHLJOKYQUJK2I


In [86]:
#Dayton_venues = getNearbyVenues(names=Dayton_data['Neighborhood'],
                                   latitudes=Dayton_data['Latitude'],
                                   longitudes=Dayton_data['Longitude']
                                  )

In [96]:
#Dayton_venues2 = getNearbyVenues2(names=Dayton_data['Neighborhood'],
                                   latitudes=Dayton_data['Latitude'],
                                   longitudes=Dayton_data['Longitude']
                                  )

In [56]:
#Cincinnati_venues = getNearbyVenues(names=Cincinnati_data['Neighborhood'],
                                   latitudes=Cincinnati_data['Latitude'],
                                   longitudes=Cincinnati_data['Longitude']
                                  )

In [57]:
#Cincinnati_venues2 = getNearbyVenues2(names=Cincinnati_data['Neighborhood'],
                                   latitudes=Cincinnati_data['Latitude'],
                                   longitudes=Cincinnati_data['Longitude']
                                  )

In [98]:
print('Dayton Brew Shape:',Dayton_venues.shape)
print('Dayton Coffee Shape:',Dayton_venues.shape)
print('Cincinnati Brew Shape:',Cincinnati_venues.shape)
print('Cincinnati Coffee Shape:',Cincinnati_venues.shape)

Dayton Brew Shape: (69, 7)
Dayton Coffee Shape: (69, 7)
Cincinnati Brew Shape: (207, 7)
Cincinnati Coffee Shape: (207, 7)


In [100]:
print(Dayton_venues.shape)
Dayton_venues.sort_values("Neighborhood", inplace = True)
new=Dayton_venues.groupby('Neighborhood').count()
new.sort_values("Neighborhood", inplace = True)

(69, 7)


### The FourSquare Search returned "Cafes" as well as "Coffee shops". I want to focus on Coffeeshops in particular so I drop the rows with "cafe".   Similarly, Drop "bar" and "Pub" from the Brewery Category.

In [122]:
Dayton_venues2.drop(Dayton_venues2[ Dayton_venues2['Coffeeshop Category'] == 'Café' ].index , inplace=True)
Cincinnati_venues2.drop(Cincinnati_venues2[Cincinnati_venues2['Coffeeshop Category'] == 'Café' ].index , inplace=True)
Cincinnati_venues.drop(Cincinnati_venues[Cincinnati_venues['Brewery Category'] == 'Bar' ].index , inplace=True)
Cincinnati_venues.drop(Cincinnati_venues[Cincinnati_venues['Brewery Category'] == 'Pub' ].index , inplace=True)

 ### Now, group each search by "Neighborhood" and count of Brewery / Coffeeshop

In [102]:
# For Dayton
Dayton_Coffee = Dayton_venues2.groupby('Neighborhood').count()  # Provide a count of Coffeeshops per neighborhood
Dayton_Brew = Dayton_venues.groupby('Neighborhood').count()     # Provide a count of Breweries per neighborhood
# For Cincinnati
Cincinnati_Coffee = Cincinnati_venues2.groupby('Neighborhood').count()  # Provide a count of Coffeeshops per neighborhood
Cincinnati_Brew = Cincinnati_venues.groupby('Neighborhood').count()     # Provide a count of Breweries per neighborhood

### Now merge all the search results back together, first for Dayton, the repeat for Cincinnati

In [103]:
Dayton_Brew2=Dayton_Brew[['Brewery']]
Dayton_Brew2 = Dayton_Brew2.reset_index()

Dayton_Coffee2=Dayton_Coffee[['Coffeeshop']]
Dayton_Coffee2 = Dayton_Coffee2.reset_index()

Dayton_merged = Dayton_Brew2.join(Dayton_Coffee2.set_index('Neighborhood'), on='Neighborhood')
Dayton_merged =Dayton_merged.fillna(0)
Dayton_merged =Dayton_merged.astype({'Coffeeshop': 'int32'})
print(Dayton_merged.shape)
Dayton_merged.head()

(31, 3)


Unnamed: 0,Neighborhood,Brewery,Coffeeshop
0,Belmont,3,4
1,Burkhardt,1,1
2,Carillon,1,1
3,College Hill,1,0
4,Cornell Heights,1,0


In [115]:
#pd.set_option('display.max_rows', None)
Dayton_all = Dayton_data.join(Dayton_merged.set_index('Neighborhood'), on='Neighborhood')
Dayton_all =Dayton_all.fillna('0')
print(Dayton_all.shape)
Dayton_all.head()


(62, 9)


Unnamed: 0,Latitude,Longitude,Neighborhood,City,State,County,Home Value,Brewery,Coffeeshop
0,39.747656,-84.245002,Arlington Heights,Dayton,OH,Montgomery,30823,0,0
1,39.709226,-84.063269,Beavercreek,Dayton,OH,Greene,217489,0,0
2,39.733461,-84.140875,Belmont,Dayton,OH,Montgomery,87032,3,4
3,39.759672,-84.151428,Burkhardt,Dayton,OH,Montgomery,61238,1,1
4,39.741796,-84.200474,Carillon,Dayton,OH,Montgomery,31161,1,1


### Add in a "normalized" column for home value for the clustering step

In [131]:
Dayton_all['HV_Norm']=Dayton_all['Home Value']/100000
Dayton_all.head()

Unnamed: 0,Latitude,Longitude,Neighborhood,City,State,County,Home Value,Brewery,Coffeeshop,HV_Norm
0,39.747656,-84.245002,Arlington Heights,Dayton,OH,Montgomery,30823,0,0,0.30823
1,39.709226,-84.063269,Beavercreek,Dayton,OH,Greene,217489,0,0,2.17489
2,39.733461,-84.140875,Belmont,Dayton,OH,Montgomery,87032,3,4,0.87032
3,39.759672,-84.151428,Burkhardt,Dayton,OH,Montgomery,61238,1,1,0.61238
4,39.741796,-84.200474,Carillon,Dayton,OH,Montgomery,31161,1,1,0.31161


### Now repeat steps for Cincinnati data

In [104]:
Cincinnati_Brew2=Cincinnati_Brew[['Brewery']]
Cincinnati_Brew2 = Cincinnati_Brew2.reset_index()

Cincinnati_Coffee2=Cincinnati_Coffee[['Coffeeshop']]
Cincinnati_Coffee2 = Cincinnati_Coffee2.reset_index()

Cincinnati_merged = Cincinnati_Brew2.join(Cincinnati_Coffee2.set_index('Neighborhood'), on='Neighborhood')
Cincinnati_merged =Cincinnati_merged.fillna(0)
Cincinnati_merged =Cincinnati_merged.astype({'Coffeeshop': 'int64'})
print(Cincinnati_merged.shape)
Cincinnati_merged.head()

(50, 3)


Unnamed: 0,Neighborhood,Brewery,Coffeeshop
0,Amberley,2,0
1,Bridgetown North,3,0
2,CUF,8,3
3,California,1,0
4,Camp Washington,2,2


In [124]:
pd.set_option('display.max_rows', None)
Cincinnati_all = Cincinnati_data.join(Cincinnati_merged.set_index('Neighborhood'), on='Neighborhood')
Cincinnati_all =Cincinnati_all.fillna('0')
print(Cincinnati_all.shape)
Cincinnati_all.head()

(62, 9)


Unnamed: 0,Neighborhood,Latitude,Longitude,City,State,County,House Value,Brewery,Coffeeshop
0,Avondale,39.144963,-84.497811,Cincinnati,OH,Hamilton,81719,0,0
1,Bond Hill,39.177785,-84.477659,Cincinnati,OH,Hamilton,111614,0,0
2,California,39.065338,-84.419893,Cincinnati,OH,Hamilton,128577,1,0
3,Camp Washington,39.13795,-84.537609,Cincinnati,OH,Hamilton,58310,2,2
4,Carthage,39.195869,-84.485014,Cincinnati,OH,Hamilton,68406,0,0


In [133]:
Cincinnati_all['HV_Norm']=Cincinnati_all['House Value']/100000
Cincinnati_all.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,City,State,County,House Value,Brewery,Coffeeshop,HV_Norm
0,Avondale,39.144963,-84.497811,Cincinnati,OH,Hamilton,81719,0,0,0.81719
1,Bond Hill,39.177785,-84.477659,Cincinnati,OH,Hamilton,111614,0,0,1.11614
2,California,39.065338,-84.419893,Cincinnati,OH,Hamilton,128577,1,0,1.28577
3,Camp Washington,39.13795,-84.537609,Cincinnati,OH,Hamilton,58310,2,2,0.5831
4,Carthage,39.195869,-84.485014,Cincinnati,OH,Hamilton,68406,0,0,0.68406


## Part 2 will begin here!

...

## Definitions Saved at bottom of page to de-clutter the notebook... 

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 1800 # define radius
    venues_list=[]
    query="Brewery"
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}'.format(   
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            query,
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Brewery', 
                  'Brewery Latitude', 
                  'Brewery Longitude', 
                  'Brewery Category']
    
    return(nearby_venues)

In [15]:
def getNearbyVenues2(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 1000 # define radius
    venues_list=[]
    query="Coffee shop"
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}'.format(   
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            query,
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Coffeeshop', 
                  'Coffeeshop Latitude', 
                  'Coffeeshop Longitude', 
                  'Coffeeshop Category']
    
    return(nearby_venues)