# Coursera Capstone Final Notebook
## Introduction
The purpose of this notebook is to determine the types of popular locations in two regions and identify what type of location (restaurant or attraction) that is most likely to open in the target region.

## Data
Data will be collected using Foursquare to find the top restaurants and businesses in the target city and use k-means clustering to create groups that we can view on a map and analyze. The same technique will be conducted for a second city of similar population in the same regional area (no more than 2 states away) to find any similarities. Based on the data collected, we can compare the clusters of each city to determine what restaurant and/or business would be recommended to open in the target city. The recommendation would be based on most popular restaurants and businesses from each city.

In [1]:
# The code was removed by Watson Studio for sharing.

## Gather Hampton Roads Data
### Hampton Roads Neighborhood Data Websites
- https://www.neighborhoodscout.com/va/yorktown
- https://www.neighborhoodscout.com/va/newport-news
- https://www.neighborhoodscout.com/va/poquoson
- https://www.neighborhoodscout.com/va/hampton

In [2]:
# Install folium, if required
!pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 8.1MB/s eta 0:00:011
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.1


In [3]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib, folium, lxml, requests, json, urllib.request
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Packages loaded')

Packages loaded


In [4]:
yt_url = 'https://www.neighborhoodscout.com/va/yorktown'
nn_url = 'https://www.neighborhoodscout.com/va/newport-news'
po_url = 'https://www.neighborhoodscout.com/va/poquoson'
ha_url = 'https://www.neighborhoodscout.com/va/hampton'
hr=[]

In [5]:
# Set up beautiful soup parser
yt_soup = BeautifulSoup(requests.get(yt_url).text,'html.parser')
nn_soup = BeautifulSoup(requests.get(nn_url).text,'html.parser')
po_soup = BeautifulSoup(requests.get(po_url).text,'html.parser')
ha_soup = BeautifulSoup(requests.get(ha_url).text,'html.parser')

In [6]:
for i in yt_soup.find_all('a')[71:89]:
    if ' / ' in i.text:
        nei = i.text.split(' / ')
        for j in nei:
            hr.append((j,'Yorktown'))
    else:
        hr.append((i.text,'Yorktown'))
#hr

In [7]:
for i in nn_soup.find_all('a')[71:89]:
    if ' / ' in i.text:
        nei = i.text.split(' / ')
        for j in nei:
            hr.append((j,'Newport News'))
    else:
        hr.append((i.text,'Newport News'))

In [8]:
for i in po_soup.find_all('a')[71:74]:
    if ' / ' in i.text:
        nei = i.text.split(' / ')
        for j in nei:
            hr.append((j,'Poquoson'))
    else:
        hr.append((i.text,'Poquoson'))

In [9]:
for i in ha_soup.find_all('a')[71:91]:
    if ' / ' in i.text:
        nei = i.text.split(' / ')
        for j in nei:
            hr.append((j,'Hampton'))
    else:
        hr.append((i.text,'Hampton'))

In [10]:
#hr

In [11]:
# Convert list into DataFrame and drop duplicates
hr_df=pd.DataFrame(hr,columns=['Neighborhood','City'])
hr_df.drop_duplicates(subset='Neighborhood',keep='first',inplace=True)
hr_df.reset_index(drop=True,inplace=True)
hr_df['Latitude'] = 'N/A'
hr_df['Longitude'] = 'N/A'
hr_df.shape

In [12]:
geolocator = Nominatim(user_agent="foursquare_agent")
for i in hr_df.index:
    address = hr_df.loc[i, 'Neighborhood'] + ', ' + hr_df.loc[i, 'City'] + ', VA'
    try:
        location = geolocator.geocode(address)
        hr_df.loc[i, 'Latitude'] = location.latitude
        hr_df.loc[i, 'Longitude'] = location.longitude
    except AttributeError:
        hr_df.drop(i, inplace=True)      # Remove rows that Nominatum can't find lat/long (removes 12 rows)
        continue
hr_df.shape

In [13]:
# create map of Hampton Roads using latitude and longitude values
hr_map = folium.Map(location=[37.1181, -76.4614], zoom_start=11)

# add markers to map
for lat, lng, city, neighborhood in zip(hr_df['Latitude'], hr_df['Longitude'], hr_df['City'], hr_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(hr_map)  
    
hr_map

## Gather Philadelphia Data
### Philadelphia Neighborhood Website:
- https://en.wikipedia.org/wiki/List_of_Philadelphia_neighborhoods

In [14]:
ph_url = 'https://en.wikipedia.org/wiki/List_of_Philadelphia_neighborhoods'
ph_soup = BeautifulSoup(requests.get(ph_url).text,'html.parser')
ph=[]

In [15]:
for i in ph_soup.find_all('li')[14:654]:
    nei=[]
    if ':' in i.text:
        nei = i.text.split(':')
        ph.append((nei[0],'Philadelphia'))
    elif ' -' in i.text:
        nei = i.text.split(' -')
        ph.append((nei[0],'Philadelphia'))
    elif '^ ' in i.text:
        ph.pop()
        break
    else:
        ph.append((i.text,'Philadelphia'))
#ph

In [16]:
# Convert list into DataFrame and drop duplicates
ph_df=pd.DataFrame(ph,columns=['Neighborhood','City'])
ph_df.drop_duplicates(subset='Neighborhood',keep='first',inplace=True)
ph_df.reset_index(drop=True,inplace=True)
ph_df['Latitude'] = 'N/A'
ph_df['Longitude'] = 'N/A'
ph_df.shape

(165, 4)

In [17]:
for i in ph_df.index:
    address = ph_df.loc[i, 'Neighborhood'] + ', ' + ph_df.loc[i, 'City'] + ', PA'
    try:
        location = geolocator.geocode(address)
        ph_df.loc[i, 'Latitude'] = location.latitude
        ph_df.loc[i, 'Longitude'] = location.longitude
    except AttributeError:
        ph_df.drop(i, inplace=True)      # Remove rows that Nominatum can't find lat/long (drops 23 rows)
        continue
ph_df.shape

(142, 4)

In [18]:
# create map of Hampton Roads using latitude and longitude values
ph_map = folium.Map(location=[39.9944, -75.1678], zoom_start=11)

# add markers to map
for lat, lng, city, neighborhood in zip(ph_df['Latitude'], ph_df['Longitude'], ph_df['City'], ph_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(ph_map)  
    
ph_map

## Gather Venue Data for Hampton Roads and Philadelphia

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)   
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            30)   
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [20]:
# Hampton Roads is more spread out and requires a broader search range
hr_venues = getNearbyVenues(names=hr_df['Neighborhood'],
                                   latitudes=hr_df['Latitude'],
                                   longitudes=hr_df['Longitude'],radius=5000
                                  )

In [21]:
print(hr_venues.shape)
hr_venues.head()

(1840, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,George Washington Memorial Hwy,37.23245,-76.513601,Yorktown Victory Monument,37.232834,-76.50544,Monument / Landmark
1,George Washington Memorial Hwy,37.23245,-76.513601,Historic Yorktown,37.237603,-76.508856,Historic Site
2,George Washington Memorial Hwy,37.23245,-76.513601,Yorktown Beach,37.237357,-76.506987,Beach
3,George Washington Memorial Hwy,37.23245,-76.513601,Yorktown Battlefield,37.230559,-76.503125,National Park
4,George Washington Memorial Hwy,37.23245,-76.513601,American Revolution Museum at Yorktown,37.239153,-76.518638,History Museum


In [22]:
# Philadelphia is more dense and can use a smaller radius for searches
ph_venues = getNearbyVenues(names=ph_df['Neighborhood'],
                                   latitudes=ph_df['Latitude'],
                                   longitudes=ph_df['Longitude'],radius=500
                                  )

In [23]:
print(ph_venues.shape)
ph_venues.head()

(2383, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Callowhill,39.957441,-75.14404,Radicchio Cafe,39.956637,-75.146095,Italian Restaurant
1,Callowhill,39.957441,-75.14404,Painted Bride Art Center,39.955569,-75.143901,Performing Arts Venue
2,Callowhill,39.957441,-75.14404,Pierre's Costumes,39.954381,-75.144361,Costume Shop
3,Callowhill,39.957441,-75.14404,Stripp’d Cold Pressed Juice,39.955763,-75.144186,Juice Bar
4,Callowhill,39.957441,-75.14404,Torch-Wood Market,39.95597,-75.144386,Food & Drink Shop


## Analyze Data for Hampton Roads

In [24]:
# Hampton Roads one hot encoding
hr_onehot = pd.get_dummies(hr_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
hr_onehot['Neighborhood'] = hr_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [hr_onehot.columns[-1]] + list(hr_onehot.columns[:-1])
hr_onehot = hr_onehot[fixed_columns]
hr_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Basketball Stadium,Beach,...,Theater,Thrift / Vintage Store,Tourist Information Center,Trail,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint
0,George Washington Memorial Hwy,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,George Washington Memorial Hwy,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,George Washington Memorial Hwy,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,George Washington Memorial Hwy,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,George Washington Memorial Hwy,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
hr_onehot.shape

(1840, 126)

In [26]:
hr_grouped = hr_onehot.groupby('Neighborhood').mean().reset_index()
hr_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Basketball Stadium,Beach,...,Theater,Thrift / Vintage Store,Tourist Information Center,Trail,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint
0,36th St,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.033333,0.000000,0.000000,...,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,48th St,0.066667,0.000000,0.000000,0.000000,0.000000,0.000000,0.033333,0.000000,0.000000,...,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,0.033333,0.000000,0.000000,0.000000
2,Acree Acres,0.033333,0.000000,0.033333,0.000000,0.000000,0.033333,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Battle Park,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.066667,...,0.000000,0.000000,0.033333,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Big Bethel,0.033333,0.000000,0.000000,0.033333,0.000000,0.033333,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Big Bethel Rd,0.033333,0.000000,0.000000,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.033333,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Bloxoms Corner,0.000000,0.000000,0.000000,0.038462,0.000000,0.038462,0.000000,0.000000,0.115385,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,Buckroe Beach,0.033333,0.000000,0.000000,0.033333,0.033333,0.066667,0.000000,0.000000,0.133333,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Cherry Acres,0.100000,0.000000,0.000000,0.066667,0.033333,0.033333,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,City Center,0.000000,0.000000,0.000000,0.033333,0.000000,0.066667,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [27]:
hr_grouped.shape

(62, 126)

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
hr_venues_sorted = pd.DataFrame(columns=columns)
hr_venues_sorted['Neighborhood'] = hr_grouped['Neighborhood']
for ind in np.arange(hr_grouped.shape[0]):
    hr_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hr_grouped.iloc[ind, :], num_top_venues)
hr_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,36th St,Convenience Store,Seafood Restaurant,Fried Chicken Joint,Discount Store,Park,American Restaurant,Concert Hall,Sandwich Place,Restaurant,History Museum
1,48th St,American Restaurant,Fried Chicken Joint,Convenience Store,Park,Sandwich Place,Pizza Place,Grocery Store,Korean Restaurant,Donut Shop,Burger Joint
2,Acree Acres,Pizza Place,Donut Shop,Coffee Shop,Mexican Restaurant,American Restaurant,Sandwich Place,Historic Site,Convenience Store,Cosmetics Shop,Dessert Shop
3,Battle Park,Historic Site,History Museum,Sandwich Place,National Park,Seafood Restaurant,Beach,American Restaurant,Steakhouse,Convenience Store,Plaza
4,Big Bethel,Coffee Shop,Thai Restaurant,Movie Theater,Brewery,Gym,Plaza,Sandwich Place,Hotel,Ice Cream Shop,Japanese Restaurant


## Analyze Data for Philadelphia

In [30]:
# Philadelphia one hot encoding
ph_onehot = pd.get_dummies(ph_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
ph_onehot['Neighborhood'] = ph_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [ph_onehot.columns[-1]] + list(ph_onehot.columns[:-1])
ph_onehot = ph_onehot[fixed_columns]
ph_onehot.head()

Unnamed: 0,Neighborhood,Advertising Agency,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Callowhill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Callowhill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Callowhill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Callowhill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Callowhill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
ph_onehot.shape

(2383, 281)

In [32]:
ph_grouped = ph_onehot.groupby('Neighborhood').mean().reset_index()
ph_grouped

Unnamed: 0,Neighborhood,Advertising Agency,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Academy Gardens,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Allegheny West,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000
2,Andorra,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Angora,0.0,0.000000,0.0,0.076923,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Ashton-Woodenbridge,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Bella Vista,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.033333,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Belmont District,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.033333,...,0.000000,0.033333,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,Belmont Village,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.033333,...,0.000000,0.033333,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Brewerytown,0.0,0.000000,0.0,0.058824,0.0,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,Bridesburg,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,...,0.111111,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [33]:
ph_grouped.shape

(142, 281)

In [34]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
ph_venues_sorted = pd.DataFrame(columns=columns)
ph_venues_sorted['Neighborhood'] = ph_grouped['Neighborhood']
for ind in np.arange(ph_grouped.shape[0]):
    ph_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ph_grouped.iloc[ind, :], num_top_venues)
ph_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Academy Gardens,Donut Shop,Garden,Farm,Zoo Exhibit,Farmers Market,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Fast Food Restaurant
1,Allegheny West,Intersection,Fast Food Restaurant,Sandwich Place,Grocery Store,Gym / Fitness Center,Shopping Plaza,Dry Cleaner,Shoe Store,Donut Shop,Liquor Store
2,Andorra,Tennis Court,Playground,Zoo Exhibit,Dry Cleaner,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Farm,Farmers Market
3,Angora,Park,Chinese Restaurant,Discount Store,Breakfast Spot,Light Rail Station,American Restaurant,Grocery Store,Intersection,Road,Supermarket
4,Ashton-Woodenbridge,Gym,Garden,Farmers Market,Dutch Restaurant,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Farm,Fast Food Restaurant


# Cluster Hampton Roads Neighborhoods

In [35]:
# set number of clusters
kclusters = 5
hr_grouped_clustering = hr_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hr_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 2, 3, 2, 2, 1, 1, 1, 2], dtype=int32)

In [36]:
# add clustering labels
hr_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
hr_merged = hr_df
# merge hr_grouped with hr_df to add latitude/longitude for each neighborhood
hr_merged = hr_merged.join(hr_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
hr_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,George Washington Memorial Hwy,Yorktown,37.2325,-76.5136,3,Historic Site,History Museum,Seafood Restaurant,National Park,Beach,Sandwich Place,American Restaurant,Ice Cream Shop,Italian Restaurant,Mexican Restaurant
3,Dare,Yorktown,36.0387,-75.6817,1,Seafood Restaurant,Beach,Italian Restaurant,Coffee Shop,New American Restaurant,Burger Joint,Restaurant,City,Department Store,Donut Shop
9,Tabb,Yorktown,37.1242,-76.4349,1,American Restaurant,Pizza Place,Italian Restaurant,Brewery,Spa,Hotel,Japanese Restaurant,Fast Food Restaurant,Mexican Restaurant,Dessert Shop
10,Acree Acres,Yorktown,37.1815,-76.4784,2,Pizza Place,Donut Shop,Coffee Shop,Mexican Restaurant,American Restaurant,Sandwich Place,Historic Site,Convenience Store,Cosmetics Shop,Dessert Shop
12,Town Center,Yorktown,37.2379,-76.5081,3,Historic Site,History Museum,Sandwich Place,Seafood Restaurant,National Park,Beach,Discount Store,Gym,Plaza,Pizza Place


In [37]:
# create Hampton Roads cluster map
hr_clusters = folium.Map(location=[37.1181, -76.4614], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hr_merged['Latitude'], hr_merged['Longitude'], hr_merged['Neighborhood'], hr_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(hr_clusters)     
hr_clusters

## Examine Hampton Roads Clusters

### Cluster 1

In [38]:
hr_merged.loc[hr_merged['Cluster Labels'] == 0, hr_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
34,Madison Ave,Newport News,Convenience Store,Discount Store,Seafood Restaurant,Fried Chicken Joint,Supplement Shop
40,Orcutt Ave,Newport News,Convenience Store,Fried Chicken Joint,Discount Store,Restaurant,Sandwich Place
41,36th St,Newport News,Convenience Store,Seafood Restaurant,Fried Chicken Joint,Discount Store,Park
42,Poquoson Shores,Poquoson,Pizza Place,Convenience Store,Discount Store,Seafood Restaurant,American Restaurant
43,York Haven Anchorage,Poquoson,Convenience Store,Fast Food Restaurant,Pizza Place,Seafood Restaurant,Discount Store
44,Little Florida Rd,Poquoson,Convenience Store,Fast Food Restaurant,Pizza Place,American Restaurant,Supermarket
45,Poquoson Ave,Poquoson,Convenience Store,Fast Food Restaurant,Pizza Place,Supermarket,Discount Store
54,Fox Hill,Hampton,Convenience Store,Pizza Place,Discount Store,Park,Beach


### Cluster 2

In [39]:
hr_merged.loc[hr_merged['Cluster Labels'] == 1, hr_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Dare,Yorktown,Seafood Restaurant,Beach,Italian Restaurant,Coffee Shop,New American Restaurant
9,Tabb,Yorktown,American Restaurant,Pizza Place,Italian Restaurant,Brewery,Spa
49,Merrimac Shores,Hampton,American Restaurant,Brewery,BBQ Joint,Science Museum,Drugstore
50,Raleigh Terrace,Hampton,American Restaurant,BBQ Joint,Brewery,Outdoors & Recreation,Supermarket
51,East Hampton,Hampton,American Restaurant,Brewery,Pizza Place,Convenience Store,Italian Restaurant
52,Hiland Park,Hampton,American Restaurant,Brewery,Italian Restaurant,Convenience Store,Fast Food Restaurant
55,Bloxoms Corner,Hampton,Beach,Pizza Place,Fishing Spot,Drugstore,Italian Restaurant
60,Cherry Acres,Hampton,American Restaurant,BBQ Joint,Convenience Store,Italian Restaurant,Brewery
64,Pine Grove Terrace,Hampton,American Restaurant,Thai Restaurant,Convenience Store,Fast Food Restaurant,Brewery
65,Hampton U,Hampton,Convenience Store,Pizza Place,Beach,Bar,Drugstore


### Cluster 3

In [40]:
hr_merged.loc[hr_merged['Cluster Labels'] == 2, hr_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Acree Acres,Yorktown,Pizza Place,Donut Shop,Coffee Shop,Mexican Restaurant,American Restaurant
13,Grafton,Yorktown,Pizza Place,Donut Shop,Coffee Shop,Mexican Restaurant,American Restaurant
15,Hidenwood,Newport News,Italian Restaurant,Coffee Shop,Indian Restaurant,Pizza Place,Brewery
17,Deer Park,Newport News,Pizza Place,Coffee Shop,Seafood Restaurant,Brewery,Indian Restaurant
25,Hilton Village,Newport News,History Museum,Convenience Store,Sandwich Place,Comedy Club,Donut Shop
27,Palmerton Dr,Newport News,Italian Restaurant,Fast Food Restaurant,Pizza Place,Wings Joint,Discount Store
28,Shields Rd,Newport News,Italian Restaurant,Discount Store,Wings Joint,Donut Shop,Sandwich Place
29,City Center,Newport News,Coffee Shop,Pizza Place,Bar,Sushi Restaurant,Brewery
30,Mercury Blvd,Newport News,Convenience Store,Sandwich Place,American Restaurant,Discount Store,Coffee Shop
31,Woodbridge Dr,Newport News,Discount Store,Pizza Place,Caribbean Restaurant,Chinese Restaurant,Rental Service


### Cluster 4

In [41]:
hr_merged.loc[hr_merged['Cluster Labels'] == 3, hr_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,George Washington Memorial Hwy,Yorktown,Historic Site,History Museum,Seafood Restaurant,National Park,Beach
12,Town Center,Yorktown,Historic Site,History Museum,Sandwich Place,Seafood Restaurant,National Park
14,Battle Park,Yorktown,Historic Site,History Museum,Sandwich Place,National Park,Seafood Restaurant


### Cluster 5

In [42]:
hr_merged.loc[hr_merged['Cluster Labels'] == 4, hr_merged.columns[[0] + [1]  + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
16,Oyster Point,Newport News,Brewery,Hotel,Coffee Shop,Pizza Place,Grocery Store
19,Jefferson Ave,Newport News,Pizza Place,Grocery Store,Coffee Shop,Fast Food Restaurant,Burger Joint
20,Shore Park,Newport News,Italian Restaurant,Pizza Place,Discount Store,Grocery Store,Coffee Shop
21,Old Orchard,Newport News,Pizza Place,Grocery Store,Coffee Shop,Italian Restaurant,Korean Restaurant
22,Deep Creek,Newport News,Pizza Place,Grocery Store,Fast Food Restaurant,Seafood Restaurant,Coffee Shop
23,Windbrook Cir,Newport News,Pizza Place,Grocery Store,Fast Food Restaurant,Burger Joint,Laser Tag
24,Oak Springs Ct,Newport News,Pizza Place,Grocery Store,Fast Food Restaurant,Bakery,Burger Joint
26,Menchville,Newport News,Pizza Place,Grocery Store,Fast Food Restaurant,Coffee Shop,Gym / Fitness Center
33,Old Oyster Point Rd,Newport News,American Restaurant,Plaza,Fast Food Restaurant,Grocery Store,Sushi Restaurant


# Cluster Philadelphia Neighborhoods

In [43]:
# set number of clusters
kclusters = 5
ph_grouped_clustering = ph_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ph_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 1, 2, 0, 0, 0, 0, 0, 2], dtype=int32)

In [44]:
# add clustering labels
ph_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
ph_merged = ph_df
# merge ph_grouped with ph_df to add latitude/longitude for each neighborhood
ph_merged = ph_merged.join(ph_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
ph_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Callowhill,Philadelphia,39.9574,-75.144,0,Coffee Shop,Sushi Restaurant,Art Gallery,Performing Arts Venue,Italian Restaurant,Diner,Spa,Costume Shop,Furniture / Home Store,Boutique
2,Chinatown,Philadelphia,39.9534,-75.1546,0,Chinese Restaurant,Bubble Tea Shop,Bakery,Shanghai Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Ramen Restaurant,Noodle House,Creperie
3,Elfreth's Alley,Philadelphia,39.9527,-75.142,0,Art Gallery,Boutique,Italian Restaurant,Coffee Shop,Gym,Plaza,Spa,Café,Candy Store,Sushi Restaurant
5,Logan Square,Philadelphia,39.9581,-75.1706,0,Hotel,Coffee Shop,Science Museum,Plaza,Sushi Restaurant,Steakhouse,Sandwich Place,Gastropub,Seafood Restaurant,New American Restaurant
6,Naval Square,Philadelphia,39.944,-75.1844,0,Playground,Coffee Shop,Hot Dog Joint,Frozen Yogurt Shop,Thai Restaurant,Liquor Store,Beer Store,Supermarket,Market,Food Truck


In [45]:
# create Hampton Roads cluster map
ph_clusters = folium.Map(location=[39.9944, -75.1678], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ph_merged['Latitude'], ph_merged['Longitude'], ph_merged['Neighborhood'], ph_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(ph_clusters)     
ph_clusters

## Examine Philadelphia Clusters

### Cluster 1

In [46]:
ph_merged.loc[ph_merged['Cluster Labels'] == 0, ph_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Callowhill,Philadelphia,Coffee Shop,Sushi Restaurant,Art Gallery,Performing Arts Venue,Italian Restaurant
2,Chinatown,Philadelphia,Chinese Restaurant,Bubble Tea Shop,Bakery,Shanghai Restaurant,Department Store
3,Elfreth's Alley,Philadelphia,Art Gallery,Boutique,Italian Restaurant,Coffee Shop,Gym
5,Logan Square,Philadelphia,Hotel,Coffee Shop,Science Museum,Plaza,Sushi Restaurant
6,Naval Square,Philadelphia,Playground,Coffee Shop,Hot Dog Joint,Frozen Yogurt Shop,Thai Restaurant
8,Market East,Philadelphia,Chinese Restaurant,Pizza Place,Convenience Store,Shopping Mall,Dry Cleaner
12,Rittenhouse Square,Philadelphia,Coffee Shop,American Restaurant,Bookstore,Café,Vegetarian / Vegan Restaurant
13,Fitler Square,Philadelphia,Breakfast Spot,Coffee Shop,Trail,Café,Park
15,Society Hill,Philadelphia,Italian Restaurant,Café,Pizza Place,History Museum,Sandwich Place
16,South Street,Philadelphia,Coffee Shop,Pizza Place,Italian Restaurant,Breakfast Spot,Bookstore


### Cluster 2

In [47]:
ph_merged.loc[ph_merged['Cluster Labels'] == 1, ph_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
50,Clearview,Philadelphia,Playground,Pharmacy,Intersection,Beach,Bar
112,Andorra,Philadelphia,Tennis Court,Playground,Zoo Exhibit,Dry Cleaner,Eastern European Restaurant
135,Fox Chase,Philadelphia,Brewery,Playground,Bus Station,Sandwich Place,Salon / Barbershop


### Cluster 3

In [48]:
ph_merged.loc[ph_merged['Cluster Labels'] == 2, ph_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
19,Central South Philadelphia,Philadelphia,Insurance Office,Intersection,Restaurant,Cosmetics Shop,Pizza Place
23,Franklin Delano Roosevelt Park,Philadelphia,Park,Golf Course,History Museum,Lake,Intersection
26,Grays Ferry,Philadelphia,Drugstore,Fried Chicken Joint,Chinese Restaurant,Pizza Place,Discount Store
44,Wilson Park,Philadelphia,Pizza Place,Bar,Art Gallery,Bank,Gym / Fitness Center
48,Angora,Philadelphia,Park,Chinese Restaurant,Discount Store,Breakfast Spot,Light Rail Station
51,Kingsessing,Philadelphia,Park,Chinese Restaurant,American Restaurant,Pizza Place,Bus Station
53,Elmwood Park,Philadelphia,Bus Station,Rental Car Location,Pizza Place,Chinese Restaurant,Park
56,Paschall,Philadelphia,African Restaurant,Intersection,Pharmacy,Zoo Exhibit,Farmers Market
63,Carroll Park,Philadelphia,Breakfast Spot,Pizza Place,Southern / Soul Food Restaurant,Park,Pharmacy
70,Haverford North,Philadelphia,Park,Deli / Bodega,Intersection,Pier,Pizza Place


### Cluster 4

In [49]:
ph_merged.loc[ph_merged['Cluster Labels'] == 3, ph_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
153,Crestmont Farms,Philadelphia,Café,Zoo Exhibit,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant
157,Morrell Park,Philadelphia,Burger Joint,Café,Zoo Exhibit,Fast Food Restaurant,English Restaurant


### Cluster 5

In [50]:
ph_merged.loc[ph_merged['Cluster Labels'] == 4, ph_merged.columns[[0] + [1] + list(range(5, 10))]]

Unnamed: 0,Neighborhood,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
45,West Passyunk,Philadelphia,Art Gallery,Intersection,Coffee Shop,Zoo Exhibit,Fast Food Restaurant


# Inter-city Analysis

#### Hampton Roads Cluster Names:
1. Convienience Store/Fast Food (Medium)
2. Restaurant/Beach/Brewery (Medium)
3. Misc (Large)
4. Historic Site/Museum (Small)
5. Pizza (Medium)

#### Philadelphia Cluster Names:
1. Misc (Large)
2. Playground (Small)
3. Parks/Sports (Large)
4. Cafe/Fast Food (Small)
5. Art Gallery (Small)