# Pittsburgh Restaurant Relocator

The goal of this notebook is to provide a list of desireable destinations for a Chinese restaurant looking to move out of downtown Pittsburgh due to COVID-19 shutdowns.

### Import Data

This section imports two different data frames from public sources:
 * The first is https://www.zipdatamaps.com/allegheny-pa-county-zipcodes
      * Contains a list of ZIP Codes, the ZIP Code Name, Population and ZIP Type
      
 * The second is https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/
      * Contains a list of latitudes and longitudes for all US Zip Codes
      * This was uploaded to the IBM environment as an exported CSV file
      
The ZIP Code data is cleaned for the desired information and merged with the latitude and longitude data to create one list of potential relocation destinations.

In [1]:
#read in list of Pittsburgh postal codes from url using pandas

import pandas as pd
df = pd.read_html('https://www.zipdatamaps.com/allegheny-pa-county-zipcodes', header=1)[1]
df

Unnamed: 0,ZIP Code,ZIP Code Name,Population,ZIP Type
0,15003.0,Ambridge,11861.0,Non-Unique
1,15005.0,Baden,9450.0,Non-Unique
2,15007.0,Bakerstown,323.0,Non-Unique
3,15014.0,Brackenridge,3184.0,Non-Unique
4,15015.0,Bradfordwoods,1175.0,Non-Unique
...,...,...,...,...
150,15279.0,Pittsburgh,,Unique
151,15281.0,Pittsburgh,,Unique
152,15283.0,Pittsburgh,,Unique
153,15286.0,Pittsburgh,,Unique


In [2]:
# drop rows where Population is NaN since these are likely PO Boxes or other less useful data
df.dropna(inplace=True)
df

Unnamed: 0,ZIP Code,ZIP Code Name,Population,ZIP Type
0,15003.0,Ambridge,11861.0,Non-Unique
1,15005.0,Baden,9450.0,Non-Unique
2,15007.0,Bakerstown,323.0,Non-Unique
3,15014.0,Brackenridge,3184.0,Non-Unique
4,15015.0,Bradfordwoods,1175.0,Non-Unique
...,...,...,...,...
117,15028.0,Coulters,142.0,PO Box
119,15047.0,Greenock,151.0,PO Box
120,15075.0,Rural Ridge,128.0,PO Box
121,15082.0,Sturgeon,350.0,PO Box


In [3]:
# check df size
df.shape

(114, 4)

In [4]:
# check df types
df.dtypes

ZIP Code         float64
ZIP Code Name     object
Population       float64
ZIP Type          object
dtype: object

In [5]:
# convert floats to ints
df = df.astype({'ZIP Code': 'int', 'Population':'int'})
df.dtypes

ZIP Code          int64
ZIP Code Name    object
Population        int64
ZIP Type         object
dtype: object

In [6]:
# confrim conversion
df

Unnamed: 0,ZIP Code,ZIP Code Name,Population,ZIP Type
0,15003,Ambridge,11861,Non-Unique
1,15005,Baden,9450,Non-Unique
2,15007,Bakerstown,323,Non-Unique
3,15014,Brackenridge,3184,Non-Unique
4,15015,Bradfordwoods,1175,Non-Unique
...,...,...,...,...
117,15028,Coulters,142,PO Box
119,15047,Greenock,151,PO Box
120,15075,Rural Ridge,128,PO Box
121,15082,Sturgeon,350,PO Box


In [7]:
# drop Zip Types that are PO Boxes
index_names = df[df['ZIP Type'] == 'PO Box'].index 
  
df.drop(index_names, inplace = True) 
df.reset_index(drop=True, inplace=True)
df 

Unnamed: 0,ZIP Code,ZIP Code Name,Population,ZIP Type
0,15003,Ambridge,11861,Non-Unique
1,15005,Baden,9450,Non-Unique
2,15007,Bakerstown,323,Non-Unique
3,15014,Brackenridge,3184,Non-Unique
4,15015,Bradfordwoods,1175,Non-Unique
...,...,...,...,...
102,15282,Pittsburgh,1,Non-Unique
103,15332,Finleyville,8148,Non-Unique
104,15642,Irwin,45286,Non-Unique
105,16046,Mars,14396,Non-Unique


In [8]:
# rename ZIP Code Name column to Neighborhood
df.rename(columns={"ZIP Code Name": "Neighborhood"},inplace=True)
df

Unnamed: 0,ZIP Code,Neighborhood,Population,ZIP Type
0,15003,Ambridge,11861,Non-Unique
1,15005,Baden,9450,Non-Unique
2,15007,Bakerstown,323,Non-Unique
3,15014,Brackenridge,3184,Non-Unique
4,15015,Bradfordwoods,1175,Non-Unique
...,...,...,...,...
102,15282,Pittsburgh,1,Non-Unique
103,15332,Finleyville,8148,Non-Unique
104,15642,Irwin,45286,Non-Unique
105,16046,Mars,14396,Non-Unique


In [9]:
# drop ZIP Codes for Pittsburgh since the goal is to move restaurants out of the city, proper
burgh_data = df[~df['Neighborhood'].str.contains("Pittsburgh")].reset_index(drop=True)
burgh_data

Unnamed: 0,ZIP Code,Neighborhood,Population,ZIP Type
0,15003,Ambridge,11861,Non-Unique
1,15005,Baden,9450,Non-Unique
2,15007,Bakerstown,323,Non-Unique
3,15014,Brackenridge,3184,Non-Unique
4,15015,Bradfordwoods,1175,Non-Unique
...,...,...,...,...
59,15148,Wilmerding,2814,Non-Unique
60,15332,Finleyville,8148,Non-Unique
61,15642,Irwin,45286,Non-Unique
62,16046,Mars,14396,Non-Unique


In [10]:
# confirm cleaned up data frame size
burgh_data.shape

(64, 4)

In [11]:
# pull in latitude and longitude for each zip code based on data from public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_26acae5bf4274cda9b2e23248e1e91ad = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='IliOYKiE3PusP2ExZ6fbWr_0kxzalos1W3P14OCo2eij',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_26acae5bf4274cda9b2e23248e1e91ad.get_object(Bucket='pythonbasicsfordatascienceproject-donotdelete-pr-unurqe1bdqwejj',Key='us-zip-code-latitude-and-longitude.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

lat_lon_df = pd.read_csv(body,sep=';')
lat_lon_df.head()


Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,87040,Paguate,NM,35.132004,-107.36564,-7,1,"35.132004,-107.36564"
1,90275,Rancho Palos Verdes,CA,33.758216,-118.36425,-8,1,"33.758216,-118.36425"
2,33780,Pinellas Park,FL,27.891809,-82.724763,-5,1,"27.891809,-82.724763"
3,18451,Paupack,PA,41.404263,-75.23826,-5,1,"41.404263,-75.23826"
4,15102,Bethel Park,PA,40.324535,-80.03864,-5,1,"40.324535,-80.03864"


In [12]:
#add latitude and longitude data to Pittsburgh ZIP code list
burgh_data['Latitude'] = burgh_data['ZIP Code'].map(lat_lon_df.set_index('Zip')['Latitude'])
burgh_data['Longitude'] = burgh_data['ZIP Code'].map(lat_lon_df.set_index('Zip')['Longitude'])
burgh_data

Unnamed: 0,ZIP Code,Neighborhood,Population,ZIP Type,Latitude,Longitude
0,15003,Ambridge,11861,Non-Unique,40.593917,-80.22181
1,15005,Baden,9450,Non-Unique,40.641066,-80.20599
2,15007,Bakerstown,323,Non-Unique,40.652311,-79.93303
3,15014,Brackenridge,3184,Non-Unique,40.608403,-79.74234
4,15015,Bradfordwoods,1175,Non-Unique,40.635147,-80.08369
...,...,...,...,...,...,...
59,15148,Wilmerding,2814,Non-Unique,40.394268,-79.80286
60,15332,Finleyville,8148,Non-Unique,40.250299,-79.99436
61,15642,Irwin,45286,Non-Unique,40.325902,-79.71324
62,16046,Mars,14396,Non-Unique,40.695658,-80.03359


### Create Visualization Map
This section uses folium to plot the Neighborhood locations in Allegheny County.

In [13]:
#import needed functions

import numpy as np

import json

!pip install geocoder==1.5.0
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests

from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!pip install folium==0.5
import folium # map rendering library

print('Libraries imported.')

Collecting geocoder==1.5.0
  Downloading geocoder-1.5.0-py2.py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 9.6 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.5.0 ratelim-0.1.6
Collecting folium==0.5
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 10.1 MB/s eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=3891a114bc43d63a7a836e13ac397be008a84010d4e32107b4fb2f1c405ef411
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully

In [14]:
# Pittsburgh Latitude and Longitude
latitude = 40.440624
longitude = -79.995888

# create map of Pittsburgh using latitude and longitude values
map_pitt = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, zips, neighborhood in zip(burgh_data['Latitude'], burgh_data['Longitude'], burgh_data['ZIP Code'], burgh_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, zips)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_pitt)  
    
map_pitt

### Import Foursquare API Data and Process Results

In [15]:
# @hidden_cell
CLIENT_ID = 'ZTZVVZULFQHKNXGMC20ZT0Q2UOKAVHYBZVOHD43HZ03O35R4' # your Foursquare ID
CLIENT_SECRET = '54WF4RLKLROPJZF13J1QB0K5EUH3NYVFY33CAT25PL3DAW5L' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [16]:
# create getNearbyVenues function, relatively large radius due to distances between some ZIP Codes

def getNearbyVenues(names, latitudes, longitudes, radius=3500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
# get Pittsburgh suburb venues

burgh_venues = getNearbyVenues(names=burgh_data['Neighborhood'],
                                 latitudes=burgh_data['Latitude'],
                                 longitudes=burgh_data['Longitude'])

Ambridge
Baden
Bakerstown
Brackenridge
Bradfordwoods
Bridgeville
Buena Vista
Cheswick
Clairton
Clinton
Creighton
Cuddy
Dravosburg
East Mckeesport
Elizabeth
Gibsonia
Glassport
Crescent
Harwick
Indianola
Leetsdale
McDonald
Monongahela
Morgan
Natrona Heights
New Kensington
Oakdale
Russellton
Sutersville
Tarentum
Trafford
Warrendale
West Newton
Wexford
Allison Park
Bethel Park
Braddock
Carnegie
Coraopolis
Duquesne
Glenshaw
Homestead
West Mifflin
Imperial
South Park
McKeesport
McKeesport
McKeesport
McKeesport
McKees Rocks
North Versailles
Oakmont
Pitcairn
Presto
Sewickley
Springdale
Turtle Creek
Monroeville
Verona
Wilmerding
Finleyville
Irwin
Mars
Valencia


In [18]:
# confirm venues shape and show sample of data

print(burgh_venues.shape)
burgh_venues.head()

(3496, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ambridge,40.593917,-80.22181,Pizza House,40.591848,-80.230139,Pizza Place
1,Ambridge,40.593917,-80.22181,FireHouse Lounge,40.595797,-80.230684,Food
2,Ambridge,40.593917,-80.22181,Bridgetown Taphouse,40.591351,-80.230126,Bar
3,Ambridge,40.593917,-80.22181,Talericos,40.604526,-80.226113,Dive Bar
4,Ambridge,40.593917,-80.22181,Franks's Pizzeria,40.588183,-80.224762,Pizza Place


In [19]:
#create neighborhood venue count
burgh_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allison Park,56,56,56,56,56,56
Ambridge,51,51,51,51,51,51
Baden,26,26,26,26,26,26
Bakerstown,45,45,45,45,45,45
Bethel Park,100,100,100,100,100,100
...,...,...,...,...,...,...
Warrendale,100,100,100,100,100,100
West Mifflin,86,86,86,86,86,86
West Newton,14,14,14,14,14,14
Wexford,76,76,76,76,76,76


In [20]:
# one hot encoding
burgh_onehot = pd.get_dummies(burgh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
burgh_onehot['Neighborhood'] = burgh_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [burgh_onehot.columns[-1]] + list(burgh_onehot.columns[:-1])
burgh_onehot = burgh_onehot[fixed_columns]

burgh_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Arcade,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Ambridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ambridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ambridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ambridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ambridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
# confrim onehot shape
burgh_onehot.shape

(3496, 248)

In [22]:
# group neighborhoods

burgh_grouped = burgh_onehot.groupby('Neighborhood').mean().reset_index()
burgh_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Arcade,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Allison Park,0.0,0.00,0.000000,0.000000,0.053571,0.0,0.0,0.017857,0.0,...,0.017857,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,0.0,0.00
1,Ambridge,0.0,0.00,0.000000,0.000000,0.078431,0.0,0.0,0.019608,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.019608,0.00,0.0,0.00
2,Baden,0.0,0.00,0.000000,0.000000,0.076923,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.038462,0.00,0.0,0.00
3,Bakerstown,0.0,0.00,0.000000,0.000000,0.066667,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.022222,0.00,0.0,0.00
4,Bethel Park,0.0,0.01,0.000000,0.000000,0.030000,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.01,0.0,0.01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56,Warrendale,0.0,0.00,0.000000,0.000000,0.060000,0.0,0.0,0.000000,0.0,...,0.010000,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,0.0,0.00
57,West Mifflin,0.0,0.00,0.023256,0.011628,0.023256,0.0,0.0,0.000000,0.0,...,0.011628,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,0.0,0.00
58,West Newton,0.0,0.00,0.000000,0.000000,0.071429,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,0.0,0.00
59,Wexford,0.0,0.00,0.000000,0.000000,0.013158,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,0.0,0.00


In [23]:
# confirmed shape of grouped data
burgh_grouped.shape

(61, 248)

In [24]:
# show top 10 venues in each neighborhood for examination

num_top_venues = 10

for hood in burgh_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = burgh_grouped[burgh_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allison Park----
                 venue  freq
0       Baseball Field  0.07
1  American Restaurant  0.05
2          Pizza Place  0.05
3                 Pool  0.04
4       Sandwich Place  0.04
5                  Bar  0.04
6                 Bank  0.04
7          Coffee Shop  0.04
8        Grocery Store  0.04
9      Automotive Shop  0.04


----Ambridge----
                  venue  freq
0        Ice Cream Shop  0.10
1           Pizza Place  0.08
2   American Restaurant  0.08
3        Sandwich Place  0.04
4                  Bank  0.04
5    Italian Restaurant  0.04
6                  Park  0.04
7  Fast Food Restaurant  0.04
8        Discount Store  0.04
9              Pharmacy  0.04


----Baden----
                  venue  freq
0        Discount Store  0.12
1  Fast Food Restaurant  0.12
2           Pizza Place  0.08
3   American Restaurant  0.08
4        Ice Cream Shop  0.04
5                  Bank  0.04
6              Pharmacy  0.04
7            Restaurant  0.04
8        Sandwich Place  

                 venue  freq
0       Sandwich Place  0.09
1       Discount Store  0.09
2        Bowling Alley  0.06
3         Liquor Store  0.06
4                 Bank  0.06
5          Gas Station  0.06
6             Pharmacy  0.06
7          Coffee Shop  0.06
8          Pizza Place  0.06
9  American Restaurant  0.06


----Mars----
                 venue  freq
0  American Restaurant  0.13
1             Pharmacy  0.06
2        Grocery Store  0.06
3          Coffee Shop  0.06
4   Salon / Barbershop  0.06
5          Pizza Place  0.03
6         Soccer Field  0.03
7   Chinese Restaurant  0.03
8    Convenience Store  0.03
9       Cosmetics Shop  0.03


----McDonald----
              venue  freq
0       Pizza Place  0.13
1    Discount Store  0.13
2       Supermarket  0.07
3              Bank  0.07
4       Flower Shop  0.07
5             Trail  0.07
6    Sandwich Place  0.07
7       Gas Station  0.07
8        Restaurant  0.07
9  Business Service  0.07


----McKees Rocks----
                 ve

In [25]:
# define return_most_common_venues function

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
# run function on Pittsburgh suburb data

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = burgh_grouped['Neighborhood']

for ind in np.arange(burgh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(burgh_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allison Park,Baseball Field,American Restaurant,Pizza Place,Pharmacy,Coffee Shop,Automotive Shop,Pool,Bar,Trail,Golf Course
1,Ambridge,Ice Cream Shop,Pizza Place,American Restaurant,Italian Restaurant,Park,Pharmacy,Bank,Fast Food Restaurant,Sandwich Place,Discount Store
2,Baden,Discount Store,Fast Food Restaurant,American Restaurant,Pizza Place,Italian Restaurant,Bank,Park,Bowling Alley,Chinese Restaurant,Supermarket
3,Bakerstown,American Restaurant,Sandwich Place,Salon / Barbershop,Fast Food Restaurant,Gas Station,Bank,Big Box Store,Gym / Fitness Center,Coffee Shop,Playground
4,Bethel Park,Clothing Store,Coffee Shop,Grocery Store,Ice Cream Shop,Sporting Goods Shop,American Restaurant,Cosmetics Shop,Trail,Lingerie Store,Bookstore


In [27]:
### start Neighborhood clustering

# set number of clusters
kclusters = 12

burgh_grouped_clustering = burgh_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(burgh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 5, 5, 8, 8, 5, 3, 0, 8, 2], dtype=int32)

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

burgh_merged = burgh_data

# merge data to add latitude/longitude for each neighborhood
burgh_merged = burgh_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

burgh_merged.head()

Unnamed: 0,ZIP Code,Neighborhood,Population,ZIP Type,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,15003,Ambridge,11861,Non-Unique,40.593917,-80.22181,5,Ice Cream Shop,Pizza Place,American Restaurant,Italian Restaurant,Park,Pharmacy,Bank,Fast Food Restaurant,Sandwich Place,Discount Store
1,15005,Baden,9450,Non-Unique,40.641066,-80.20599,5,Discount Store,Fast Food Restaurant,American Restaurant,Pizza Place,Italian Restaurant,Bank,Park,Bowling Alley,Chinese Restaurant,Supermarket
2,15007,Bakerstown,323,Non-Unique,40.652311,-79.93303,8,American Restaurant,Sandwich Place,Salon / Barbershop,Fast Food Restaurant,Gas Station,Bank,Big Box Store,Gym / Fitness Center,Coffee Shop,Playground
3,15014,Brackenridge,3184,Non-Unique,40.608403,-79.74234,5,Pizza Place,American Restaurant,Bar,Fast Food Restaurant,Pharmacy,Grocery Store,Discount Store,Sandwich Place,Coffee Shop,Liquor Store
4,15015,Bradfordwoods,1175,Non-Unique,40.635147,-80.08369,0,Italian Restaurant,Coffee Shop,Bakery,Grocery Store,Pizza Place,Salon / Barbershop,Gym / Fitness Center,Convenience Store,Deli / Bodega,Pharmacy


In [29]:
### create map of clustered data

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(burgh_merged['Latitude'], burgh_merged['Longitude'], burgh_merged['Neighborhood'], burgh_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analyze Results

#### Cluster 1

In [30]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 0, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bradfordwoods,-80.08369,0,Italian Restaurant,Coffee Shop,Bakery,Grocery Store,Pizza Place,Salon / Barbershop,Gym / Fitness Center,Convenience Store,Deli / Bodega,Pharmacy
14,Elizabeth,-79.85946,0,Café,Pizza Place,Golf Course,American Restaurant,Discount Store,Dessert Shop,Restaurant,Convenience Store,Soccer Field,Chinese Restaurant
29,Tarentum,-79.78095,0,Pizza Place,Bar,Sandwich Place,Disc Golf,American Restaurant,Park,Fishing Spot,Dive Bar,Video Store,Hotel
33,Wexford,-80.05916,0,Bakery,Italian Restaurant,Grocery Store,Salon / Barbershop,Pizza Place,Ice Cream Shop,Coffee Shop,Gas Station,Auto Dealership,Deli / Bodega
34,Allison Park,-79.96033,0,Baseball Field,American Restaurant,Pizza Place,Pharmacy,Coffee Shop,Automotive Shop,Pool,Bar,Trail,Golf Course
38,Coraopolis,-80.18464,0,Hotel,Italian Restaurant,Golf Course,American Restaurant,Gym,Bank,Fast Food Restaurant,Coffee Shop,Trail,Wings Joint
41,Homestead,-79.90635,0,Bar,Bank,Bakery,Theme Park Ride / Attraction,Sandwich Place,Pizza Place,Brewery,Discount Store,Pharmacy,Liquor Store
44,South Park,-80.00756,0,Park,Bar,American Restaurant,Trail,Ice Cream Shop,Sandwich Place,Pool,Playground,Event Space,Pizza Place
51,Oakmont,-79.83762,0,Pizza Place,Gas Station,Bar,American Restaurant,Sandwich Place,Italian Restaurant,Chinese Restaurant,Grocery Store,Bank,Bakery
54,Sewickley,-80.15554,0,Café,Restaurant,Italian Restaurant,Sushi Restaurant,Coffee Shop,Bank,Park,Comic Shop,Cosmetics Shop,Pizza Place


#### Cluster 2

In [31]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 1, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Clinton,-80.35953,1,Trail,ATM,Grocery Store,Dive Bar,Restaurant,Farm,Farmers Market,Pizza Place,Ice Cream Shop,Gym


#### Cluster 3

In [32]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 2, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Buena Vista,-79.79325,2,Restaurant,Pharmacy,Ice Cream Shop,Bike Trail,Other Repair Shop,Golf Course,Trail,Food & Drink Shop,Post Office,American Restaurant
28,Sutersville,-79.80038,2,Restaurant,Ice Cream Shop,Pharmacy,American Restaurant,Soccer Field,Trail,Financial or Legal Service,Farm,Farmers Market,Fast Food Restaurant


#### Cluster 4

In [33]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 3, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Braddock,-79.86439,3,Theme Park Ride / Attraction,Discount Store,Bank,Pizza Place,Sandwich Place,Pharmacy,Fast Food Restaurant,Gas Station,Grocery Store,Italian Restaurant
39,Duquesne,-79.85095,3,Theme Park Ride / Attraction,Discount Store,Pizza Place,Bank,Grocery Store,Gas Station,Pharmacy,Sandwich Place,Gym,Fast Food Restaurant
42,West Mifflin,-79.89418,3,Theme Park Ride / Attraction,Discount Store,Pizza Place,Sandwich Place,Bar,Bank,Pharmacy,Diner,Gas Station,Convenience Store


#### Cluster 5

In [34]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 4, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Cuddy,-80.1656,4,Rest Area,Pub,Korean Restaurant,Golf Course,Paintball Field,Baseball Field,Deli / Bodega,Pizza Place,Breakfast Spot,Fishing Spot


#### Cluster 6

In [35]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 5, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ambridge,-80.22181,5,Ice Cream Shop,Pizza Place,American Restaurant,Italian Restaurant,Park,Pharmacy,Bank,Fast Food Restaurant,Sandwich Place,Discount Store
1,Baden,-80.20599,5,Discount Store,Fast Food Restaurant,American Restaurant,Pizza Place,Italian Restaurant,Bank,Park,Bowling Alley,Chinese Restaurant,Supermarket
3,Brackenridge,-79.74234,5,Pizza Place,American Restaurant,Bar,Fast Food Restaurant,Pharmacy,Grocery Store,Discount Store,Sandwich Place,Coffee Shop,Liquor Store
8,Clairton,-79.90179,5,Discount Store,Pharmacy,Post Office,Bar,Pizza Place,Convenience Store,American Restaurant,Café,Dessert Shop,Sandwich Place
12,Dravosburg,-79.89008,5,Discount Store,Bank,American Restaurant,Pharmacy,Convenience Store,Pizza Place,Coffee Shop,Airport,Sandwich Place,Bar
13,East Mckeesport,-79.80792,5,Discount Store,Pizza Place,Grocery Store,Sandwich Place,Gym,Bank,Ice Cream Shop,Home Service,Fast Food Restaurant,Restaurant
16,Glassport,-79.88869,5,Discount Store,Pizza Place,Pharmacy,American Restaurant,Sandwich Place,Bar,Trail,Fire Station,Gas Station,Baseball Field
17,Crescent,-80.2261,5,Discount Store,Sandwich Place,American Restaurant,Coffee Shop,Pizza Place,Bank,Pharmacy,Fast Food Restaurant,Gas Station,Bowling Alley
20,Leetsdale,-80.20977,5,Sandwich Place,Discount Store,Gas Station,Bowling Alley,Coffee Shop,Liquor Store,American Restaurant,Pizza Place,Pharmacy,Bank
22,Monongahela,-79.92642,5,Pharmacy,Convenience Store,Discount Store,Ice Cream Shop,Miscellaneous Shop,Supermarket,Sandwich Place,Garden Center,Gas Station,Italian Restaurant


#### Cluster 7

In [36]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 6, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,West Newton,-79.75483,6,Trail,Discount Store,Pizza Place,Sporting Goods Shop,Pharmacy,Bank,American Restaurant,Gym / Fitness Center,Post Office,Food


#### Cluster 8

In [37]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 7, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
60,Finleyville,-79.99436,7,Ice Cream Shop,Golf Course,Breakfast Spot,Discount Store,Pool,Water Park,Bar,Sandwich Place,Construction & Landscaping,Beach Bar


#### Cluster 9

In [38]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 8, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bakerstown,-79.93303,8,American Restaurant,Sandwich Place,Salon / Barbershop,Fast Food Restaurant,Gas Station,Bank,Big Box Store,Gym / Fitness Center,Coffee Shop,Playground
5,Bridgeville,-80.11534,8,Park,Gym,Sandwich Place,American Restaurant,Italian Restaurant,Ice Cream Shop,Mexican Restaurant,Gas Station,Rest Area,Coffee Shop
7,Cheswick,-79.83242,8,Sandwich Place,Clothing Store,Sporting Goods Shop,Shoe Store,Fast Food Restaurant,Hotel,Ice Cream Shop,Italian Restaurant,Pizza Place,Burger Joint
10,Creighton,-79.77947,8,Sandwich Place,Clothing Store,Bar,Pizza Place,Sporting Goods Shop,Coffee Shop,American Restaurant,Shoe Store,Ice Cream Shop,Discount Store
15,Gibsonia,-79.95766,8,Salon / Barbershop,Sandwich Place,American Restaurant,Pizza Place,Gas Station,Mexican Restaurant,Pharmacy,Ice Cream Shop,Mobile Phone Shop,Bank
18,Harwick,-79.8064,8,Sandwich Place,Pizza Place,Clothing Store,Bar,Fast Food Restaurant,American Restaurant,Sporting Goods Shop,Discount Store,Hotel,Ice Cream Shop
19,Indianola,-79.85848,8,Bank,Ice Cream Shop,Business Service,Fast Food Restaurant,Betting Shop,Clothing Store,Sandwich Place,Italian Restaurant,Restaurant,Electronics Store
23,Morgan,-80.14529,8,Ice Cream Shop,Park,American Restaurant,Pub,Rest Area,Coffee Shop,Pizza Place,Burger Joint,Smoothie Shop,Mediterranean Restaurant
26,Oakdale,-80.18692,8,Park,Sandwich Place,Trail,Mexican Restaurant,Pizza Place,Playground,Grocery Store,American Restaurant,Burger Joint,Church
31,Warrendale,-80.09304,8,American Restaurant,Coffee Shop,Sandwich Place,Pizza Place,Grocery Store,Hotel,Sporting Goods Shop,Supplement Shop,Italian Restaurant,Mexican Restaurant


#### Cluster 10

In [39]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 9, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,McDonald,-80.23629,9,Pizza Place,Discount Store,Sandwich Place,Business Service,Bank,Flower Shop,Golf Course,Restaurant,Trail,Supermarket


#### Cluster 11

In [40]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 10, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Valencia,-79.93733,10,Bank,Gas Station,Baseball Field,Jewelry Store,Soccer Field,Music Store,Farm,Farmers Market,Toy / Game Store,Shoe Store


#### Cluster 12

In [41]:
burgh_merged.loc[burgh_merged['Cluster Labels'] == 11, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Russellton,-79.83394,11,Airport,Deli / Bodega,Gas Station,Disc Golf,Discount Store,Pub,Park,Chinese Restaurant,Food,Pizza Place


In [42]:
### Cluster 9 (label = 8) appears to be the best mix of restaurants
best = burgh_merged.loc[burgh_merged['Cluster Labels'] == 8, burgh_merged.columns[[1] + list(range(5, burgh_merged.shape[1]))]]
print('The size before dropping Chinese restaurant competition: ', best.shape)

best = best[~best['1st Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['2nd Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['3rd Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['4th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['5th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['6th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['7th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['8th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['9th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)
best = best[~best['10th Most Common Venue'].str.contains("Chinese")].reset_index(drop=True)

print('The size after dropping Chinese restaurant competition: ', best.shape)

best

The size before dropping Chinese restaurant competition:  (15, 13)
The size after dropping Chinese restaurant competition:  (13, 13)


Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bakerstown,-79.93303,8,American Restaurant,Sandwich Place,Salon / Barbershop,Fast Food Restaurant,Gas Station,Bank,Big Box Store,Gym / Fitness Center,Coffee Shop,Playground
1,Bridgeville,-80.11534,8,Park,Gym,Sandwich Place,American Restaurant,Italian Restaurant,Ice Cream Shop,Mexican Restaurant,Gas Station,Rest Area,Coffee Shop
2,Cheswick,-79.83242,8,Sandwich Place,Clothing Store,Sporting Goods Shop,Shoe Store,Fast Food Restaurant,Hotel,Ice Cream Shop,Italian Restaurant,Pizza Place,Burger Joint
3,Creighton,-79.77947,8,Sandwich Place,Clothing Store,Bar,Pizza Place,Sporting Goods Shop,Coffee Shop,American Restaurant,Shoe Store,Ice Cream Shop,Discount Store
4,Gibsonia,-79.95766,8,Salon / Barbershop,Sandwich Place,American Restaurant,Pizza Place,Gas Station,Mexican Restaurant,Pharmacy,Ice Cream Shop,Mobile Phone Shop,Bank
5,Harwick,-79.8064,8,Sandwich Place,Pizza Place,Clothing Store,Bar,Fast Food Restaurant,American Restaurant,Sporting Goods Shop,Discount Store,Hotel,Ice Cream Shop
6,Indianola,-79.85848,8,Bank,Ice Cream Shop,Business Service,Fast Food Restaurant,Betting Shop,Clothing Store,Sandwich Place,Italian Restaurant,Restaurant,Electronics Store
7,Morgan,-80.14529,8,Ice Cream Shop,Park,American Restaurant,Pub,Rest Area,Coffee Shop,Pizza Place,Burger Joint,Smoothie Shop,Mediterranean Restaurant
8,Oakdale,-80.18692,8,Park,Sandwich Place,Trail,Mexican Restaurant,Pizza Place,Playground,Grocery Store,American Restaurant,Burger Joint,Church
9,Warrendale,-80.09304,8,American Restaurant,Coffee Shop,Sandwich Place,Pizza Place,Grocery Store,Hotel,Sporting Goods Shop,Supplement Shop,Italian Restaurant,Mexican Restaurant


In [43]:
# re-add in populations to that cluster
dest = best[['Neighborhood']].copy()
dest

Unnamed: 0,Neighborhood
0,Bakerstown
1,Bridgeville
2,Cheswick
3,Creighton
4,Gibsonia
5,Harwick
6,Indianola
7,Morgan
8,Oakdale
9,Warrendale


In [44]:
### rank on population

# drops McKeesport from burgh_merged since it's not a unique value.  Not the most pythonic solve but...
burgh_merged_nomkp = burgh_merged[~burgh_merged['Neighborhood'].str.contains("McKeesport")].reset_index(drop=True)
burgh_merged_nomkp

# adds population to destination df
dest['Population'] = dest['Neighborhood'].map(burgh_merged_nomkp.set_index('Neighborhood')['Population'])

# re-rank
dest.sort_values(by='Population', ascending=False, inplace=True)
dest

# list of best suburbs to move restaurant to is ready to go!

Unnamed: 0,Neighborhood,Population
10,Bethel Park,29529
4,Gibsonia,27049
1,Bridgeville,16213
8,Oakdale,9956
2,Cheswick,9029
11,Pitcairn,3294
12,Presto,1163
3,Creighton,1128
5,Harwick,895
6,Indianola,461


In [45]:
# reset index to create final list

dest.reset_index(drop=True, inplace=True)
dest

Unnamed: 0,Neighborhood,Population
0,Bethel Park,29529
1,Gibsonia,27049
2,Bridgeville,16213
3,Oakdale,9956
4,Cheswick,9029
5,Pitcairn,3294
6,Presto,1163
7,Creighton,1128
8,Harwick,895
9,Indianola,461
