<h1><center><em>Applied Data Science Capstone Project</em></center></h1>

## 1. Introduction

This analysis aims to identify similar neighborhoods in major cities across the United States for the purpose of expanding business franchises. The target audience and stakeholders here are franchise owners, real estate investment firms, and/or banks.

For example, if a pet grooming boutique, vegan bakery, vintage comic book store, fast food, etc. franchise is successful in one neighborhood in NYC and would like to expand to another city, it may be more likely to be successful in a neighborhood with similar characteristics as the original NYC neighborhood.

The franchise is interested in finding the most successful location for their expanding business. The real estate investment firm is interested in buying up lucrative real estate. The bank is interested in understanding the risk/reward of making loan agreements.

## 2. Data

This analysis will rely heavily on the Foursquare location data as well as the neighborhood HTML table data from Wikipedia. Data will be scraped and parsed from Wikipedia using the Python package Beautifulsoup and converted into a pandas dataframe. K-means clustering analysis will be applied to the data. The neighborhoods in cities across the United States will be segmented and clustered into similar neighborhoods.

### Import Libraries

In [None]:
import requests # library to handle requests
from bs4 import BeautifulSoup
import pandas as pd

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import json # library to handle JSON files

# import k-means from clustering stage
from sklearn.cluster import KMeans

import sys
!{sys.executable} -m pip install folium
import folium # map rendering library

!{sys.executable} -m pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import numpy as np


#### Neighborhood Lat Long Data

In [3]:
# Toronto
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', na_values=["Not assigned"])
df_toronto = pd.DataFrame(df[0])
df_toronto.dropna(axis=0, how='any', subset=['Borough'], inplace=True)
coordinates = pd.read_csv('http://cocl.us/Geospatial_data')
toronto_data = pd.merge(df_toronto, coordinates, on='Postal Code')
toronto_data.drop(['Borough', 'Postal Code'], axis=1, inplace=True)
toronto_data.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True)

In [4]:
# Los Angeles
df = pd.read_html('http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data', na_values=["Not assigned"])
df_los_angeles = pd.DataFrame(df[2])
df_los_angeles.drop(['Income', 'Schools', 'Diversity', 'Age', 'Homes', 'Vets', 'Asian', 'Black', 'Latino', 'White', 'Population', 'Area'], axis=1, inplace=True)
df_los_angeles = df_los_angeles[['LA_Nbhd','Latitude','Longitude']]
df_los_angeles.rename(columns={'LA_Nbhd': 'Neighborhood'}, inplace=True)

In [5]:
# Miami
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami')
df_miami = pd.DataFrame(df[0])
df_miami.drop(['Demonym','Population2010','Population/Km²','Sub-neighborhoods'], axis=1, inplace=True)
df_miami[['Latitude','Longitude']] = df_miami.Coordinates.str.split("-",expand=True) 
df_miami['Longitude'] = '-' + df_miami['Longitude'].astype(str)
df_miami.drop(['Coordinates'], axis=1, inplace=True)
df_miami = df_miami.dropna()

In [None]:
# San Francisco
response = requests.get("http://www.healthysf.org/bdi/outcomes/zipmap.htm")
soup = BeautifulSoup(response.text, "lxml")
table = soup.find_all("table")
df = pd.read_html(str(table))
df = pd.DataFrame(df[4])
df.columns = df.iloc[0]
df = df.iloc[1:-1, :-1]
sf_data = df
!{sys.executable} -m pip install uszipcode
from uszipcode import SearchEngine
search = SearchEngine(simple_zipcode=True)

latitude = []
longitude = []

for index, row in df.iterrows():
    zipcode = search.by_zipcode(row["Zip Code"]).to_dict()
    latitude.append(zipcode.get("lat"))
    longitude.append(zipcode.get("lng"))

sf_data["Latitude"] = latitude
sf_data["Longitude"] = longitude
sf_data.drop(['Zip Code'], axis=1, inplace=True)
sf_data = sf_data.dropna()

In [None]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

In [8]:
import json # library to handle JSON files
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [9]:
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough','Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
nyc_data = pd.DataFrame(columns=column_names)


In [10]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    nyc_data = nyc_data.append({          'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)


In [11]:
#manhattan_data = nyc_data[neighborhood_name['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data = nyc_data[nyc_data['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data = manhattan_data.dropna()

In [12]:
# all_neighborhoods
neighbs = [toronto_data, df_los_angeles, df_miami, sf_data, manhattan_data]
all_neighbs = pd.concat(neighbs)
all_neighbs[['Neighborhood','Latitude','Longitude']] 
all_neighbs.drop(['Borough'], axis=1, inplace=True)
all_neighbs.drop_duplicates()
all_neighbs = all_neighbs.dropna()
all_neighbs

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Parkwoods,43.7533,-79.3297
1,Victoria Village,43.7259,-79.3156
2,"Regent Park, Harbourfront",43.6543,-79.3606
3,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895
...,...,...,...
35,Turtle Bay,40.752,-73.9677
36,Tudor City,40.7469,-73.9712
37,Stuyvesant Town,40.731,-73.9741
38,Flatiron,40.7397,-73.9909


#### Foursquare Data

In [13]:
CLIENT_ID = 'YPXLLN2FJXN03PEH2MLUWYQVVOPLE2SSAJQ22UCWLT5BHM5Y' # Foursquare ID
CLIENT_SECRET = 'BKOF5FBXRNPIG4GCGJIY5ARXZOSZX3IPS513CS12RC1SATTF' # Foursquare Secret
VERSION = '20180604'
LIMIT = 100
radius = 500

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


In [14]:
# function to get venues from each neighborhood in each city
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

In [None]:
print(toronto_venues.shape)
toronto_venues.head()

In [None]:
los_angeles_venues = getNearbyVenues(names=df_los_angeles['Neighborhood'],
                                   latitudes=df_los_angeles['Latitude'],
                                   longitudes=df_los_angeles['Longitude']
                                  )


In [None]:
miami_venues = getNearbyVenues(names=df_miami['Neighborhood'],
                                   latitudes=df_miami['Latitude'],
                                   longitudes=df_miami['Longitude']
                                  )

In [None]:
san_fran_venues = getNearbyVenues(names=sf_data['Neighborhood'],
                                   latitudes=sf_data['Latitude'],
                                   longitudes=sf_data['Longitude']
                                  )

In [None]:
all_neighbs_venues = getNearbyVenues(names=all_neighbs['Neighborhood'],
                                   latitudes=all_neighbs['Latitude'],
                                   longitudes=all_neighbs['Longitude']
                                  )

#### So now we have a dataframe of venues for each city and one dataframe of venues across all of our cities that we can use for analysis of all neighborhoods across our chosen cities.

## 3. Methodology

### Exploratory Data Analysis

In [16]:
print(all_neighbs_venues.shape)
print(all_neighbs_venues.columns)
all_neighbs_venues.head()

(8700, 7)
Index(['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
       'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'],
      dtype='object')


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7533,-79.3297,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.7533,-79.3297,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.7259,-79.3156,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.7259,-79.3156,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.7259,-79.3156,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [17]:
all_neighbs_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adams_Normandie,9,9,9,9,9,9
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",7,7,7,7,7,7
Allapattah,2,2,2,2,2,2
Arleta,5,5,5,5,5,5
...,...,...,...,...,...,...
Woodland_Hills,54,54,54,54,54,54
Wynwood,76,76,76,76,76,76
York Mills West,2,2,2,2,2,2
"York Mills, Silver Hills",1,1,1,1,1,1


In [18]:
print('There are {} uniques categories.'.format(len(all_neighbs_venues['Venue Category'].unique())))


There are 446 uniques categories.


### Analyze The Neighborhoods

In [59]:
# one hot encoding
all_neighbs_venues_onehot = pd.get_dummies(all_neighbs_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_neighbs_venues_onehot['Neighborhood'] = all_neighbs_venues['Neighborhood'] 

# move neighborhood column to the first column
# get a list of columns
cols = list(all_neighbs_venues_onehot)

# move the column to head of list using index, pop and insert
cols.insert(0, cols.pop(cols.index('Neighborhood')))

# use ix to reorder
all_neighbs_venues_onehot = all_neighbs_venues_onehot.loc[:, cols]

all_neighbs_venues_onehot

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8695,Hudson Yards,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8696,Hudson Yards,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8697,Hudson Yards,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8698,Hudson Yards,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [60]:
all_neighb_venues_grouped = all_neighbs_venues_onehot.groupby('Neighborhood').mean().reset_index()
all_neighb_venues_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Adams_Normandie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
2,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
3,Allapattah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
4,Arleta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
281,Woodland_Hills,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
282,Wynwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.013158,0.0,0.0,0.0
283,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0
284,"York Mills, Silver Hills",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00,0.00,0.000000,0.0,0.0,0.0


In [62]:
#function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [201]:
#create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = all_neighb_venues_grouped['Neighborhood']

for ind in np.arange(all_neighb_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(all_neighb_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adams_Normandie,Sushi Restaurant,Gas Station,Playground,Home Service,Taco Place,Park,Grocery Store,Electronics Store,Empanada Restaurant,English Restaurant
1,Agincourt,Breakfast Spot,Clothing Store,Lounge,Latin American Restaurant,Skating Rink,Yoga Studio,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Escape Room
2,"Alderwood, Long Branch",Pizza Place,Pub,Coffee Shop,Sandwich Place,Pharmacy,Gym,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant
3,Allapattah,Lounge,Department Store,Yoga Studio,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit
4,Arleta,Home Service,Convenience Store,Bakery,Video Store,Historic Site,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant


### Use k-means technique to cluster the neighborhoods into 6 clusters.

In [300]:
# set number of clusters
kclusters = 6

all_neighb_venues_grouped_clustering = all_neighb_venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(all_neighb_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 3, 4, 4, 4, 4, 4, 4, 4], dtype=int32)

#### create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

all_neighbs_merged = all_neighbs

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
all_neighbs_merged = all_neighbs_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

all_neighbs_merged.head() # check the last columns!
all_neighbs_merged.dropna(inplace=True)

In [414]:
import branca

legend_html = '''
{% macro html(this, kwargs) %}
<div style="
    position: fixed; 
    bottom: 10px;
    left: 7px;
    width: 200px;
    height: 140px;
    z-index:9999;
    font-size:8px;
    ">
    <p><a style="color:#eb1a03;">&#x25CF;</a>&emsp;Construction & Landscaping, Yoga, & Farmers Market Neighborhoods</p>
    <p><a style="color:#8d06ee;">&#x25CF;</a>&emsp;Business Services, Electronics, & Empanadas Neighborhoods</p>
    <p><a style="color:#00a3ff;">&#x25CF;</a>&emsp;Parks & Outdoors Neighborhoods</p>
    <p><a style="color:#8effff;">&#x25CF;</a>&emsp;Restaurants & Shopping Neighborhoods</p>
    <p><a style="color:#cffbba;">&#x25CF;</a>&emsp;Varied Neighborhoods</p>
    <p><a style="color:#cb7e35;">&#x25CF;</a>&emsp;Sporting Goods and Yoga Neighborhood</p>
</div>
<div style="
    position: fixed; 
    bottom: 2px;
    left: 2px;
    width: 185px;
    height: 150px; 
    z-index:9998;
    font-size:10px;
    background-color: #cfcfcf;
    opacity: 0.9;
    ">
</div>
{% endmacro %}
'''
legend = branca.element.MacroElement()
legend._template = branca.element.Template(legend_html)

In [422]:
# create map
map_clusters = folium.Map(location=[37.0902, -95.7129], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=3)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

In [205]:
all_neighbs_merged['Cluster Labels'] = all_neighbs_merged['Cluster Labels']

In [206]:
all_neighbs_merged.astype({'Cluster Labels': 'int64'})

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,43.7533,-79.3297,2,Park,Food & Drink Shop,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
1,Victoria Village,43.7259,-79.3156,3,Hockey Arena,French Restaurant,Portuguese Restaurant,Intersection,Pizza Place,Coffee Shop,Ethiopian Restaurant,Eye Doctor,Exhibit,Event Space
2,"Regent Park, Harbourfront",43.6543,-79.3606,4,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Theater,Gym / Fitness Center,Hotel,Bank
3,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648,4,Clothing Store,Vietnamese Restaurant,Boutique,Furniture / Home Store,Gift Shop,Event Space,Coffee Shop,Women's Store,Accessories Store,Cycle Studio
4,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895,4,Coffee Shop,Distribution Center,Park,College Cafeteria,College Auditorium,Gym,Theater,Sandwich Place,Bank,Bar
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35,Turtle Bay,40.752,-73.9677,4,Coffee Shop,Italian Restaurant,Sushi Restaurant,Park,Deli / Bodega,Ramen Restaurant,Japanese Restaurant,Seafood Restaurant,Garden,Karaoke Bar
36,Tudor City,40.7469,-73.9712,4,Café,Mexican Restaurant,Park,Deli / Bodega,Coffee Shop,Asian Restaurant,Diner,Dog Run,Sushi Restaurant,Gym
37,Stuyvesant Town,40.731,-73.9741,4,Park,Gym / Fitness Center,Bistro,Cocktail Bar,Coffee Shop,Harbor / Marina,Heliport,Pet Service,Bar,Farmers Market
38,Flatiron,40.7397,-73.9909,4,Italian Restaurant,New American Restaurant,Japanese Restaurant,Gym / Fitness Center,Sporting Goods Shop,Spa,Mediterranean Restaurant,Gym,American Restaurant,Furniture / Home Store


##### Cluster 0 - Construction & Landscaping, Yoga, & Farmers Market Neighborhoods

In [207]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 0, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,"Rouge Hill, Port Union, Highland Creek",Construction & Landscaping,Bar,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
101,"Old Mill South, King's Mill Park, Sunnylea, Hu...",Construction & Landscaping,Baseball Field,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
35,Granada_Hills,Construction & Landscaping,Yoga Studio,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space


##### Cluster 1 - Business Services, Electronics, & Empanadas Neighborhoods

In [208]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 1, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
79,Shadow_Hills,Business Service,Yoga Studio,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor
86,Sylmar,Liquor Store,Business Service,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
87,Tarzana,Home Service,Business Service,Duty-free Shop,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit


##### Cluster 2 - Parks & Outdoors Neighborhoods

In [209]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 2, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,Park,Food & Drink Shop,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
21,Caledonia-Fairbanks,Park,Women's Store,Pool,Falafel Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room
35,"East Toronto, Broadview North (Old East York)",Convenience Store,Intersection,Park,Yoga Studio,Falafel Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
49,"North Park, Maple Leaf Park, Upwood Park",Park,Construction & Landscaping,Bakery,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant
61,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant
64,Weston,Park,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit
66,York Mills West,Park,Convenience Store,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
68,"Forest Hill North & West, Forest Hill Road Park",Park,Trail,Sushi Restaurant,Jewelry Store,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
85,"Milliken, Agincourt North, Steeles East, L'Amo...",Park,Bakery,Intersection,Playground,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room
91,Rosedale,Park,Trail,Playground,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room


##### Cluster 3 - Restaurants & Shopping Neighborhoods

In [421]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 3, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,Hockey Arena,French Restaurant,Portuguese Restaurant,Intersection,Pizza Place,Coffee Shop,Ethiopian Restaurant,Eye Doctor,Exhibit,Event Space
6,"Malvern, Rouge",Fast Food Restaurant,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit
8,"Parkview Hill, Woodbine Gardens",Pizza Place,Gym / Fitness Center,Bus Line,Pharmacy,Gastropub,Athletics & Sports,Pet Store,Intersection,Bank,Breakfast Spot
10,Glencairn,Pub,Park,Bakery,Japanese Restaurant,Pizza Place,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
17,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Shopping Plaza,Coffee Shop,Beer Store,Liquor Store,Café,Pizza Place,Pharmacy,Convenience Store,Ethiopian Restaurant,Escape Room
50,Humber Summit,Furniture / Home Store,Pizza Place,Yoga Studio,Fast Food Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
63,"Runnymede, The Junction North",Breakfast Spot,Brewery,Convenience Store,Bus Line,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space
70,Westmount,Chinese Restaurant,Intersection,Coffee Shop,Sandwich Place,Pizza Place,Discount Store,Yoga Studio,Fabric Shop,Electronics Store,Empanada Restaurant
72,"Willowdale, Willowdale West",Butcher,Grocery Store,Coffee Shop,Pizza Place,Pharmacy,Yoga Studio,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room
77,"Kingsview Village, St. Phillips, Martin Grove ...",Park,Sandwich Place,Bus Line,Pizza Place,Yoga Studio,Fabric Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant


##### Cluster 4 - Varied Neighborhoods

In [219]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 4, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Regent Park, Harbourfront",Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Theater,Gym / Fitness Center,Hotel,Bank
3,"Lawrence Manor, Lawrence Heights",Clothing Store,Vietnamese Restaurant,Boutique,Furniture / Home Store,Gift Shop,Event Space,Coffee Shop,Women's Store,Accessories Store,Cycle Studio
4,"Queen's Park, Ontario Provincial Government",Coffee Shop,Distribution Center,Park,College Cafeteria,College Auditorium,Gym,Theater,Sandwich Place,Bank,Bar
7,Don Mills,Gym,Coffee Shop,Beer Store,Japanese Restaurant,Dim Sum Restaurant,Sandwich Place,Chinese Restaurant,Caribbean Restaurant,Bike Shop,Asian Restaurant
9,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Hotel,Italian Restaurant,Fast Food Restaurant,Theater
...,...,...,...,...,...,...,...,...,...,...,...
35,Turtle Bay,Coffee Shop,Italian Restaurant,Sushi Restaurant,Park,Deli / Bodega,Ramen Restaurant,Japanese Restaurant,Seafood Restaurant,Garden,Karaoke Bar
36,Tudor City,Café,Mexican Restaurant,Park,Deli / Bodega,Coffee Shop,Asian Restaurant,Diner,Dog Run,Sushi Restaurant,Gym
37,Stuyvesant Town,Park,Gym / Fitness Center,Bistro,Cocktail Bar,Coffee Shop,Harbor / Marina,Heliport,Pet Service,Bar,Farmers Market
38,Flatiron,Italian Restaurant,New American Restaurant,Japanese Restaurant,Gym / Fitness Center,Sporting Goods Shop,Spa,Mediterranean Restaurant,Gym,American Restaurant,Furniture / Home Store


##### Cluster 5 - Sporting Goods and Yoga Neighborhood

In [212]:
all_neighbs_merged.loc[all_neighbs_merged['Cluster Labels'] == 5, all_neighbs_merged.columns[[0] + list(range(4, all_neighbs_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
107,Windsor_Square,Sporting Goods Shop,Yoga Studio,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space


#### Let's Check out Toronto

In [423]:
# create map
map_clusters = folium.Map(location=[43.6532, -79.3832], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=10)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

#### Let's Check out LA

In [424]:
# create map
map_clusters = folium.Map(location=[34.0522, -118.2437], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=10)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

#### Let's Check out Miami

In [425]:
# create map
map_clusters = folium.Map(location=[25.7617, -80.1918], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=11)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

#### Let's Check out San Francisco

In [426]:
# create map
map_clusters = folium.Map(location=[37.7749, -122.4194], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=12)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

#### Let's Check out Manhattan

In [427]:
# create map
map_clusters = folium.Map(location=[40.7831, -73.9712], attr = 'http://{s}.yourtiles.com/{z}/{x}/{y}.png', zoom_start=11)
 
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
    
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_neighbs_merged['Latitude'], all_neighbs_merged['Longitude'], all_neighbs_merged['Neighborhood'], all_neighbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        tiles= 'Stamen',
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters)
map_clusters.get_root().add_child(legend)

map_clusters

## 4. Results

The analysis shows us that of the 294 neighborhoods inculded in the analysis, the vast majority, 233, were grouped into cluster 4 which we have deemed to be "Varied Neighborhoods" spans all cities in the analysis. 

33 neighborhoods are in Cluster 3 - Restaurants & Shopping Neighborhoods, mostly in Toronto and LA.

21 neighborhoods are in Cluster 2 - Parks & Outdoors Neighborhoods, mostly in Toronto and LA.

Three neighborhoods were grouped into cluster 0 which we have characterised as "Construction & Landscaping, Yoga, & Farmers Market Neighborhoods, one in LA and 2 in Toronto. 

Three neighborhoods belong to Cluster 1 - Business Services, Electronics, & Empanadas Neighborhoods and they are all in the North of LA.

Cluster 5 - Sporting Goods and Yoga Neighborhood consists of a single neighborhood in central LA.

## 5. Discussion

I beleive the analysis was skewed by uneven cluster size and possible outliers. However, some distinctions to be made are that Cluster 4 is the cluter of diverse neighborhoods with a variety of venues, and so it may make sense that Manhattan is made up of entirely this cluster. The central city of the cities analyzed is also primarily cluster 4.

Toronto and LA are the cities with the most Restaurants and Shopping and Parks and Outdoors. Let's go!

Moving forward, I would try this analysis again using the DBSCAN method which has advantages over k-means such as the clusters formed are arbitrary in shape and may not have same feature size.

## 6. Conclusion

In conculsion, while this may not be the most solid data analysis out there, I certainly learned a lot about the different Python libraries used in data science and about some of the statistics behind the code.