# Identify the similarities and dissimilarities between neighborhoods in Toronto and New York City.
###  Where are the main retail centers? Which neighborhoods would be the best locations?



         Data will be retrieved from Foursquare.  This data will identify the most common venues 
         in both cities.  Data for these cities' neighborhoods and their most common venues, lat and long
         A cluster analysis will be performed to help identify the major retail neighborhoods. 
         The clusters will be inspected to see what the top most common venues are and may suggest
         a neighborhood's that favor retail and restaurants.  
         And the clusters that do not appear to be good candidates for retail/restaurants will be
         examined for other business possibilities.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 4>

1. <a href="#item1">Download datasets for Toronto and New York and merge then into one dataset.</a>

2. <a href="#item2">Explore Neighborhoods in New York and Toronto</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters:  Look for simularities  between the two cities</a>    
</font>
</div>

<h1>1) Get the data, and create dataframes for Toronto and New york.  Merge the two into one 'combined dataset'</h1>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

New York dataset.

In [2]:
!wget -q -O 'newyork_data.html' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Let's take a quick look at the data.

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
# newyork_data is a Dictionary data type

In [4]:
#use the Dictionary 'featues to create a list '
NY_neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [5]:
NY_neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the New York data (a list) into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe 
NY_neighborhoods = pd.DataFrame(columns=column_names)
NY_neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [7]:

# remember NY_Neighborhoods_data

for data in NY_neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']  
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = data['geometry']['coordinates'][1]
    neighborhood_lon = data['geometry']['coordinates'][0]
    
    NY_neighborhoods = NY_neighborhoods.append({'Borough': borough,
                                                'Neighborhood': neighborhood_name,
                                                'Latitude': neighborhood_lat,
                                                'Longitude': neighborhood_lon}, ignore_index=True)
NY_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [8]:
print(NY_neighborhoods.shape)

(306, 4)


## Next, let's create a similar dataframe for Toronto
## Eliminate all rows that have 'Not assigned' value for 'Borough'

In [9]:
df_Toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df_Toronto = pd.DataFrame(df_Toronto[0])
for column in df_Toronto.columns:
    print(column)

Postcode
Borough
Neighbourhood


In [10]:
print(df_Toronto.shape)
df_Toronto_filtered = df_Toronto[df_Toronto['Borough'] != 'Not assigned']
print(df_Toronto_filtered.shape)

(288, 3)
(211, 3)


In [11]:
print(NY_neighborhoods.shape)

(306, 4)


# Use numpy 'where' to replace 'Neighbourhood' values of 'Not assigned' with the 'Borough' value. 

In [12]:
df_Toronto_filtered['Neighbourhood'] = np.where(df_Toronto_filtered['Neighbourhood'] == 'Not assigned',
                                         df_Toronto_filtered['Borough'],
                                         df_Toronto_filtered['Neighbourhood'])
df_Toronto_final=df_Toronto_filtered

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [13]:
print('df_Toronto_final',type(df_Toronto_final),df_Toronto_final.shape)

df_Toronto_final <class 'pandas.core.frame.DataFrame'> (211, 3)


## Read geographical coordinates of each postal code

In [14]:
geo_postal_code = pd.read_csv('http://cocl.us/Geospatial_data')
df_toronto_postalCodes = pd.DataFrame(geo_postal_code)
df_toronto_postalCodes.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


##   Merge the two Toronto dataframes such that the 'lat' and 'long and 'neighborhood' data are combined 
## into a new dataframe 'toronto'

In [15]:
toronto = pd.merge(df_Toronto_final,df_toronto_postalCodes,left_on= 'Postcode', right_on='Postal Code')
toronto.head()
print('size toronto {} '.format(toronto.shape))

size toronto (211, 6) 


In [16]:
toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,M5A,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,M6A,43.718518,-79.464763


In [17]:
print('size toronto {} '.format(toronto.shape))

size toronto (211, 6) 


In [18]:
toronto.drop(['Postcode','Postal Code'], 1,inplace = True)

In [19]:
toronto.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)

## Next, add a column called 'City' to both the New York and Toronto dataframes

In [20]:
toronto['City'] = "Totonto"
NY_neighborhoods['City'] = 'New York City'

## Next let's merge the Toronto and New York dataframe into one master dataframe

In [21]:
df_master = pd.DataFrame(NY_neighborhoods)

In [22]:
df_combined = pd.DataFrame(df_master.append(toronto,ignore_index=True))

In [23]:
df_combined.shape

(517, 5)

## The Toronto and NY dataframes have been combined to create a new DataFrame 'all_data'

In [24]:

all_data = df_combined
all_data.head()


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,City
0,Bronx,Wakefield,40.894705,-73.847201,New York City
1,Bronx,Co-op City,40.874294,-73.829939,New York City
2,Bronx,Eastchester,40.887556,-73.827806,New York City
3,Bronx,Fieldston,40.895437,-73.905643,New York City
4,Bronx,Riverdale,40.890834,-73.912585,New York City


In [25]:
CLIENT_ID = 'HPNNZXX1CG5QXNX20LUFZMXS2S30YSY4RTL1JLRGFWFHXKKE' # your Foursquare ID
CLIENT_SECRET = 'J53PPH0E5AS4YWAE5ANDJUV1WCI3N0PJ5FLPJJ03XMEPIDD3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HPNNZXX1CG5QXNX20LUFZMXS2S30YSY4RTL1JLRGFWFHXKKE
CLIENT_SECRET:J53PPH0E5AS4YWAE5ANDJUV1WCI3N0PJ5FLPJJ03XMEPIDD3


##  Get venues.

In [26]:
# a function for retrieving venues 

LIMIT = 500
radius = 300

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],         
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Venue Category']
    
    return(nearby_venues)

<h1>Get venues for the Toronto/New York dataframe</h1>

In [27]:
#
all_venues = getNearbyVenues(names=all_data['Neighborhood'],
                             latitudes=all_data['Latitude'],
                             longitudes=all_data['Longitude'],
                             )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [28]:
print(type(all_venues))
all_venues.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant
4,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station


In [29]:
all_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,100,100,100,100,100,100
Agincourt,5,5,5,5,5,5
Agincourt North,3,3,3,3,3,3
Albion Gardens,9,9,9,9,9,9
Alderwood,9,9,9,9,9,9
Allerton,36,36,36,36,36,36
Annadale,11,11,11,11,11,11
Arden Heights,5,5,5,5,5,5
Arlington,5,5,5,5,5,5
Arrochar,19,19,19,19,19,19


In [30]:
print('There are {} uniques categories.'.format(len(all_venues['Venue Category'].unique())))

There are 466 uniques categories.


In [31]:
# one hot encoding
all_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_onehot['Neighborhood'] = all_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [all_onehot.columns[-1]] + list(all_onehot.columns[:-1])
all_onehot = all_onehot[fixed_columns]
print(all_onehot.shape)
all_onehot.head()

(14803, 466)


Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,Airport Tram,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Auditorium,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Rec Center,College Stadium,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Nightlife Spot,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Plane,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Poutine Place,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Shop,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swim School,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [32]:
all_grouped = all_onehot.groupby('Neighborhood').mean().reset_index()
print(all_grouped.shape)
all_grouped.head()

(502, 466)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,Airport Tram,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Auditorium,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Rec Center,College Stadium,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Nightlife Spot,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Plane,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Poutine Place,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Shop,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swim School,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
all_grouped.shape

(502, 466)

## Let's look at some some of the top venues for some of the neighborhoods

In [34]:
num_top_venues = 5
cnt = 0
print('len all_grouped',all_grouped.shape)

#print out first 3 neighborhoods
for hood in all_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = all_grouped[all_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    cnt += 1
    if cnt == 3:
        break

len all_grouped (502, 466)
----Adelaide----
             venue  freq
0      Coffee Shop  0.06
1             Café  0.05
2       Steakhouse  0.04
3              Bar  0.04
4  Thai Restaurant  0.04


----Agincourt----
                venue  freq
0              Lounge   0.2
1      Clothing Store   0.2
2      Sandwich Place   0.2
3  Chinese Restaurant   0.2
4      Breakfast Spot   0.2


----Agincourt North----
                   venue  freq
0                   Park  0.67
1             Playground  0.33
2            Yoga Studio  0.00
3  Outdoors & Recreation  0.00
4               Pet Café  0.00




In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<h1>Top 10 venues for each neighborhood</h1>

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
all_venues_sorted = pd.DataFrame(columns=columns)
all_venues_sorted['Neighborhood'] = all_grouped['Neighborhood']

for ind in np.arange(all_grouped.shape[0]):
    all_venues_sorted.iloc[ind, 1:] = return_most_common_venues(all_grouped.iloc[ind, :], num_top_venues)

all_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,American Restaurant,Bakery,Burger Joint,Hotel,Cosmetics Shop
1,Agincourt,Lounge,Breakfast Spot,Chinese Restaurant,Clothing Store,Sandwich Place,Women's Store,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
2,Agincourt North,Park,Playground,Women's Store,Farm,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
3,Albion Gardens,Grocery Store,Coffee Shop,Beer Store,Fried Chicken Joint,Pizza Place,Pharmacy,Fast Food Restaurant,Sandwich Place,Factory,Eye Doctor
4,Alderwood,Pizza Place,Coffee Shop,Pool,Gym,Skating Rink,Sandwich Place,Pub,Pharmacy,Event Space,Event Service


## 4. Cluster Neighborhoods  - 5 clusters  - Kmeans

In [37]:
# set number of clusters
kclusters = 5

# drop Neighborhood from fequency dataframe so it can used KMeans
all_grouped_clustering = all_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(all_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([4, 4, 0, 1, 1, 1, 1, 1, 4, 4], dtype=int32)

In [38]:
# add clustering labels to rows of DataFrame
all_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

all_merged = all_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
all_merged = all_merged.join(all_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='right')

all_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,New York City,4,Dessert Shop,Gas Station,Food Truck,Pharmacy,Donut Shop,Sandwich Place,Laundromat,Caribbean Restaurant,Food,Ice Cream Shop
1,Bronx,Co-op City,40.874294,-73.829939,New York City,1,Baseball Field,Bus Station,Fast Food Restaurant,Pharmacy,Park,Grocery Store,Ice Cream Shop,Discount Store,Liquor Store,Restaurant
2,Bronx,Eastchester,40.887556,-73.827806,New York City,4,Caribbean Restaurant,Bus Station,Metro Station,Bus Stop,Diner,Juice Bar,Bowling Alley,Seafood Restaurant,Donut Shop,Pizza Place
3,Bronx,Fieldston,40.895437,-73.905643,New York City,4,Plaza,River,Bus Station,Playground,Women's Store,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
4,Bronx,Riverdale,40.890834,-73.912585,New York City,4,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Bus Station,Yoga Studio,Trail,Bookstore,Café


In [39]:
#this is cluster 3
all_merged.loc[all_merged['Cluster Labels'] == 0, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Clason Point,0,Park,South American Restaurant,Boat or Ferry,Bus Stop,Grocery Store,Scenic Lookout,Pool,Event Space,Eye Doctor,Exhibit
192,Somerville,0,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
203,Todt Hill,0,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
256,Randall Manor,0,Park,Bus Stop,Bagel Shop,Pizza Place,Women's Store,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service
303,Bayswater,0,Park,Tennis Court,Playground,Women's Store,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service
306,Parkwoods,0,Fast Food Restaurant,Food & Drink Shop,Park,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
344,Caledonia-Fairbanks,0,Park,Women's Store,Pharmacy,Market,Fast Food Restaurant,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
366,East Toronto,0,Park,Convenience Store,Coffee Shop,Farm,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
376,CFB Toronto,0,Airport,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
377,Downsview East,0,Airport,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space


Finally, let's visualize the resulting clusters

In [40]:
# create map
# I selected 'Binghampton, NY', New York as the center on the map.  
# It is approxiamatelt mid-way between NYC and Toronto 
address = 'Binghampton, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of ''Binghampton, NY'', New York  are {}, {}.'.format(latitude, longitude))
map_clusters = folium.Map(location=[41.0, -74], zoom_start=6)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(all_merged['Latitude'], all_merged['Longitude'], all_merged['Neighborhood'], all_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Binghampton, NY, New York  are 42.1147984, -75.8540822.


<a id='item5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [41]:
cluster_1 = all_merged.loc[all_merged['Cluster Labels'] == 0, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]
print(len(cluster_1))
cluster_1

29


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Clason Point,0,Park,South American Restaurant,Boat or Ferry,Bus Stop,Grocery Store,Scenic Lookout,Pool,Event Space,Eye Doctor,Exhibit
192,Somerville,0,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
203,Todt Hill,0,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
256,Randall Manor,0,Park,Bus Stop,Bagel Shop,Pizza Place,Women's Store,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service
303,Bayswater,0,Park,Tennis Court,Playground,Women's Store,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service
306,Parkwoods,0,Fast Food Restaurant,Food & Drink Shop,Park,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
344,Caledonia-Fairbanks,0,Park,Women's Store,Pharmacy,Market,Fast Food Restaurant,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
366,East Toronto,0,Park,Convenience Store,Coffee Shop,Farm,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
376,CFB Toronto,0,Airport,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space
377,Downsview East,0,Airport,Park,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space


#### Cluster 2

In [42]:
cluster_2 = all_merged.loc[all_merged['Cluster Labels'] == 1, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]
cluster_2


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Co-op City,1,Baseball Field,Bus Station,Fast Food Restaurant,Pharmacy,Park,Grocery Store,Ice Cream Shop,Discount Store,Liquor Store,Restaurant
8,Norwood,1,Pizza Place,Deli / Bodega,Park,Bank,Chinese Restaurant,Pharmacy,Liquor Store,Sandwich Place,Mexican Restaurant,Athletics & Sports
14,University Heights,1,Pizza Place,Food,Deli / Bodega,Shoe Store,Bank,Bakery,Chinese Restaurant,Donut Shop,Pharmacy,Sandwich Place
15,Morris Heights,1,Spanish Restaurant,Food Truck,Grocery Store,Latin American Restaurant,Bus Station,Playground,Deli / Bodega,Pizza Place,Pharmacy,Bank
17,East Tremont,1,Pizza Place,Café,Fish & Chips Shop,Restaurant,Mobile Phone Shop,Fast Food Restaurant,Deli / Bodega,Discount Store,Shoe Store,Supermarket
19,High Bridge,1,Pizza Place,Pharmacy,Supermarket,Sandwich Place,Discount Store,Sports Club,Latin American Restaurant,Asian Restaurant,Donut Shop,Chinese Restaurant
20,Melrose,1,Pizza Place,Pharmacy,Supermarket,Discount Store,Intersection,Bus Station,Market,Gym,Gym / Fitness Center,Mexican Restaurant
25,Morrisania,1,Discount Store,Pizza Place,Fast Food Restaurant,Metro Station,Donut Shop,Grocery Store,Chinese Restaurant,Mexican Restaurant,Sandwich Place,Bowling Alley
26,Soundview,1,Grocery Store,Chinese Restaurant,Bus Station,Video Store,Discount Store,Bus Stop,Lawyer,Pizza Place,Pharmacy,Playground
29,Country Club,1,Sandwich Place,Playground,Fried Chicken Joint,Chinese Restaurant,Women's Store,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


#### Cluster 3

In [43]:
cluster_3 = all_merged.loc[all_merged['Cluster Labels'] == 2, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]
cluster_3

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
255,Emerson Hill,2,Construction & Landscaping,Farmers Market,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
412,Emery,2,Baseball Field,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
413,Humberlea,2,Baseball Field,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
504,Humber Bay,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
505,King's Mill Park,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
506,Kingsway Park South East,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
507,Mimico NE,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
508,Old Mill South,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
509,The Queensway East,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor
510,Royal York South East,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor


#### Cluster 4

In [44]:
cluster_4 = all_merged.loc[all_merged['Cluster Labels'] == 3, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]
cluster_4

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
322,Cloverdale,3,Bank,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
323,Islington,3,Bank,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
324,Martin Grove,3,Bank,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
325,Princess Gardens,3,Bank,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit
326,West Deane Park,3,Bank,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit


#### Cluster 5

In [45]:
cluster_5 = all_merged.loc[all_merged['Cluster Labels'] == 4, all_merged.columns[[1] + list(range(5, all_merged.shape[1]))]]
cluster_5

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wakefield,4,Dessert Shop,Gas Station,Food Truck,Pharmacy,Donut Shop,Sandwich Place,Laundromat,Caribbean Restaurant,Food,Ice Cream Shop
2,Eastchester,4,Caribbean Restaurant,Bus Station,Metro Station,Bus Stop,Diner,Juice Bar,Bowling Alley,Seafood Restaurant,Donut Shop,Pizza Place
3,Fieldston,4,Plaza,River,Bus Station,Playground,Women's Store,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
4,Riverdale,4,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Bus Station,Yoga Studio,Trail,Bookstore,Café
379,Riverdale,4,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Bus Station,Yoga Studio,Trail,Bookstore,Café
5,Kingsbridge,4,Pizza Place,Sandwich Place,Deli / Bodega,Mexican Restaurant,Supermarket,Bar,Discount Store,Spanish Restaurant,Fried Chicken Joint,Latin American Restaurant
6,Marble Hill,4,Coffee Shop,Sandwich Place,Discount Store,Deli / Bodega,Tennis Stadium,Bank,Bakery,Gym,Seafood Restaurant,Donut Shop
7,Woodlawn,4,Deli / Bodega,Playground,Bar,Pizza Place,Indian Restaurant,Lawyer,Pharmacy,Donut Shop,Train Station,Bus Station
9,Williamsbridge,4,Caribbean Restaurant,Soup Place,Nightclub,Bar,Women's Store,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service
10,Baychester,4,Bus Station,Electronics Store,Convenience Store,Shopping Mall,Sandwich Place,Donut Shop,Sporting Goods Shop,Gym / Fitness Center,Bank,Mattress Store


<h1>Let's add city names to the clusters</h1>

In [46]:
cluster1_cities = pd.merge(cluster_1,df_combined,left_on= 'Neighborhood', right_on='Neighborhood')
cluster2_cities = pd.merge(cluster_2,df_combined,left_on= 'Neighborhood', right_on='Neighborhood')
cluster3_cities = pd.merge(cluster_3,df_combined,left_on= 'Neighborhood', right_on='Neighborhood')
cluster4_cities = pd.merge(cluster_4,df_combined,left_on= 'Neighborhood', right_on='Neighborhood')
cluster5_cities = pd.merge(cluster_5,df_combined,left_on= 'Neighborhood', right_on='Neighborhood')

In [47]:
cluster3_cities.head()

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude,City
0,Emerson Hill,2,Construction & Landscaping,Farmers Market,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Staten Island,40.606794,-74.097762,New York City
1,Emery,2,Baseball Field,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,North York,43.724766,-79.532242,Totonto
2,Humberlea,2,Baseball Field,Women's Store,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,North York,43.724766,-79.532242,Totonto
3,Humber Bay,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Etobicoke,43.636258,-79.498509,Totonto
4,King's Mill Park,2,Construction & Landscaping,Baseball Field,Fast Food Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Etobicoke,43.636258,-79.498509,Totonto


<H1>Drop duplicates in clusters</H1>

In [48]:


cluster1_cities.drop_duplicates(inplace=True)
cluster2_cities.drop_duplicates(inplace=True)
cluster3_cities.drop_duplicates(inplace=True)
cluster4_cities.drop_duplicates(inplace=True)
cluster5_cities.drop_duplicates(inplace=True)



In [49]:
clust1_NYC     = cluster1_cities[cluster1_cities['City'] =='New York City']
clust1_Toronto = cluster1_cities[cluster1_cities['City'] =='Toronto']
len(clust1_NYC)

5

In [50]:


clust1_NYC= cluster1_cities[cluster1_cities['City']=='New York City']
clust1_Toronto = cluster1_cities[cluster1_cities['City']=='Toronto']

clust2_NYC= cluster2_cities[cluster2_cities['City']=='New York City']
clust2_Toronto = cluster2_cities[cluster2_cities['City']=='Toronto']

clust3_NYC= cluster3_cities[cluster3_cities['City']=='New York City']
clust3_Toronto = cluster3_cities[cluster3_cities['City']=='Toronto']

clust4_NYC= cluster4_cities[cluster4_cities['City']=='New York City']
clust4_Toronto = cluster4_cities[cluster4_cities['City']=='Toronto']


clust5_NYC= cluster5_cities[cluster5_cities['City']=='New York City']
clust5_Toronto = cluster5_cities[cluster5_cities['City']=='Toronto']

<h2>Let's a look at the contribution of Toronto and New York to each cluster.</h2>
<h2>Both cities contribute to clusters 1, 2,3 and 5.</h2>
<h2>Cluster 4 is 100% Toronto neighborhoods</h2>

In [51]:
print("Cluster 1:  New York members={},  Toronto members={}".format(len(clust1_NYC),(len(cluster1_cities)-len(clust1_NYC))))
print("Cluster 2:  New York members={},  Toronto members={}".format(len(clust2_NYC),(len(cluster2_cities)-len(clust2_NYC))))
print("Cluster 3:  New York members={},  Toronto members={}".format(len(clust3_NYC),(len(cluster3_cities)-len(clust3_NYC))))
print("Cluster 4:  New York members={},  Toronto members={}".format(len(clust4_NYC),(len(cluster4_cities)-len(clust4_NYC))))
print("Cluster 5:  New York members={},  Toronto members={}".format(len(clust5_NYC),(len(cluster5_cities)-len(clust5_NYC))))

Cluster 1:  New York members=5,  Toronto members=24
Cluster 2:  New York members=51,  Toronto members=41
Cluster 3:  New York members=1,  Toronto members=10
Cluster 4:  New York members=0,  Toronto members=5
Cluster 5:  New York members=249,  Toronto members=127


In [52]:
# all the venues have the same column names
cols=list(cluster_2.columns)
col = cols[2:12]

# extract venue names for each cluster
def venue_list(thisClust):
    LL=[]
    for c in col:
        L2 = list((thisClust[c]))
        LL.append(L2)

    LLL = list()
    for i in range(10):
        for j in LL[:][i]:
            LLL.append(j)
    return LLL

# a list of venues for each of the five clusters
# these lists will be used later to create new lists of venu counts (for each cluster)
Ven1 = venue_list(cluster_1)
Ven2 = venue_list(cluster_2)
Ven3 = venue_list(cluster_3)
Ven4 = venue_list(cluster_4)
Ven5 = venue_list(cluster_5)


## Let's try to get a feel of each cluster's '1st Most Common Venues'

<h2>'1st Most Common Venues in Cluster # 1'    [notice no restaurants in the '1st Most Common Venue']</h2>

In [53]:
c1 = cluster1_cities[['City','Neighborhood','1st Most Common Venue']]
v1=c1[['City','1st Most Common Venue']]
most1= pd.DataFrame(v1['1st Most Common Venue'])
most1['1st Most Common Venue'].value_counts(dropna=False)

Park                          17
Construction & Landscaping     3
Smoke Shop                     3
Airport                        2
Trail                          2
Bus Line                       1
Fast Food Restaurant           1
Name: 1st Most Common Venue, dtype: int64

<h2>'1st Most Common Venue' in Cluster # 2  [notice MANY restaurants in the '1st Most Common Venue']</h2>

In [54]:
c2 = cluster2_cities[['City','Neighborhood','1st Most Common Venue']]
v2=c2[['City','1st Most Common Venue']]
most2= pd.DataFrame(v2['1st Most Common Venue'])
most2['1st Most Common Venue'].value_counts(dropna=False)

Pizza Place                        35
Pharmacy                           11
Coffee Shop                        10
Grocery Store                       9
Café                                3
Caribbean Restaurant                3
Indian Restaurant                   2
Deli / Bodega                       2
Chinese Restaurant                  1
Spanish Restaurant                  1
Southern / Soul Food Restaurant     1
Empanada Restaurant                 1
Intersection                        1
Bank                                1
Bakery                              1
Fast Food Restaurant                1
Other Nightlife                     1
Playground                          1
Pub                                 1
Sandwich Place                      1
Baseball Field                      1
Cosmetics Shop                      1
Discount Store                      1
Rental Car Location                 1
Italian Restaurant                  1
Name: 1st Most Common Venue, dtype: int64

<h2>'1st Most Common Venue' in Cluster # 3   [notice no restaurants in the '1st Most Common Venue']'</h2>

In [55]:
c3 = cluster3_cities[['City','Neighborhood','1st Most Common Venue']]
v3=c3[['City','1st Most Common Venue']]
most3= pd.DataFrame(v3['1st Most Common Venue'])
most3['1st Most Common Venue'].value_counts(dropna=False)

Construction & Landscaping    9
Baseball Field                2
Name: 1st Most Common Venue, dtype: int64

<h2>'1st Most Common Venue' in Cluster # 4   [notice no restaurants in the '1st Most Common Venue']</h2>

In [56]:
c4 = cluster4_cities[['City','Neighborhood','1st Most Common Venue']]
v4=c4[['City','1st Most Common Venue']]
most4= pd.DataFrame(v4['1st Most Common Venue'])
most4['1st Most Common Venue'].value_counts(dropna=False)

Bank    5
Name: 1st Most Common Venue, dtype: int64

<h2>'1st Most Common Venue' in Cluster # 5   [notice there are some restaurants in the '1st Most Common Venue']</h2>

In [57]:
c5 = cluster5_cities[['City','Neighborhood','1st Most Common Venue']]
v5=c5[['City','1st Most Common Venue']].drop_duplicates()
most5= pd.DataFrame(v5['1st Most Common Venue'])
most5['1st Most Common Venue'].value_counts(dropna=False)

Rental Car Location            2
Convenience Store              2
Bakery                         2
Clothing Store                 2
Middle Eastern Restaurant      2
Cosmetics Shop                 2
Bus Station                    2
Liquor Store                   2
Fried Chicken Joint            2
Chinese Restaurant             2
Café                           2
Furniture / Home Store         2
Dessert Shop                   2
Indian Restaurant              2
Greek Restaurant               2
Coffee Shop                    2
Pizza Place                    2
Fast Food Restaurant           2
Grocery Store                  2
Bar                            2
Women's Store                  2
Intersection                   2
Caribbean Restaurant           2
Food Truck                     2
Ramen Restaurant               1
Burger Joint                   1
Gym                            1
Trail                          1
Athletics & Sports             1
Park                           1
Peruvian R

In [58]:
L=cluster_2[['1st Most Common Venue',
 '2nd Most Common Venue',
 '3rd Most Common Venue',
 '4th Most Common Venue',
 '5th Most Common Venue',
 '6th Most Common Venue',
 '7th Most Common Venue',
 '8th Most Common Venue',
 '9th Most Common Venue',
 '10th Most Common Venue']]

In [59]:
# define the dataframe columns

col_names1 = ['venues_Cluster_1'] 
col_names2 = ['venues_Cluster_2'] 
col_names3 = ['venues_Cluster_3'] 
col_names4 = ['venues_Cluster_4'] 
col_names5 = ['venues_Cluster_5'] 

# instantiate the dataframes for each venue 

clust1_venues = pd.DataFrame(Ven1,columns=col_names1)
clust2_venues = pd.DataFrame(Ven2,columns=col_names2)
clust3_venues = pd.DataFrame(Ven3,columns=col_names3)
clust4_venues = pd.DataFrame(Ven4,columns=col_names4)
clust5_venues = pd.DataFrame(Ven5,columns=col_names5)

# count each venue category of each cluster
c1_venue_count =pd.DataFrame(clust1_venues['venues_Cluster_1'].value_counts(dropna=False))
c2_venue_count =pd.DataFrame(clust2_venues['venues_Cluster_2'].value_counts(dropna=False))
c3_venue_count =pd.DataFrame(clust3_venues['venues_Cluster_3'].value_counts(dropna=False))
c4_venue_count =pd.DataFrame(clust4_venues['venues_Cluster_4'].value_counts(dropna=False))
c5_venue_count =pd.DataFrame(clust5_venues['venues_Cluster_5'].value_counts(dropna=False))

<h1>Let's refine cluster characterization by expanding venue count to include all top 10 venues.</h1>

<h3>---------------------------------------</h3>

<h3>The BAKERY LOCATION SEARCH HYPOTHEISIS</h3>

<h3>The hypothesis is:  Venues that neighborhoods that have many restaurants are food friendly' neighborhoods and </h3>
<h3>'food friendly' neighborhoods that currently having no bakeries may be potentially a good location for a new bakery.</h3>
<h3>Let's look at each cluster's restaurant venues counts.</h3>
<h3>----------------------------------------------------------------------------------------------------------------</h3>

<h3>Is Cluster 1 a 'restaurant friendly' cluster ?  YES</h3>
<h3>There are 100 restaurants and 3 bakeries. Approximatetly a ratio 33:1  restaurants to bakeries  </h3>

In [60]:
c1_rest= pd.DataFrame(clust1_venues[clust1_venues['venues_Cluster_1'].str.contains('Restaurant')])
Clust_1_rest_types =pd.DataFrame(c1_rest['venues_Cluster_1'].value_counts(dropna=False))
c1_bakery= pd.DataFrame(clust1_venues[clust1_venues['venues_Cluster_1'].str.contains('Bakery')])
Clust_1_bakery_types =pd.DataFrame(c1_bakery['venues_Cluster_1'].value_counts(dropna=False))
print(Clust_1_rest_types)
print('---------------------------------------')

print('There are {} restaurants and {} bakeries.'.format(len(c1_rest),len(c1_bakery)))


                             venues_Cluster_1
Empanada Restaurant                        28
English Restaurant                         28
Ethiopian Restaurant                       26
Falafel Restaurant                          6
Egyptian Restaurant                         6
Fast Food Restaurant                        2
Sushi Restaurant                            2
South American Restaurant                   1
Eastern European Restaurant                 1
---------------------------------------
There are 100 restaurants and 3 bakeries.


<h3>Is Cluster 2 a 'restaurant friendly' cluster ?   YES</h3>
<h3>There are 191 restaurants and 14 bakeries. The ratio of restaurant to bakery = 14:1 </h3>

In [61]:
c2_rest= pd.DataFrame(clust2_venues[clust2_venues['venues_Cluster_2'].str.contains('Restaurant')])
Clust_2_rest_types =pd.DataFrame(c2_rest['venues_Cluster_2'].value_counts(dropna=False))
c2_bakery= pd.DataFrame(clust2_venues[clust2_venues['venues_Cluster_2'].str.contains('Bakery')])
Clust_2_bakery_types =pd.DataFrame(c2_bakery['venues_Cluster_2'].value_counts(dropna=False))
print(Clust_2_rest_types)
print('---------------------------------------')
print('There are {} restaurants and {} bakeries.'.format(len(c2_rest),len(c2_bakery)))

                                 venues_Cluster_2
Fast Food Restaurant                           26
Chinese Restaurant                             23
Empanada Restaurant                            19
English Restaurant                             19
Restaurant                                     13
Italian Restaurant                             11
Mexican Restaurant                              9
Ethiopian Restaurant                            8
Egyptian Restaurant                             7
Caribbean Restaurant                            7
American Restaurant                             6
Sushi Restaurant                                5
Middle Eastern Restaurant                       5
Latin American Restaurant                       5
Indian Restaurant                               4
Thai Restaurant                                 4
Falafel Restaurant                              4
Asian Restaurant                                3
Spanish Restaurant                              3


<h3>Cluster 3 is restaurant friendly?  YES</h3>
<h3>There are 42 restaurants and 0 bakeries   </h3>

In [62]:
c3_rest= pd.DataFrame(clust3_venues[clust3_venues['venues_Cluster_3'].str.contains('Restaurant')])
Clust_3_rest_types =pd.DataFrame(c3_rest['venues_Cluster_3'].value_counts(dropna=False))
c3_bakery= pd.DataFrame(clust3_venues[clust3_venues['venues_Cluster_3'].str.contains('Bakery')])
Clust_3_bakery_types =pd.DataFrame(c3_bakery['venues_Cluster_3'].value_counts(dropna=False))
print(Clust_3_rest_types)
print('---------------------------------------')
#print(Clust_1_bakery_types)
#print('---------------------------------------')
print('There are {} restaurants and {} bakeries.'.format(len(c3_rest),len(c3_bakery)))



                      venues_Cluster_3
Empanada Restaurant                 11
English Restaurant                  11
Ethiopian Restaurant                11
Fast Food Restaurant                 8
Egyptian Restaurant                  1
---------------------------------------
There are 42 restaurants and 0 bakeries.


<h3>Is Cluster 4 'restaurant friendly' cluster ?</h3>
<h3>There are 15 restaurants, but there no Bakeries. </h3>

In [63]:
c4_rest= pd.DataFrame(clust4_venues[clust4_venues['venues_Cluster_4'].str.contains('Restaurant')])
Clust_4_rest_types =pd.DataFrame(c4_rest['venues_Cluster_4'].value_counts(dropna=False))
c4_bakery= pd.DataFrame(clust4_venues[clust4_venues['venues_Cluster_4'].str.contains('Bakery')])
Clust_4_bakery_types =pd.DataFrame(c4_bakery['venues_Cluster_4'].value_counts(dropna=False))
print(Clust_4_rest_types)
print('---------------------------------------')
print('There are {} restaurants and {} bakeries.'.format(len(c4_rest),len(c4_bakery)))

                      venues_Cluster_4
English Restaurant                   5
Empanada Restaurant                  5
Ethiopian Restaurant                 5
---------------------------------------
There are 15 restaurants and 0 bakeries.


<h3>Is Cluster 5 restaurant friendly cluster ?   YES,  but there any many bakeries </h3>

<h3>There are 1005 restaurants and 101 bakeries.   Approximatetly a ratio 10:1  restaurants to bakeries</h3>

In [64]:
c5_rest= pd.DataFrame(clust5_venues[clust5_venues['venues_Cluster_5'].str.contains('Restaurant')])
Clust_5_rest_types =pd.DataFrame(c5_rest['venues_Cluster_5'].value_counts(dropna=False))
c5_bakery= pd.DataFrame(clust5_venues[clust5_venues['venues_Cluster_5'].str.contains('Bakery')])
Clust_5_bakery_types =pd.DataFrame(c5_bakery['venues_Cluster_5'].value_counts(dropna=False))
print(Clust_5_rest_types)
print('---------------------------------------')
print('There are {} restaurants and {} bakeries.'.format(len(c5_rest),len(c5_bakery)))

                                 venues_Cluster_5
Italian Restaurant                            100
Chinese Restaurant                             82
Ethiopian Restaurant                           67
English Restaurant                             67
Empanada Restaurant                            62
American Restaurant                            61
Fast Food Restaurant                           60
Mexican Restaurant                             57
Restaurant                                     47
Sushi Restaurant                               37
Caribbean Restaurant                           30
Japanese Restaurant                            30
Thai Restaurant                                23
Seafood Restaurant                             23
Latin American Restaurant                      20
Vietnamese Restaurant                          17
Asian Restaurant                               17
Indian Restaurant                              17
French Restaurant                              16


<h3>Cluster 1, 3 and 4 would be worth evaluating initially.</h3>
<h3>Cluster 1 has a high restaurant to bakery ratio of 33.  
<h3>Clusters 3 and 4 have many restauants and no bakeries.</h3>
<h3>Which cities/neighborhoods are in Clusters 1, 3 and 4  ?</h3>

In [65]:
city_neigh_134 = pd.concat([cluster1_cities[['City','Neighborhood']],
                           cluster3_cities[['City','Neighborhood']],
                           cluster4_cities[['City','Neighborhood']]],axis=0)

city_neigh_134.reset_index(drop=True)

Unnamed: 0,City,Neighborhood
0,New York City,Clason Point
1,New York City,Somerville
2,New York City,Todt Hill
3,New York City,Randall Manor
4,New York City,Bayswater
5,Totonto,Parkwoods
6,Totonto,Caledonia-Fairbanks
7,Totonto,East Toronto
8,Totonto,CFB Toronto
9,Totonto,Downsview East


<h4>Conclusions:</h4>
<h4>Toronto and New York City both have urban neighborhoods.  During the clustering creation a fixed radius (300 meters) was used.</h4>
<h4>The neighborhoods in Clusters 1, 3 and 4 are 'restaurant friendly' and currently have either no bakeries or have a high</h4> 
<h4>restaurant to bakery ratio and represent a potential favorable location for a new bakery.</h4>
<h4>There are 517 neighborhoods in Toronto and New York City.</h4>
<h4>Using the Kmeans algorithm, the number of neighborhood in these two cities that</h4>
<h4>is 44.  So, reducing potential neighborhoods for a new bakery location from 517 to 44 using machine learning.</h4>
<h4>This is just a 'quick look'.  Future studies would include demographic data.</h4>