# This is Data Science Capstone Project - The Battle of Neighborhoods

In [None]:

Business Problem section
Background

According to Newsweekjapan news, the popularity rankings of countries for moving abroad because of corona.
Canada if you're moving! Among the 101 countries in the world, Canada ranked first, Japan is second.
Of the 101 countries, 29 ranked Canada as the country most likely to immigrate. Japan was ranked number one by 13 countries, making it the world's second largest immigrant country. Third place went to Spain, followed by Germany, Qatar and Australia. Canada's neighbor, the United States, ranks ninth.
Business Problem

the distinct culture and geographcial location differences among of New York, Toronto, and Tokyo,
I am trying to find out similar or differences neighborhoods among these cities.
Data Science Methodology

    Collect Inspection Data
    Explore and Understand Data
    Data preparation and preprocessing
    Modeling
    visualization

    Search the city address data of New York, Toronto, and Tokyo on the internet.
    Convert addresses into their equivalent latitude and longitude values. also use the Foursquare API to explore neighborhoods.
    I will use the explore function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters.
    I will Use hierarchical agglomerative clustering method to compare neighborhoods among cities to complete this task.
    I will use WordCloud and the Folium library to visualize the neighborhoods in New York City and their emerging clusters.
    Finally, I will make a conclusion at last step.



# 1. Import packages

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import folium # map rendering library

print('Libraries imported.')

ModuleNotFoundError: No module named 'folium'

# 2. Prepare for the city map function to show different neighborhood labels based on Borough

In [2]:
def city_map(city,country,df):
    # create map
    address = city + ',' + country

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    citymap = folium.Map(location=[latitude, longitude], zoom_start=10)

    # set color scheme for the Borough
    borough_name = df['Borough'].unique().tolist()
    colnum = df['Borough'].unique().size
    x = np.arange(colnum)
    ys = [i+x+(i*x)**2 for i in range(colnum)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    for lat, lon, neighborhood, borough in zip(df['Latitude'], df['Longitude'], df['Neighborhood'], df['Borough']):
        cluster = borough_name.index(borough)
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster],
            fill=True,
            fill_color=rainbow[cluster],
            fill_opacity=0.7).add_to(citymap)
    
    return citymap

# 3. Download and Explore Dataset

In [None]:
with open('new_york.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data_ny = newyork_data['features']

# define the dataframe with four columns: City, Borough, Neighborhood, Latitude, Longitude
column_names = ['City','Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
ny_neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data_ny:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'City': 'New York',
                                          'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
print(ny_neighborhoods.shape)
ny_neighborhoods.head()

In [None]:
New York neighborhood data

In [None]:
Make this Notebook Trusted to load map: File -> Trust Notebook

 	City 	Borough 	Neighborhood 	Latitude 	Longitude
0 	New York 	Bronx 	Wakefield 	40.894705 	-73.847201
1 	New York 	Bronx 	Co-op City 	40.874294 	-73.829939
2 	New York 	Bronx 	Eastchester 	40.887556 	-73.827806
3 	New York 	Bronx 	Fieldston 	40.895437 	-73.905643
4 	New York 	Bronx 	Riverdale 	40.890834 	-73.912585

# We see that all the neighorhoods in the same Borough fall into the same color and are well seprated in map, indicating good quality of data.

In [None]:
Toronto neighborhood data

In [6]:
# Read Toronto neighborhood data from wikipedia page
toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',header=0)[0]
# Clean dataframe, delete rows with 'Not assigned'
toronto = toronto[~toronto.Borough.str.contains("Not assigned") == True]
#rename column names
toronto.rename(columns={"Postal Code":'PostalCode',"Neighbourhood":"Neighborhood"}, inplace=True)
#dataframe sort_values
toronto = pd.DataFrame(toronto).sort_values(by=['PostalCode'])
# Reset index for dataframe
toronto.reset_index(drop=True,inplace=True)
print(toronto.shape)
toronto.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
#Read Toronto geo
url = 'https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv'
toronto_geo = pd.read_csv(url)
toronto_geo.rename(columns={"Postal Code":'PostalCode'}, inplace=True)
toronto_geo = pd.DataFrame(toronto_geo).sort_values(by=['PostalCode'])
print(toronto_geo.shape)
toronto_geo.head()

(103, 3)


Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
# merge toronto city and geo to toronto neighborhoods
toronto_neighborhoods = pd.merge(toronto,toronto_geo,on='PostalCode')
toronto_neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [9]:
# change PostalCode to City
toronto_neighborhoods.rename(columns={'PostalCode':'City'}, inplace=True)
toronto_neighborhoods["City"] = "Toronto"
print(toronto_neighborhoods.shape)
toronto_neighborhoods.head()

(103, 5)


Unnamed: 0,City,Borough,Neighborhood,Latitude,Longitude
0,Toronto,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,Toronto,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,Toronto,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,Toronto,Scarborough,Woburn,43.770992,-79.216917
4,Toronto,Scarborough,Cedarbrae,43.773136,-79.239476


In [10]:


toronto_neighborhoods['Borough'].unique()



array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       'Mississauga', 'Etobicoke'], dtype=object)

In [None]:
Make this Notebook Trusted to load map: File -> Trust Notebook

# Tokyo neighborhood data

In [11]:
# https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards
# Read Tokyo neighborhood data from wikipedia page
tk = pd.read_html('https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards',header=0)[3]
tk = pd.DataFrame(tk)
#print(tokyo.columns.values)
tk.drop(['No.','Flag','Density(/km2)','Area(km2)','Population(as of October\xa02016','Kanji'],axis=1,inplace=True)
#rename column names
tk.rename(columns={"Name":'Borough',"Major districts":"Neighborhood"}, inplace=True)
columns = ["City"]
tokyo = pd.DataFrame(columns=columns)
tokyo = pd.concat([tokyo,tk])
tokyo["City"] = "Tokyo"
print(tokyo.shape)
tokyo.head()

(24, 3)


Unnamed: 0,City,Borough,Neighborhood
0,Tokyo,Chiyoda,"Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,..."
1,Tokyo,Chūō,"Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb..."
2,Tokyo,Minato,"Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong..."
3,Tokyo,Shinjuku,"Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich..."
4,Tokyo,Bunkyō,"Hongō, Yayoi, Hakusan"


In [12]:
tokyo.replace(regex=['Ō'], value='O',inplace=True)
tokyo.replace(regex=['ō'], value='o',inplace=True)
tokyo.replace(regex=['ū'], value='u',inplace=True)
tokyo.head()

Unnamed: 0,City,Borough,Neighborhood
0,Tokyo,Chiyoda,"Nagatacho, Kasumigaseki, Otemachi, Marunouchi,..."
1,Tokyo,Chuo,"Nihonbashi, Kayabacho, Ginza, Tsukiji, Hatchob..."
2,Tokyo,Minato,"Odaiba, Shinbashi, Hamamatsucho, Mita, Roppong..."
3,Tokyo,Shinjuku,"Shinjuku, Takadanobaba, Okubo, Kagurazaka, Ich..."
4,Tokyo,Bunkyo,"Hongo, Yayoi, Hakusan"


In [13]:
columns = ["City",'Borough','Neighborhood']
tokyo_df = pd.DataFrame(columns=columns)
for i in range(tokyo.shape[0]):
    borough = tokyo.loc[i,'Borough']
    neighborhood = tokyo.loc[i,'Neighborhood']
    
    if pd.isnull(neighborhood) :
        pass
    else:
        if neighborhood.find(','):
            neighborhoods = neighborhood.split(',')
            for i in range(len(neighborhoods)):
                tokyo_df = tokyo_df.append({'City': 'Tokyo',
                                        'Borough': borough,
                                        'Neighborhood': neighborhoods[i]}, ignore_index=True)
    
print(tokyo_df.shape)
tokyo_df.head()

(106, 3)


Unnamed: 0,City,Borough,Neighborhood
0,Tokyo,Chiyoda,Nagatacho
1,Tokyo,Chiyoda,Kasumigaseki
2,Tokyo,Chiyoda,Otemachi
3,Tokyo,Chiyoda,Marunouchi
4,Tokyo,Chiyoda,Akihabara


# Convert addresses into their equivalent latitude and longitude values

In [16]:
#create a new dataframe tk_neighborhoods to find all neighborhoods with the location information
tk_neighborhoods= pd.DataFrame(columns = ['City','Borough','Neighborhood'])

for i in range(tokyo_df.shape[0]):
    borough = tokyo_df.loc[i,'Borough']
    neighborhood = tokyo_df.loc[i,'Neighborhood']
      
    #find the location data, ignore the neighborhoods that are unable to be located by Nominatim
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode("{},{},Tokyo,Japan".format(neighborhood,borough))
    
    #try one more searching
    if location is None: 
        location = geolocator.geocode("{},Tokyo,Japan".format(neighborhood))
        
    if location is None: 
        print("The location data of {} in {} is not available!".format(neighborhood,borough))
    else:
        tk_neighborhoods = tk_neighborhoods.append({'City': 'Tokyo',
                                                'Borough': borough,
                                                'Neighborhood': neighborhood,
                                                'Latitude': location.latitude,
                                                'Longitude': location.longitude
                                               }, ignore_index=True)

tk_neighborhoods.head()

Unnamed: 0,City,Borough,Neighborhood,Latitude,Longitude
0,Tokyo,Chiyoda,Nagatacho,35.675618,139.743469
1,Tokyo,Chiyoda,Kasumigaseki,35.674054,139.750972
2,Tokyo,Chiyoda,Otemachi,35.684631,139.766466
3,Tokyo,Chiyoda,Marunouchi,35.680656,139.765222
4,Tokyo,Chiyoda,Akihabara,35.698768,139.774255


 	City 	Borough 	Neighborhood 	Latitude 	Longitude
0 	Tokyo 	Chiyoda 	Nagatacho 	35.675618 	139.743469
1 	Tokyo 	Chiyoda 	Kasumigaseki 	35.674054 	139.750972
2 	Tokyo 	Chiyoda 	Otemachi 	35.684631 	139.766466
3 	Tokyo 	Chiyoda 	Marunouchi 	35.680656 	139.765222
4 	Tokyo 	Chiyoda 	Akihabara 	35.698768 	139.774255

In [17]:
print(tk_neighborhoods.shape)
tk_neighborhoods['Borough'].unique()

(106, 5)


array(['Chiyoda', 'Chuo', 'Minato', 'Shinjuku', 'Bunkyo', 'Taito',
       'Sumida', 'Koto', 'Shinagawa', 'Meguro', 'Ota', 'Setagaya',
       'Shibuya', 'Nakano', 'Suginami', 'Toshima', 'Kita', 'Arakawa',
       'Itabashi', 'Nerima', 'Adachi', 'Katsushika', 'Edogawa'],
      dtype=object)

In [None]:

Out[22]:

array(['Chiyoda', 'Chuo', 'Minato', 'Shinjuku', 'Bunkyo', 'Taito',
       'Sumida', 'Koto', 'Shinagawa', 'Meguro', 'Ota', 'Setagaya',
       'Shibuya', 'Nakano', 'Suginami', 'Toshima', 'Kita', 'Arakawa',
       'Itabashi', 'Nerima', 'Adachi', 'Katsushika', 'Edogawa'],
      dtype=object)



In [18]:
print('The dataframe has {} Borough and {} neighborhoods.'.format(
        len(tokyo_df['Borough'].unique()),
        tokyo_df.shape[0]
    )
)

The dataframe has 23 Borough and 106 neighborhoods.


In [None]:
city_map('Tokyo','Japan', tk_neighborhoods)

In [None]:
Make this Notebook Trusted to load map: File -> Trust Notebook

# Save the data

In [None]:
ny_neighborhoods.to_csv('ny_neighborhoods.csv', index=False)
toronto_neighborhoods.to_csv('toronto_neighborhoods.csv', index=False)
tk_neighborhoods.to_csv('tk_neighborhoods.csv', index=False)

# Use the Foursquare API to explore neighborhoods.

In [None]:
CLIENT_ID = 'SYI5JTP05XIF2XSKSF02HRY405K2TVT20ZUCYYZAWEURD5CN' # Foursquare ID
CLIENT_SECRET = 'PO4DVXK2YNAF1OTDHYHNXKNMDY2YU1NNY2YRWNAH0CQG3MEC' # Foursquare Secret

VERSION = '20180605' # Foursquare API version
LIMIT = 100 # only return the top 100 venues

# define a function to expore each city in radius of 1000 meters
def getNearbyVenues(nborhood, radius=100):
    
    venues=pd.DataFrame(columns=['City','Borough','Neighborhood','VenueName','VenueCategory'])
    for city, borough, neighborhood, lat, lng in zip(nborhood['City'], nborhood['Borough'], nborhood['Neighborhood'], nborhood['Latitude'], nborhood['Longitude']):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        if results is not None:
            for v in results:
                venues = venues.append({'City':city, 'Borough':borough, 'Neighborhood': neighborhood, 
                           'VenueName': v['venue']['name'], 
                           'VenueCategory': v['venue']['categories'][0]['name']},ignore_index=True)

   
    
    return(venues)

# Look at neighborhoods in New York

In [None]:
# obtain the nearby venues of each neighborhood in New York
ny_venues = getNearbyVenues(ny_neighborhoods)
ny_venues.head()

 	City 	Borough 	Neighborhood 	VenueName 	VenueCategory
0 	New York 	Bronx 	Fieldston 	Ecco Salon 	Cosmetics Shop
1 	New York 	Bronx 	Kingsbridge 	Garden Gourmet Market 	Gourmet Shop
2 	New York 	Bronx 	Kingsbridge 	MyUnique 	Thrift / Vintage Store
3 	New York 	Bronx 	Kingsbridge 	Stop & Shop 	Supermarket
4 	New York 	Bronx 	Kingsbridge 	Mattress Firm 	Mattress Store

In [None]:
#save the data
ny_venues.to_csv('ny_venues.csv', index=False)
ny_venues.shape

In [None]:
(795, 5)

In [None]:
# rename columns name
ny_venues = ny_venues.rename(columns={'Neighborhood': 'NeighborhoodName'})

In [None]:
ny_venues_category_counts = ny_venues["VenueCategory"].value_counts()
ny_venues_category_counts = ny_venues_category_counts.to_frame().rename(columns={"VenueCategory":"NewYork_VenueCategory"})
ny_venues_category_counts

 	NewYork_VenueCategory
Pizza Place 	31
Chinese Restaurant 	29
Deli / Bodega 	27
Coffee Shop 	25
Italian Restaurant 	18
Grocery Store 	16
Bar 	15
Café 	14
Ice Cream Shop 	13
Mexican Restaurant 	13
Sandwich Place 	12
Park 	12
American Restaurant 	11
Gym 	11
Bagel Shop 	11
Gourmet Shop 	11
Bakery 	11
Bus Stop 	11
Pharmacy 	11
Gym / Fitness Center 	10
Hotel 	10
Donut Shop 	9
Diner 	9
Playground 	9
Korean Restaurant 	8
Cosmetics Shop 	8
Food Truck 	8
Cocktail Bar 	8
Bank 	8
Greek Restaurant 	8
Vegetarian / Vegan Restaurant 	7
Restaurant 	7
Thai Restaurant 	7
Art Gallery 	7
Burger Joint 	7
Wine Shop 	7
Sushi Restaurant 	7
Bus Station 	7
Wine Bar 	6
Discount Store 	6
Juice Bar 	6
Thrift / Vintage Store 	6
Fast Food Restaurant 	6
Pet Store 	6
Japanese Restaurant 	6
Boat or Ferry 	5
Liquor Store 	5
Supermarket 	5
Beach 	5
Asian Restaurant 	5
Fried Chicken Joint 	5
Home Service 	4
Arts & Crafts Store 	4
Gift Shop 	4
Concert Hall 	4
Shipping Store 	4
Performing Arts Venue 	4
New American Restaurant 	4
Optical Shop 	4
Mediterranean Restaurant 	4
Indian Restaurant 	4
Yoga Studio 	4
Toy / Game Store 	4
Food & Drink Shop 	4
Caribbean Restaurant 	4
Spa 	4
Pub 	4
Vietnamese Restaurant 	3
Clothing Store 	3
Dessert Shop 	3
Latin American Restaurant 	3
Dumpling Restaurant 	3
Dance Studio 	3
Fruit & Vegetable Store 	3
Movie Theater 	3
Beer Garden 	3
Spanish Restaurant 	3
Mobile Phone Shop 	3
Train Station 	3
Department Store 	3
Hotel Bar 	3
Theater 	3
Lounge 	3
Bookstore 	3
Sporting Goods Shop 	3
Historic Site 	3
Pier 	2
Malay Restaurant 	2
Polish Restaurant 	2
Dog Run 	2
Bubble Tea Shop 	2
Monument / Landmark 	2
Music Venue 	2
Frozen Yogurt Shop 	2
Salon / Barbershop 	2
Roof Deck 	2
Food 	2
Residential Building (Apartment / Condo) 	2
Pool 	2
Nail Salon 	2
Salad Place 	2
Racetrack 	2
Food Court 	2
Check Cashing Service 	2
Tapas Restaurant 	2
Jewelry Store 	2
Nightclub 	2
Farmers Market 	2
Noodle House 	2
Hookah Bar 	2
Tea Room 	2
Cycle Studio 	2
Miscellaneous Shop 	1
Accessories Store 	1
Steakhouse 	1
Argentinian Restaurant 	1
Sports Bar 	1
Other Great Outdoors 	1
Tattoo Parlor 	1
Shoe Store 	1
Himalayan Restaurant 	1
Cupcake Shop 	1
Pilates Studio 	1
Basketball Court 	1
Speakeasy 	1
South American Restaurant 	1
Board Shop 	1
Construction & Landscaping 	1
Opera House 	1
Szechuan Restaurant 	1
Real Estate Office 	1
Bridal Shop 	1
Middle Eastern Restaurant 	1
Building 	1
Seafood Restaurant 	1
French Restaurant 	1
Furniture / Home Store 	1
Gastropub 	1
Museum 	1
Smoothie Shop 	1
Community Center 	1
Plaza 	1
BBQ Joint 	1
Piercing Parlor 	1
High School 	1
Shopping Mall 	1
Bike Shop 	1
Hotpot Restaurant 	1
Rock Climbing Spot 	1
Gas Station 	1
Bike Rental / Bike Share 	1
Neighborhood 	1
Taco Place 	1
Indoor Play Area 	1
Pool Hall 	1
Tailor Shop 	1
German Restaurant 	1
Boxing Gym 	1
Tennis Court 	1
Lingerie Store 	1
Hawaiian Restaurant 	1
Comedy Club 	1
Caucasian Restaurant 	1
Music Store 	1
Laundry Service 	1
Gay Bar 	1
Laundromat 	1
Puerto Rican Restaurant 	1
Shop & Service 	1
Tex-Mex Restaurant 	1
Doctor's Office 	1
Fish & Chips Shop 	1
Garden 	1
Falafel Restaurant 	1
Southern / Soul Food Restaurant 	1
Kids Store 	1
Shanghai Restaurant 	1
Varenyky restaurant 	1
Filipino Restaurant 	1
Flower Shop 	1
Art Museum 	1
Pakistani Restaurant 	1
Drugstore 	1
Metro Station 	1
Climbing Gym 	1
Trail 	1
Mattress Store 	1
Poke Place 	1
Video Store 	1
Bus Line 	1
Sports Club 	1
Massage Studio 	1
Cooking School 	1
Afghan Restaurant 	1
Moving Target 	1
Bistro 	1
Convenience Store 	1
Library 	1
Beer Bar 	1
Big Box Store 	1
Farm 	1
Creperie 	1
Scenic Lookout 	1
Temple 	1
Baseball Field 	1
Paper / Office Supplies Store 	1
Women's Store 	1
Cheese Shop 	1
Outdoor Sculpture 	1

In [None]:
#ny_venues_category_counts.index.values

In [None]:
Now we have total of 795 venues available in New York.

# Visualize with WordCloud

In [None]:
# import package and its set of stopwords
from wordcloud import WordCloud, STOPWORDS

In [None]:
#venues_counts
def word_string(venues_category_counts):
    word_string = ''
    for i in range(venues_category_counts.shape[0]):
        repeat_num_times = int(venues_category_counts.iloc[i,0])
        word_string = word_string + ((venues_category_counts.iloc[i].name + ' ') * repeat_num_times)
                                     
    # display the generated text
    return word_string

ny_word_string = word_string(ny_venues_category_counts)
#ny_word_string

In [None]:
# create the word cloud
wordcloud = WordCloud(background_color='white').generate(ny_word_string)

In [None]:


# display the cloud
fig = plt.figure()
fig.set_figwidth(14)
fig.set_figheight(18)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
fig.savefig("ny_img.png")



In [None]:


As you see, top 10 neighborhoods in New York is:

    Pizza Place 31
    Chinese Restaurant 29
    Deli / Bodega 27
    Coffee Shop 25
    Italian Restaurant
    Grocery Store 16
    Bar 15
    Café 14
    Ice Cream Shop 13
    Mexican Restaurant 13



# Look at neighborhoods in Toronto

In [None]:
# repeat on Toronto
toronto_venues = getNearbyVenues(toronto_neighborhoods)
toronto_venues.head()

 	City 	Borough 	Neighborhood 	VenueName 	VenueCategory
0 	Toronto 	Scarborough 	Scarborough Village 	McCowan Park 	Playground
1 	Toronto 	Scarborough 	Steeles West, L'Amoreaux West 	Mr Congee Chinese Cuisine 龍粥記 	Chinese Restaurant
2 	Toronto 	Scarborough 	Steeles West, L'Amoreaux West 	Subway 	Sandwich Place
3 	Toronto 	Scarborough 	Steeles West, L'Amoreaux West 	KFC 	Fast Food Restaurant
4 	Toronto 	Scarborough 	Steeles West, L'Amoreaux West 	Tim Hortons 	Coffee Shop

In [None]:
#save the data
toronto_venues.to_csv('toronto_venues.csv', index=False)

#  to avoid multi-index problem
toronto_venues = toronto_venues.rename(columns={'Neighborhood': 'NeighborhoodName'})
toronto_venues.shape

In [None]:
(141, 5)



In [None]:
toronto_venues_category_counts = toronto_venues["VenueCategory"].value_counts()
toronto_venues_category_counts = toronto_venues_category_counts.to_frame().rename(columns={"VenueCategory":"Toronto_VenueCategory"})
toronto_venues_category_counts

 	Toronto_VenueCategory
Coffee Shop 	17
Café 	6
Italian Restaurant 	6
Japanese Restaurant 	5
Restaurant 	5
Sushi Restaurant 	4
Sandwich Place 	4
Bakery 	4
Deli / Bodega 	4
Gym 	3
Fast Food Restaurant 	3
Breakfast Spot 	3
Dessert Shop 	3
Bar 	2
Concert Hall 	2
Seafood Restaurant 	2
Burger Joint 	2
Liquor Store 	2
Cupcake Shop 	2
Salad Place 	2
Park 	2
Pharmacy 	2
Art Gallery 	2
Pub 	2
Building 	1
Hot Dog Joint 	1
School 	1
Burrito Place 	1
American Restaurant 	1
Tea Room 	1
Gift Shop 	1
Vegetarian / Vegan Restaurant 	1
Playground 	1
Toy / Game Store 	1
Trail 	1
Supermarket 	1
Garden 	1
Spa 	1
Thai Restaurant 	1
Cocktail Bar 	1
Electronics Store 	1
Hotel 	1
Bank 	1
Brewery 	1
Middle Eastern Restaurant 	1
Grocery Store 	1
Housing Development 	1
Fried Chicken Joint 	1
Gay Bar 	1
Chinese Restaurant 	1
Asian Restaurant 	1
Beer Store 	1
Boutique 	1
Taco Place 	1
Gastropub 	1
Farmers Market 	1
Gluten-free Restaurant 	1
Food Court 	1
Soup Place 	1
Pool 	1
Indian Restaurant 	1
Performing Arts Venue 	1
College Gym 	1
Yoga Studio 	1
Diner 	1
Record Shop 	1
Greek Restaurant 	1
Furniture / Home Store 	1
Cosmetics Shop 	1
Roof Deck 	1
Clothing Store 	1
Gym / Fitness Center 	1
Dance Studio 	1
Theater 	1
Thrift / Vintage Store 	1
Bookstore 	1

In [None]:
#toronto_word_string
toronto_word_string = word_string(toronto_venues_category_counts)
# create the word cloud
toronto_wordcloud = WordCloud(background_color='white').generate(toronto_word_string)
# display the cloud
fig = plt.figure()
fig.set_figwidth(14)
fig.set_figheight(18)

plt.imshow(toronto_wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
fig.savefig("toronto_img.png")

In [None]:

Top 10 neighborhoods in Toronto:

    Coffee Shop 17
    Café 6
    Italian Restaurant 6
    Japanese Restaurant 5
    Restaurant 5
    Sushi Restaurant 4
    Sandwich Place 4
    Bakery 4
    Deli / Bodega 4
    Gym 3



# Look at neighborhoods in Tokyo

In [None]:
# repeat on tokyo
tokyo_venues = getNearbyVenues(tk_neighborhoods)
tokyo_venues.head()

 	City 	Borough 	Neighborhood 	VenueName 	VenueCategory
0 	Tokyo 	Chiyoda 	Nagatacho 	Tully's Coffee 	Coffee Shop
1 	Tokyo 	Chiyoda 	Nagatacho 	Tully's Coffee 	Coffee Shop
2 	Tokyo 	Chiyoda 	Nagatacho 	7-Eleven (セブンイレブン) 	Convenience Store
3 	Tokyo 	Chiyoda 	Nagatacho 	Yoshinoya (吉野家) 	Donburi Restaurant
4 	Tokyo 	Chiyoda 	Nagatacho 	一茶そば 	Japanese Restaurant

In [None]:
#save the data
tokyo_venues.to_csv('tokyo_venues.csv', index=False)

#  to avoid multi-index problem
tokyo_venues = tokyo_venues.rename(columns={'Neighborhood': 'NeighborhoodName'})
tokyo_venues.shape

In [None]:
(1120, 5)

In [None]:
tokyo_venues_category_counts = tokyo_venues["VenueCategory"].value_counts()
tokyo_venues_category_counts = tokyo_venues_category_counts.to_frame().rename(columns={"VenueCategory":"Tokyo_VenueCategory"})
tokyo_venues_category_counts

 	Tokyo_VenueCategory
Convenience Store 	86
Coffee Shop 	76
Sake Bar 	63
Café 	51
Ramen Restaurant 	49
Japanese Restaurant 	41
Chinese Restaurant 	27
Italian Restaurant 	27
Soba Restaurant 	24
Donburi Restaurant 	22
BBQ Joint 	20
Fast Food Restaurant 	20
Train Station 	20
Bakery 	18
Steakhouse 	16
Sushi Restaurant 	15
Supermarket 	15
Dessert Shop 	15
Shopping Mall 	14
Bookstore 	12
Ice Cream Shop 	12
Tonkatsu Restaurant 	12
Bar 	11
Yoshoku Restaurant 	11
Clothing Store 	11
ATM 	10
Bus Stop 	10
Drugstore 	10
Discount Store 	10
Udon Restaurant 	10
Park 	10
Noodle House 	9
Platform 	9
Seafood Restaurant 	9
Indian Restaurant 	8
Beer Bar 	8
Pharmacy 	8
French Restaurant 	8
Arcade 	7
Tempura Restaurant 	7
Fried Chicken Joint 	7
Japanese Curry Restaurant 	7
Teishoku Restaurant 	7
Sandwich Place 	7
Tea Room 	6
Restaurant 	6
Grocery Store 	6
Hotel 	6
Thai Restaurant 	6
Takoyaki Place 	6
Yakitori Restaurant 	6
Dumpling Restaurant 	6
Burger Joint 	5
Japanese Family Restaurant 	5
Bistro 	5
Korean Restaurant 	5
Plaza 	4
Cosmetics Shop 	4
Outdoor Sculpture 	4
Metro Station 	4
Wagashi Place 	4
American Restaurant 	4
Theater 	4
Deli / Bodega 	4
Unagi Restaurant 	4
Furniture / Home Store 	4
Playground 	3
Spa 	3
Bed & Breakfast 	3
Shabu-Shabu Restaurant 	3
Historic Site 	3
Juice Bar 	3
Art Gallery 	3
Electronics Store 	3
Wine Bar 	3
Flower Shop 	3
Pub 	3
Bubble Tea Shop 	3
Gift Shop 	3
Szechuan Restaurant 	3
Donut Shop 	3
Pastry Shop 	2
Hobby Shop 	2
Athletics & Sports 	2
Dim Sum Restaurant 	2
Hookah Bar 	2
Pizza Place 	2
Movie Theater 	2
Cocktail Bar 	2
Museum 	2
Brazilian Restaurant 	2
Sporting Goods Shop 	2
Kushikatsu Restaurant 	2
Taxi Stand 	2
Gym 	2
Gym / Fitness Center 	2
Rock Club 	2
Intersection 	2
Singaporean Restaurant 	2
Vietnamese Restaurant 	2
Boutique 	2
History Museum 	2
Record Shop 	2
Fishing Spot 	2
Tourist Information Center 	2
Stationery Store 	2
Comedy Club 	2
Soup Place 	2
Food & Drink Shop 	1
Roof Deck 	1
Performing Arts Venue 	1
Pet Café 	1
Cantonese Restaurant 	1
Wine Shop 	1
Bath House 	1
Scenic Lookout 	1
Sports Bar 	1
Light Rail Station 	1
Track 	1
Department Store 	1
Toy / Game Store 	1
Gastropub 	1
Art Museum 	1
Falafel Restaurant 	1
Asian Restaurant 	1
Sri Lankan Restaurant 	1
Creperie 	1
Luggage Store 	1
Used Bookstore 	1
Nabe Restaurant 	1
Nightclub 	1
Hotel Bar 	1
Pool 	1
Concert Hall 	1
Arts & Crafts Store 	1
North Indian Restaurant 	1
Bagel Shop 	1
Garden 	1
Kebab Restaurant 	1
Farmers Market 	1
Hong Kong Restaurant 	1
Tapas Restaurant 	1
Indie Theater 	1
Okonomiyaki Restaurant 	1
Chocolate Shop 	1
Comic Shop 	1
Multiplex 	1
Liquor Store 	1
Public Art 	1
Candy Store 	1
Event Space 	1
Jazz Club 	1
Gelato Shop 	1
German Restaurant 	1
Pachinko Parlor 	1
South Indian Restaurant 	1
Mexican Restaurant 	1
Vegetarian / Vegan Restaurant 	1
Outdoors & Recreation 	1
Brewery 	1
Irish Pub 	1
Public Bathroom 	1
Street Art 	1
Music Store 	1
Cycle Studio 	1
Gourmet Shop 	1
Mobile Phone Shop 	1
Food Truck 	1
Acai House 	1
Australian Restaurant 	1
Diner 	1
Soccer Field 	1

In [None]:
#tokyo_word_string
tokyo_word_string = word_string(tokyo_venues_category_counts)
# create the word cloud
tokyo_wordcloud = WordCloud(background_color='white').generate(tokyo_word_string)
# display the cloud
fig = plt.figure()
fig.set_figwidth(14)
fig.set_figheight(18)

plt.imshow(tokyo_wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
fig.savefig("tokyo_img.png")



In [None]:


Top 10 neighborhoods in Tokyo:

    Convenience Store 86
    Coffee Shop 76
    Sake Bar 63
    Café 51
    Ramen Restaurant 49
    Japanese Restaurant 41
    Chinese Restaurant 27
    Italian Restaurant 27
    Soba Restaurant 24
    Donburi Restaurant 22



# Now let's combine all the data into a Dataframe


In [None]:
#combine venues 
allvenues =  pd.concat([ny_venues,toronto_venues,tokyo_venues])
allvenues = allvenues.rename(columns={'Neighborhood': 'NeighborhoodName'})
allvenues.shape

In [None]:


(2056, 5)



In [None]:
allvenues.head()

 	City 	Borough 	NeighborhoodName 	VenueName 	VenueCategory
0 	New York 	Bronx 	Fieldston 	Ecco Salon 	Cosmetics Shop
1 	New York 	Bronx 	Kingsbridge 	Garden Gourmet Market 	Gourmet Shop
2 	New York 	Bronx 	Kingsbridge 	MyUnique 	Thrift / Vintage Store
3 	New York 	Bronx 	Kingsbridge 	Stop & Shop 	Supermarket
4 	New York 	Bronx 	Kingsbridge 	Mattress Firm 	Mattress Store

# Group the venues depends on their categories, Calculate the total venues of category


In [None]:


# get dummies
allvenues_onehot = pd.get_dummies(allvenues, columns = ['VenueCategory'], prefix="", prefix_sep="")
allvenues_onehot = allvenues_onehot.drop('VenueName',axis = 1)
#allvenues_onehot.columns.values



In [None]:
##group the venues based on Neighborhoods
allvenues_grouped = allvenues_onehot.groupby('NeighborhoodName',axis = 0).sum().reset_index()
allvenues_grouped = allneighborhoods.join(allvenues_grouped.set_index('NeighborhoodName'), on='NeighborhoodName')

#drop the rows with NaN (no venues information)
allvenues_grouped.dropna(inplace=True)
allvenues_grouped.head()

In [None]:
 	City 	Borough 	NeighborhoodName 	Latitude 	Longitude 	ATM 	Acai House 	Accessories Store 	Afghan Restaurant 	American Restaurant 	Arcade 	Argentinian Restaurant 	Art Gallery 	Art Museum 	Arts & Crafts Store 	Asian Restaurant 	Athletics & Sports 	Australian Restaurant 	BBQ Joint 	Bagel Shop 	Bakery 	Bank 	Bar 	Baseball Field 	Basketball Court 	Bath House 	Beach 	Bed & Breakfast 	Beer Bar 	Beer Garden 	Beer Store 	Big Box Store 	Bike Rental / Bike Share 	Bike Shop 	Bistro 	Board Shop 	Boat or Ferry 	Bookstore 	Boutique 	Boxing Gym 	Brazilian Restaurant 	Breakfast Spot 	Brewery 	Bridal Shop 	Bubble Tea Shop 	Building 	Burger Joint 	Burrito Place 	Bus Line 	Bus Station 	Bus Stop 	Café 	Candy Store 	Cantonese Restaurant 	Caribbean Restaurant 	Caucasian Restaurant 	Check Cashing Service 	Cheese Shop 	Chinese Restaurant 	Chocolate Shop 	Climbing Gym 	Clothing Store 	Cocktail Bar 	Coffee Shop 	College Gym 	Comedy Club 	Comic Shop 	Community Center 	Concert Hall 	Construction & Landscaping 	Convenience Store 	Cooking School 	Cosmetics Shop 	Creperie 	Cupcake Shop 	Cycle Studio 	Dance Studio 	Deli / Bodega 	Department Store 	Dessert Shop 	Dim Sum Restaurant 	Diner 	Discount Store 	Doctor's Office 	Dog Run 	Donburi Restaurant 	Donut Shop 	Drugstore 	Dumpling Restaurant 	Electronics Store 	Event Space 	Falafel Restaurant 	Farm 	Farmers Market 	Fast Food Restaurant 	Filipino Restaurant 	Fish & Chips Shop 	Fishing Spot 	Flower Shop 	Food 	Food & Drink Shop 	Food Court 	Food Truck 	French Restaurant 	Fried Chicken Joint 	Frozen Yogurt Shop 	Fruit & Vegetable Store 	Furniture / Home Store 	Garden 	Gas Station 	Gastropub 	Gay Bar 	Gelato Shop 	German Restaurant 	Gift Shop 	Gluten-free Restaurant 	Gourmet Shop 	Greek Restaurant 	Grocery Store 	Gym 	Gym / Fitness Center 	Hawaiian Restaurant 	High School 	Himalayan Restaurant 	Historic Site 	History Museum 	Hobby Shop 	Home Service 	Hong Kong Restaurant 	Hookah Bar 	Hot Dog Joint 	Hotel 	Hotel Bar 	Hotpot Restaurant 	Housing Development 	Ice Cream Shop 	Indian Restaurant 	Indie Theater 	Indoor Play Area 	Intersection 	Irish Pub 	Italian Restaurant 	Japanese Curry Restaurant 	Japanese Family Restaurant 	Japanese Restaurant 	Jazz Club 	Jewelry Store 	Juice Bar 	Kebab Restaurant 	Kids Store 	Korean Restaurant 	Kushikatsu Restaurant 	Latin American Restaurant 	Laundromat 	Laundry Service 	Library 	Light Rail Station 	Lingerie Store 	Liquor Store 	Lounge 	Luggage Store 	Malay Restaurant 	Massage Studio 	Mattress Store 	Mediterranean Restaurant 	Metro Station 	Mexican Restaurant 	Middle Eastern Restaurant 	Miscellaneous Shop 	Mobile Phone Shop 	Monument / Landmark 	Movie Theater 	Moving Target 	Multiplex 	Museum 	Music Store 	Music Venue 	Nabe Restaurant 	Nail Salon 	Neighborhood 	New American Restaurant 	Nightclub 	Noodle House 	North Indian Restaurant 	Okonomiyaki Restaurant 	Opera House 	Optical Shop 	Other Great Outdoors 	Outdoor Sculpture 	Outdoors & Recreation 	Pachinko Parlor 	Pakistani Restaurant 	Paper / Office Supplies Store 	Park 	Pastry Shop 	Performing Arts Venue 	Pet Café 	Pet Store 	Pharmacy 	Pier 	Piercing Parlor 	Pilates Studio 	Pizza Place 	Platform 	Playground 	Plaza 	Poke Place 	Polish Restaurant 	Pool 	Pool Hall 	Pub 	Public Art 	Public Bathroom 	Puerto Rican Restaurant 	Racetrack 	Ramen Restaurant 	Real Estate Office 	Record Shop 	Residential Building (Apartment / Condo) 	Restaurant 	Rock Climbing Spot 	Rock Club 	Roof Deck 	Sake Bar 	Salad Place 	Salon / Barbershop 	Sandwich Place 	Scenic Lookout 	School 	Seafood Restaurant 	Shabu-Shabu Restaurant 	Shanghai Restaurant 	Shipping Store 	Shoe Store 	Shop & Service 	Shopping Mall 	Singaporean Restaurant 	Smoothie Shop 	Soba Restaurant 	Soccer Field 	Soup Place 	South American Restaurant 	South Indian Restaurant 	Southern / Soul Food Restaurant 	Spa 	Spanish Restaurant 	Speakeasy 	Sporting Goods Shop 	Sports Bar 	Sports Club 	Sri Lankan Restaurant 	Stationery Store 	Steakhouse 	Street Art 	Supermarket 	Sushi Restaurant 	Szechuan Restaurant 	Taco Place 	Tailor Shop 	Takoyaki Place 	Tapas Restaurant 	Tattoo Parlor 	Taxi Stand 	Tea Room 	Teishoku Restaurant 	Temple 	Tempura Restaurant 	Tennis Court 	Tex-Mex Restaurant 	Thai Restaurant 	Theater 	Thrift / Vintage Store 	Tonkatsu Restaurant 	Tourist Information Center 	Toy / Game Store 	Track 	Trail 	Train Station 	Udon Restaurant 	Unagi Restaurant 	Used Bookstore 	Varenyky restaurant 	Vegetarian / Vegan Restaurant 	Video Store 	Vietnamese Restaurant 	Wagashi Place 	Wine Bar 	Wine Shop 	Women's Store 	Yakitori Restaurant 	Yoga Studio 	Yoshoku Restaurant
3 	New York 	Bronx 	Fieldston 	40.895437 	-73.905643 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0
5 	New York 	Bronx 	Kingsbridge 	40.881687 	-73.902818 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0
7 	New York 	Bronx 	Woodlawn 	40.898273 	-73.867315 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	2.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	3.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0
8 	New York 	Bronx 	Norwood 	40.877224 	-73.879391 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0 	0.0
12 	New York 	Bronx 	City Island 	40.847247 	-73.786488 	0.0 	0.0 	0.0 	0.0 	1.0 	0.0 	0.0 	0.0 	0.0

In [None]:
# create a new dataframe with most common venue catrgories
def return_most_common_venues(row, num_top_venues):
    row_categories = row
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

columns = ['City','Borough','NeighborhoodName','Latitude','Longitude','Total Number of Venues']
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues

for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
allvenues_sorted = pd.DataFrame(columns = columns)
#allvenues_sorted = allvenues_grouped[['City','Borough','NeighborhoodName']]

for ind in range(allvenues_grouped.shape[0]):
    allvenues_sorted.loc[ind, 'City'] = allvenues_grouped.iloc[ind].City
    allvenues_sorted.loc[ind, 'Borough'] = allvenues_grouped.iloc[ind].Borough
    allvenues_sorted.loc[ind, 'NeighborhoodName'] = allvenues_grouped.iloc[ind].NeighborhoodName
    allvenues_sorted.loc[ind, 'Latitude'] = allvenues_grouped.iloc[ind].Latitude
    allvenues_sorted.loc[ind, 'Longitude'] = allvenues_grouped.iloc[ind].Longitude
    allvenues_sorted.loc[ind, 'Total Number of Venues'] = allvenues_grouped.iloc[ind,5:].sum()
    allvenues_sorted.iloc[ind, 6:] = return_most_common_venues(allvenues_grouped.iloc[ind, 5:], num_top_venues)

In [None]:
 	City 	Borough 	NeighborhoodName 	Latitude 	Longitude 	Total Number of Venues 	1st Most Common Venue 	2nd Most Common Venue 	3rd Most Common Venue 	4th Most Common Venue 	5th Most Common Venue 	6th Most Common Venue 	7th Most Common Venue 	8th Most Common Venue 	9th Most Common Venue 	10th Most Common Venue
0 	New York 	Bronx 	Fieldston 	40.895437 	-73.905643 	1.0 	Cosmetics Shop 	ATM 	Pastry Shop 	Piercing Parlor 	Pier 	Pharmacy 	Pet Store 	Pet Café 	Performing Arts Venue 	Park
1 	New York 	Bronx 	Kingsbridge 	40.881687 	-73.902818 	4.0 	Supermarket 	Thrift / Vintage Store 	Gourmet Shop 	Mattress Store 	Pastry Shop 	Pier 	Pharmacy 	Pet Store 	Pet Café 	Performing Arts Venue
2 	New York 	Bronx 	Woodlawn 	40.898273 	-73.867315 	9.0 	Pub 	Deli / Bodega 	Bar 	Grocery Store 	Indian Restaurant 	Pizza Place 	Piercing Parlor 	Pier 	Pharmacy 	Pet Store
3 	New York 	Bronx 	Norwood 	40.877224 	-73.879391 	1.0 	Mexican Restaurant 	Pizza Place 	Pilates Studio 	Piercing Parlor 	Pier 	Pharmacy 	Pet Store 	Pet Café 	Performing Arts Venue 	ATM
4 	New York 	Bronx 	City Island 	40.847247 	-73.786488 	9.0 	Diner 	Bar 	Grocery Store 	Frozen Yogurt Shop 	French Restaurant 	Pizza Place 	Ice Cream Shop 	Deli / Bodega 	American Restaurant 	Real Estate Office

In [None]:
allvenues_grouped.shape

# Use hierarchical agglomerative clustering method to compare neighborhoods among cities.
First, find the number of clusters. Let's use scipy library to create the dendrograms for our dataset.


In [None]:
import scipy.cluster.hierarchy as shc

data = allvenues_grouped.iloc[:,6:]
plt.figure(figsize=(10, 7))   
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Neighborhoods')
plt.ylabel('Distance')
plt.axhline(y=31, c='k')
dend = shc.dendrogram(shc.linkage(data, method='ward'))

In [None]:

According to the above graph, I decide to separate our neighborhoods into 9 clusters. I will use the hierarchical agglomerative clustering of the sklearn.
Import AgglomerativeClustering
