# Mácio Matheus Arruda

--------------------------------------------
## The Brazilian tourism agency plans to organize their first trip to Toronto in Canada, and they plan to start with the city of North York.

#### To create the best itinerary for your clients, the travel agency is looking for which districts near North York
#### visit, for that, he had commissioned an analysis of characteristics and the most relevant places in these neighborhoods.

#### Mainly, the tourism agency focuses on hotels, restaurants, parks, shops, places, squares, etc.

#### Question: So, what are the characteristics of the neighborhoods neighboring North York and what places should the Brazilian tourism agency  visit to provide the best tour to its clients?


In [1]:
!pip -q install geopy

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from bs4 import BeautifulSoup as bs
import requests
from geopy.geocoders import Nominatim 
import folium

### Load the pandas dataframe with Toronto data

In [3]:
df_toronto = pd.read_csv('toronto_data.csv')
df_toronto.tail(50)

Unnamed: 0.1,Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
53,53,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
54,54,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
55,55,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
56,56,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
57,57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
58,58,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
59,59,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752
60,60,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576
61,61,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817
62,62,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975


### Create a simple map of Toronto City

In [61]:
# for the city Toronto, latitude and longtitude are manually extracted via google search
toronto_latitude = 43.6532; toronto_longitude = -79.3832
map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_opacity=0.4).add_to(map_toronto)  
    

map_toronto

#### Below, a printscreen containing the plotted map (if there is a problem in the previous cell)

![Folium map screenshot](https://raw.githubusercontent.com/macio-matheus/Coursera_Capstone/master/week4/screenshot_folium_map_toronto.png)

### Create a new data frame with neighborhoods in North York 

In [62]:
CLIENT_ID = 'UOQZ3Z5EOT1H1QXV0X14VID1JIYYU1I0SIPFFTM1IYJQXFTU' # your Foursquare ID
CLIENT_SECRET = 'VBAWSMC0XESLE1FFC5H3T4L1JDSWSRIGSWNDI3Z00YGODI1O' # your Foursquare Secret
VERSION = '20180604'

In [63]:
nyork_data = df_toronto[df_toronto['Borough'] == 'North York'].reset_index(drop=True)
nyork_data.drop(['Unnamed: 0'], axis=1, inplace=True)
nyork_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M2H,North York,Hillcrest Village,43.803762,-79.363452
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
2,M2K,North York,Bayview Village,43.786947,-79.385975
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493
5,M2N,North York,Willowdale South,43.77012,-79.408493
6,M2P,North York,York Mills West,43.752758,-79.400049
7,M2R,North York,Willowdale West,43.782736,-79.442259
8,M3A,North York,Parkwoods,43.753259,-79.329656
9,M3B,North York,Don Mills North,43.745906,-79.352188


### Create a map of North York and its neighbourhoods

In [64]:
address_nyork = 'North York,Toronto'
latitude_nyork = 43.773077
longitude_nyork = -79.257774
print('The geograpical coordinate of North York are {}, {}.'.format(latitude_nyork, longitude_nyork))

The geograpical coordinate of North York are 43.773077, -79.257774.


In [67]:
map_nyork = folium.Map(location=[latitude_nyork, longitude_nyork], zoom_start=11)

# add markers to map
for lat, lng, label in zip(nyork_data['Latitude'], nyork_data['Longitude'], nyork_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='blue',
        fill=True,
        fill_opacity=0.5).add_to(map_nyork)  
    
map_nyork

#### Below, a printscreen containing the plotted map (if there is a problem in the previous cell)

![Folium map screenshot](https://raw.githubusercontent.com/macio-matheus/Coursera_Capstone/master/week4/screenshot_folium_map_northyork.png)

### Get the top 100 venues in the neighborhood 'Hillcrest Village', from North York

In [9]:
neighborhood_latitude = nyork_data.loc[0, 'Latitude'] # neighbourhood latitude value
neighborhood_longitude = nyork_data.loc[0, 'Longitude'] # neighbourhood longitude value

neighborhood_name = nyork_data.loc[0, 'Neighborhood'] # neighbourhood name

print('Latitude and longitude values of "{}" are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of "Hillcrest Village" are 43.8037622, -79.3634517.


In [10]:
LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_nyork, longitude_nyork, VERSION, radius, LIMIT)

In [11]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c40cc22db04f57d21412223'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Scarborough City Centre',
  'headerFullLocation': 'Scarborough City Centre, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 62,
  'suggestedBounds': {'ne': {'lat': 43.78207700900001,
    'lng': -79.24533335909429},
   'sw': {'lat': 43.76407699099999, 'lng': -79.2702146409057}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5085ec39e4b0b1ead2eb0818',
       'name': 'Disney Store',
       'location': {'address': '300 Borough Drive',
        'crossStreet': 'in Scarborough Town Centre',
        'lat': 43.775537,
        'lng': -79.256833,
       

In [12]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    return categories_list[0]['name']

In [13]:
import json
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']  
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Disney Store,Toy / Game Store,43.775537,-79.256833
1,Canyon Creek Chophouse,Steakhouse,43.776959,-79.261694
2,DAVIDsTEA,Tea Room,43.776613,-79.258516
3,Tommy Hilfiger Company Store,Clothing Store,43.776015,-79.257369
4,American Eagle Outfitters,Clothing Store,43.775908,-79.258352
5,Chipotle Mexican Grill,Mexican Restaurant,43.77641,-79.258069
6,SEPHORA,Cosmetics Shop,43.775592,-79.258242
7,Coliseum Scarborough Cinemas,Movie Theater,43.775995,-79.255649
8,Shoppers Drug Mart,Pharmacy,43.772747,-79.251123
9,CANBE Foods Inc,Indian Restaurant,43.773546,-79.246082


In [14]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

62 venues were returned by Foursquare.


In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Get venues for each neighborhood in North York

In [38]:
nyork_venues = getNearbyVenues(names=nyork_data['Neighborhood'], latitudes=nyork_data['Latitude'], longitudes=nyork_data['Longitude'])

Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Bedford Park, Lawrence Manor East
Lawrence Heights, Lawrence Manor
Glencairn
Maple Leaf Park, North Park, Upwood Park


In [39]:
nyork_venues.tail(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
349,Glencairn,43.709577,-79.445073,Mr. Sub,43.708975,-79.453588,Sandwich Place
350,Glencairn,43.709577,-79.445073,Cinque Lire,43.708698,-79.453576,Pizza Place
351,Glencairn,43.709577,-79.445073,City Fish Market,43.709105,-79.45366,Fish Market
352,Glencairn,43.709577,-79.445073,Enterprise Rent-A-Car,43.70644,-79.45256,Rental Car Location
353,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Rustic Bakery,43.715414,-79.4903,Bakery
354,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Maple leaf park,43.716188,-79.493531,Park
355,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Mika's Trim,43.714068,-79.496113,Construction & Landscaping
356,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Inch by Inch EnviroComm Consultants,43.718373,-79.48779,Home Service
357,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Shan Webtech,43.715953,-79.496931,Business Service
358,"Maple Leaf Park, North Park, Upwood Park",43.713756,-79.490074,Toronto - Clearview Home & Property Inspections,43.719071,-79.486964,Home Service


In [40]:
nyork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Downsview North, Wilson Heights",19,19,19,19,19,19
Bayview Village,9,9,9,9,9,9
"Bedford Park, Lawrence Manor East",31,31,31,31,31,31
"CFB Toronto, Downsview East",4,4,4,4,4,4
Don Mills North,7,7,7,7,7,7
Downsview Central,5,5,5,5,5,5
Downsview Northwest,12,12,12,12,12,12
Downsview West,8,8,8,8,8,8
"Fairview, Henry Farm, Oriole",68,68,68,68,68,68
"Flemingdon Park, Don Mills South",25,25,25,25,25,25


In [41]:
print('There are {} uniques categories.'.format(len(nyork_venues['Venue Category'].unique())))

There are 129 uniques categories.


In [42]:
# one hot encoding
nyork_onehot = pd.get_dummies(nyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyork_onehot['Neighborhood'] = nyork_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyork_onehot.columns[-1]] + list(nyork_onehot.columns[:-1])
nyork_onehot = nyork_onehot[fixed_columns]

nyork_onehot.head(20)

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,...,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,Hillcrest Village,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Hillcrest Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Hillcrest Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Hillcrest Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
5,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
7,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Fairview, Henry Farm, Oriole",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
nyork_onehot.shape

(359, 130)

In [44]:
nyork_grouped = nyork_onehot.groupby('Neighborhood').mean().reset_index()
nyork_grouped.head(10)

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,...,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,...,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.032258,0.0,...,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0
3,"CFB Toronto, Downsview East",0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0
6,Downsview Northwest,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0
7,Downsview West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
8,"Fairview, Henry Farm, Oriole",0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.029412,0.014706,...,0.0,0.0,0.014706,0.029412,0.0,0.014706,0.0,0.0,0.014706,0.014706
9,"Flemingdon Park, Don Mills South",0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Get top 10 venues per neighborhood

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [46]:
num_top_venues = 50

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = nyork_grouped['Neighborhood']

for ind in np.arange(nyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Bridal Shop,Bank,Sandwich Place,Shopping Mall,Deli / Bodega,Fast Food Restaurant,Restaurant,Sushi Restaurant,...,Clothing Store,Chinese Restaurant,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar
1,Bayview Village,Bank,Chinese Restaurant,Café,Skating Rink,Restaurant,Japanese Restaurant,Skate Park,Grocery Store,Deli / Bodega,...,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
2,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Juice Bar,Sushi Restaurant,Fast Food Restaurant,Liquor Store,Pub,Pizza Place,Restaurant,...,Fried Chicken Joint,Frozen Yogurt Shop,Dance Studio,Caribbean Restaurant,Construction & Landscaping,Community Center,Airport,Arts & Crafts Store,Asian Restaurant,Athletics & Sports
3,"CFB Toronto, Downsview East",Coffee Shop,Airport,Park,Sandwich Place,Dessert Shop,Electronics Store,Discount Store,Diner,Dim Sum Restaurant,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique
4,Don Mills North,Japanese Restaurant,Gym / Fitness Center,Pool,Caribbean Restaurant,Café,Paper / Office Supplies Store,General Entertainment,Furniture / Home Store,Construction & Landscaping,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
5,Downsview Central,Market,Vietnamese Restaurant,Outdoor Supply Store,Korean Restaurant,Baseball Field,Dim Sum Restaurant,Event Space,Electronics Store,Discount Store,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Store,Bike Shop,Boutique
6,Downsview Northwest,Grocery Store,Liquor Store,Athletics & Sports,Sandwich Place,Discount Store,Fast Food Restaurant,Fried Chicken Joint,Pizza Place,Gym / Fitness Center,...,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
7,Downsview West,Coffee Shop,Vietnamese Restaurant,Park,Gym / Fitness Center,Pizza Place,Moving Target,Shopping Mall,Bank,Department Store,...,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast
8,"Fairview, Henry Farm, Oriole",Clothing Store,Fast Food Restaurant,Coffee Shop,Baseball Field,Bakery,Japanese Restaurant,Kids Store,Toy / Game Store,Electronics Store,...,Supplement Shop,Boutique,Asian Restaurant,Shopping Mall,Arts & Crafts Store,Dessert Shop,Deli / Bodega,Bubble Tea Shop,Dance Studio,Diner
9,"Flemingdon Park, Don Mills South",Gym,Japanese Restaurant,Asian Restaurant,Beer Store,Coffee Shop,Chinese Restaurant,Office,Bus Stop,Clothing Store,...,Gift Shop,Greek Restaurant,Bagel Shop,Bakery,Department Store,Candy Store,Burger Joint,Burrito Place,Bowling Alley,Bus Line


### Run k-means to cluster the neighborhoods into 3 clusters

In [55]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

nyork_data = nyork_data[0:21]
#nyork_data = nyork_data.drop(16)
# set number of clusters
kclusters = 5
nyork_grouped_clustering = nyork_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30] 

array([3, 3, 3, 1, 3, 4, 3, 1, 3, 3, 3, 2, 3, 0, 1, 3, 1, 3, 3, 3, 1], dtype=int32)

In [56]:
kmeans.labels_.shape

(21,)

### Include kmeans.labels_ into the original North York dataframe

In [57]:
nyork_merged = nyork_data

# add clustering labels
nyork_merged['Cluster Labels'] = kmeans.labels_
nyork_merged = nyork_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
nyork_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,3,Bakery,Chinese Restaurant,Housing Development,Diner,...,Athletics & Sports,Bagel Shop,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique,Bridal Shop
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,3,Clothing Store,Fast Food Restaurant,Coffee Shop,Baseball Field,...,Supplement Shop,Boutique,Asian Restaurant,Shopping Mall,Arts & Crafts Store,Dessert Shop,Deli / Bodega,Bubble Tea Shop,Dance Studio,Diner
2,M2K,North York,Bayview Village,43.786947,-79.385975,3,Bank,Chinese Restaurant,Café,Skating Rink,...,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,1,,,,,...,,,,,,,,,,
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,3,Park,Trail,Music Venue,Coffee Shop,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique
5,M2N,North York,Willowdale South,43.77012,-79.408493,4,Coffee Shop,Pizza Place,Japanese Restaurant,Ramen Restaurant,...,Fried Chicken Joint,Arts & Crafts Store,Dance Studio,Cosmetics Shop,Falafel Restaurant,Farmers Market,Event Space,Deli / Bodega,Construction & Landscaping,Department Store
6,M2P,North York,York Mills West,43.752758,-79.400049,3,Gym,Intersection,Pet Store,Bank,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique
7,M2R,North York,Willowdale West,43.782736,-79.442259,1,Pizza Place,Pharmacy,Butcher,Coffee Shop,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Bowling Alley
8,M3A,North York,Parkwoods,43.753259,-79.329656,3,Park,Food & Drink Shop,Burger Joint,Pet Store,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Store,Bike Shop,Boutique,Bridal Shop
9,M3B,North York,Don Mills North,43.745906,-79.352188,3,Japanese Restaurant,Gym / Fitness Center,Pool,Caribbean Restaurant,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop


### Visualize the clusters in the map

In [58]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location = [latitude_nyork, longitude_nyork], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nyork_merged['Latitude'], nyork_merged['Longitude'], nyork_merged['Neighborhood'], nyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1.0).add_to(map_clusters)
       
map_clusters

#### Below, a printscreen containing the plotted map (if there is a problem in the previous cell)

![Folium map screenshot](https://raw.githubusercontent.com/macio-matheus/Coursera_Capstone/master/week4/screenshot_folium_map_clusteres.png)

### Examine each of the five clusters

In [28]:
nyork_merged.loc[nyork_merged['Cluster Labels'] == 0, nyork_merged.columns[[1] + list(range(5, nyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Golf Course,Pool,Mediterranean Restaurant,Dog Run,Clothing Store,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store
1,North York,0,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Toy / Game Store,Asian Restaurant,Kids Store,Bakery,Food Court,Tea Room
2,North York,0,Chinese Restaurant,Café,Bank,Japanese Restaurant,Empanada Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant
3,North York,0,,,,,,,,,,
4,North York,0,,,,,,,,,,
5,North York,0,Ramen Restaurant,Coffee Shop,Restaurant,Pizza Place,Sandwich Place,Japanese Restaurant,Café,Fast Food Restaurant,Indonesian Restaurant,Hotel
6,North York,0,Park,Electronics Store,Bank,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant
7,North York,0,Grocery Store,Butcher,Pharmacy,Pizza Place,Coffee Shop,Frozen Yogurt Shop,Fried Chicken Joint,Gift Shop,General Entertainment,Comfort Food Restaurant
9,North York,0,Caribbean Restaurant,Gym / Fitness Center,Café,Pool,Japanese Restaurant,Women's Store,Dog Run,Construction & Landscaping,Cosmetics Shop,Deli / Bodega
10,North York,0,Gym,Asian Restaurant,Coffee Shop,Beer Store,Grocery Store,Bike Shop,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant,Dim Sum Restaurant


In [29]:
nyork_merged.loc[nyork_merged['Cluster Labels'] == 1, nyork_merged.columns[[1] + list(range(5, nyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,North York,1,Park,Food & Drink Shop,Fast Food Restaurant,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [30]:
nyork_merged.loc[nyork_merged['Cluster Labels'] == 2, nyork_merged.columns[[1] + list(range(5, nyork_merged.shape[1]))]]    

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,North York,2,Park,Airport,Playground,Bus Stop,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega


In [59]:
nyork_merged.loc[nyork_merged['Cluster Labels'] == 3, nyork_merged.columns[[1] + list(range(5, nyork_merged.shape[1]))]]    

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
0,North York,3,Bakery,Chinese Restaurant,Housing Development,Diner,Dim Sum Restaurant,Falafel Restaurant,Event Space,Electronics Store,...,Athletics & Sports,Bagel Shop,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique,Bridal Shop
1,North York,3,Clothing Store,Fast Food Restaurant,Coffee Shop,Baseball Field,Bakery,Japanese Restaurant,Kids Store,Toy / Game Store,...,Supplement Shop,Boutique,Asian Restaurant,Shopping Mall,Arts & Crafts Store,Dessert Shop,Deli / Bodega,Bubble Tea Shop,Dance Studio,Diner
2,North York,3,Bank,Chinese Restaurant,Café,Skating Rink,Restaurant,Japanese Restaurant,Skate Park,Grocery Store,...,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
4,North York,3,Park,Trail,Music Venue,Coffee Shop,Greek Restaurant,Event Space,Community Center,Construction & Landscaping,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique
6,North York,3,Gym,Intersection,Pet Store,Bank,Tennis Court,Park,Furniture / Home Store,Frozen Yogurt Shop,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique
8,North York,3,Park,Food & Drink Shop,Burger Joint,Pet Store,Bed & Breakfast,Fast Food Restaurant,General Entertainment,Event Space,...,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Store,Bike Shop,Boutique,Bridal Shop
9,North York,3,Japanese Restaurant,Gym / Fitness Center,Pool,Caribbean Restaurant,Café,Paper / Office Supplies Store,General Entertainment,Furniture / Home Store,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop
10,North York,3,Gym,Japanese Restaurant,Asian Restaurant,Beer Store,Coffee Shop,Chinese Restaurant,Office,Bus Stop,...,Gift Shop,Greek Restaurant,Bagel Shop,Bakery,Department Store,Candy Store,Burger Joint,Burrito Place,Bowling Alley,Bus Line
12,North York,3,Coffee Shop,Bank,Pizza Place,Miscellaneous Shop,Massage Studio,Japanese Restaurant,Bar,Fast Food Restaurant,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Baseball Field,Bed & Breakfast,Beer Store,Bike Shop,Boutique,Bowling Alley
15,North York,3,Market,Vietnamese Restaurant,Outdoor Supply Store,Korean Restaurant,Baseball Field,Dim Sum Restaurant,Event Space,Electronics Store,...,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Store,Bike Shop,Boutique


In [60]:
nyork_merged.loc[nyork_merged['Cluster Labels'] == 4, nyork_merged.columns[[1] + list(range(5, nyork_merged.shape[1]))]]    

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,41th Most Common Venue,42th Most Common Venue,43th Most Common Venue,44th Most Common Venue,45th Most Common Venue,46th Most Common Venue,47th Most Common Venue,48th Most Common Venue,49th Most Common Venue,50th Most Common Venue
5,North York,4,Coffee Shop,Pizza Place,Japanese Restaurant,Ramen Restaurant,Fast Food Restaurant,Sandwich Place,Gym,Café,...,Fried Chicken Joint,Arts & Crafts Store,Dance Studio,Cosmetics Shop,Falafel Restaurant,Farmers Market,Event Space,Deli / Bodega,Construction & Landscaping,Department Store
