# Introduction/Business Problem

The problem I would like to analyze arises from a personal need: where should I choose to live?
It is the typical problem that a person who has to be relocated for job has to face: in which part of the city should I choose to live? In the next month I have to move for job in a different city from the one I live now and I will deal with the choice of where to take home, in which area of the city. Of course one could see in a place not too far from the office, but this is not enough. I want to explore the neighborhood of Milan city because I want to find an area where I can practice my hobbies. In particular since I am a swimmer and I love to practice sport in general, I want an area full of swimming pools, jyms, fitness club. A different person who love arts and exibition could want an area full of teathers or cinemas for example. 
Other people could like instead going and chilling out so an area full of pubs and clubs would be perfect for them. 
So my problem would be in general: find the better place where to live depending of the hobbies and interests of a person.

# The Data

The data I want to use are the geografical coordinate of Milan city, retrieved from the geocoder or geolocator API. 
In particular I know that the postal codes of Milan city go from 20121 to 201612, so I will get the geografical coordinates of these postalcodes.
Next, I will explore with the Forsquare API the different venues of these areas paying particulat attention to the gym, swimming pools, fitness centers and at the end using K-means to cluster the data and choose the cluster that better fits to my hobbies. 

# Methodology

In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt # plotting library

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import wikipedia
import requests
#https://beautiful-soup-4.readthedocs.io/en/latest/
from bs4 import BeautifulSoup   #Python package for parsing HTML and XML documents
import time
print('Libraries imported.')

Libraries imported.


In [3]:
address =  'Milan Italy'
print(address)
geolocator = Nominatim()
location = geolocator.geocode(address)
latitudeMi = location.latitude
longitudeMi = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitudeMi, longitudeMi))

Milan Italy




The geograpical coordinate of Milan Italy are 45.4667971, 9.1904984.


In [4]:
#for i in range(20121, 20162+1):
 #   print('POSTAL CODE: ', i)

In [5]:
# initialize the variables
lat_lng_coords = None
latitude = []
longitude = []

In [6]:
for i in range(20121, 20162+1):
    address = ' Milan, Italy'
    geolocator = Nominatim()
    print('ADDRESS: ', i, '-', address)
    location = geolocator.geocode(str(i), address)
    latitudeMi = location.latitude
    longitudeMi = location.longitude
    latitude.append(latitudeMi)
    longitude.append(longitudeMi)
    print('The geograpical coordinate of {} are {}, {}.'.format(str(i) + address, latitudeMi, longitudeMi))



ADDRESS:  20121 -  Milan, Italy
The geograpical coordinate of 20121 Milan, Italy are 45.4721783683814, 9.18804382276746.
ADDRESS:  20122 -  Milan, Italy
The geograpical coordinate of 20122 Milan, Italy are 45.4618117954248, 9.19630988449062.
ADDRESS:  20123 -  Milan, Italy
The geograpical coordinate of 20123 Milan, Italy are 45.462639, 9.1885153.
ADDRESS:  20124 -  Milan, Italy
The geograpical coordinate of 20124 Milan, Italy are 45.4846035406421, 9.20081739536339.
ADDRESS:  20125 -  Milan, Italy
The geograpical coordinate of 20125 Milan, Italy are 45.4997708807025, 9.20491090230299.
ADDRESS:  20126 -  Milan, Italy
The geograpical coordinate of 20126 Milan, Italy are 45.5132902305084, 9.2176238336645.
ADDRESS:  20127 -  Milan, Italy
The geograpical coordinate of 20127 Milan, Italy are 45.4965625911373, 9.22041971183362.
ADDRESS:  20128 -  Milan, Italy
The geograpical coordinate of 20128 Milan, Italy are 45.5151286112891, 9.22456354838012.
ADDRESS:  20129 -  Milan, Italy
The geograpical

In [7]:
#create a vector with the postal codes
postalcode = np.arange(20121, 20162+1)
postalcode

array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20130, 20131, 20132, 20133, 20134, 20135, 20136, 20137, 20138,
       20139, 20140, 20141, 20142, 20143, 20144, 20145, 20146, 20147,
       20148, 20149, 20150, 20151, 20152, 20153, 20154, 20155, 20156,
       20157, 20158, 20159, 20160, 20161, 20162])

In [129]:
#create a dictionary with PostalCode, Latitude and Longitude
d = {'Postal Code': postalcode, 'Latitude': latitude, 'Longitude': longitude}
#d

In [130]:
#create the dataframe
milan_data = pd.DataFrame(d, columns=['Postal Code', 'Latitude', 'Longitude'])

In [131]:
milan_data.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.472178,9.188044
1,20122,45.461812,9.19631
2,20123,45.462639,9.188515
3,20124,45.484604,9.200817
4,20125,45.499771,9.204911
5,20126,45.51329,9.217624
6,20127,45.496563,9.22042
7,20128,45.515129,9.224564
8,20129,45.470966,9.213798
9,20130,43.244552,-1.990588


This is the address of the place where my office is located

In [12]:
myaddress = 'Via Castellanza 11, 20151 Milan'
geolocator = Nominatim()
location = geolocator.geocode(myaddress)
latitudeMy = location.latitude
longitudeMy = location.longitude
print('The geographical coordinate of {} are {}, {}.'.format(myaddress, latitudeMy, longitudeMy))



The geographical coordinate of Via Castellanza 11, 20151 Milan are 45.5009523, 9.109888.


Now I want to explore each area of each differnt postal code, examine the venues in order to find an area full of swimming pools and jyms where I can live

In [13]:
#Define Foursquare Credentials and Version
CLIENT_ID = '1RHG0VKNFIUBDXQGDKDQEZP1WJXPIYBKMGIQO1Z0RYGZRM1A' # your Foursquare ID
CLIENT_SECRET = 'VJ4E4WB2B1Z4AEGNAHJYQQCJGPERRA5RE3K15FLY02R0HFYQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1RHG0VKNFIUBDXQGDKDQEZP1WJXPIYBKMGIQO1Z0RYGZRM1A
CLIENT_SECRET:VJ4E4WB2B1Z4AEGNAHJYQQCJGPERRA5RE3K15FLY02R0HFYQ


Search all the venues for the Swimming Pool

In [14]:
#Let's get the top 100 venues that are in Milan
search_query = 'Swimming Pool'
radius = 2000
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitudeMy, longitudeMy, VERSION, search_query, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=1RHG0VKNFIUBDXQGDKDQEZP1WJXPIYBKMGIQO1Z0RYGZRM1A&client_secret=VJ4E4WB2B1Z4AEGNAHJYQQCJGPERRA5RE3K15FLY02R0HFYQ&ll=45.5009523,9.109888&v=20180605&query=Swimming Pool&radius=2000&limit=100


In [15]:
#Send the GET request and examine the results
resultsSwimming = requests.get(url).json()
resultsSwimming

{'meta': {'code': 200, 'requestId': '5bc215921ed219428acee9be'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4c308fa366e40f4734aac38b-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/pool_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d15e941735',
         'name': 'Pool',
         'pluralName': 'Pools',
         'primary': True,
         'shortName': 'Pool'}],
       'id': '4c308fa366e40f4734aac38b',
       'location': {'address': 'Via Adolfo Omodeo',
        'cc': 'IT',
        'city': 'Milano',
        'country': 'Italia',
        'crossStreet': 'Via Antonio Cechov',
        'distance': 1156,
        'formattedAddress': ['Via Adolfo Omodeo (Via Antonio Cechov)',
         '20151 Milano Lombardia',
         'Italia'],
        'labeledLatL

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
#clean the json and structure it into a pandas dataframe
venues = resultsSwimming['response']['groups'][0]['items']

In [18]:
len(venues)

4

In [19]:
#Flatten JSON into a dataframe
nearby_venues = json_normalize(venues)
nearby_venues.head(10)

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,...,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4c308fa366e40f4734aac38b-0,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",4c308fa366e40f4734aac38b,Via Adolfo Omodeo,IT,Milano,Italia,Via Antonio Cechov,...,"[Via Adolfo Omodeo (Via Antonio Cechov), 20151...","[{'label': 'display', 'lat': 45.49138567817066...",45.491386,9.115659,,20151.0,Lombardia,Piscina Comunale Lampugnano,0,[]
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4f0fd180e4b0d3f8a3c90de1-1,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",4f0fd180e4b0d3f8a3c90de1,Via Alcide de Gasperi 1,IT,Milano,Italia,,...,"[Via Alcide de Gasperi 1, Milano Lombardia, It...","[{'label': 'display', 'lat': 45.49636132751478...",45.496361,9.128447,Zona 8,,Lombardia,Virgin Swimming Pool,0,[]
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-51dbaf19498ea602283851a2-2,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",51dbaf19498ea602283851a2,Via Privata Polonia 10,IT,Milano,Italia,,...,"[Via Privata Polonia 10, Milano Lombardia, Ita...","[{'label': 'display', 'lat': 45.51361557896577...",45.513616,9.119248,,,Lombardia,The Hub's pool,0,[]
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-51699a22e4b04259cdd600bf-3,"[{'id': '4bf58dd8d48988d105941735', 'name': 'G...",51699a22e4b04259cdd600bf,,IT,,Italia,,...,[Italia],"[{'label': 'display', 'lat': 45.511416, 'lng':...",45.511416,9.124666,,,,Superspa by Angelo Caroli @ B4,0,[]


In [20]:
#Consider only the columns 'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng'
filtered_columns = ['venue.id', 'venue.name', 'venue.categories', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,venue.id,venue.name,venue.categories,venue.location.address,venue.location.lat,venue.location.lng
0,4c308fa366e40f4734aac38b,Piscina Comunale Lampugnano,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Adolfo Omodeo,45.491386,9.115659
1,4f0fd180e4b0d3f8a3c90de1,Virgin Swimming Pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Alcide de Gasperi 1,45.496361,9.128447
2,51dbaf19498ea602283851a2,The Hub's pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Privata Polonia 10,45.513616,9.119248
3,51699a22e4b04259cdd600bf,Superspa by Angelo Caroli @ B4,"[{'id': '4bf58dd8d48988d105941735', 'name': 'G...",,45.511416,9.124666


In [21]:
# clean columns (remove the . from the column name)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4c308fa366e40f4734aac38b,Piscina Comunale Lampugnano,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Adolfo Omodeo,45.491386,9.115659
1,4f0fd180e4b0d3f8a3c90de1,Virgin Swimming Pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Alcide de Gasperi 1,45.496361,9.128447
2,51dbaf19498ea602283851a2,The Hub's pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Privata Polonia 10,45.513616,9.119248
3,51699a22e4b04259cdd600bf,Superspa by Angelo Caroli @ B4,"[{'id': '4bf58dd8d48988d105941735', 'name': 'G...",,45.511416,9.124666


In [22]:
#And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


Search all the venues for the Gym

In [23]:
#Let's get the top 100 venues that are in Milan
search_query = 'Gym'
radius = 2000
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitudeMy, longitudeMy, VERSION, search_query, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=1RHG0VKNFIUBDXQGDKDQEZP1WJXPIYBKMGIQO1Z0RYGZRM1A&client_secret=VJ4E4WB2B1Z4AEGNAHJYQQCJGPERRA5RE3K15FLY02R0HFYQ&ll=45.5009523,9.109888&v=20180605&query=Gym&radius=2000&limit=100


In [24]:
#Send the GET request and examine the results
resultsGym = requests.get(url).json()
resultsGym

{'meta': {'code': 200, 'requestId': '5bc215dbdd57970753df4626'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4cac3b1744a8224bda8e3140-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/gym_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d175941735',
         'name': 'Gym / Fitness Center',
         'pluralName': 'Gyms or Fitness Centers',
         'primary': True,
         'shortName': 'Gym / Fitness'}],
       'id': '4cac3b1744a8224bda8e3140',
       'location': {'address': 'Viale Alcide De Gasperi, 2',
        'cc': 'IT',
        'city': 'Milano',
        'country': 'Italia',
        'distance': 1535,
        'formattedAddress': ['Viale Alcide De Gasperi, 2',
         '20151 Milano Lombardia',
         'Italia'],
        'labeledLatLngs': [{'lab

In [25]:
#clean the json and structure it into a pandas dataframe
venuesGym = resultsGym['response']['groups'][0]['items']

In [26]:
#Flatten JSON into a dataframe
nearby_venuesGym = json_normalize(venuesGym)
nearby_venuesGym.head(10)

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,...,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4cac3b1744a8224bda8e3140-0,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",4cac3b1744a8224bda8e3140,"Viale Alcide De Gasperi, 2",IT,Milano,Italia,,...,"[Viale Alcide De Gasperi, 2, 20151 Milano Lomb...","[{'label': 'display', 'lat': 45.49644343158925...",45.496443,9.128483,Zona 8,20151.0,Lombardia,Virgin Active,0,[]
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4d42a020b6e73704fbda8609-1,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",4d42a020b6e73704fbda8609,"Via Sapri, 64",IT,Milano,Italia,,...,"[Via Sapri, 64, 20156 Milano Lombardia, Italia]","[{'label': 'display', 'lat': 45.50386345370981...",45.503863,9.130028,,20156.0,Lombardia,Way Out,0,[]
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4defe40fd16486e86e4d90a3-2,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",4defe40fd16486e86e4d90a3,Via Lampugnano 80,IT,Milano,Italia,Via Federico Zardi,...,"[Via Lampugnano 80 (Via Federico Zardi), 20151...","[{'label': 'display', 'lat': 45.49088190864057...",45.490882,9.113795,,20151.0,Lombardia,Capelli e Sforza,0,[]
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-52ed2d6a498e23a6618ffdae-3,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",52ed2d6a498e23a6618ffdae,Via Gallarate 207,IT,Milano,Italia,,...,"[Via Gallarate 207, 20151 Milano Lombardia, It...","[{'label': 'display', 'lat': 45.49912796896843...",45.499128,9.12477,,20151.0,Lombardia,CrossFit San Siro,0,[]
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-51c3530c498e5911e791b1fd-4,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",51c3530c498e5911e791b1fd,,IT,,Italia,,...,[Italia],"[{'label': 'display', 'lat': 45.51138339107317...",45.511383,9.091387,,,,Rockspot,0,[]


In [27]:
nearby_venuesGym =nearby_venuesGym.loc[:, filtered_columns]
nearby_venuesGym.head()

Unnamed: 0,venue.id,venue.name,venue.categories,venue.location.address,venue.location.lat,venue.location.lng
0,4cac3b1744a8224bda8e3140,Virgin Active,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...","Viale Alcide De Gasperi, 2",45.496443,9.128483
1,4d42a020b6e73704fbda8609,Way Out,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...","Via Sapri, 64",45.503863,9.130028
2,4defe40fd16486e86e4d90a3,Capelli e Sforza,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Lampugnano 80,45.490882,9.113795
3,52ed2d6a498e23a6618ffdae,CrossFit San Siro,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Gallarate 207,45.499128,9.12477
4,51c3530c498e5911e791b1fd,Rockspot,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",,45.511383,9.091387


In [28]:
# clean columns names (remove the . from the column name)
nearby_venuesGym.columns = [col.split(".")[-1] for col in nearby_venuesGym.columns]
nearby_venuesGym.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4cac3b1744a8224bda8e3140,Virgin Active,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...","Viale Alcide De Gasperi, 2",45.496443,9.128483
1,4d42a020b6e73704fbda8609,Way Out,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...","Via Sapri, 64",45.503863,9.130028
2,4defe40fd16486e86e4d90a3,Capelli e Sforza,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Lampugnano 80,45.490882,9.113795
3,52ed2d6a498e23a6618ffdae,CrossFit San Siro,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Gallarate 207,45.499128,9.12477
4,51c3530c498e5911e791b1fd,Rockspot,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",,45.511383,9.091387


In [29]:
#And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venuesGym.shape[0]))

5 venues were returned by Foursquare.


In [30]:
nearby_venuesGym.head(10)

Unnamed: 0,id,name,categories,address,lat,lng
0,4cac3b1744a8224bda8e3140,Virgin Active,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...","Viale Alcide De Gasperi, 2",45.496443,9.128483
1,4d42a020b6e73704fbda8609,Way Out,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...","Via Sapri, 64",45.503863,9.130028
2,4defe40fd16486e86e4d90a3,Capelli e Sforza,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Lampugnano 80,45.490882,9.113795
3,52ed2d6a498e23a6618ffdae,CrossFit San Siro,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Gallarate 207,45.499128,9.12477
4,51c3530c498e5911e791b1fd,Rockspot,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",,45.511383,9.091387


Search all the venues for Park

In [31]:
search_query = 'Park'
radius = 2000
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitudeMy, longitudeMy, VERSION, search_query, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=1RHG0VKNFIUBDXQGDKDQEZP1WJXPIYBKMGIQO1Z0RYGZRM1A&client_secret=VJ4E4WB2B1Z4AEGNAHJYQQCJGPERRA5RE3K15FLY02R0HFYQ&ll=45.5009523,9.109888&v=20180605&query=Park&radius=2000&limit=100


In [32]:
#Send the GET request and examine the results
resultsPark = requests.get(url).json()
resultsPark

{'meta': {'code': 200, 'requestId': '5bc21601db04f55c3eebd0ee'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bf026f1d4e4d13a97ea15a7-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d163941735',
         'name': 'Park',
         'pluralName': 'Parks',
         'primary': True,
         'shortName': 'Park'}],
       'id': '4bf026f1d4e4d13a97ea15a7',
       'location': {'address': 'Via Cascina Bellaria',
        'cc': 'IT',
        'city': 'Milano',
        'country': 'Italia',
        'distance': 1870,
        'formattedAddress': ['Via Cascina Bellaria',
         '20151 Milano Lombardia',
         'Italia'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.484266515634

In [33]:
#clean the json and structure it into a pandas dataframe
venuesPark = resultsPark['response']['groups'][0]['items']

In [34]:
#Flatten JSON into a dataframe
nearby_venuesPark = json_normalize(venuesPark)
nearby_venuesPark.head(10)

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4bf026f1d4e4d13a97ea15a7-0,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4bf026f1d4e4d13a97ea15a7,Via Cascina Bellaria,IT,Milano,Italia,,1870,"[Via Cascina Bellaria, 20151 Milano Lombardia,...","[{'label': 'display', 'lat': 45.48426651563477...",45.484267,9.107022,20151.0,Lombardia,Parco di Trenno,0,[]
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-55af50bb498e3c69cda54dfb-1,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",55af50bb498e3c69cda54dfb,Via Gallarate,IT,Milano,Italia,,800,"[Via Gallarate, Milano Lombardia, Italia]","[{'label': 'display', 'lat': 45.50667131421003...",45.506671,9.103671,,Lombardia,Cascina Merlata,0,[]
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4f07fdcbe4b0e624d6a9eb3e-2,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4f07fdcbe4b0e624d6a9eb3e,Via Francesco Cilea,IT,Milano,Italia,,438,"[Via Francesco Cilea, Milano Lombardia, Italia]","[{'label': 'display', 'lat': 45.49701157069697...",45.497012,9.110018,,Lombardia,Parco Sandro Pertini,0,[]
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4c963c94f7cfa1cd734fc415-3,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4c963c94f7cfa1cd734fc415,,IT,Milano,Italia,,579,"[Milano Lombardia, Italia]","[{'label': 'display', 'lat': 45.50011856103531...",45.500119,9.117218,,Lombardia,Circolo Ricreativo RCS,0,[]
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4f89ac89e4b09efba1b8554b-4,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4f89ac89e4b09efba1b8554b,Parco Di Trenno,IT,Milano,Italia,,1404,"[Parco Di Trenno, Milano Lombardia, Italia]","[{'label': 'display', 'lat': 45.48935902138292...",45.489359,9.102798,,Lombardia,Spazio Gorlini,0,[]
5,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4cbd58244495721ec7215f7a-5,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4cbd58244495721ec7215f7a,Via Benedetto Croce,IT,Milano,Italia,Via Fratelli Vigorelli,1653,"[Via Benedetto Croce (Via Fratelli Vigorelli),...","[{'label': 'display', 'lat': 45.49401423814139...",45.494014,9.12863,20151.0,Lombardia,Giardini dei Caduti di Nassiriya,0,[]
6,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4d8a291399c2a1cdde908ad7-6,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4d8a291399c2a1cdde908ad7,"Via Giovanni Keplero, 21",IT,Pero,Italia,,1832,"[Via Giovanni Keplero, 21, 20016 Pero Lombardi...","[{'label': 'display', 'lat': 45.50337297733133...",45.503373,9.086651,20016.0,Lombardia,Parco Naturale Pero Atahotel,0,[]
7,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4e144d4414951daa08a8dccf-7,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4e144d4414951daa08a8dccf,Via Trenno,IT,Milano,Italia,Via Giulio Natta,1877,"[Via Trenno (Via Giulio Natta), Milano Lombard...","[{'label': 'display', 'lat': 45.48878980367733...",45.48879,9.126559,,Lombardia,Parchetto Trenno,0,[]


In [35]:
nearby_venuesPark =nearby_venuesPark.loc[:, filtered_columns]
nearby_venuesPark.head()

Unnamed: 0,venue.id,venue.name,venue.categories,venue.location.address,venue.location.lat,venue.location.lng
0,4bf026f1d4e4d13a97ea15a7,Parco di Trenno,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Cascina Bellaria,45.484267,9.107022
1,55af50bb498e3c69cda54dfb,Cascina Merlata,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Gallarate,45.506671,9.103671
2,4f07fdcbe4b0e624d6a9eb3e,Parco Sandro Pertini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Francesco Cilea,45.497012,9.110018
3,4c963c94f7cfa1cd734fc415,Circolo Ricreativo RCS,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,45.500119,9.117218
4,4f89ac89e4b09efba1b8554b,Spazio Gorlini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Parco Di Trenno,45.489359,9.102798


In [36]:
# clean columns names (remove the . from the column name)
nearby_venuesPark.columns = [col.split(".")[-1] for col in nearby_venuesPark.columns]
nearby_venuesPark.head(10)

Unnamed: 0,id,name,categories,address,lat,lng
0,4bf026f1d4e4d13a97ea15a7,Parco di Trenno,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Cascina Bellaria,45.484267,9.107022
1,55af50bb498e3c69cda54dfb,Cascina Merlata,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Gallarate,45.506671,9.103671
2,4f07fdcbe4b0e624d6a9eb3e,Parco Sandro Pertini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Francesco Cilea,45.497012,9.110018
3,4c963c94f7cfa1cd734fc415,Circolo Ricreativo RCS,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,45.500119,9.117218
4,4f89ac89e4b09efba1b8554b,Spazio Gorlini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Parco Di Trenno,45.489359,9.102798
5,4cbd58244495721ec7215f7a,Giardini dei Caduti di Nassiriya,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Benedetto Croce,45.494014,9.12863
6,4d8a291399c2a1cdde908ad7,Parco Naturale Pero Atahotel,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...","Via Giovanni Keplero, 21",45.503373,9.086651
7,4e144d4414951daa08a8dccf,Parchetto Trenno,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Trenno,45.48879,9.126559


In [37]:
#And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venuesPark.shape[0]))

8 venues were returned by Foursquare.


In [38]:
nearby_venues.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4c308fa366e40f4734aac38b,Piscina Comunale Lampugnano,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Adolfo Omodeo,45.491386,9.115659
1,4f0fd180e4b0d3f8a3c90de1,Virgin Swimming Pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Alcide de Gasperi 1,45.496361,9.128447
2,51dbaf19498ea602283851a2,The Hub's pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Privata Polonia 10,45.513616,9.119248
3,51699a22e4b04259cdd600bf,Superspa by Angelo Caroli @ B4,"[{'id': '4bf58dd8d48988d105941735', 'name': 'G...",,45.511416,9.124666


In [39]:
nearby_venuesGym.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4cac3b1744a8224bda8e3140,Virgin Active,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...","Viale Alcide De Gasperi, 2",45.496443,9.128483
1,4d42a020b6e73704fbda8609,Way Out,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...","Via Sapri, 64",45.503863,9.130028
2,4defe40fd16486e86e4d90a3,Capelli e Sforza,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Lampugnano 80,45.490882,9.113795
3,52ed2d6a498e23a6618ffdae,CrossFit San Siro,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Gallarate 207,45.499128,9.12477
4,51c3530c498e5911e791b1fd,Rockspot,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",,45.511383,9.091387


In [40]:
nearby_venuesPark.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4bf026f1d4e4d13a97ea15a7,Parco di Trenno,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Cascina Bellaria,45.484267,9.107022
1,55af50bb498e3c69cda54dfb,Cascina Merlata,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Gallarate,45.506671,9.103671
2,4f07fdcbe4b0e624d6a9eb3e,Parco Sandro Pertini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Francesco Cilea,45.497012,9.110018
3,4c963c94f7cfa1cd734fc415,Circolo Ricreativo RCS,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,45.500119,9.117218
4,4f89ac89e4b09efba1b8554b,Spazio Gorlini,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Parco Di Trenno,45.489359,9.102798


Join the 3 dataframes

In [41]:
frames = [nearby_venues, nearby_venuesGym, nearby_venuesPark]
result = pd.concat(frames, ignore_index= True)
result

Unnamed: 0,id,name,categories,address,lat,lng
0,4c308fa366e40f4734aac38b,Piscina Comunale Lampugnano,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Adolfo Omodeo,45.491386,9.115659
1,4f0fd180e4b0d3f8a3c90de1,Virgin Swimming Pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Alcide de Gasperi 1,45.496361,9.128447
2,51dbaf19498ea602283851a2,The Hub's pool,"[{'id': '4bf58dd8d48988d15e941735', 'name': 'P...",Via Privata Polonia 10,45.513616,9.119248
3,51699a22e4b04259cdd600bf,Superspa by Angelo Caroli @ B4,"[{'id': '4bf58dd8d48988d105941735', 'name': 'G...",,45.511416,9.124666
4,4cac3b1744a8224bda8e3140,Virgin Active,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...","Viale Alcide De Gasperi, 2",45.496443,9.128483
5,4d42a020b6e73704fbda8609,Way Out,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...","Via Sapri, 64",45.503863,9.130028
6,4defe40fd16486e86e4d90a3,Capelli e Sforza,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Lampugnano 80,45.490882,9.113795
7,52ed2d6a498e23a6618ffdae,CrossFit San Siro,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",Via Gallarate 207,45.499128,9.12477
8,51c3530c498e5911e791b1fd,Rockspot,"[{'id': '503289d391d4c4b30a586d6a', 'name': 'C...",,45.511383,9.091387
9,4bf026f1d4e4d13a97ea15a7,Parco di Trenno,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",Via Cascina Bellaria,45.484267,9.107022


Let's create a function to repeat the same process to all the postalcode in Milan

In [43]:
def getNearbyVenues(names, latitudes, longitudes, query, radius=1000):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name, lat, lng, '->')

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            query,
            lat,
            lng,
            radius,
            LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        print('num venues -> ', len(results))

        # return only relevant information for each nearby venue
        for v in results:
            if 'address' in v['venue']['location']:
                venues_list.append([(
                    name,
                    lat,
                    lng,
                    v['venue']['name'],
                    v['venue']['id'],
                    v['venue']['location']['lat'],
                    v['venue']['location']['lng'],
                    v['venue']['location']['address'],
                    v['venue']['categories'][0]['name'])])
                #print('NAME:', v['venue']['name'], 'LAT: ', v['venue']['location']['lat'], 'LONG: ', v['venue']['location']['lng'], 'ADDRESS: ',  v['venue']['location']['address'], 'CATEGORY: ',  v['venue']['categories'][0]['name'], '\n')
            else:
                venues_list.append([(
                    name,
                    lat,
                    lng,
                    v['venue']['name'],
                    v['venue']['id'],
                    v['venue']['location']['lat'],
                    v['venue']['location']['lng'],
                    np.nan,
                    v['venue']['categories'][0]['name'])])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                             'Postal Code Latitude',
                             'Postal Code Longitude',
                             'Venue',
                             'Venue ID',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Address',
                             'Venue Category']

    return(nearby_venues)

Run the above function on each postal code and create a new dataframe called *milan_venues*

In [63]:
#milan_data['PostalCode'] = milan_data['PostalCode'].astype(int)
#milan_data['Latitude'] = milan_data['Latitude'].astype(float)

In [44]:
pool_venues = getNearbyVenues(milan_data['PostalCode'], milan_data['Latitude'], milan_data['Longitude'], 'Swimming Pool')
pool_venues.head()

20121 45.4721783683814 9.18804382276746 ->
num venues ->  2
20122 45.4618117954248 9.19630988449062 ->
num venues ->  2
20123 45.462639 9.1885153 ->
num venues ->  1
20124 45.4846035406421 9.20081739536339 ->
num venues ->  1
20125 45.4997708807025 9.20491090230299 ->
num venues ->  0
20126 45.5132902305084 9.2176238336645 ->
num venues ->  3
20127 45.4965625911373 9.22041971183362 ->
num venues ->  1
20128 45.5151286112891 9.22456354838012 ->
num venues ->  3
20129 45.4709658466505 9.21379776222726 ->
num venues ->  2
20130 43.2445524454715 -1.9905876493851 ->
num venues ->  0
20131 45.4838376477017 9.22238340908133 ->
num venues ->  3
20132 45.4992682 9.2418212 ->
num venues ->  0
20133 45.4713355926092 9.22804657372785 ->
num venues ->  0
20134 45.4777472217568 9.24476583022646 ->
num venues ->  0
20135 45.4545233555934 9.21137011152827 ->
num venues ->  4
20136 45.449414397427 9.18445188536978 ->
num venues ->  3
20137 55.2417324754743 24.7643199841379 ->
num venues ->  0
20138 45.

Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,20121,45.472178,9.188044,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel
1,20121,45.472178,9.188044,acqua go,54fefc01498ea7e049439d53,45.477745,9.184486,,Gym Pool
2,20122,45.461812,9.19631,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel
3,20122,45.461812,9.19631,Physioclinic,4c9daae40e9bb1f744c1df5f,45.461423,9.205257,Via Fontana 18,Gym
4,20123,45.462639,9.188515,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel


In [45]:
print(len(pool_venues['Postal Code'].unique()))
pool_venues['Postal Code'].unique()

31


array([20121, 20122, 20123, 20124, 20126, 20127, 20128, 20129, 20131,
       20135, 20136, 20138, 20139, 20141, 20143, 20144, 20145, 20147,
       20148, 20149, 20150, 20151, 20152, 20154, 20156, 20157, 20158,
       20159, 20160, 20161, 20162], dtype=int64)

We can see that for 'Swimming pool' there are some postal codes missing in our research, in particular: 20125, 20130, 20131, 20132, 20133, 20134, 20137, 20140, 20142, 20146, 20153, 20155.

Let' s repeat the search for the Gym

In [46]:
gym_venues = getNearbyVenues(milan_data['PostalCode'], milan_data['Latitude'], milan_data['Longitude'], 'Gym')
gym_venues.head()

20121 45.4721783683814 9.18804382276746 ->
num venues ->  16
20122 45.4618117954248 9.19630988449062 ->
num venues ->  20
20123 45.462639 9.1885153 ->
num venues ->  19
20124 45.4846035406421 9.20081739536339 ->
num venues ->  18
20125 45.4997708807025 9.20491090230299 ->
num venues ->  3
20126 45.5132902305084 9.2176238336645 ->
num venues ->  4
20127 45.4965625911373 9.22041971183362 ->
num venues ->  9
20128 45.5151286112891 9.22456354838012 ->
num venues ->  4
20129 45.4709658466505 9.21379776222726 ->
num venues ->  11
20130 43.2445524454715 -1.9905876493851 ->
num venues ->  1
20131 45.4838376477017 9.22238340908133 ->
num venues ->  21
20132 45.4992682 9.2418212 ->
num venues ->  6
20133 45.4713355926092 9.22804657372785 ->
num venues ->  11
20134 45.4777472217568 9.24476583022646 ->
num venues ->  4
20135 45.4545233555934 9.21137011152827 ->
num venues ->  9
20136 45.449414397427 9.18445188536978 ->
num venues ->  13
20137 55.2417324754743 24.7643199841379 ->
num venues ->  0
2

Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,20121,45.472178,9.188044,Virgin Active,55dab2b2498e8313350a2658,45.472465,9.196257,Piazza cavoiur,Gym
1,20121,45.472178,9.188044,Hard Candy Fitness Audace Repubblica,54245d31498e395bb7286579,45.477589,9.195096,Via Parini 1,Gym
2,20121,45.472178,9.188044,Manzoni Fitness,521cd25611d23cab4708263f,45.469534,9.191433,,Gym
3,20121,45.472178,9.188044,La Palestrina di Emiliano,54662da5498ea575689bf64f,45.471299,9.193989,"Via Borgospesso, 12",Gym / Fitness Center
4,20121,45.472178,9.188044,20Hours Club Bossi,4cf4b2de7e0da1cdb37da397,45.467466,9.18697,Piazzetta Maurilio Bossi 4,Gym / Fitness Center


In [47]:
len(gym_venues['Postal Code'].unique())

40

In [48]:
gym_venues['Postal Code'].unique()

array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20130, 20131, 20132, 20133, 20134, 20135, 20136, 20138, 20139,
       20140, 20141, 20142, 20143, 20144, 20145, 20146, 20147, 20148,
       20149, 20150, 20151, 20152, 20153, 20154, 20155, 20157, 20158,
       20159, 20160, 20161, 20162], dtype=int64)

For the 'Gym' search there are only 2 postal codes missing: 20137, 20156

In [50]:
park_venues = getNearbyVenues(milan_data['PostalCode'], milan_data['Latitude'], milan_data['Longitude'], 'Park')
park_venues.head()

20121 45.4721783683814 9.18804382276746 ->
num venues ->  12
20122 45.4618117954248 9.19630988449062 ->
num venues ->  7
20123 45.462639 9.1885153 ->
num venues ->  13
20124 45.4846035406421 9.20081739536339 ->
num venues ->  4
20125 45.4997708807025 9.20491090230299 ->
num venues ->  6
20126 45.5132902305084 9.2176238336645 ->
num venues ->  3
20127 45.4965625911373 9.22041971183362 ->
num venues ->  4
20128 45.5151286112891 9.22456354838012 ->
num venues ->  3
20129 45.4709658466505 9.21379776222726 ->
num venues ->  6
20130 43.2445524454715 -1.9905876493851 ->
num venues ->  0
20131 45.4838376477017 9.22238340908133 ->
num venues ->  8
20132 45.4992682 9.2418212 ->
num venues ->  3
20133 45.4713355926092 9.22804657372785 ->
num venues ->  4
20134 45.4777472217568 9.24476583022646 ->
num venues ->  4
20135 45.4545233555934 9.21137011152827 ->
num venues ->  8
20136 45.449414397427 9.18445188536978 ->
num venues ->  13
20137 55.2417324754743 24.7643199841379 ->
num venues ->  1
20138 

Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,20121,45.472178,9.188044,Parco Sempione,4b05887bf964a52079c822e3,45.473129,9.177281,Parco Sempione,Park
1,20121,45.472178,9.188044,Giardini Perego,4c75435fff1fb60cada6f5a7,45.472107,9.192224,,Park
2,20121,45.472178,9.188044,Giardini Indro Montanelli,4bf54718706e20a1063daa98,45.473796,9.200078,Corso Porta Venezia,Park
3,20121,45.472178,9.188044,Castello Sforzesco,4b05887cf964a520dcc822e3,45.469545,9.180424,"Piazza Castello, 3",Castle
4,20121,45.472178,9.188044,Giardini di Villa Reale,4b05887bf964a52077c822e3,45.472127,9.199721,Via Palestro,Park


In [51]:
print(len(park_venues['Postal Code'].unique()))
park_venues['Postal Code'].unique()

39


array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20131, 20132, 20133, 20134, 20135, 20136, 20137, 20138, 20139,
       20140, 20141, 20143, 20144, 20145, 20146, 20147, 20148, 20149,
       20150, 20151, 20152, 20153, 20154, 20155, 20156, 20157, 20158,
       20159, 20161, 20162], dtype=int64)

For the 'Park' search there are only 3 postal codes missing: 20130, 20142, 20160

Check the shape of the 3 dataframes

In [55]:
print('pool_venues: ', pool_venues.shape)
print('gym_venues: ', gym_venues.shape)
print('park_venues: ', park_venues.shape)

pool_venues:  (82, 9)
gym_venues:  (307, 9)
park_venues:  (187, 9)


In [57]:
#join the 3 dataframe
frames = [pool_venues, gym_venues, park_venues]

milan_venues = pd.concat(frames, ignore_index= True)
print('milan_venues: ', milan_venues.shape)
milan_venues.head()

milan_venues:  (576, 9)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,20121,45.472178,9.188044,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel
1,20121,45.472178,9.188044,acqua go,54fefc01498ea7e049439d53,45.477745,9.184486,,Gym Pool
2,20122,45.461812,9.19631,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel
3,20122,45.461812,9.19631,Physioclinic,4c9daae40e9bb1f744c1df5f,45.461423,9.205257,Via Fontana 18,Gym
4,20123,45.462639,9.188515,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466514,9.193668,Corso Matteotti 4/6,Hotel


In [67]:
#check the postal code resulting
milan_venues.sort_values(by=['Postal Code'])['Postal Code'].unique()

array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20130, 20131, 20132, 20133, 20134, 20135, 20136, 20137, 20138,
       20139, 20140, 20141, 20142, 20143, 20144, 20145, 20146, 20147,
       20148, 20149, 20150, 20151, 20152, 20153, 20154, 20155, 20156,
       20157, 20158, 20159, 20160, 20161, 20162], dtype=int64)

In [70]:
len(milan_venues.sort_values(by=['Postal Code'], ascending=True)['Postal Code'].unique())

42

In [71]:
# one hot encoding
milan_onehot = pd.get_dummies(milan_venues[['Venue Category']], prefix="", prefix_sep="")
milan_onehot.head(10)

Unnamed: 0,Art Gallery,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space,Garden,General Entertainment,...,Pool,Resort,Road,Spa,Sports Club,Stadium,Supermarket,Track,Vegetarian / Vegan Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [72]:
milan_onehot.columns

Index(['Art Gallery', 'Athletics & Sports', 'Boxing Gym', 'Campground',
       'Castle', 'Climbing Gym', 'College Gym', 'Event Space', 'Garden',
       'General Entertainment', 'Gym', 'Gym / Fitness Center', 'Gym Pool',
       'Harbor / Marina', 'Hotel', 'Italian Restaurant', 'Lake',
       'Martial Arts Dojo', 'Monument / Landmark', 'Park', 'Playground',
       'Plaza', 'Pool', 'Resort', 'Road', 'Spa', 'Sports Club', 'Stadium',
       'Supermarket', 'Track', 'Vegetarian / Vegan Restaurant', 'Yoga Studio'],
      dtype='object')

In [73]:
#Add Postal Code column back to dataframe
milan_onehot['Postal Code'] = milan_venues['Postal Code']
milan_onehot.head()

Unnamed: 0,Art Gallery,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space,Garden,General Entertainment,...,Resort,Road,Spa,Sports Club,Stadium,Supermarket,Track,Vegetarian / Vegan Restaurant,Yoga Studio,Postal Code
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,20121
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,20121
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,20122
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,20122
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,20123


In [74]:
# move neighborhood column to the first column
fixed_columns = [milan_onehot.columns[-1]] + list(milan_onehot.columns[:-1])
milan_onehot = milan_onehot[fixed_columns]

In [75]:
milan_onehot.head()

Unnamed: 0,Postal Code,Art Gallery,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space,Garden,...,Pool,Resort,Road,Spa,Sports Club,Stadium,Supermarket,Track,Vegetarian / Vegan Restaurant,Yoga Studio
0,20121,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,20121,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,20122,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,20122,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,20123,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [76]:
#let's examine the new dataframe size
print("Shape of dataset milan_onehot: ", milan_onehot.shape)

Shape of dataset milan_onehot:  (576, 33)


In [77]:
# let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
milan_grouped = milan_onehot.groupby('Postal Code').mean().reset_index()
milan_grouped.head()

Unnamed: 0,Postal Code,Art Gallery,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space,Garden,...,Pool,Resort,Road,Spa,Sports Club,Stadium,Supermarket,Track,Vegetarian / Vegan Restaurant,Yoga Studio
0,20121,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333
1,20122,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,...,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0
2,20123,0.030303,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,...,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0
3,20124,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,...,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0
4,20125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0


In [81]:
#Let's print each postal code along with the top 5 most common venues
num_top_venues = 5

Cast Postal Code to string

In [78]:
milan_grouped['Postal Code'] = milan_grouped['Postal Code'].astype(str)

In [82]:
for hood in milan_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = milan_grouped[milan_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----20121----
                  venue  freq
0                   Gym  0.27
1                  Park  0.20
2  Gym / Fitness Center  0.13
3                 Hotel  0.10
4                 Plaza  0.07


----20122----
                  venue  freq
0                   Gym  0.31
1  Gym / Fitness Center  0.24
2                  Park  0.14
3                 Hotel  0.07
4           Art Gallery  0.03


----20123----
                  venue  freq
0  Gym / Fitness Center  0.24
1                   Gym  0.21
2                  Park  0.21
3                 Hotel  0.06
4                 Plaza  0.06


----20124----
                  venue  freq
0                 Hotel  0.30
1  Gym / Fitness Center  0.17
2                  Park  0.17
3                   Gym  0.17
4     Martial Arts Dojo  0.04


----20125----
                  venue  freq
0                  Park  0.56
1                 Track  0.11
2            Playground  0.11
3  Gym / Fitness Center  0.11
4                   Gym  0.11


----20126----
      

In [83]:
#Let's write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

In [84]:
# Let's create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

In [85]:
# Create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [86]:
# Create a new dataframe
postalcode_venues_sorted = pd.DataFrame(columns=columns)

In [87]:
postalcode_venues_sorted['Postal Code'] = milan_grouped['Postal Code']

In [88]:
for ind in np.arange(milan_grouped.shape[0]):
    postalcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(milan_grouped.iloc[ind, :], num_top_venues)

In [89]:
postalcode_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,Gym,Park,Gym / Fitness Center,Hotel,Plaza,Lake,Monument / Landmark,Yoga Studio,Road,Spa
1,20122,Gym,Gym / Fitness Center,Park,Hotel,Plaza,General Entertainment,Martial Arts Dojo,Monument / Landmark,Art Gallery,College Gym
2,20123,Gym / Fitness Center,Gym,Park,Plaza,Hotel,General Entertainment,Monument / Landmark,Art Gallery,Road,Spa
3,20124,Hotel,Gym / Fitness Center,Gym,Park,College Gym,Martial Arts Dojo,Pool,Spa,Climbing Gym,Gym Pool
4,20125,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Event Space,General Entertainment,Garden,College Gym


# Cluster Neighborhoods anlysis

In [90]:
#Run k-means to cluster the neighborhood into 5 clusters
# set number of clusters
kclusters = 5

In [91]:
#Drop the column Postal Code since kmeans can run only on numerical data
milan_grouped_clustering = milan_grouped.drop('Postal Code', 1)

In [92]:
milan_grouped_clustering.head()

Unnamed: 0,Art Gallery,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space,Garden,General Entertainment,...,Pool,Resort,Road,Spa,Sports Club,Stadium,Supermarket,Track,Vegetarian / Vegan Restaurant,Yoga Studio
0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333
1,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,...,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0
2,0.030303,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.030303,...,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,...,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0


In [167]:
# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(milan_grouped_clustering)

In [168]:
# check cluster labels generated for each row in the dataframe
print("\n FIRST 10 LABELS \n", kmeans.labels_[0:10])


 FIRST 10 LABELS 
 [3 3 3 3 1 0 3 0 3 2]


In [173]:
milan_data.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.472178,9.188044
1,20122,45.461812,9.19631
2,20123,45.462639,9.188515
3,20124,45.484604,9.200817
4,20125,45.499771,9.204911


In [174]:
#Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
milan_merged = milan_data

In [175]:
milan_merged.shape

(42, 3)

In [176]:
milan_merged.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.472178,9.188044
1,20122,45.461812,9.19631
2,20123,45.462639,9.188515
3,20124,45.484604,9.200817
4,20125,45.499771,9.204911


In [177]:
len(kmeans.labels_)

42

In [178]:
#Add clustering labels
milan_merged['Cluster Labels'] = kmeans.labels_

In [179]:
milan_merged.head()

Unnamed: 0,Postal Code,Latitude,Longitude,Cluster Labels
0,20121,45.472178,9.188044,3
1,20122,45.461812,9.19631,3
2,20123,45.462639,9.188515,3
3,20124,45.484604,9.200817,3
4,20125,45.499771,9.204911,1


In [180]:
postalcode_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,Gym,Park,Gym / Fitness Center,Hotel,Plaza,Lake,Monument / Landmark,Yoga Studio,Road,Spa
1,20122,Gym,Gym / Fitness Center,Park,Hotel,Plaza,General Entertainment,Martial Arts Dojo,Monument / Landmark,Art Gallery,College Gym
2,20123,Gym / Fitness Center,Gym,Park,Plaza,Hotel,General Entertainment,Monument / Landmark,Art Gallery,Road,Spa
3,20124,Hotel,Gym / Fitness Center,Gym,Park,College Gym,Martial Arts Dojo,Pool,Spa,Climbing Gym,Gym Pool
4,20125,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Event Space,General Entertainment,Garden,College Gym


In [184]:
milan_merged.dtypes

Postal Code        object
Latitude          float64
Longitude         float64
Cluster Labels      int32
dtype: object

In [160]:
#Set the same type for the columns to merge

In [183]:
milan_merged['Postal Code'] = milan_merged['Postal Code'].astype(str)

In [185]:
postalcode_venues_sorted['Postal Code'] = postalcode_venues_sorted['Postal Code'].astype(str)

In [186]:
postalcode_venues_sorted.dtypes

Postal Code               object
1st Most Common Venue     object
2nd Most Common Venue     object
3rd Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
6th Most Common Venue     object
7th Most Common Venue     object
8th Most Common Venue     object
9th Most Common Venue     object
10th Most Common Venue    object
dtype: object

In [187]:
##Merge the 2 dataframes
milan_merged = milan_merged.join(postalcode_venues_sorted.set_index('Postal Code'), on='Postal Code')

In [188]:
milan_merged.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,45.472178,9.188044,3,Gym,Park,Gym / Fitness Center,Hotel,Plaza,Lake,Monument / Landmark,Yoga Studio,Road,Spa
1,20122,45.461812,9.19631,3,Gym,Gym / Fitness Center,Park,Hotel,Plaza,General Entertainment,Martial Arts Dojo,Monument / Landmark,Art Gallery,College Gym
2,20123,45.462639,9.188515,3,Gym / Fitness Center,Gym,Park,Plaza,Hotel,General Entertainment,Monument / Landmark,Art Gallery,Road,Spa
3,20124,45.484604,9.200817,3,Hotel,Gym / Fitness Center,Gym,Park,College Gym,Martial Arts Dojo,Pool,Spa,Climbing Gym,Gym Pool
4,20125,45.499771,9.204911,1,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Event Space,General Entertainment,Garden,College Gym
5,20126,45.51329,9.217624,0,Pool,Park,Gym,Martial Arts Dojo,Climbing Gym,Yoga Studio,Gym / Fitness Center,General Entertainment,Garden,Event Space
6,20127,45.496563,9.22042,3,Gym,Gym / Fitness Center,Park,College Gym,Martial Arts Dojo,Playground,Plaza,Pool,Track,Climbing Gym
7,20128,45.515129,9.224564,0,Pool,Park,Gym,Martial Arts Dojo,Climbing Gym,Yoga Studio,Gym / Fitness Center,General Entertainment,Garden,Event Space
8,20129,45.470966,9.213798,3,Gym / Fitness Center,Gym,Park,College Gym,Garden,Playground,Plaza,Gym Pool,Yoga Studio,Supermarket
9,20130,43.244552,-1.990588,2,Gym,Yoga Studio,Vegetarian / Vegan Restaurant,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space


In [189]:
#Finally, let's visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitudeMi, longitudeMi], zoom_start=11)

In [190]:
map_clusters

In [122]:
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print("rainbow \n", rainbow)

rainbow 
 ['#8000ff', '#00b5eb', '#80ffb4', '#ffb360', '#ff0000']


In [192]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(milan_merged['Latitude'], milan_merged['Longitude'], milan_merged['Postal Code'], milan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

In [193]:
#Show the map
map_clusters

# Results section

In [194]:
# Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster

######## CLUSTER 1 ########
milan_merged.loc[milan_merged['Cluster Labels'] == 0, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,45.51329,Park,Gym,Martial Arts Dojo,Climbing Gym,Yoga Studio,Gym / Fitness Center,General Entertainment,Garden,Event Space
7,45.515129,Park,Gym,Martial Arts Dojo,Climbing Gym,Yoga Studio,Gym / Fitness Center,General Entertainment,Garden,Event Space
14,45.454523,Gym,Pool,Martial Arts Dojo,Monument / Landmark,Yoga Studio,Sports Club,Boxing Gym,Castle,Climbing Gym
26,45.465451,Gym,Pool,Gym / Fitness Center,Martial Arts Dojo,Yoga Studio,General Entertainment,Garden,Event Space,Harbor / Marina
27,45.477748,Pool,Gym Pool,Gym,Gym / Fitness Center,Hotel,Yoga Studio,General Entertainment,Garden,Event Space
30,45.493479,Pool,Gym,Gym / Fitness Center,Yoga Studio,Gym Pool,General Entertainment,Garden,Event Space,Hotel
31,38.908691,Gym,Park,College Gym,Yoga Studio,Gym Pool,Gym / Fitness Center,General Entertainment,Garden,Event Space
38,45.492482,Park,Pool,Gym / Fitness Center,Yoga Studio,Gym Pool,Hotel,General Entertainment,Garden,Event Space
39,55.23894,Pool,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,Event Space
41,45.512385,Gym,Park,Gym / Fitness Center,Yoga Studio,Gym Pool,General Entertainment,Garden,Event Space,Hotel


In [195]:
######## CLUSTER 2 ########
milan_merged.loc[milan_merged['Cluster Labels'] == 1, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,45.499771,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Event Space,General Entertainment,Garden,College Gym
13,45.477747,Gym,Gym / Fitness Center,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym
15,45.449414,Gym / Fitness Center,Gym,Pool,Vegetarian / Vegan Restaurant,Martial Arts Dojo,Yoga Studio,Gym Pool,College Gym,Event Space
16,55.241732,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space
17,45.445099,Pool,Gym / Fitness Center,Yoga Studio,Gym Pool,Gym,General Entertainment,Garden,Event Space,Hotel
19,43.217532,Gym / Fitness Center,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym
25,45.458608,Gym / Fitness Center,Gym,Martial Arts Dojo,Yoga Studio,Garden,Gym Pool,General Entertainment,Event Space,Hotel
32,45.470082,Campground,Gym,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Castle,Climbing Gym,College Gym
35,45.503402,Gym Pool,Yoga Studio,Hotel,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym
36,45.512299,Pool,Gym / Fitness Center,Yoga Studio,Gym Pool,Gym,General Entertainment,Garden,Event Space,Hotel


In [196]:
######## CLUSTER 3 ########
milan_merged.loc[milan_merged['Cluster Labels'] == 2, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,43.244552,Yoga Studio,Vegetarian / Vegan Restaurant,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space


In [197]:
######## CLUSTER 4 ########
milan_merged.loc[milan_merged['Cluster Labels'] == 3, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,45.472178,Park,Gym / Fitness Center,Hotel,Plaza,Lake,Monument / Landmark,Yoga Studio,Road,Spa
1,45.461812,Gym / Fitness Center,Park,Hotel,Plaza,General Entertainment,Martial Arts Dojo,Monument / Landmark,Art Gallery,College Gym
2,45.462639,Gym,Park,Plaza,Hotel,General Entertainment,Monument / Landmark,Art Gallery,Road,Spa
3,45.484604,Gym / Fitness Center,Gym,Park,College Gym,Martial Arts Dojo,Pool,Spa,Climbing Gym,Gym Pool
6,45.496563,Gym / Fitness Center,Park,College Gym,Martial Arts Dojo,Playground,Plaza,Pool,Track,Climbing Gym
8,45.470966,Gym,Park,College Gym,Garden,Playground,Plaza,Gym Pool,Yoga Studio,Supermarket
10,45.483838,Gym / Fitness Center,Park,Pool,Plaza,Yoga Studio,Martial Arts Dojo,College Gym,Climbing Gym,Event Space
11,45.499268,Martial Arts Dojo,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Garden,Gym Pool,General Entertainment,Event Space
12,45.471336,Gym,Park,Garden,College Gym,Plaza,Yoga Studio,Gym Pool,General Entertainment,Event Space
18,45.439824,Gym,Park,Gym Pool,Pool,Yoga Studio,General Entertainment,Garden,Event Space,Hotel


In [198]:
######## CLUSTER 5 ########
milan_merged.loc[milan_merged['Cluster Labels'] == 4, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,55.248211,Yoga Studio,Vegetarian / Vegan Restaurant,Athletics & Sports,Boxing Gym,Campground,Castle,Climbing Gym,College Gym,Event Space


# Discussion section

Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.