# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The objective of this capstone project is to find the most suitable location for living in a new city. Within the project we will explore the neighborhood of Milano and search for a better place where to live depending of the hobbies and interests of a person.

## Data <a name="data"></a>

To solve this problem, I will need below data :
* geographical coordinate of Milano: the postal codes of Milano go from 20121 to 201612 so we will retrieve the geographical coordinates of these postal codes
* Forsquare API to get venues data related to Milano areas. The main focus will be on  gyms, swimming pools and fitness centers because these are my favorite hobbies.


## Methodology <a name="methodology"></a>

In [1]:
from geopy.geocoders import Nominatim

address =  'Milan Italy'

geolocator = Nominatim(user_agent="DreamJobCity")
location = geolocator.geocode(address)
latitudeMi = location.latitude
longitudeMi = location.longitude

print('The geograpical coordinate of {} are {}, {}.'.format(address, latitudeMi, longitudeMi))

The geograpical coordinate of Milan Italy are 45.4668, 9.1905.


In [2]:
# initialize the variables
lat_lng_coords = None
latitude = []
longitude = []

In [3]:
#get coordinates
for i in range(20121, 20162+1):
    location = geolocator.geocode(str(i), address)
    latitudeMi = location.latitude
    longitudeMi = location.longitude
    latitude.append(latitudeMi)
    longitude.append(longitudeMi)
    print('The geograpical coordinate of {} are {}, {}.'.format(str(i) + address, latitudeMi, longitudeMi))

The geograpical coordinate of 20121Milan Italy are 45.47209965286037, 9.188083637357634.
The geograpical coordinate of 20122Milan Italy are 45.461913126654856, 9.196374983587853.
The geograpical coordinate of 20123Milan Italy are 45.4632179225897, 9.177475393185716.
The geograpical coordinate of 20124Milan Italy are 45.4831028, 9.1994731.
The geograpical coordinate of 20125Milan Italy are 45.4996703215885, 9.204921034636818.
The geograpical coordinate of 20126Milan Italy are 21.9308311, -102.2843864.
The geograpical coordinate of 20127Milan Italy are 45.496602297442486, 9.220526978547303.
The geograpical coordinate of 20128Milan Italy are 45.51493449289599, 9.225577792516473.
The geograpical coordinate of 20129Milan Italy are 45.47140199137647, 9.213718798757128.
The geograpical coordinate of 20130Milan Italy are 43.24453390258106, -1.990582383566761.
The geograpical coordinate of 20131Milan Italy are 45.48376029871447, 9.222420693236819.
The geograpical coordinate of 20132Milan Italy 

In [4]:
#create a dictionary with PostalCode, Latitude and Longitude
import numpy as np

data = {'Postal Code': np.arange(20121, 20162+1), 'Latitude': latitude, 'Longitude': longitude}

In [5]:
#create the dataframe with geograpical data
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


milan_df = pd.DataFrame(data, columns=['Postal Code', 'Latitude', 'Longitude'])

In [6]:
milan_df.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.4721,9.188084
1,20122,45.461913,9.196375
2,20123,45.463218,9.177475
3,20124,45.483103,9.199473
4,20125,45.49967,9.204921
5,20126,21.930831,-102.284386
6,20127,45.496602,9.220527
7,20128,45.514934,9.225578
8,20129,45.471402,9.213719
9,20130,43.244534,-1.990582


Get office geograpical coordinates

In [7]:
office = 'Piazza Gae Aulenti, 20124 Milano' # IBM Client Center address

geolocator = Nominatim(user_agent="DreamJobCity")
location = geolocator.geocode(office)
latitudeOffice = location.latitude
longitudeOffice = location.longitude

print('The geographical coordinate of {} are {}, {}.'.format(office, latitudeOffice, longitudeOffice))

The geographical coordinate of Piazza Gae Aulenti, 20124 Milano are 45.483456000000004, 9.190440363642619.


### Explore the venues in each area in order to find the most enjoyable area
For me an enjoyable area means and area with a large number of swimming pools and gyms

* Define Foursquare credentials¶


In [41]:
CLIENT_ID = "JXTIAE2GONXLA2AZ3WPAUDWDEQVO2Y02AQ3G2XHH44D4Q0G5"
CLIENT_SECRET = "F0UIQ0H5ICBLLZSWBIZ44RHZ2TBZIDGIKOW1KYXC3ZQT0V4P"

VERSION = '20200620'
RADIUS = 2000
LIMIT = 100
VENUES_URI = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}"

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
#Flatten JSON into a dataframe
from pandas.io.json import json_normalize

# function that extracts data from Forsquare response
def get_result_data(results):
    data = results['response']['groups'][0]['items']
    venues = json_normalize(data)
    
    #Consider only the columns 'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng'
    filtered_columns = ['venue.id', 'venue.name', 'venue.categories', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']
    venues = venues.loc[:, filtered_columns]
    venues['venue.categories'] = venues.apply(get_category_type, axis=1)
    venues.columns = [col.split(".")[-1] for col in venues.columns]
    
    return venues

* Search for Swimming Pools

In [14]:
#Fetch the top 100 venues that are in Milan
import requests

query = 'Swimming Pool'
resultsSwimming = requests.get( VENUES_URI.format(CLIENT_ID, CLIENT_SECRET, latitudeOffice, longitudeOffice, VERSION, query, RADIUS, LIMIT) ).json()
resultsSwimming

{'meta': {'code': 200, 'requestId': '5eee7248bae9a2001b4f4dc4'},
 'response': {'headerLocation': 'Zona 9',
  'headerFullLocation': 'Zona 9, Milan',
  'headerLocationGranularity': 'neighborhood',
  'query': 'swimming pool',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 45.50145601800002,
    'lng': 9.216065839044733},
   'sw': {'lat': 45.46545598199999, 'lng': 9.164814888240505}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5244274e11d2a351e0178739',
       'name': 'Ceresio 7 Pools & Restaurant',
       'location': {'address': 'Via Ceresio, 7',
        'lat': 45.48402456390375,
        'lng': 9.179849343941378,
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.48402456390375,
          'lng': 9.179849343941378}],
        'distance': 829,
 

In [21]:
nearby_venues = get_result_data(resultsSwimming)
nearby_venues.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,5244274e11d2a351e0178739,Ceresio 7 Pools & Restaurant,Italian Restaurant,"Via Ceresio, 7",45.484025,9.179849
1,4bd198dbb221c9b66ebfd5d0,Piscina Cozzi,Pool,Viale Tunisia 35,45.478397,9.201517
2,4ba4f699f964a52053c938e3,Miele,Pool,Via Lambertenghi 12,45.489098,9.186258
3,4c8b62663dc2a1cd1fa7b532,Centro Sportivo Murat,Pool,"Via Dino Villani, 2",45.499327,9.19058
4,4c1b498f63750f474498b467,Virgin Solarium Terrazzo,Gym Pool,Corso como,45.482559,9.187175
5,4bc76f5f6501c9b6a86d3e29,Piscina Bacone,Pool,Via Piccinni 10,45.48263,9.214902
6,4ba68fd0f964a5208b5e39e3,"Boscolo Milano, Autograph Collection",Hotel,Corso Matteotti 4/6,45.466908,9.194086


In [22]:
#Count venues returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


* Search for Gyms

In [23]:
gym_query = 'Gym'
resultsGym = requests.get( VENUES_URI.format(CLIENT_ID, CLIENT_SECRET, latitudeOffice, longitudeOffice, VERSION, gym_query, RADIUS, LIMIT) ).json()
resultsGym

{'meta': {'code': 200, 'requestId': '5eee736a71c428001b3416bb'},
 'response': {'headerLocation': 'Zona 9',
  'headerFullLocation': 'Zona 9, Milan',
  'headerLocationGranularity': 'neighborhood',
  'query': 'gym',
  'totalResults': 45,
  'suggestedBounds': {'ne': {'lat': 45.50145601800002,
    'lng': 9.216065839044733},
   'sw': {'lat': 45.46545598199999, 'lng': 9.164814888240505}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd0949377b29c74120a8b82',
       'name': 'Virgin Active',
       'location': {'address': 'Corso Como, 15',
        'lat': 45.48263212145271,
        'lng': 9.186464398536367,
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.48263212145271,
          'lng': 9.186464398536367}],
        'distance': 323,
        'postalCode': '20

In [24]:
nearby_venuesGym = get_result_data(resultsGym)
nearby_venuesGym.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4bd0949377b29c74120a8b82,Virgin Active,Gym,"Corso Como, 15",45.482632,9.186464
1,54245d31498e395bb7286579,Hard Candy Fitness Audace Repubblica,Gym,Via Parini 1,45.477589,9.195096
2,570a1fea498e9396ac3a8cc0,Excelsior Hotel Gallia Fitness Center,Gym / Fitness Center,"Piazza Duca d'Acosta, 9",45.485718,9.20142
3,55dab2b2498e8313350a2658,Virgin Active,Gym,Piazza Cavour,45.472222,9.196479
4,5526612e498eede4fe14e2f4,adidas Runbase,Gym,Corso Sempione 10,45.477511,9.170063
5,4c0bba94bbc676b0b5e84bd5,GetFIT,Gym / Fitness Center,Via Cenisio 10,45.488056,9.170602
6,4f74b75de4b08bc8e48b8239,Romans Club,Gym,Corso Sempione 30,45.479552,9.167653
7,59e77d3f6e4650734479f349,FitMi,Gym / Fitness Center,,45.486928,9.196585
8,4ba290cdf964a520e00438e3,GetFIT,Gym,"Viale Stelvio, 65",45.495446,9.181803
9,4bc04c9374a9a59348e2cff6,Greenline,Gym,"Via Procaccini, 36",45.483121,9.169461


In [25]:
#Count venues returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venuesGym.shape[0]))

45 venues were returned by Foursquare.


* Search for Parks

In [26]:
park_query = 'Park'
resultsPark = requests.get( VENUES_URI.format(CLIENT_ID, CLIENT_SECRET, latitudeOffice, longitudeOffice, VERSION, park_query, RADIUS, LIMIT) ).json()
resultsPark

{'meta': {'code': 200, 'requestId': '5eee74e49da7ee001bb47043'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Zona 9',
  'headerFullLocation': 'Zona 9, Milan',
  'headerLocationGranularity': 'neighborhood',
  'query': 'park',
  'totalResults': 29,
  'suggestedBounds': {'ne': {'lat': 45.50145601800002,
    'lng': 9.216065839044733},
   'sw': {'lat': 45.46545598199999, 'lng': 9.164814888240505}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bf54718706e20a1063daa98',
       'name': 'Giardini Indro Montanelli',
       'location': {'address': 'Corso Porta Venezia',
        'crossStreet': 'Via Palestro',
        'lat': 45.47435821048079,
        'lng': 9.200626804149993,
        '

In [27]:
nearby_venuesPark = get_result_data(resultsPark)
nearby_venuesPark.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4bf54718706e20a1063daa98,Giardini Indro Montanelli,Park,Corso Porta Venezia,45.474358,9.200627
1,4b05887bf964a52079c822e3,Parco Sempione,Park,Parco Sempione,45.472956,9.176876
2,59de4c5b3ba7671012779322,Biblioteca degli Alberi,Park,Piazza Gae Aulenti,45.484417,9.192857
3,4b05887bf964a52077c822e3,Giardini di Villa Reale,Park,Via Palestro,45.472127,9.199721
4,4c75435fff1fb60cada6f5a7,Giardini Perego,Park,Via dei Giardini 7/9,45.472107,9.192224
5,4b05887cf964a520dcc822e3,Castello Sforzesco,Castle,Piazza Castello 3,45.469545,9.180424
6,4b97fd80f964a520742435e3,Largo Claudio Treves,Park,Largo Treves,45.475239,9.186873
7,4fc1035de4b0cec93206aa4e,Laghetto del Parco Sempione,Lake,Parco Sempione,45.47301,9.17657
8,4cf51e527e0da1cd09b2a597,Parco Giochi Giardini di P.ta Venezia,Playground,Giardini Indro Montanelli,45.473753,9.202342
9,4b71ab41f964a520e7542de3,Piazza Castello,Plaza,Piazza Castello,45.468965,9.181312


In [28]:
#Count venues returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venuesPark.shape[0]))

24 venues were returned by Foursquare.


In [29]:
nearby_venues.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,5244274e11d2a351e0178739,Ceresio 7 Pools & Restaurant,Italian Restaurant,"Via Ceresio, 7",45.484025,9.179849
1,4bd198dbb221c9b66ebfd5d0,Piscina Cozzi,Pool,Viale Tunisia 35,45.478397,9.201517
2,4ba4f699f964a52053c938e3,Miele,Pool,Via Lambertenghi 12,45.489098,9.186258
3,4c8b62663dc2a1cd1fa7b532,Centro Sportivo Murat,Pool,"Via Dino Villani, 2",45.499327,9.19058
4,4c1b498f63750f474498b467,Virgin Solarium Terrazzo,Gym Pool,Corso como,45.482559,9.187175


In [30]:
nearby_venuesGym.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4bd0949377b29c74120a8b82,Virgin Active,Gym,"Corso Como, 15",45.482632,9.186464
1,54245d31498e395bb7286579,Hard Candy Fitness Audace Repubblica,Gym,Via Parini 1,45.477589,9.195096
2,570a1fea498e9396ac3a8cc0,Excelsior Hotel Gallia Fitness Center,Gym / Fitness Center,"Piazza Duca d'Acosta, 9",45.485718,9.20142
3,55dab2b2498e8313350a2658,Virgin Active,Gym,Piazza Cavour,45.472222,9.196479
4,5526612e498eede4fe14e2f4,adidas Runbase,Gym,Corso Sempione 10,45.477511,9.170063


In [31]:
nearby_venuesPark.head()

Unnamed: 0,id,name,categories,address,lat,lng
0,4bf54718706e20a1063daa98,Giardini Indro Montanelli,Park,Corso Porta Venezia,45.474358,9.200627
1,4b05887bf964a52079c822e3,Parco Sempione,Park,Parco Sempione,45.472956,9.176876
2,59de4c5b3ba7671012779322,Biblioteca degli Alberi,Park,Piazza Gae Aulenti,45.484417,9.192857
3,4b05887bf964a52077c822e3,Giardini di Villa Reale,Park,Via Palestro,45.472127,9.199721
4,4c75435fff1fb60cada6f5a7,Giardini Perego,Park,Via dei Giardini 7/9,45.472107,9.192224


In [32]:
frames = [nearby_venues, nearby_venuesGym, nearby_venuesPark]
result = pd.concat(frames, ignore_index= True)
result

Unnamed: 0,id,name,categories,address,lat,lng
0,5244274e11d2a351e0178739,Ceresio 7 Pools & Restaurant,Italian Restaurant,"Via Ceresio, 7",45.484025,9.179849
1,4bd198dbb221c9b66ebfd5d0,Piscina Cozzi,Pool,Viale Tunisia 35,45.478397,9.201517
2,4ba4f699f964a52053c938e3,Miele,Pool,Via Lambertenghi 12,45.489098,9.186258
3,4c8b62663dc2a1cd1fa7b532,Centro Sportivo Murat,Pool,"Via Dino Villani, 2",45.499327,9.19058
4,4c1b498f63750f474498b467,Virgin Solarium Terrazzo,Gym Pool,Corso como,45.482559,9.187175
5,4bc76f5f6501c9b6a86d3e29,Piscina Bacone,Pool,Via Piccinni 10,45.48263,9.214902
6,4ba68fd0f964a5208b5e39e3,"Boscolo Milano, Autograph Collection",Hotel,Corso Matteotti 4/6,45.466908,9.194086
7,4bd0949377b29c74120a8b82,Virgin Active,Gym,"Corso Como, 15",45.482632,9.186464
8,54245d31498e395bb7286579,Hard Candy Fitness Audace Repubblica,Gym,Via Parini 1,45.477589,9.195096
9,570a1fea498e9396ac3a8cc0,Excelsior Hotel Gallia Fitness Center,Gym / Fitness Center,"Piazza Duca d'Acosta, 9",45.485718,9.20142


In [36]:
milan_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.4721,9.188084
1,20122,45.461913,9.196375
2,20123,45.463218,9.177475
3,20124,45.483103,9.199473
4,20125,45.49967,9.204921


### Explore Neighborhoods in Milan

In [42]:
def getNearbyVenues(names, latitudes, longitudes, query, radius=1000):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            query,
            lat,
            lng,
            radius,
            LIMIT)

        # make the GET request
        #return print(requests.get(url).json())
        results = requests.get(url).json()["response"]['groups'][0]['items']
       
        # return only relevant information for each nearby venue
        for v in results:
            if 'address' in v['venue']['location']:
                venues_list.append([(
                    name,
                    lat,
                    lng,
                    v['venue']['name'],
                    v['venue']['id'],
                    v['venue']['location']['lat'],
                    v['venue']['location']['lng'],
                    v['venue']['location']['address'],
                    v['venue']['categories'][0]['name'])])
            else:
                venues_list.append([(
                    name,
                    lat,
                    lng,
                    v['venue']['name'],
                    v['venue']['id'],
                    v['venue']['location']['lat'],
                    v['venue']['location']['lng'],
                    np.nan,
                    v['venue']['categories'][0]['name'])])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                             'Postal Code Latitude',
                             'Postal Code Longitude',
                             'Venue',
                             'Venue ID',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Address',
                             'Venue Category']
    return(nearby_venues)

In [43]:
pool_venues = getNearbyVenues(milan_df['Postal Code'], milan_df['Latitude'], milan_df['Longitude'], 'Swimming Pool')

In [45]:
print(len(pool_venues['Postal Code'].unique()))
pool_venues['Postal Code'].unique()

25


array([20121, 20122, 20123, 20124, 20127, 20128, 20129, 20131, 20135,
       20137, 20138, 20139, 20141, 20144, 20145, 20148, 20149, 20150,
       20151, 20154, 20157, 20158, 20159, 20161, 20162])

##### We can see that for 'Swimming pool' there are multiple postal codes missing in our research: 20125, 20126, 20130, 20132, 20133, 20134, 20136, 20140, 20142, 20143, 20146, 20147, 20152, 20153, 20155, 20156, 20160

In [46]:
gym_venues = getNearbyVenues(milan_df['Postal Code'], milan_df['Latitude'], milan_df['Longitude'], 'Gym')

In [47]:
gym_venues_with_nan = gym_venues

In [48]:
len(gym_venues['Postal Code'].unique())
gym_venues['Postal Code'].unique()

array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20131, 20133, 20134, 20135, 20137, 20138, 20139, 20140, 20141,
       20144, 20145, 20146, 20147, 20148, 20149, 20150, 20151, 20154,
       20156, 20158, 20159, 20161, 20162])

##### Gyms are missing in the follow areas: 20130, 20132, 20136, 20142, 20143, 20152, 20153, 20155, 20157, 20160

In [49]:
park_venues = getNearbyVenues(milan_df['Postal Code'], milan_df['Latitude'], milan_df['Longitude'], 'Park')

In [50]:
park_venues_with_nan = park_venues

In [51]:

print(len(park_venues['Postal Code'].unique()))
park_venues['Postal Code'].unique()

33


array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20131, 20133, 20134, 20135, 20137, 20138, 20139, 20140, 20141,
       20144, 20145, 20146, 20147, 20148, 20149, 20151, 20153, 20154,
       20156, 20157, 20158, 20159, 20161, 20162])

#####  Parks are not present only in this areas: 20130, 20132, 20136, 20142, 20143, 20150, 20152, 20155, 20160

In [52]:
print('pool_venues: ', pool_venues.shape)
print('gym_venues: ', gym_venues.shape)
print('park_venues: ', park_venues.shape)

pool_venues:  (66, 9)
gym_venues:  (259, 9)
park_venues:  (155, 9)


In [53]:
#join the 3 dataframe
frames = [pool_venues, gym_venues, park_venues]

milan_venues = pd.concat(frames, ignore_index= True)
print('milan_venues: ', milan_venues.shape)
milan_venues.head()

milan_venues:  (480, 9)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,20121,45.4721,9.188084,acqua go,54fefc01498ea7e049439d53,45.477745,9.184486,,Gym Pool
1,20121,45.4721,9.188084,"Boscolo Milano, Autograph Collection",4ba68fd0f964a5208b5e39e3,45.466908,9.194086,Corso Matteotti 4/6,Hotel
2,20122,45.461913,9.196375,Physioclinic,4c9daae40e9bb1f744c1df5f,45.461423,9.205257,Via Fontana 18,Gym
3,20123,45.463218,9.177475,Centro Sportivo San Carlo,4bee990f3686c9b6fea9246e,45.464768,9.169795,Via Zenale 6,Gym
4,20123,45.463218,9.177475,Piscina Conca del Naviglio,541803af498e431f31eaa085,45.455677,9.177178,,Pool


In [54]:
#check the postal code resulting
milan_venues.sort_values(by=['Postal Code'])['Postal Code'].unique()

array([20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20129,
       20131, 20133, 20134, 20135, 20137, 20138, 20139, 20140, 20141,
       20144, 20145, 20146, 20147, 20148, 20149, 20150, 20151, 20153,
       20154, 20156, 20157, 20158, 20159, 20161, 20162])

In [55]:
len(milan_venues.sort_values(by=['Postal Code'], ascending=True)['Postal Code'].unique())

34

## Analysis <a name="analysis"></a>

In [56]:
# one hot encoding
milan_onehot = pd.get_dummies(milan_venues[['Venue Category']], prefix="", prefix_sep="")
milan_onehot.head()

Unnamed: 0,Art Gallery,Art Museum,Boxing Gym,Building,Campground,Castle,Climbing Gym,College Gym,Dance Studio,Garden,General Entertainment,Gym,Gym / Fitness Center,Gym Pool,Hotel,Italian Restaurant,Lake,Martial Arts Dojo,Monument / Landmark,Office,Park,Parking,Playground,Plaza,Pool,Resort,Road,Skate Park,Spa,Sports Club,Stadium,Track,Water Park,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [57]:
milan_onehot.columns

Index(['Art Gallery', 'Art Museum', 'Boxing Gym', 'Building', 'Campground',
       'Castle', 'Climbing Gym', 'College Gym', 'Dance Studio', 'Garden',
       'General Entertainment', 'Gym', 'Gym / Fitness Center', 'Gym Pool',
       'Hotel', 'Italian Restaurant', 'Lake', 'Martial Arts Dojo',
       'Monument / Landmark', 'Office', 'Park', 'Parking', 'Playground',
       'Plaza', 'Pool', 'Resort', 'Road', 'Skate Park', 'Spa', 'Sports Club',
       'Stadium', 'Track', 'Water Park', 'Yoga Studio'],
      dtype='object')

In [58]:
#Add Postal Code column back to dataframe
milan_onehot['Postal Code'] = milan_venues['Postal Code']
milan_onehot.head()

Unnamed: 0,Art Gallery,Art Museum,Boxing Gym,Building,Campground,Castle,Climbing Gym,College Gym,Dance Studio,Garden,General Entertainment,Gym,Gym / Fitness Center,Gym Pool,Hotel,Italian Restaurant,Lake,Martial Arts Dojo,Monument / Landmark,Office,Park,Parking,Playground,Plaza,Pool,Resort,Road,Skate Park,Spa,Sports Club,Stadium,Track,Water Park,Yoga Studio,Postal Code
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20121
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20121
2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20122
3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20123
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20123


In [59]:
# move neighborhood column to the first column
fixed_columns = [milan_onehot.columns[-1]] + list(milan_onehot.columns[:-1])
milan_onehot = milan_onehot[fixed_columns]

In [60]:
milan_onehot.head()

Unnamed: 0,Postal Code,Art Gallery,Art Museum,Boxing Gym,Building,Campground,Castle,Climbing Gym,College Gym,Dance Studio,Garden,General Entertainment,Gym,Gym / Fitness Center,Gym Pool,Hotel,Italian Restaurant,Lake,Martial Arts Dojo,Monument / Landmark,Office,Park,Parking,Playground,Plaza,Pool,Resort,Road,Skate Park,Spa,Sports Club,Stadium,Track,Water Park,Yoga Studio
0,20121,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,20121,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,20122,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,20123,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,20123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [61]:
# examine the new dataframe size
print("Shape of dataset milan_onehot: ", milan_onehot.shape)

Shape of dataset milan_onehot:  (480, 35)


In [64]:
# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
milan_grouped = milan_onehot.groupby('Postal Code').mean().reset_index()
milan_grouped.head()

Unnamed: 0,Postal Code,Art Gallery,Art Museum,Boxing Gym,Building,Campground,Castle,Climbing Gym,College Gym,Dance Studio,Garden,General Entertainment,Gym,Gym / Fitness Center,Gym Pool,Hotel,Italian Restaurant,Lake,Martial Arts Dojo,Monument / Landmark,Office,Park,Parking,Playground,Plaza,Pool,Resort,Road,Skate Park,Spa,Sports Club,Stadium,Track,Water Park,Yoga Studio
0,20121,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.241379,0.172414,0.034483,0.034483,0.0,0.034483,0.0,0.034483,0.0,0.241379,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483
1,20122,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.307692,0.269231,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.153846,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0
2,20123,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.277778,0.166667,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.055556,0.055556,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.083333
3,20124,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.173913,0.26087,0.130435,0.0,0.0,0.0,0.043478,0.0,0.0,0.304348,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
4,20125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0


In [65]:
#print each postal code along with the top 5 most common venues
num_top_venues = 5

In [66]:
#cast Postal Code to string
milan_grouped['Postal Code'] = milan_grouped['Postal Code'].astype(str)

In [67]:
for hood in milan_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = milan_grouped[milan_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----20121----
                  venue  freq
0                  Park  0.24
1                   Gym  0.24
2  Gym / Fitness Center  0.17
3                 Plaza  0.07
4                  Road  0.03


----20122----
                   venue  freq
0                    Gym  0.31
1   Gym / Fitness Center  0.27
2                   Park  0.15
3            Art Gallery  0.04
4  General Entertainment  0.04


----20123----
                  venue  freq
0                   Gym  0.28
1                  Park  0.22
2  Gym / Fitness Center  0.17
3           Yoga Studio  0.08
4                 Plaza  0.06


----20124----
                  venue  freq
0                  Park  0.30
1  Gym / Fitness Center  0.26
2                   Gym  0.17
3              Gym Pool  0.13
4     Martial Arts Dojo  0.04


----20125----
                  venue  freq
0                  Park  0.33
1                 Track  0.17
2            Playground  0.17
3  Gym / Fitness Center  0.17
4                   Gym  0.17


----20126----


In [68]:
#sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

In [69]:
#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# Create a new dataframe
postalcode_venues_sorted = pd.DataFrame(columns=columns)
postalcode_venues_sorted['Postal Code'] = milan_grouped['Postal Code']
for ind in np.arange(milan_grouped.shape[0]):
        postalcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(milan_grouped.iloc[ind, :], num_top_venues)        

In [70]:
postalcode_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,Gym,Park,Gym / Fitness Center,Plaza,Yoga Studio,Castle,Gym Pool,Hotel,Monument / Landmark,Lake
1,20122,Gym,Gym / Fitness Center,Park,Plaza,Martial Arts Dojo,Monument / Landmark,General Entertainment,Art Gallery,Spa,College Gym
2,20123,Gym,Park,Gym / Fitness Center,Yoga Studio,Plaza,Pool,Spa,Road,Castle,Gym Pool
3,20124,Park,Gym / Fitness Center,Gym,Gym Pool,Martial Arts Dojo,Yoga Studio,Pool,Castle,Climbing Gym,College Gym
4,20125,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Dance Studio,General Entertainment,Garden,College Gym
5,20126,Gym / Fitness Center,Gym,Park,Yoga Studio,Dance Studio,Gym Pool,General Entertainment,Garden,College Gym,Italian Restaurant
6,20127,Park,Gym,Gym / Fitness Center,Gym Pool,Pool,Martial Arts Dojo,Playground,Plaza,Track,Climbing Gym
7,20128,Pool,Gym,Park,Martial Arts Dojo,Climbing Gym,Dance Studio,Gym / Fitness Center,General Entertainment,Garden,Yoga Studio
8,20129,Gym / Fitness Center,Gym,Park,Yoga Studio,College Gym,Playground,Plaza,Pool,Building,Gym Pool
9,20131,Gym,Gym / Fitness Center,Park,Pool,Plaza,Yoga Studio,Martial Arts Dojo,College Gym,Castle,Climbing Gym


In [71]:
#drop the column Postal Code since kmeans can run only on numerical data
milan_grouped_clustering = milan_grouped.drop('Postal Code', 1)

milan_grouped_clustering.head()

Unnamed: 0,Art Gallery,Art Museum,Boxing Gym,Building,Campground,Castle,Climbing Gym,College Gym,Dance Studio,Garden,General Entertainment,Gym,Gym / Fitness Center,Gym Pool,Hotel,Italian Restaurant,Lake,Martial Arts Dojo,Monument / Landmark,Office,Park,Parking,Playground,Plaza,Pool,Resort,Road,Skate Park,Spa,Sports Club,Stadium,Track,Water Park,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.241379,0.172414,0.034483,0.034483,0.0,0.034483,0.0,0.034483,0.0,0.241379,0.0,0.0,0.068966,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483
1,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.307692,0.269231,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.153846,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0
2,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.277778,0.166667,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.055556,0.055556,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.083333
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.173913,0.26087,0.130435,0.0,0.0,0.0,0.043478,0.0,0.0,0.304348,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0


### Cluster Neighborhoods¶

Run k-means to cluster the neighborhood into 5 clusters.

In [72]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(milan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 2, 2, 2, 1, 2, 0, 1, 2], dtype=int32)

In [73]:
milan_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,20121,45.4721,9.188084
1,20122,45.461913,9.196375
2,20123,45.463218,9.177475
3,20124,45.483103,9.199473
4,20125,45.49967,9.204921


In [96]:
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
milan_merged = milan_df.copy()

#prepare Postal Code form merfing
milan_merged['Postal Code'] = milan_merged['Postal Code'].astype(str)
postalcode_venues_sorted['Postal Code'] = postalcode_venues_sorted['Postal Code'].astype(str)


In [97]:
# merge milano_grouped with milano_df
milan_merged = milan_merged.join(postalcode_venues_sorted.set_index('Postal Code'), on='Postal Code')

In [100]:
milan_merged.head()

Unnamed: 0,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,45.4721,9.188084,2.0,Gym,Park,Gym / Fitness Center,Plaza,Yoga Studio,Castle,Gym Pool,Hotel,Monument / Landmark,Lake
1,20122,45.461913,9.196375,1.0,Gym,Gym / Fitness Center,Park,Plaza,Martial Arts Dojo,Monument / Landmark,General Entertainment,Art Gallery,Spa,College Gym
2,20123,45.463218,9.177475,2.0,Gym,Park,Gym / Fitness Center,Yoga Studio,Plaza,Pool,Spa,Road,Castle,Gym Pool
3,20124,45.483103,9.199473,2.0,Park,Gym / Fitness Center,Gym,Gym Pool,Martial Arts Dojo,Yoga Studio,Pool,Castle,Climbing Gym,College Gym
4,20125,45.49967,9.204921,2.0,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Dance Studio,General Entertainment,Garden,College Gym
5,20126,21.930831,-102.284386,1.0,Gym / Fitness Center,Gym,Park,Yoga Studio,Dance Studio,Gym Pool,General Entertainment,Garden,College Gym,Italian Restaurant
6,20127,45.496602,9.220527,2.0,Park,Gym,Gym / Fitness Center,Gym Pool,Pool,Martial Arts Dojo,Playground,Plaza,Track,Climbing Gym
7,20128,45.514934,9.225578,0.0,Pool,Gym,Park,Martial Arts Dojo,Climbing Gym,Dance Studio,Gym / Fitness Center,General Entertainment,Garden,Yoga Studio
8,20129,45.471402,9.213719,1.0,Gym / Fitness Center,Gym,Park,Yoga Studio,College Gym,Playground,Plaza,Pool,Building,Gym Pool
9,20130,43.244534,-1.990582,,,,,,,,,,,


In [108]:
#Remove NaN Values

milan_merged = milan_merged.dropna()

### Visualize the resulting clusters¶

In [103]:
#install folium if missing
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

In [109]:
milan_merged

Unnamed: 0,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,45.4721,9.188084,2.0,Gym,Park,Gym / Fitness Center,Plaza,Yoga Studio,Castle,Gym Pool,Hotel,Monument / Landmark,Lake
1,20122,45.461913,9.196375,1.0,Gym,Gym / Fitness Center,Park,Plaza,Martial Arts Dojo,Monument / Landmark,General Entertainment,Art Gallery,Spa,College Gym
2,20123,45.463218,9.177475,2.0,Gym,Park,Gym / Fitness Center,Yoga Studio,Plaza,Pool,Spa,Road,Castle,Gym Pool
3,20124,45.483103,9.199473,2.0,Park,Gym / Fitness Center,Gym,Gym Pool,Martial Arts Dojo,Yoga Studio,Pool,Castle,Climbing Gym,College Gym
4,20125,45.49967,9.204921,2.0,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Dance Studio,General Entertainment,Garden,College Gym
5,20126,21.930831,-102.284386,1.0,Gym / Fitness Center,Gym,Park,Yoga Studio,Dance Studio,Gym Pool,General Entertainment,Garden,College Gym,Italian Restaurant
6,20127,45.496602,9.220527,2.0,Park,Gym,Gym / Fitness Center,Gym Pool,Pool,Martial Arts Dojo,Playground,Plaza,Track,Climbing Gym
7,20128,45.514934,9.225578,0.0,Pool,Gym,Park,Martial Arts Dojo,Climbing Gym,Dance Studio,Gym / Fitness Center,General Entertainment,Garden,Yoga Studio
8,20129,45.471402,9.213719,1.0,Gym / Fitness Center,Gym,Park,Yoga Studio,College Gym,Playground,Plaza,Pool,Building,Gym Pool
10,20131,45.48376,9.222421,2.0,Gym,Gym / Fitness Center,Park,Pool,Plaza,Yoga Studio,Martial Arts Dojo,College Gym,Castle,Climbing Gym


In [110]:
import folium

import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitudeMi, longitudeMi], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(milan_merged['Latitude'], milan_merged['Longitude'], milan_merged['Postal Code'], milan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters    

## Examine Clusters

In [111]:
#Cluster 1
milan_merged.loc[milan_merged['Cluster Labels'] == 0, milan_merged.columns[[0] + list(range(4, milan_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,20128,Pool,Gym,Park,Martial Arts Dojo,Climbing Gym,Dance Studio,Gym / Fitness Center,General Entertainment,Garden,Yoga Studio
13,20134,Park,Gym,Gym / Fitness Center,Yoga Studio,Dance Studio,Gym Pool,General Entertainment,Garden,College Gym,Italian Restaurant
14,20135,Park,Gym,Pool,Martial Arts Dojo,Monument / Landmark,Yoga Studio,Sports Club,Boxing Gym,Campground,Castle
16,20137,Park,Gym,Pool,Boxing Gym,Martial Arts Dojo,Monument / Landmark,College Gym,Garden,Gym / Fitness Center,General Entertainment
17,20138,Pool,Gym,Park,Climbing Gym,Dance Studio,Gym Pool,Gym / Fitness Center,General Entertainment,Garden,Yoga Studio
25,20146,Park,Gym / Fitness Center,Gym,Martial Arts Dojo,Yoga Studio,Italian Restaurant,Gym Pool,General Entertainment,Garden,Dance Studio
27,20148,Park,Pool,Playground,Track,Gym,Hotel,Gym / Fitness Center,General Entertainment,Garden,Dance Studio
30,20151,Park,Gym,Pool,Gym / Fitness Center,Sports Club,Office,College Gym,General Entertainment,Garden,Dance Studio
36,20157,Park,Pool,Italian Restaurant,Gym Pool,Gym / Fitness Center,Gym,General Entertainment,Garden,Dance Studio,Yoga Studio


In [112]:
#Cluster 2
milan_merged.loc[milan_merged['Cluster Labels'] == 1, milan_merged.columns[[0] + list(range(4, milan_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,20122,Gym,Gym / Fitness Center,Park,Plaza,Martial Arts Dojo,Monument / Landmark,General Entertainment,Art Gallery,Spa,College Gym
5,20126,Gym / Fitness Center,Gym,Park,Yoga Studio,Dance Studio,Gym Pool,General Entertainment,Garden,College Gym,Italian Restaurant
8,20129,Gym / Fitness Center,Gym,Park,Yoga Studio,College Gym,Playground,Plaza,Pool,Building,Gym Pool
18,20139,Gym / Fitness Center,Gym Pool,Gym,Park,Pool,Italian Restaurant,General Entertainment,Garden,Dance Studio,Yoga Studio
19,20140,Gym / Fitness Center,Park,Yoga Studio,Dance Studio,Gym Pool,Gym,General Entertainment,Garden,College Gym,Italian Restaurant
23,20144,Gym / Fitness Center,Park,Yoga Studio,Pool,Spa,Gym,Martial Arts Dojo,Building,Campground,Castle
29,20150,Gym / Fitness Center,Hotel,Gym,Yoga Studio,Resort,Pool,Spa,Campground,Castle,Building
40,20161,Gym / Fitness Center,Park,Pool,Gym,Italian Restaurant,Gym Pool,General Entertainment,Garden,Dance Studio,Yoga Studio


In [113]:
#Cluster 3
milan_merged.loc[milan_merged['Cluster Labels'] == 2, milan_merged.columns[[0] + list(range(4, milan_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,Gym,Park,Gym / Fitness Center,Plaza,Yoga Studio,Castle,Gym Pool,Hotel,Monument / Landmark,Lake
2,20123,Gym,Park,Gym / Fitness Center,Yoga Studio,Plaza,Pool,Spa,Road,Castle,Gym Pool
3,20124,Park,Gym / Fitness Center,Gym,Gym Pool,Martial Arts Dojo,Yoga Studio,Pool,Castle,Climbing Gym,College Gym
4,20125,Park,Track,Gym / Fitness Center,Gym,Playground,Yoga Studio,Dance Studio,General Entertainment,Garden,College Gym
6,20127,Park,Gym,Gym / Fitness Center,Gym Pool,Pool,Martial Arts Dojo,Playground,Plaza,Track,Climbing Gym
10,20131,Gym,Gym / Fitness Center,Park,Pool,Plaza,Yoga Studio,Martial Arts Dojo,College Gym,Castle,Climbing Gym
20,20141,Gym,Park,Pool,Hotel,Martial Arts Dojo,Gym / Fitness Center,General Entertainment,Garden,Dance Studio,Yoga Studio
24,20145,Park,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts Dojo,Gym Pool,Italian Restaurant,General Entertainment,Garden,Dance Studio
26,20147,Park,Gym,Dance Studio,Martial Arts Dojo,Gym / Fitness Center,Italian Restaurant,Gym Pool,General Entertainment,Garden,Yoga Studio
28,20149,Pool,Gym,Park,Gym / Fitness Center,Hotel,Yoga Studio,Stadium,Campground,Castle,Climbing Gym


In [114]:
#Cluster 4
milan_merged.loc[milan_merged['Cluster Labels'] == 3, milan_merged.columns[[0] + list(range(4, milan_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,20133,Gym,College Gym,Park,Garden,Dance Studio,Gym Pool,Gym / Fitness Center,General Entertainment,Yoga Studio,Italian Restaurant
35,20156,Gym,Park,Yoga Studio,Dance Studio,Gym Pool,Gym / Fitness Center,General Entertainment,Garden,College Gym,Italian Restaurant


In [115]:
#Cluster 5
milan_merged.loc[milan_merged['Cluster Labels'] == 4, milan_merged.columns[[0] + list(range(4, milan_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,20153,Water Park,Campground,Park,Garden,Hotel,Gym Pool,Gym / Fitness Center,Gym,General Entertainment,Yoga Studio


## Results and Discussion <a name="results"></a>

Our analysis show that a large number of Swimming Pools, Gyms and Parks are present in Milan, however hobbies relevance is different for everyone. For me the most important hobbie is swimming followed by gym and park walks. So the search for my new home will be focused in the areas(postal code) where the most common venues will allow me to enjoy my hobbies and be as close as possibile to the office.
* **Cluster 1**: i will not choose this cluster because the most common venues are parks, but parks i my last favorite hobby so will not be most suited place in town
* **Cluster 2**: is the second most crowded cluster but is not a good choice because the most common venues are Gyms and Parks. I know that is 2 of 3 but my office address is not in this cluster.
* **Cluster 3**: is the biggest and the most relevant cluster because contains my office Postal Code **20124**. I can see that the  most common venues are in revese order of preference, however all my hobbies are the most common venues in many areas of this cluster, for example in *20149* pools are the most common venue, in *20162* are the second one.
* **Cluster 4**: is the second small cluster where gyms are the most common venues but there few swimming pools are present
* **Cluster 5**: is the smallest cluster with only one postal code where my favorite activity is in 6th position

The analysis reveal that the most suited area for me is the **20124**, but this area should be considered only as starting point because all the nearby areas( *Cluster 3*) of this postal code are suited to my lifestyle.

## Conclusion <a name="conclusion"></a>

The main purpose of this project was to identify most suitable location for living in Milan in order to aid me have my Dream Job without limit my way of life.
Final decission wil be made based on additional factors like real estate availability and prices.