<h1> The best place to open a new Pizza Restaurant in Milan

<p style="text-align:justify">In this Data Science project, we will analyze the best place in Milan to open a new Pizza Restaurant close to univerisities. The Data are clustered and analyzed in order to find the best location for our freshly new restaurant.

<h3> Data Collection

Data are collected in different ways:
<ul>
  <li>The list of districts of Milan, Italy obtained from an external list "Milan_Districts.xlsx". This contains the info about the districts of Milan and the population density</li>
  <li>The coordinates of longitude and latitude the districts obtained by the Open Streets Map API</li>
    <li>The list of university with their reference postal code is uploaded by an external list</li>
    <li>The list of venues imported from the FORESQUARE API</li>
</ul>
In this section the data collection is described:     

In [80]:
 
#-- Importing Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import xml 
 
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from bs4 import BeautifulSoup #library for web scraping

print('Libraries imported.')


Libraries imported.


<h4> Import Districts Names and Zones

In [82]:
df = pd.read_excel("Milan_Districts.xlsx")

In [83]:
#-- Create districts dataframe
columns_name = ['Zone_Number','Zone','Name']
districts = pd.DataFrame(columns = columns_name)
district_name = []
zone_name = []
zone_number = []
for i in range(len(df['No.'])):
    line_att = df['Districts'][i]
    line_att = line_att.replace('\xa0','')
    line_att = line_att.upper()
    line_att = line_att.split(',')
    for j in range(len(line_att)):
        district_name.append(line_att[j])
        zone_name.append(df['Name'][i])
        zone_number.append(df['No.'][i])

districts['Zone']=zone_name
districts['Name']=district_name
districts['Zone_Number']=zone_number
districts.head()

Unnamed: 0,Zone_Number,Zone,Name
0,1,Centro Storico,BRERA
1,1,Centro Storico,CENTRO STORICO
2,1,Centro Storico,CONCA DEL NAVIGLIO
3,1,Centro Storico,GUASTALLA
4,1,Centro Storico,PORTA SEMPIONE


<h4> Import districts latitude and longitude

In [84]:
#-- import CAP
df_CAP = pd.read_excel('Milan_Districts.xlsx',sheet_name = 'CAP')
df_CAP = df_CAP.rename(columns = {'LOCATION':'Name'})
#-- Merging CAP with districts
results = pd.merge(districts, df_CAP, on = 'Name')
#-- Determine unique CAPs for each zone
df_Milan_Districts = results.drop_duplicates(subset='CAP')
df_Milan_Districts = df_Milan_Districts.drop(columns='Name')
df_Milan_Districts.reset_index(drop=True,inplace=True)

In [85]:
# Install and import Geocoder library
import geocoder

In [86]:
#-- Define the function to extract latitude and longitude
def get_latlng(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Milan, Italy'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [87]:
# Postal codes coordinates
postal_codes = df_Milan_Districts['CAP']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes.tolist() ]

In [88]:
#-- Add latitude and longitude values to df_Milan_districts DataFrame
long =[]
lat =[]
for i in range(len(coords)):
    long.append(coords[i][0])
    lat.append(coords[i][1])
    
df_Milan_Districts['Longitude']=long
df_Milan_Districts['Latitude']=lat
df_Milan_Districts = df_Milan_Districts.rename(columns = {'Zone':'Name'})
df_Milan_Districts.head()

Unnamed: 0,Zone_Number,Name,CAP,Longitude,Latitude
0,1,Centro Storico,20121,45.472955,9.187099
1,1,Centro Storico,20123,45.463065,9.177542
2,1,Centro Storico,20122,45.462565,9.198647
3,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20134,45.482773,9.2548
4,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20126,45.515929,9.215605


In [89]:
#-- Add missing info about population and density
df_Milan_Districts = pd.merge(df_Milan_Districts, df[['Population (2014)','Population Density','Name']], on='Name')
df_Milan_Districts.head()

Unnamed: 0,Zone_Number,Name,CAP,Longitude,Latitude,Population (2014),Population Density
0,1,Centro Storico,20121,45.472955,9.187099,96.315,11.074
1,1,Centro Storico,20123,45.463065,9.177542,96.315,11.074
2,1,Centro Storico,20122,45.462565,9.198647,96.315,11.074
3,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20134,45.482773,9.2548,153109.0,13.031
4,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20126,45.515929,9.215605,153109.0,13.031


Now that we have the DataFrame with all the geolocalization information ready, we can create a map of Milan and plot the points of interest that we are looking at. 

In [90]:
#-- Geopy
address = 'Milan, Italy'

geolocator = Nominatim(user_agent="Milan_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Milan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Milan are 45.4667971, 9.1904984.


In [91]:
#-- Print map of milan
# create a map of milan using latitude and longitude values
map_milan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat_1, lng_1, borough, neighborhood in zip(df_Milan_Districts['Latitude'], df_Milan_Districts['Longitude'], df_Milan_Districts['Zone_Number'], df_Milan_Districts['Name']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    #print(lat_1, lng_1)
    folium.CircleMarker(
        [lng_1, lat_1],
        radius=15,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.2,
        parse_html=False).add_to(map_milan)  
    
map_milan

Now, we have to insert the boundary conditions to our problem, i.e. the presence of an university college. There are several colleges in Milan. We upload the list of them in a proper dataframe so that we can consider only the marks that fall nearby college locations.

In [92]:
#-- Load university df
df_uni = pd.read_excel('List_University.xlsx')
df_uni.head(5)

#-- find latitude and longitude info for all the colleges
# Postal codes coordinates
postal_codes_uni = df_uni['CAP']    
coords_uni = [ get_latlng(postal_code) for postal_code in postal_codes_uni.tolist() ]

In [93]:
#-- Add latitude and longitude values to df_uni DataFrame
long_uni =[]
lat_uni =[]
for i in range(len(coords_uni)):
    lat_uni.append(coords_uni[i][0])
    long_uni.append(coords_uni[i][1])
    
df_uni['Longitude']=long_uni
df_uni['Latitude']=lat_uni

#-- Display the map of Milan with the unique locations and superimpose the location of the colleges
#-- Geopy
address = 'Milan, Italy'

geolocator = Nominatim(user_agent="Milan_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#-- Print map of milan
# create a map of milan using latitude and longitude values
map_milan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat_1, lng_1, borough, neighborhood in zip(df_Milan_Districts['Latitude'], df_Milan_Districts['Longitude'], df_Milan_Districts['Zone_Number'], df_Milan_Districts['Name']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    #print(lat_1, lng_1)
    folium.CircleMarker(
        [lng_1, lat_1],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.2,
        parse_html=False).add_to(map_milan)  
for lat_2, lng_2, borough_1 in zip(df_uni['Latitude'], df_uni['Longitude'], df_uni['University_name']):
    label = '{}'.format(borough_1)
    label = folium.Popup(label, parse_html=True)
    #print(lat_1, lng_1)
    folium.CircleMarker(
        [lat_2, lng_2],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.1,
        parse_html=False).add_to(map_milan) 

    
map_milan

<h3>Methodology

Thanks to the map representation where the main milan districts are joined with the loacation of milan universities, we can understand the neighborhoods that have to considered for our analysis. These are all the neighborhoods where at least 1 college is present. We can thus exclude all the blue points (i.e. the districts) where do not include the green one (i.e. the university). 

In the next step, the FORESQUARE API is exploited to extract all the venues nearby the districts identified for the analysis. For this extraction a maximum number of 100 nearby venues in a radius of 700 meters from the identified location are considered. The purpose is to detect which is the distribution of the frequency of each venue in order to detect the district where the number of pizza restaurants is the less.

K-means is thus exploited to clusterize all the venues. Finally, the detected clusters are analyzed to determine the best place to open the restaurant.

<h3>Results and Discussion

The final dataframe thus contains all the info regarding the districts where at least 1 college is present. This dataframe is defined as it follows:

In [94]:
df_new = pd.merge(df_Milan_Districts, df_uni, on='CAP')
df_new = df_new.drop(columns=['Longitude_y','Latitude_y'])
df_new = df_new.rename(columns={'Longitude_x':'Latitude', 'Latitude_x':'Longitude'})
df_new.head()

Unnamed: 0,Zone_Number,Name,CAP,Latitude,Longitude,Population (2014),Population Density,University_name
0,1,Centro Storico,20121,45.472955,9.187099,96.315,11.074,Università degli Studi di Milano
1,1,Centro Storico,20121,45.472955,9.187099,96.315,11.074,Accademia delle belle arti di brera
2,1,Centro Storico,20123,45.463065,9.177542,96.315,11.074,Università Cattolica Sacro Cuore
3,1,Centro Storico,20122,45.462565,9.198647,96.315,11.074,Universita degli Studi di Milano
4,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20126,45.515929,9.215605,153109.0,13.031,Università degli studi di Milano Bicocca


Now, we have to extract from FORESQUARE API all the most relevant venues for each designated district. Then, we apply k-means clustering algorithm to cluster all the found venues.

In [98]:
#-- Connect to FORESQUARE API
CLIENT_ID = 'XNGMCE4ZGZCUHR5F2F20ADU4RXX3P42A3T3EWLYMXIKGEW2U' # your Foursquare ID
CLIENT_SECRET = 'J21VBQV3SQFKCIUHMEI0MR4LEOVO0Y3O4L21S2454TXHASJR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

#-- Define the neighborhood
#df_new.loc[0, 'Name']
#first_guess_latitude = df_new.loc[0, 'Latitude'] # neighborhood latitude valu
#first_guess_longitude = df_new.loc[0, 'Longitude'] # neighborhood longitude value
#first_guess_name = df_new.loc[0, 'Name'] # neighborhood name

#-- Top 100 venues
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 700 # define radius
# create URL
#url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#    CLIENT_ID, 
#    CLIENT_SECRET, 
#    VERSION, 
#    first_guess_latitude, 
#    first_guess_longitude, 
#    radius, 
#    LIMIT)
#url # display URL

Your credentials:
CLIENT_ID: XNGMCE4ZGZCUHR5F2F20ADU4RXX3P42A3T3EWLYMXIKGEW2U
CLIENT_SECRET:J21VBQV3SQFKCIUHMEI0MR4LEOVO0Y3O4L21S2454TXHASJR


In [96]:
#results = requests.get(url).json()

In [99]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [100]:
#venues = results['response']['groups'][0]['items']
#nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
#filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
#nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
#nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
#nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

#nearby_venues.head()

In [101]:
#-- Function to extract all the venues for each selected district in milan (df_new)
def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [102]:
#-- Get all the Data
Milan_venues = getNearbyVenues(names=df_new['Name'],
                                   latitudes=df_new['Latitude'],
                                   longitudes=df_new['Longitude']
                                  )

Centro Storico
Centro Storico
Centro Storico
Centro Storico
Stazione Centrale, Gorla, Turro, Greco, Crescenzago
Stazione Centrale, Gorla, Turro, Greco, Crescenzago
Città Studi, Lambrate, Porta Venezia
Città Studi, Lambrate, Porta Venezia
Città Studi, Lambrate, Porta Venezia
Vigentino, Chiaravalle, Gratosoglio
Vigentino, Chiaravalle, Gratosoglio
Vigentino, Chiaravalle, Gratosoglio
Barona, Lorenteggio
Barona, Lorenteggio
Fiera, Gallaratese, Quarto Oggiaro
Porta Garibaldi, Niguarda
Porta Garibaldi, Niguarda
Porta Garibaldi, Niguarda


In [103]:
print(Milan_venues.shape)
Milan_venues.head()

(977, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro Storico,45.472955,9.187099,Pinacoteca di Brera,45.471979,9.188128,Art Museum
1,Centro Storico,45.472955,9.187099,Di Viole Di Liquirizia,45.47146,9.185336,Cupcake Shop
2,Centro Storico,45.472955,9.187099,N'Ombra de Vin,45.473452,9.187873,Wine Bar
3,Centro Storico,45.472955,9.187099,Palazzo di Brera,45.472019,9.188043,College Arts Building
4,Centro Storico,45.472955,9.187099,Antica Osteria Stendhal,45.473978,9.187678,Italian Restaurant


Let's group the venues

In [104]:
print('There are {} uniques categories.'.format(len(Milan_venues['Venue Category'].unique())))

There are 164 uniques categories.


We want to investigate the frequency for each venue:

In [105]:
#-- Determine the frequency per each event
# one hot encoding
Milan_onehot = pd.get_dummies(Milan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Milan_onehot['Neighborhood'] = Milan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Milan_onehot.columns[-1]] + list(Milan_onehot.columns[:-1])
Milan_onehot = Milan_onehot[fixed_columns]

Milan_onehot.head()

Milan_onehot.shape

Milan_grouped = Milan_onehot.groupby('Neighborhood').mean().reset_index()

Milan_grouped.shape

num_top_venues = 5

for hood in Milan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Milan_grouped[Milan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')



----Barona, Lorenteggio----
                venue  freq
0  Italian Restaurant  0.12
1                Café  0.11
2         Pizza Place  0.07
3         Supermarket  0.05
4      Ice Cream Shop  0.04


----Centro Storico----
                venue  freq
0  Italian Restaurant  0.12
1               Hotel  0.06
2      Ice Cream Shop  0.06
3            Boutique  0.05
4                Café  0.04


----Città Studi, Lambrate, Porta Venezia----
                venue  freq
0                Café  0.17
1         Pizza Place  0.17
2  Italian Restaurant  0.17
3               Hotel  0.11
4    Department Store  0.06


----Fiera, Gallaratese, Quarto Oggiaro----
                    venue  freq
0             Pizza Place  0.15
1                   Hotel  0.11
2             Supermarket  0.07
3                     Gym  0.07
4  Thrift / Vintage Store  0.04


----Porta Garibaldi, Niguarda----
            venue  freq
0            Café  0.12
1     Pizza Place  0.11
2     Supermarket  0.04
3  Ice Cream Shop  0.03
4  

We include now all the results in a Pandas DataFrame

In [106]:
#-- Put results into Pandas Dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Milan_grouped['Neighborhood']

for ind in np.arange(Milan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Milan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Barona, Lorenteggio",Italian Restaurant,Café,Pizza Place,Supermarket,Restaurant,Ice Cream Shop,Gym,Rock Club,Concert Hall,Pub
1,Centro Storico,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
2,"Città Studi, Lambrate, Porta Venezia",Italian Restaurant,Café,Pizza Place,Hotel,Garden Center,Department Store,Wine Bar,Cafeteria,Soccer Field,Furniture / Home Store
3,"Fiera, Gallaratese, Quarto Oggiaro",Pizza Place,Hotel,Gym,Supermarket,Italian Restaurant,Plaza,Bus Stop,Mexican Restaurant,Café,Moroccan Restaurant
4,"Porta Garibaldi, Niguarda",Café,Pizza Place,Supermarket,Italian Restaurant,Plaza,Restaurant,Bus Station,Diner,Hotel,Ice Cream Shop


We finally apply K-means to cluster all the venues

In [109]:
#-- Apply k-means clustering algorithm
#-- K-means
# set number of clusters
kclusters = 4

Milan_grouped_clustering = Milan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Milan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:7]

Milan_merged = df_new[0:7]

# add clustering labels
Milan_merged['Cluster Labels'] = kmeans.labels_

# merge milan_grouped with milan_Districts to add latitude/longitude for each neighborhood
Milan_merged = Milan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Name')

Milan_merged.head() 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Zone_Number,Name,CAP,Latitude,Longitude,Population (2014),Population Density,University_name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Centro Storico,20121,45.472955,9.187099,96.315,11.074,Università degli Studi di Milano,0,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
1,1,Centro Storico,20121,45.472955,9.187099,96.315,11.074,Accademia delle belle arti di brera,2,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
2,1,Centro Storico,20123,45.463065,9.177542,96.315,11.074,Università Cattolica Sacro Cuore,3,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
3,1,Centro Storico,20122,45.462565,9.198647,96.315,11.074,Universita degli Studi di Milano,1,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
4,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...",20126,45.515929,9.215605,153109.0,13.031,Università degli studi di Milano Bicocca,0,Café,Italian Restaurant,Pizza Place,Ice Cream Shop,Hotel,Sushi Restaurant,Chinese Restaurant,Japanese Restaurant,Gym,Bar


Display the result of machine-learning clustering on the milan map

In [110]:
#-- Display the results
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)
# add markers to the map
markers_colors = []
for lat, lon, nei , cluster in zip(Milan_merged['Latitude'], Milan_merged['Longitude'], Milan_merged['Name'], Milan_merged['Cluster Labels']):
    label = folium.Popup(str(nei) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

['#8000ff', '#2adddd', '#d4dd80', '#ff0000']


We finally examine the different clusters to derive the final solution

In [111]:
#- Cluster 1
Milan_merged.loc[Milan_merged['Cluster Labels'] == 0,Milan_merged.columns[[2] + list(range(5, Milan_merged.shape[1]))]]

Unnamed: 0,CAP,Population (2014),Population Density,University_name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20121,96.315,11.074,Università degli Studi di Milano,0,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
4,20126,153109.0,13.031,Università degli studi di Milano Bicocca,0,Café,Italian Restaurant,Pizza Place,Ice Cream Shop,Hotel,Sushi Restaurant,Chinese Restaurant,Japanese Restaurant,Gym,Bar
5,20131,153109.0,13.031,Università degli Studi di Milano,0,Café,Italian Restaurant,Pizza Place,Ice Cream Shop,Hotel,Sushi Restaurant,Chinese Restaurant,Japanese Restaurant,Gym,Bar


In [112]:
#-- Cluster 2
Milan_merged.loc[Milan_merged['Cluster Labels'] == 1,Milan_merged.columns[[2] + list(range(5, Milan_merged.shape[1]))]]

Unnamed: 0,CAP,Population (2014),Population Density,University_name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,20122,96.315,11.074,Universita degli Studi di Milano,1,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant


In [113]:
#-- Cluster 3
Milan_merged.loc[Milan_merged['Cluster Labels'] == 2,Milan_merged.columns[[2] + list(range(5, Milan_merged.shape[1]))]]

Unnamed: 0,CAP,Population (2014),Population Density,University_name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,20121,96.315,11.074,Accademia delle belle arti di brera,2,Italian Restaurant,Ice Cream Shop,Hotel,Boutique,Café,Plaza,Wine Bar,Pizza Place,Sandwich Place,Restaurant
6,20132,141.229,10.785,Politecnico di Milano,2,Italian Restaurant,Café,Pizza Place,Hotel,Garden Center,Department Store,Wine Bar,Cafeteria,Soccer Field,Furniture / Home Store


Basing on these results, the best district is the Zone 1 "Centro Storico", where the highest number of universities is present (i.e. 4 colleges), we have a quite high population density (i.e. 96.315 people/km2) and the pizza place is the 8-th / 9-th most frequent venue.