# Capstone Project - Analysis of Venues for Corporate Event in Milan and for trips in Milan and Como in Italy
### Applied Data Science Capstone

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A multinational company based in Milan, Italy(say XYZ) is currently organizing a corporate event and has invited 
business executives from its offices spread all over the world for a 5 day corporate program. The company has to put
a good program, including a hotel of residence, event space and nearby venues in the Milan Area as the company is located there and also some more restaurants,shopping malls and fashion stores in some other areas spread over larger radius within the city and the company also intends to offer a one day tour to executives to the city of Como.

So the company’s purpose is to make a list of such places in Milan. The executives will be living in Milan and the
city can also be visited in the evenings after work. The company chooses to take executives for a visit to nearby venues
in the Milan area after work. The company also intends to offer a complete one day tour to the executives for visiting the city of Como(approx 50 km from Milan) in Italy on the 6th day(being Saturday) at the end of the week. 

We can use data science and obtain results for some hotels, event space and nearby venues in the Milan Area as the company is located there and also some more restaurants,shopping malls and fashion stores in some other areas spread over larger radius within the city. Similarly, we can also find out each neighborhood in Como with the top 10 common venues so that the company can choose the neighborhoods as per the interested venues/maximum no of venues in a neighborhood.

## Data <a name="data"></a>

Based on definition of our problem, We can list the data as below:
    
• I found the data of the first, second and third level Administrative Divisions of the entire Country of Italy 
 from http://www.geonames.org along with the geospatial data. I cleaned the data and grouped it from Country to State
 and then to City of Milan and Como.
• The json file has the coordinates and data of the city of Milan. I have cleaned it to obtain the required data for this project.
•  I used it to create a map of places in Milan and Como.
•  I have used Foursquare API to run the queries for hotels, event space and nearby venues in the Milan Area as the company is located there, also some more restaurants,shopping malls and fashion stores in some other areas spread over larger radius within the city. I have also used it to find top 10 most common venues for each neighnorhood for the city of Como in Italy.


In [1]:
#Importing necessary libraries
import pandas as pd
import numpy as np

In [2]:
#Loading tabular data by reading it from the file into pandas dataframe.
df=pd.read_table('IT.txt',header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,IT,67010,Barete,Abruzzi,1,L'Aquila,AQ,,,42.4501,13.2806,4
1,IT,67012,Cagnano Amiterno,Abruzzi,1,L'Aquila,AQ,,,42.4574,13.2279,4
2,IT,67012,San Giovanni,Abruzzi,1,L'Aquila,AQ,,,42.4642,13.2584,4
3,IT,67013,Mascioni,Abruzzi,1,L'Aquila,AQ,,,42.5282,13.3492,4
4,IT,67013,Campotosto,Abruzzi,1,L'Aquila,AQ,,,42.5588,13.368,4


In [3]:
df.shape

(18385, 12)

In [4]:
#Assigning columns names to each column as mentioned in the downloaded file. Admin1 corresponds to states of Italy, Admin2 
#corresponds to cities and Admin3 corresponds to places/neighborhoods in the cities.
df.columns = ["Country_code","postal_code","place_name","admin_name1","admin_code1","admin_name2","admin_code2","admin_name3","admin_code3","latitude","longitude","accuracy"]

In [5]:
df.head()

Unnamed: 0,Country_code,postal_code,place_name,admin_name1,admin_code1,admin_name2,admin_code2,admin_name3,admin_code3,latitude,longitude,accuracy
0,IT,67010,Barete,Abruzzi,1,L'Aquila,AQ,,,42.4501,13.2806,4
1,IT,67012,Cagnano Amiterno,Abruzzi,1,L'Aquila,AQ,,,42.4574,13.2279,4
2,IT,67012,San Giovanni,Abruzzi,1,L'Aquila,AQ,,,42.4642,13.2584,4
3,IT,67013,Mascioni,Abruzzi,1,L'Aquila,AQ,,,42.5282,13.3492,4
4,IT,67013,Campotosto,Abruzzi,1,L'Aquila,AQ,,,42.5588,13.368,4


In [6]:
#Deleting columns that are not needed
columns = ['admin_code1', 'admin_code2','admin_name3','admin_code3','accuracy']
df.drop(columns, inplace=True, axis=1)

In [7]:
df.head()

Unnamed: 0,Country_code,postal_code,place_name,admin_name1,admin_name2,latitude,longitude
0,IT,67010,Barete,Abruzzi,L'Aquila,42.4501,13.2806
1,IT,67012,Cagnano Amiterno,Abruzzi,L'Aquila,42.4574,13.2279
2,IT,67012,San Giovanni,Abruzzi,L'Aquila,42.4642,13.2584
3,IT,67013,Mascioni,Abruzzi,L'Aquila,42.5282,13.3492
4,IT,67013,Campotosto,Abruzzi,L'Aquila,42.5588,13.368


In [8]:
df.shape

(18385, 7)

__Note that there are 18,385 rows in the above table as it contains data of the entire country of Italy. Hence, we will use  groupby functions to obtain data for the state Lombardia and respective cities of Milan and Como for the project__

In [9]:
df1=df.groupby('admin_name1')

In [10]:
#Obtaining data for the state 'Lombardia'
df2=df1.get_group('Lombardia')

In [11]:
df2.head()

Unnamed: 0,Country_code,postal_code,place_name,admin_name1,admin_name2,latitude,longitude
6598,IT,24010,Ubiale,Lombardia,Bergamo,45.7835,9.6165
6599,IT,24010,Peghera,Lombardia,Bergamo,45.8731,9.5692
6600,IT,24010,Roncobello,Lombardia,Bergamo,45.9557,9.7523
6601,IT,24010,Piazzolo,Lombardia,Bergamo,45.9798,9.6705
6602,IT,24010,Taleggio,Lombardia,Bergamo,45.8934,9.5648


In [12]:
df2.shape

(2500, 7)

In [13]:
#Assigning the required columns to another dataframe
data=df2[['postal_code','admin_name2','place_name','latitude','longitude']]

In [14]:
data.head()

Unnamed: 0,postal_code,admin_name2,place_name,latitude,longitude
6598,24010,Bergamo,Ubiale,45.7835,9.6165
6599,24010,Bergamo,Peghera,45.8731,9.5692
6600,24010,Bergamo,Roncobello,45.9557,9.7523
6601,24010,Bergamo,Piazzolo,45.9798,9.6705
6602,24010,Bergamo,Taleggio,45.8934,9.5648


In [15]:
#Renaming columns with the appropriate names i.e Borough for Cities and Neighborhoods for the places in the city
neighborhoodb = data.rename(columns={'postal_code':'PostalCode','admin_name2':'Borough','place_name':'Neighborhood','latitude':'Latitude','longitude':'Longitude'})
neighborhoodb.reset_index(inplace = True, drop = True) 

In [16]:
neighborhoodb.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,24010,Bergamo,Ubiale,45.7835,9.6165
1,24010,Bergamo,Peghera,45.8731,9.5692
2,24010,Bergamo,Roncobello,45.9557,9.7523
3,24010,Bergamo,Piazzolo,45.9798,9.6705
4,24010,Bergamo,Taleggio,45.8934,9.5648


In [17]:
neighborhoodb.shape

(2500, 5)

In [18]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoodb['Borough'].unique()),
        neighborhoodb.shape[0]
    )
)

The dataframe has 12 boroughs and 2500 neighborhoods.


In [19]:
#Unique boroughs in each neighborhood
neighborhoodb['Borough'].unique()

array(['Bergamo', 'Brescia', 'Como', 'Cremona', 'Lecco', 'Lodi',
       'Monza e Brianza', 'Milano', 'Mantova', 'Pavia', 'Sondrio',
       'Varese'], dtype=object)

In [20]:
#Using groupby to obtain data for specific cities 
neighborhoodc=neighborhoodb.groupby('Borough')

In [21]:
#Obtaining data for the city of  Milano
neighborhoodsh=neighborhoodc.get_group('Milano')

In [22]:
neighborhoodsh.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
1440,20010,Milano,Cornaredo,45.5013,9.0268
1441,20010,Milano,Mesero,45.5023,8.8554
1442,20010,Milano,Rogorotto,45.5062,8.9694
1443,20010,Milano,San Giorgio Su Legnano,45.5738,8.9137
1444,20010,Milano,Vanzago,45.5257,8.991


In [23]:
#Resetting the index
neighborhoodsh.reset_index(inplace = True, drop = True)
neighborhoodsh.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,20010,Milano,Cornaredo,45.5013,9.0268
1,20010,Milano,Mesero,45.5023,8.8554
2,20010,Milano,Rogorotto,45.5062,8.9694
3,20010,Milano,San Giorgio Su Legnano,45.5738,8.9137
4,20010,Milano,Vanzago,45.5257,8.991


In [24]:
neighborhoodsh.shape

(279, 5)

In [25]:
#Assigning Milano location data columns to separate dataframe in order to merge neighbords with common postal code & borough
geo=neighborhoodsh[['PostalCode','Latitude','Longitude']]
geo.head()


Unnamed: 0,PostalCode,Latitude,Longitude
0,20010,45.5013,9.0268
1,20010,45.5023,8.8554
2,20010,45.5062,8.9694
3,20010,45.5738,8.9137
4,20010,45.5257,8.991


__As there are several location values for the same postal code, lets group the data by postal code and take the mean
of latitude and longitude values for each postal code__

In [26]:
geo1 = geo.groupby('PostalCode').mean().reset_index()

In [27]:
geo1.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,20010,45.508617,8.918033
1,20011,45.4672,8.9187
2,20012,45.5068,8.8155
3,20013,45.4646,8.8845
4,20014,45.5534,8.9792


__Please note that are no un-assigned Boroughs or Neighborhoods in the file__

In [28]:
#Merging neighborhoods in the same row with a comma having the same postal code and borough in Como
neighborsh= neighborhoodsh.groupby(['PostalCode','Borough'])['Neighborhood'].agg( ','.join).reset_index()

In [29]:
neighborsh.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,20010,Milano,"Cornaredo,Mesero,Rogorotto,San Giorgio Su Legn..."
1,20011,Milano,"Corbetta,Battuello,Cerello"
2,20012,Milano,Cuggiono
3,20013,Milano,"Magenta,Ponte Nuovo"
4,20014,Milano,"Nerviano,Sant'Ilario Milanese"


In [30]:
#Merging the postalcode, borough and neighborhood table with geospatial data table to create a new dataframe
neighborhoods = pd.merge(neighborsh, geo1,on=['PostalCode'], how='inner',sort=False)

In [31]:
#Resetting the index
neighborhoods.reset_index(inplace = True, drop = True)
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,20010,Milano,"Cornaredo,Mesero,Rogorotto,San Giorgio Su Legn...",45.508617,8.918033
1,20011,Milano,"Corbetta,Battuello,Cerello",45.4672,8.9187
2,20012,Milano,Cuggiono,45.5068,8.8155
3,20013,Milano,"Magenta,Ponte Nuovo",45.4646,8.8845
4,20014,Milano,"Nerviano,Sant'Ilario Milanese",45.5534,8.9792


In [32]:
neighborhoods.shape

(96, 5)

__Getting data for the Como city__

In [33]:
#Getting data for the Como city
neighs=neighborhoodc.get_group('Como')
neighs.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
731,22010,Como,San Gregorio,46.1464,9.3009
732,22010,Como,Pianello Del Lario,46.1028,9.2769
733,22010,Como,San Siro,46.0657,9.2688
734,22010,Como,Piano Porlezza,46.0333,9.1667
735,22010,Como,San Bartolomeo Val Cavargna,46.0835,9.1491


In [34]:
neighs.reset_index(inplace = True, drop = True)
neighs.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,22010,Como,San Gregorio,46.1464,9.3009
1,22010,Como,Pianello Del Lario,46.1028,9.2769
2,22010,Como,San Siro,46.0657,9.2688
3,22010,Como,Piano Porlezza,46.0333,9.1667
4,22010,Como,San Bartolomeo Val Cavargna,46.0835,9.1491


In [35]:
#Assigning Como location data columns to separate dataframe
geodata=neighs[['PostalCode','Latitude','Longitude']]
geodata.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,22010,46.1464,9.3009
1,22010,46.1028,9.2769
2,22010,46.0657,9.2688
3,22010,46.0333,9.1667
4,22010,46.0835,9.1491


__As there are several location values for the same postal code, we can group the data by postal code and take the mean
of latitude and longitude values for each postal code__

In [36]:
geodata1 = geodata.groupby('PostalCode').mean().reset_index()

In [37]:
geodata1.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,22010,46.049728,9.209642
1,22011,45.99195,9.23585
2,22012,45.8431,9.0719
3,22013,46.15615,9.3333
4,22014,46.13,9.2817


__Please note that are no un-assigned Boroughs or Neighborhoods in the file__

In [38]:
#Merging neighborhoods in the same row with a comma having the same postal code and borough in Como
neighbors= neighs.groupby(['PostalCode','Borough'])['Neighborhood'].agg( ','.join).reset_index()

In [39]:
neighbors.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,22010,Como,"San Gregorio,Pianello Del Lario,San Siro,Piano..."
1,22011,Como,"Griante,Cadenabbia"
2,22012,Como,Cernobbio
3,22013,Como,"Vercana,Domaso"
4,22014,Como,Dongo
5,22015,Como,Gravedona
6,22016,Como,"Lenno,Tremezzina"
7,22017,Como,Menaggio
8,22018,Como,"Porlezza,Cima"
9,22019,Como,Tremezzo


In [40]:
neighbors.shape

(50, 3)

In [41]:
#Merging the postalcode, borough and neighborhood table with geospatial data table to create a new dataframe
cneighborhoods = pd.merge(neighbors, geodata1,on=['PostalCode'], how='inner',sort=False)
       
cneighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,22010,Como,"San Gregorio,Pianello Del Lario,San Siro,Piano...",46.049728,9.209642
1,22011,Como,"Griante,Cadenabbia",45.99195,9.23585
2,22012,Como,Cernobbio,45.8431,9.0719
3,22013,Como,"Vercana,Domaso",46.15615,9.3333
4,22014,Como,Dongo,46.13,9.2817


In [42]:
cneighborhoods.shape

(50, 5)

In [43]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          91 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##

## 1. Looking for hotels,event space, shopping malls, fashion stores, restaurants and cafes in Milan Area of the City and also other areas in the City.

__Use geopy library to get the latitude and longitude values of Milan City__

In [44]:
address = 'Milan, Italy'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Milano are {}, {}.'.format(latitude, longitude))

GeocoderServiceError: [Errno 99] Cannot assign requested address

__Create a map of Milan with neighborhoods superimposed on top.__

In [None]:
# create map of Milano using latitude and longitude values
map_milano= folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milano)  
    
map_milano

__Defining Foursquare Credentials and Version__

In [None]:
CLIENT_ID = 'FFABTQD45IG2BFTA1MKTBBFB2PAYVFBDOL2TQ2VVJDXHOUTC' #  Foursquare ID
CLIENT_SECRET = '1JCR2KTRTT41H2NQDN13C4BGDGVAFZGQOGC0Q3UVMV02YIV1' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

__We are interested in looking for a hotel in the Area/Neighborhood Milano of the Borough Milano with postal code 20121 as the company is also located there.__
__From the above data, we know the location of the Neighborhood Milano in the Borough Milano__

In [None]:
neighborhoods.loc[58,'Neighborhood']

In [None]:
neighborhood_latitude = neighborhoods.loc[58, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[58, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[58, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

__Foursquare__

Let's use Foursquare API to get info on hotels,event space and nearby venues in the Milan area of the city ad also info on
shopping malls, fashion stores, restaurants spread over larger radius in the city of Milan,Italy.

We will specifying search queries for hotels,nearby venues in the required area and specific search queries for shopping malls, fashion stores, restaurants to obain results for each of the above. 

We will also be using Foursquare to obtain venues data for the city of Como.

__We will obtain hotel data within a radius of 500 meters from the Milan Area and then obtain data for the__
__required postal code__

In [None]:
#Now, let's input a search query for top 100 hotels that are in Milan area within a radius of 500 meters.
search_query = 'Hotel'
LIMIT=100
radius = 500

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'\
.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, search_query, radius, LIMIT)
url

In [None]:
# Send the GET Request and examine the results
results = requests.get(url).json()
#results

In [None]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

__Cleaning hotel dataframe__

In [None]:
# keep only columns that include venue name, and anything that is associated with location
clean_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')]+ ['id']
clean_dataframe = dataframe.loc[:,clean_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
clean_dataframe['categories'] = clean_dataframe.apply(get_category_type, axis=1)

# clean column names by keeping only last term
clean_dataframe.columns = [column.split('.')[-1] for column in clean_dataframe.columns]

clean_dataframe.head(5)

In [None]:
clean_dataframe.shape

In [None]:
# delete unnecessary columns
clean_dataframe2= clean_dataframe.drop(['labeledLatLngs','distance','cc','neighborhood', 'state', 'country', 'crossStreet', 'formattedAddress', 'id'], axis=1)
clean_dataframe2.head()

In [None]:

# delete rows with none values
clean_dataframe3 = clean_dataframe2.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
clean_dataframe3.head()

In [None]:
# delete rows which its category is not Hotel or Event Space
array= ['Hotel', 'Event Space']
hotel_dataframe= clean_dataframe3.loc[clean_dataframe3['categories'].isin(array)]
hotel_dataframe.head()

In [None]:
# delete rows which has duplicate hotel's name
df_hotels = hotel_dataframe.drop_duplicates(subset='name', keep="first")
df_hotels.head()

In [None]:
#Checking all the hotels within a radius of 500 meters
df_hotels.shape

In [None]:
# choose the hotels and event space with the postal code 20121 in Milano 
df_hotel = df_hotels[df_hotels.postalCode == '20121']
df_hotel.head()

In [None]:
df_hotel.reset_index(inplace = True, drop = True) 
df_hotel

In [None]:
##Checking the no of hotels
df_hotel.shape

__From the above data,we have 9 hotels and 1 event space in the Milan Area with postal code 20121.__
__The company can choose the hotels and event space from the above data.__

#### Now, let's input a query to find the top 100 venues that are in Milan Area within a radius of 500 meters.

In [None]:
# type your answer here
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

In [None]:
results1 = requests.get(url).json()
#results1

In [None]:
venues = results1['response']['groups'][0]['items']
    
nearby_venues1 = json_normalize(venues) # flatten JSON
nearby_venues1.head()

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.formattedAddress']
nearby_venues2 =nearby_venues1.loc[:, filtered_columns]

# filter the category for each row
nearby_venues2['venue.categories'] = nearby_venues2.apply(get_category_type, axis=1)

# clean columns
nearby_venues2.columns = [col.split(".")[-1] for col in nearby_venues2.columns]
nearby_venues2.reset_index(inplace=True,drop=True)
nearby_venues2.head()

In [None]:
# Choose the venues other than Hotel
nearby_venue3 = nearby_venues2[nearby_venues2.categories != 'Hotel']
nearby_venue3.head()

In [None]:
# delete rows with none values
nearby_venue3 = nearby_venue3.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

In [None]:
nearby_venue3.reset_index(inplace = True, drop = True) 
nearby_venue3.head()

In [None]:
nearby_venue3.shape

__Data analysis of the above is carried out in the analaysis section__

__We can run some specific search queries for restaurants, fashion stores and shopping malls  in also areas other than Milan Area of the city to give the company a few more options of the same.This is not related to any specific neighborhood in the city and uses location data(latitude and longitude values)of the Milan City.__


## Search for restaurants

In [None]:
# search for Restaurants within radius of 10000 meters
search_query = 'Restaurant'
radius = 10000

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

In [None]:

# Send the GET Request and examine the results
Rresults = requests.get(url).json()

In [None]:

# assign relevant part of JSON to venues
venues = Rresults['response']['venues']

# tranform venues into a dataframe
Restaurant_dataframe = json_normalize(venues)
Restaurant_dataframe.head()

In [None]:
# keep only columns that include venue name, and anything that is associated with location
Restaurant_clean_columns = ['name', 'categories'] + [col for col in Restaurant_dataframe.columns if col.startswith('location.')]+ ['id']
clean_Restaurant_dataframe = Restaurant_dataframe.loc[:,Restaurant_clean_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list3 = row['categories']
    except:
        categories_list3 = row['venue.categories']
        
    if len(categories_list3) == 0:
        return None
    else:
        return categories_list3[0]['name']

# filter the category for each row
clean_Restaurant_dataframe['categories'] = clean_Restaurant_dataframe.apply(get_category_type, axis=1)

# clean column names by keeping only last term
clean_Restaurant_dataframe.columns = [column.split('.')[-1] for column in clean_Restaurant_dataframe.columns]

clean_Restaurant_dataframe.head()

In [None]:

# delete unnecessary columns
clean_Restaurant_dataframe2= clean_Restaurant_dataframe.drop(['labeledLatLngs','distance','cc','neighborhood', 'state', 'country', 'crossStreet', 'formattedAddress', 'id'], axis=1)
clean_Restaurant_dataframe2.head()

In [None]:
# delete rows with none values
df_Restaurant = clean_Restaurant_dataframe2.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df_Restaurant.head(5)

In [None]:
df_Restaurant.reset_index(inplace = True, drop = True) 
df_Restaurant.head()

In [None]:
#Checking the no of restaurants
df_Restaurant.shape

__The company can also choose restaurants from the above 31 restaurants with different categories of food in also others areas of the City for team dinner of executives.__

__Milan is the city famous for fashion. Lets run a query to obtain results of fashion stores also in others areas.__
__of the City.__

In [None]:
# search for fashion stores within radius of 20000 meters.
search_query = 'Fashion Store'
radius = 20000
LIMIT=200

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

In [None]:

# Send the GET Request and examine the results
Fresults = requests.get(url).json()

In [None]:
# assign relevant part of JSON to venues
venues = Fresults['response']['venues']

# tranform venues into a dataframe
Fashion_dataframe = json_normalize(venues)
Fashion_dataframe.head()

In [None]:
# keep only columns that include venue name, and anything that is associated with location
Fashion_clean_columns = ['name', 'categories'] + [col for col in Fashion_dataframe.columns if col.startswith('location.')]+ ['id']
clean_fashion_dataframe = Fashion_dataframe.loc[:,Fashion_clean_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list3 = row['categories']
    except:
        categories_list3 = row['venue.categories']
        
    if len(categories_list3) == 0:
        return None
    else:
        return categories_list3[0]['name']

# filter the category for each row
clean_fashion_dataframe['categories'] = clean_fashion_dataframe.apply(get_category_type, axis=1)

# clean column names by keeping only last term
clean_fashion_dataframe.columns = [column.split('.')[-1] for column in clean_fashion_dataframe.columns]

clean_fashion_dataframe.head()

In [None]:
clean_fashion_dataframe.shape

In [None]:
# delete unnecessary columns
clean_fashion_dataframe= clean_fashion_dataframe.drop(['labeledLatLngs','distance','cc', 'state', 'country', 'crossStreet', 'formattedAddress', 'id','neighborhood'], axis=1)
clean_fashion_dataframe.head()

In [None]:
# delete rows with none values
df_fashion = clean_fashion_dataframe.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df_fashion.head()

In [None]:
df_fashion.reset_index(inplace = True, drop = True) 
df_fashion.head(5)

In [None]:
df_fashion.shape

__The company can also suggest the above 34 fashion stores for shopping to the executives in the city.__

__Lets run a query to obtain results of the shopping mall also in others areas__

In [None]:
# search for Shopping Malls within radius of 10000 meters
search_query = 'Shopping mall'
radius = 10000

# Define the corresponding URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

In [None]:
# Send the GET Request and examine the results
sresults = requests.get(url).json()

In [None]:
# assign relevant part of JSON to venues
venues = sresults['response']['venues']

# tranform venues into a dataframe
S_dataframe = json_normalize(venues)
S_dataframe.head()

In [None]:
# keep only columns that include venue name, and anything that is associated with location
shopping_clean_columns = ['name', 'categories'] + [col for col in S_dataframe.columns if col.startswith('location.')]+ ['id']
clean_shopping_dataframe = S_dataframe.loc[:,shopping_clean_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list4 = row['categories']
    except:
        categories_list4 = row['venue.categories']
        
    if len(categories_list4) == 0:
        return None
    else:
        return categories_list4[0]['name']

# filter the category for each row
clean_shopping_dataframe['categories'] = clean_shopping_dataframe.apply(get_category_type, axis=1)

# clean column names by keeping only last term
clean_shopping_dataframe.columns = [column.split('.')[-1] for column in clean_shopping_dataframe.columns]

clean_shopping_dataframe.head()

In [None]:
# delete unnecessary columns
clean_shopping_dataframe= clean_shopping_dataframe.drop(['cc', 'state', 'country', 'crossStreet', 'distance', 'formattedAddress',\
                                        'labeledLatLngs', 'id','neighborhood'], axis=1)
clean_shopping_dataframe.head()

In [None]:
# delete rows with none values
df_shopping = clean_shopping_dataframe.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
df_shopping.head(5)

In [None]:
df_shopping.reset_index(inplace = True, drop = True) 
df_shopping.head()

In [None]:
#Checking the no of shopping malls/places
df_shopping.shape

__The company can also suggest or take the executives for any specific shopping to the above 18 shopping malls/places.__

__Further, analysis is continued in the analysis section.__

## 2. Segmenting and Clustering neighborhoods in Como

In [None]:
#Viewing the dataframe of Como city extracted and cleaned from the country data
cneighborhoods.head()

In [None]:
#Getting co-ordinates of the city
address = 'Como,Italy'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude1 = location.latitude
longitude1 = location.longitude
print('The geograpical coordinate of Como are {}, {}.'.format(latitude1, longitude1))


In [None]:
# create map of Como using latitude and longitude values
map_Como = folium.Map(location=[latitude1, longitude1], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(cneighborhoods['Latitude'], cneighborhoods['Longitude'], cneighborhoods['Borough'], cneighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Como)  
    
map_Como

In [None]:
#Foursquare Credentials
CLIENT_ID = 'FFABTQD45IG2BFTA1MKTBBFB2PAYVFBDOL2TQ2VVJDXHOUTC' #  Foursquare ID
CLIENT_SECRET = '1JCR2KTRTT41H2NQDN13C4BGDGVAFZGQOGC0Q3UVMV02YIV1' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
#Exploring the first neighborhood in Como
cneighborhoods.loc[0, 'Neighborhood']

In [None]:
#Getting neighborhood latitude and longitude values
neighborhood_latitude = cneighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude =cneighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = cneighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

In [None]:
# Using Foursqaure to obtain top 100 venues data of the neighborhood with radius of 500 meters
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

In [None]:
#Send the GET request and examine the results
results2 = requests.get(url).json()
results2

__Since there arent lot of results for this neighborhood. We will explore results for all neighborhoods further__

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
#Let's create a function to repeat the same process to all the neighborhoods in Como
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
#Running the above function on each neighborhood and create a new dataframe called *como_venues*.

como_venues = getNearbyVenues(names=cneighborhoods['Neighborhood'],
                                   latitudes=cneighborhoods['Latitude'],
                                   longitudes=cneighborhoods['Longitude']
                                  )

In [None]:
print(como_venues.shape)
como_venues.head()

__Further, analysis is continued in the analysis section__

## Methodology <a name="methodology"></a>

In [None]:

In this project, we will focus on finding out hotels,event space and nearby venues in the Milan Area as the company is 
located there and also some more restaurants,shopping malls and fashion stores in some other areas spread over larger 
radius within the city for the corporate event(5 day program)to be organised by the Company for business executives. 
We will also find out top 10 common venues for each neighborhood of Como as the company intends to take the executives 
for a one-day tour to the city of Como.

In first step we have collected the required **data: hotels,event space and nearby venues such as shopping stores,
fashion stores,restaurants,monuments,cafes etc within a radius of 500 meters from the Milan Neighborhood in the city and 
also some other areas to have more options. We have also identified common venues for each neighborhood of Como.

Second step in our analysis will be calculation and exploration of categories of venues in the Milan Area 
and also venues in the Como City - we have used **bar charts** to identify the count of each category of venue in the Milan
Area and also venues in each neighborhood in the Como city.

In third and final step we will focus on interested areas of Milan and top 10 common venues for each neighborhood as
**clusters of locations in Como City.We will present map of all such locations for the Milan Area  and also some other areas
of the City. We will create a separate map with clusters (using **k-means clustering**) of neighborhoods with top 10 common 
venues in Como city to identify general neighborhoods / addresses which should be of interest to the company for choosing
venue locations.

## Analysis <a name="analysis"></a>

## 1. Lets first perform some analysis on the Milano data

In [None]:
#Venues obtained after cleaning data and deleting hotel information as we already have obtained hotel information earlier.
print('{} venues '.format(nearby_venue3.shape[0]))

__We can see the number of unique categories of venues in the Milan Area__

In [None]:
print("The number of unique categories of venues are",len(nearby_venue3['categories'].unique()))

__Lets see the kind of each unique category__

In [None]:
nearby_venue3['categories'].unique()

In [None]:
nearby_venue3.head()

In [None]:
nb=nearby_venue3.groupby(['categories']).size().reset_index(name='counts')
nb.head()

In [None]:
# use the inline backend to generate the plots within the browser
%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') #  for ggplot-like style

# for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

import matplotlib
for name, hex in matplotlib.colors.cnames.items():
    print(name, hex)

In [None]:
nb.set_index("categories",drop=True,inplace=True)


__Plotting a bar graph to check the count of each category of venues in the Milan Area of the city.__

In [None]:
nb.plot(kind='bar', figsize=(10,6),color='green',legend=None)

plt.xlabel('Categories of Venues in the Milan Area') # add to x-label to the plot
plt.ylabel('Count of each category') # add y-label to the plot
plt.title('Count of each category of venue in Milano') # add title to the plot

plt.show()

__The company can choose from 93 venues available for a short trip in the evenings in the Milan Area within a__
__radius of 500 meters from the Milan Area.There are 43 unique categories of venues like Italian__ __Restaurants,Ice Cream Shops, Plaza, Cafe and Pizza Place with the respective highest counts. Venues also include__
__some of the Art Galleries, Art Museums,Monuments, Cosmetics Store, Boutiques(which are also popular in the City) and__ 
__many others that can be viewed from the above graph.__

In [None]:
#This dataframe also includes hotels that fall within the radius of 500 meters from the Milan Area
nearby_venues2.head()

__Visualising all the venues that fall within the radius of 500 meters from the Milan Area in a folium map.__

In [None]:
# create map of Milano using latitude and longitude values
map_milan= folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, name, categories in zip(nearby_venues2['lat'], nearby_venues2['lng'], nearby_venues2['name'], nearby_venues2['categories']):
    label = '{}, {}'.format(name, categories)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
map_milan

__Since, the venues are very close to each other they can be easily visited in the Milan Area.__

__Lets merge the data of hotels(all hotels within 500 meters radius of the area) with restaurants,shopping malls and fashion stores obtained using specific search__
__queries in other areas of the Milan City.__

In [None]:
# Merged Data
nb_df = pd.concat([df_hotels, df_fashion,df_shopping,df_Restaurant], ignore_index=True)
nb_df.head()


In [None]:
nb_df.tail(5)

In [None]:
nb_df.shape

In [None]:
#Checking the count of each category in the merged dataframe
nb1=nb_df.groupby(['categories']).size().reset_index(name='counts')
nb1.head()

In [None]:
#Checking the type of each category
nb1.shape

In [None]:
nb1.plot(kind='bar', x= 'categories',figsize=(10,6),legend=None)

plt.xlabel('Categories of Venues in city of Milano spread over larger radius') # add to x-label to the plot
plt.ylabel('Count of each category') # add y-label to the plot
plt.title('Count of each category of venue in city of Milano spread over larger radius') # add title to the plot

plt.show()

__The company can choose from the 41 categories of places in the city of Milano spread over larger radius for any__ __specific type of hotel, restaurants,shopping malls and fashion stores.From the above graph, we can see that hotels(as they are already choosen from the filtered category)__ 
__have the highest count, followed by clothing stores,mobile phone shop,restaurants,sporting goods shop and others.__

In [None]:
# Generate map to visualize merged data of hotels,shopping malls,restaurants and fashion stores.
merged_map = folium.Map(location=[latitude, longitude], zoom_start=14)

for lat, lng, name, categories, address in zip(nb_df['lat'], nb_df['lng'], 
                                           nb_df['name'], nb_df['categories'],\
                                               nb_df['address']):
    label = '{}, {},{}'.format(name,categories,address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(merged_map)  
    
merged_map

__The company can choose from any one of the above 23 hotels,31 Restaurants,34 fashion stores and 18 shopping malls/places__.
__for any specific type of category of location spread over larger radius in the city of Milano.__

## 2. Further, lets do some analysis on Como data.

Analyzing each neighborhood

In [None]:
#Let's check how many venues were returned for each neighborhood
como_venues.groupby('Neighborhood').count()

In [None]:
print("The total no of neighborhoods in Como are",len(como_venues['Neighborhood']))

In [None]:
cg2=como_venues.groupby('Neighborhood').count()
cg2=cg2.reset_index()

In [None]:
cg2.shape

In [None]:
cg2.head()

In [None]:
cg3=cg2[['Neighborhood','Venue']]
cg3.head()

__We can see in the below mentioned graph that the neighborhood having the highest no of venues are 
Bellagio,San Giovanni di Bellagio with 50 venues, Como-Lora(written in short) with 41 venues,Mennaggio with 25 venues,
Mariano Comense,Perticato with 22 venues and Cernobbio with 22 venues followed by others.__

In [None]:
#Plotting a bar graph of no of venues for each neighborhood
cg3.plot(kind='bar', x= 'Neighborhood',y='Venue',figsize=(6,4),color='chocolate',legend=None)

plt.xlabel('Neighborhoods') # add to x-label to the plot
plt.ylabel('No of venues') # add y-label to the plot
plt.title('No of venues in each neighborhood of Como city') # add title to the plot

plt.show()

In [None]:
#Checking the number of unique categories
print('There are {} uniques categories of venues.'.format(len(como_venues['Venue Category'].unique())))

In [None]:
# one hot encoding
como_onehot = pd.get_dummies(como_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
como_onehot['Neighborhood'] = como_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [como_onehot.columns[-1]] + list(como_onehot.columns[:-1])
como_onehot = como_onehot[fixed_columns]

como_onehot.head()

In [None]:
como_onehot.shape

In [None]:
#let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
como_grouped = como_onehot.groupby('Neighborhood').mean().reset_index()
como_grouped.head(5)

In [None]:
como_grouped.shape

__Function to sort the venues in descending order.__

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
#create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = como_grouped['Neighborhood']

for ind in np.arange(como_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(como_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

__We will be using *k*-means to cluster the neighborhood into 5 clusters.__

In [None]:
# set number of clusters
kclusters = 5

como_grouped_clustering = como_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(como_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

como_merged = cneighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
como_merged = como_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

como_merged.head() # check the last columns!

In [None]:
# delete rows with none values
como_merged1 = como_merged.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
como_merged1.head(10)

In [None]:
#Converting the data type of CLuster Labels to integer
como_merged1["Cluster Labels"]= como_merged1["Cluster Labels"].astype(int) 

In [None]:
como_merged1.reset_index(inplace = True, drop = True) 
como_merged1.head(10)

In [None]:
como_merged1.head(5)

In [None]:
#Checking the size of the dataframe
como_merged1.shape

__Visualizing the resulting clusters on the map__

In [None]:
# create map
map_clusters = folium.Map(location=[latitude1, longitude1], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(como_merged1['Latitude'], como_merged1['Longitude'], como_merged1['Neighborhood'], como_merged1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

In [None]:
The above analysis show that for the corporate event(5 day program),the company can choose from the any of the 9 hotels and 
1 event space in the Milan Area nearby the Company location in the city and also several nearby venues available for a 
short trip in the evenigs within a radius of 500 meters from the Milan Area to various places in the city.There are 43 
unique categories which includes nearby venues like Italian Restaurants,Ice Cream Shops, Plaza, Cafe and Pizza Place with
the respective highest counts. Venues also include some of the Art Galleries, Art Museums,Monuments, Cosmetics Store, 
Boutiques(which are also popular in the City) and many others.

The company for any specific type of category of location can also choose from any one of the  23 hotels,31 Restaurants,
34 fashion stores and 18 shopping malls/places with 41 unique categories spread over larger(different for each venue)
radius in the city of Milano of venues to have some more options.

There are 45 neighborhoods in Como with 370 venues and 97 unique categories of venues. Top 5 neighborhoods with
highest no of venues are Bellagio,San Giovanni di Bellagio with 50 venues, Como-Lora(big set written in short) with 41 
venues,Mennaggio with 25 venues,Mariano Comense,Perticato with 22 venues and Cernobbio with 22 venues followed by other 
venues.Then,top 10 common venues for each neighborhood in Como are also identified.

Likewise, the company can also choose the neighborhoods in Como as per the top 10 common interested venues/maximum no of 
venues in a neighborhood and plan a one day tour to the city of Como for the executives.

In order to avoid confusion, I would like to clearly state that there are different Milan Neighborhoods/Areas
in the Milan City in the dataset I have obtained. They can be differentiated from each other by checking the postal 
code and also complete address can be easily identified as mentioned above for any interested venue.I would also like to
mention that data returned by Foursquare may vary slightly depending on the time of the day it is run.

I have just used one of the ways to suggest various venues, location and their categories for the city of Milan and Como. 
It is possible that a location may or may not be suitable for any reasons which may be subject to further analysis.
As Milan and Como are big cities,there is a complexity and very different approaches can be tried in clustering and 
classification studies. Moreover, it is obvious that not every classification method can yield the same high quality 
results for the city.

I used the Kmeans algorithm as part of this clustering study for the dataset I have obtained. One can also choose to do
a more detailed study and the details of the neighborhood or street can also be drilled.

I have ended the study by visualising the data of the locations in the Milan Area on one map and data spread over larger
radius of the city in another map. Data of the Como city is also visulaised in a separate map with clusters 
(using **k-means clustering**) of neighborhoods with top 10 common venues in the city to identify general neighborhoods
which should be of interest to the company for choosing venue locations.

## Conclusion <a name="conclusion"></a>

In [None]:
Purpose of this project was to identify hotels and event space(9 hotels and 1 event space are identified) in the areas 
close to the company location in the Milan Area of the Milan city for the corporate event and also nearby venues(93 venues  
are identified with 43 unique categories of venues) that can be visited in the evenings after work within a radius of 500 
meters from the Area.I have also identified 31 restaurants, 18 shopping malls and 34 fashion stores not specific to any 
neighborhood and spread over a larger radius in the city to give the company few more options of the same.

I have also identified 45 neighborhoods in Como with 370 venues and 97 unique categories of venues. Top 5 neighborhoods 
with highest no of venues are also identified.Then,top 10 common venues for each neighborhood in Como are also identified.

The company can accordingly choose neighborhoods as per the interested venues/maximum no of venues in a neighborhood for one 
day tour of executives to the Como city.

Final decision for choosing optimal venue locations will be made by the company based on specific characteristics of 
neighborhoods and locations in city of Milan and Como, taking into consideration additional factors like attractiveness of 
each location in terms of event for executives,tourist place and opening & closing times for visit.

Several other companies can use similar data analysis types for any type of event/tourist package that they are interested
in organising.