# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

This project aims to leverage the Foursquare location data to explore top places of interest by district in the city of Shanghai, China. The results should serve as a practical travel guide for visitors of Shanghai.

## Data <a name="data"></a>

The data to be used in the project include the following:

* district names of Shanghai
* coordinates of each district
* Foursquare location data such as restaurants, bars, hotels, shops, shopping malls and parks

Based on Foursquare location data, we will use the coordinates of each district in Shanghai to find top places of interest within a specific radius in each district. We will then cluster the districts into 5 clusters and present the top 10 venues for each district to travelers.

## Methodology <a name="methodology"></a>

* Based on coordinate data, first create a map of Shanghai consisting of 16 marked districts.
* Utilize the Foursquare API to explore each district to collect the top 100 venues within a radius of 1000 meters.
* Select the top 10 venues from each district and apply k-means machine learning methods to segment the districts into 5 clusters.
* Present a refined map of Shanghai with 16 districts clustered into 5 clusters.
* Examine each cluster and present top 10 places of interest in each district as a travel guide to travelers in Shanghai.

Finding the coordinates of Shanghai.

In [2]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Shanghai, China'

geolocator = Nominatim(user_agent="sh_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Shanghai, China are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Shanghai, China are 31.2253441, 121.4888922.


Read the CSV file containing coordinates of the 16 districts in Shanghai and turn it into a Pandas DataFrame.

In [3]:
import pandas as pd
df_districts = pd.read_csv('shanghai_districts_coordinates.csv')
df_districts.head()

Unnamed: 0,District,Latitude,Longitude
0,Baoshan,31.4055,121.4896
1,Changning,31.2204,121.4246
2,Fengxian,30.9178,121.474
3,Hongkou,31.2646,121.5051
4,Huangpu,31.2318,121.4844


Creating a map of Shanghai with markers for each district.

In [4]:
import folium
# create map of Shanghai
map_Shanghai = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, district, in zip(df_districts['Latitude'], df_districts['Longitude'], df_districts['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Shanghai)  
    
map_Shanghai

Utilizing the Foursquare API to explore and segment districts of Shanghai.

In [5]:
CLIENT_ID = '14T2OGYXAYSX1QFUKOGI5MDHQYOPYE5EKUA5UANZ3FZHA4T1'
CLIENT_SECRET = '4DRRLOE1OBYLBARPYTJMLHJQLMXUB4PQCQJUPXKOZ3UPUNIA'
VERSION = '20180605'

First, explore the Huangpu District by getting the top 100 venues within a radius of 1000 meters.

In [6]:
import requests
Huangpu_latitude = df_districts.loc[4, 'Latitude']
Huangpu_longitude = df_districts.loc[4, 'Longitude']
Huangpu_name = df_districts.loc[4, 'District']

LIMIT = 100
radius = 1000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c649dcb9fb6b72b0b797644'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Xiǎo dōng mén',
  'headerFullLocation': 'Xiǎo dōng mén, Shanghai',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 54,
  'suggestedBounds': {'ne': {'lat': 31.23434410900001,
    'lng': 121.49939721336264},
   'sw': {'lat': 31.216344090999993, 'lng': 121.47838718663735}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c53decc4623be9a6222d2f1',
       'name': 'City of God Temple (城隍庙)',
       'location': {'address': '249 Fangbang M Rd',
        'lat': 31.22785855139222,
        'lng': 121.48753566826004,
        'labeledLatLngs': [{'label': 'display',
 

In [7]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
#clean the json and structure it into a pandas dataframe
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,City of God Temple (城隍庙),Temple,31.227859,121.487536
1,Yu Garden (豫园),Garden,31.228922,121.487982
2,Shanghai Old Street (上海老街),Historic Site,31.227097,121.487065
3,Hotel Indigo Shanghai On The Bund (上海外灘英迪格酒店),Hotel,31.228193,121.495571
4,CHAR Bar,Hotel Bar,31.228209,121.495593


Repeating the same process to all districts in Shanghai and analyze each district.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
shanghai_venues = getNearbyVenues(names=df_districts['District'],
                                   latitudes=df_districts['Latitude'],
                                   longitudes=df_districts['Longitude']
                                  )

Baoshan 
Changning 
Fengxian 
Hongkou 
Huangpu 
Jiading 
Jing'an
Jinshan 
Minhang 
Pudong
Putuo
Qingpu 
Songjiang 
Xuhui 
Yangpu 
Chongming 


In [10]:
# one hot encoding
shanghai_onehot = pd.get_dummies(shanghai_venues[['Venue Category']], prefix="", prefix_sep="")

# add district column back to dataframe
shanghai_onehot['District'] = shanghai_venues['District'] 

# move district column to the first column
fixed_columns = [shanghai_onehot.columns[-1]] + list(shanghai_onehot.columns[:-1])
shanghai_onehot = shanghai_onehot[fixed_columns]

shanghai_onehot.head()

Unnamed: 0,District,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Bistro,Boat or Ferry,...,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Waterfront,Wine Bar,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zhejiang Restaurant
0,Baoshan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Baoshan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Baoshan,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,Baoshan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Changning,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [11]:
#group rows by district and by taking the mean of the frequency of occurrence of each category
shanghai_grouped = shanghai_onehot.groupby('District').mean().reset_index()
shanghai_grouped

Unnamed: 0,District,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Bistro,Boat or Ferry,...,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Waterfront,Wine Bar,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zhejiang Restaurant
0,Baoshan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Changning,0.0,0.0,0.010989,0.010989,0.0,0.010989,0.010989,0.0,0.0,...,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.010989,0.010989
2,Chongming,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Fengxian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Hongkou,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Huangpu,0.01,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0
6,Jiading,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Jing'an,0.0,0.0,0.0,0.01,0.04,0.02,0.0,0.01,0.0,...,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
8,Jinshan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Minhang,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,...,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Printing each district along with the top 10 most common venues.

In [13]:
num_top_venues = 10

for hood in shanghai_grouped['District']:
    print("--------"+hood+"--------")
    temp = shanghai_grouped[shanghai_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

--------Baoshan --------
                    venue  freq
0           Boat or Ferry  0.25
1                 Stadium  0.25
2        Department Store  0.25
3                    Park  0.25
4  Peking Duck Restaurant  0.00
5            Noodle House  0.00
6            Optical Shop  0.00
7         Other Nightlife  0.00
8        Pedestrian Plaza  0.00
9     American Restaurant  0.00


--------Changning --------
                 venue  freq
0          Coffee Shop  0.12
1   Chinese Restaurant  0.12
2                Hotel  0.07
3   Italian Restaurant  0.04
4                 Café  0.04
5  Japanese Restaurant  0.03
6                  Gym  0.03
7    Convenience Store  0.03
8         Noodle House  0.02
9   Dongbei Restaurant  0.02


--------Chongming --------
                    venue  freq
0      Chinese Restaurant   0.3
1             Bus Station   0.2
2             Coffee Shop   0.2
3    Fast Food Restaurant   0.1
4                    Park   0.1
5           Boat or Ferry   0.1
6     American Restaur

In [14]:
#function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creating the new dataframe and display the top 10 venues for each district.

In [15]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = shanghai_grouped['District']

for ind in np.arange(shanghai_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(shanghai_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Baoshan,Department Store,Park,Stadium,Boat or Ferry,Zhejiang Restaurant,Fish Market,Furniture / Home Store,French Restaurant,Food Court,Flea Market
1,Changning,Coffee Shop,Chinese Restaurant,Hotel,Café,Italian Restaurant,Convenience Store,Japanese Restaurant,Gym,Cantonese Restaurant,Shanghai Restaurant
2,Chongming,Chinese Restaurant,Bus Station,Coffee Shop,Boat or Ferry,Park,Fast Food Restaurant,Flea Market,Furniture / Home Store,French Restaurant,Food Court
3,Fengxian,Bus Stop,Fast Food Restaurant,Cantonese Restaurant,Chinese Restaurant,Shopping Mall,Coffee Shop,Pizza Place,Farmers Market,Electronics Store,Garden
4,Hongkou,Coffee Shop,Bakery,Chinese Restaurant,Café,Plaza,Taiwanese Restaurant,Multiplex,Sandwich Place,Shopping Mall,Chinese Breakfast Place


Running k-means to cluster the districts into 5 clusters.

In [16]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

shanghai_grouped_clustering = shanghai_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(shanghai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 4, 4, 4, 4, 4, 3, 4, 0, 4], dtype=int32)

In [17]:
# add clustering labels
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#create a new dataframe that includes the cluster as well as the top 10 venues for each district
shanghai_merged = df_districts

# add latitude/longitude for each district
shanghai_merged = shanghai_merged.join(district_venues_sorted.set_index('District'), on='District')

shanghai_merged.head()

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Baoshan,31.4055,121.4896,2,Department Store,Park,Stadium,Boat or Ferry,Zhejiang Restaurant,Fish Market,Furniture / Home Store,French Restaurant,Food Court,Flea Market
1,Changning,31.2204,121.4246,4,Coffee Shop,Chinese Restaurant,Hotel,Café,Italian Restaurant,Convenience Store,Japanese Restaurant,Gym,Cantonese Restaurant,Shanghai Restaurant
2,Fengxian,30.9178,121.474,4,Bus Stop,Fast Food Restaurant,Cantonese Restaurant,Chinese Restaurant,Shopping Mall,Coffee Shop,Pizza Place,Farmers Market,Electronics Store,Garden
3,Hongkou,31.2646,121.5051,4,Coffee Shop,Bakery,Chinese Restaurant,Café,Plaza,Taiwanese Restaurant,Multiplex,Sandwich Place,Shopping Mall,Chinese Breakfast Place
4,Huangpu,31.2318,121.4844,4,Hotel,Chinese Restaurant,French Restaurant,Lounge,Italian Restaurant,Restaurant,Coffee Shop,Dumpling Restaurant,Shanghai Restaurant,Seafood Restaurant


Creating a map of Shanghai with 16 districts clustered into 5 clusters.

In [18]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(shanghai_merged['Latitude'], shanghai_merged['Longitude'], shanghai_merged['District'], shanghai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examining each cluster and present results in cluster-specific dataframes as guide to travelers in Shanghai.

In [23]:
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 0, shanghai_merged.columns[[0] + list(range(3, shanghai_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Jinshan,0,Fast Food Restaurant,Shopping Mall,Cantonese Restaurant,Convenience Store,Chinese Restaurant,Coffee Shop,French Restaurant,Food Court,Flea Market,Zhejiang Restaurant
11,Qingpu,0,Fast Food Restaurant,Hotel,Coffee Shop,Shanghai Restaurant,Zhejiang Restaurant,Farmers Market,French Restaurant,Food Court,Flea Market,Fish Market
12,Songjiang,0,Fast Food Restaurant,Convenience Store,Metro Station,Light Rail Station,Indian Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Gastropub,Dumpling Restaurant


In [24]:
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 1, shanghai_merged.columns[[0] + list(range(3, shanghai_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Yangpu,1,Park,Bakery,Hotel,Seafood Restaurant,Farmers Market,French Restaurant,Food Court,Flea Market,Fish Market,Fast Food Restaurant


In [25]:
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 2, shanghai_merged.columns[[0] + list(range(3, shanghai_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Baoshan,2,Department Store,Park,Stadium,Boat or Ferry,Zhejiang Restaurant,Fish Market,Furniture / Home Store,French Restaurant,Food Court,Flea Market


In [26]:
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 3, shanghai_merged.columns[[0] + list(range(3, shanghai_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Jiading,3,Convenience Store,Stadium,Hotel,Garden,Fast Food Restaurant,Furniture / Home Store,French Restaurant,Food Court,Flea Market,Fish Market


In [27]:
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 4, shanghai_merged.columns[[0] + list(range(3, shanghai_merged.shape[1]))]]

Unnamed: 0,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Changning,4,Coffee Shop,Chinese Restaurant,Hotel,Café,Italian Restaurant,Convenience Store,Japanese Restaurant,Gym,Cantonese Restaurant,Shanghai Restaurant
2,Fengxian,4,Bus Stop,Fast Food Restaurant,Cantonese Restaurant,Chinese Restaurant,Shopping Mall,Coffee Shop,Pizza Place,Farmers Market,Electronics Store,Garden
3,Hongkou,4,Coffee Shop,Bakery,Chinese Restaurant,Café,Plaza,Taiwanese Restaurant,Multiplex,Sandwich Place,Shopping Mall,Chinese Breakfast Place
4,Huangpu,4,Hotel,Chinese Restaurant,French Restaurant,Lounge,Italian Restaurant,Restaurant,Coffee Shop,Dumpling Restaurant,Shanghai Restaurant,Seafood Restaurant
6,Jing'an,4,Café,Cocktail Bar,Spa,Coffee Shop,Bakery,Park,Dumpling Restaurant,Japanese Restaurant,Chinese Restaurant,Turkish Restaurant
8,Minhang,4,Coffee Shop,Café,Chinese Restaurant,Bakery,Fast Food Restaurant,Shopping Mall,Multiplex,Cantonese Restaurant,Burger Joint,Metro Station
9,Pudong,4,Coffee Shop,Convenience Store,Fast Food Restaurant,Hotel,Japanese Restaurant,French Restaurant,Park,Plaza,Ramen Restaurant,Movie Theater
10,Putuo,4,Seafood Restaurant,Department Store,Chinese Restaurant,Metro Station,Fish Market,Fast Food Restaurant,French Restaurant,Food Court,Flea Market,Zhejiang Restaurant
13,Xuhui,4,Coffee Shop,Chinese Restaurant,Hotel,Japanese Restaurant,Shopping Mall,Café,Fast Food Restaurant,Bakery,Dumpling Restaurant,Asian Restaurant
15,Chongming,4,Chinese Restaurant,Bus Station,Coffee Shop,Boat or Ferry,Park,Fast Food Restaurant,Flea Market,Furniture / Home Store,French Restaurant,Food Court


## Results and Discussion <a name="results"></a>

Our analysis segments the 16 districts of Shanghai into 5 clusters:
* Cluster 1: Jinshan, Qingpu and Songjiang
* Cluster 2: Yangpu
* Cluster 3: Baoshan
* Cluster 4: Jiading
* Cluster 5: Changning, Fengxian, Hongkou, Huangpu, Jing'an, Minhang, Pudong, Putuo, Xuhui and Chongming.

Restaurants are the most common places of interest in most districts/clusters, followed by coffee shops and shopping malls.

Cluster 5 contains most of the districts that are located in the city center of Shanghai, with a larger variety of places of interest than the other clusters, and is therefore recommended to travelers.

## Conclusion <a name="conclusion"></a>

The project has explored the 16 districts of Shanghai to find top places of interest based on Foursquare location data. The results indicate that the central area of the city encompasses the most districts with various restaurants, cafés, stores and shopping malls. A map of Shanghai with different clusters as well as tables of top venues is presented as a reference for travelers and visitors.