# Capstone Project - Move to a metropolis

## Table of contents
* [Introduction](#introduction)
* [Data](#data)

## Introduction <a name="introduction"></a>

New York City (NYC), Toronto, and Shanghai are all the financial centers of their respective countries, and are all among the top 10 financial centers around the world.<sup>1</sup> However, residents in the three cities are of different cultural backgrounds. Historically NYC and Toronto served as destinations for immigrants mostly from Europe, but there are mostly Chinese people living in Shanghai while it serves as an important trading port.<sup>2-4</sup> As a person from a less developed city, I am curious to learn more about lives in the three big cities.

The goal of this project is to find out the similarity and difference between the three big cities and hopefully the result can help people decide which one of the three is likely to be a better new home.

## Data <a name="data"></a>

In this project, we will be using the neighborhood information from the previous module for NYC and Toronto, together with the new information of neighborhoods in Shanghai from Wikipedia.<sup>5</sup> Foursquare location data will be used to obtain ventures around the neighborhoods.

We want all types of venues around each neighborhood to tell us about the types of food, transportation, and relaxation. These places contain hints of people's life styles. We want to see if it will be convenient to find things. As will be noticed later, the number of districts in Shanghai is intrinsically smaller than the other two cities. So the radius of search is increased from 500 to 1000 as compensation.

During preparation of data, I realized that some information such as the universities in Shanghai could not be aquired very precisely by Foursquare. It might be because of the translation of names or other language related reasons. I would like to use the distribution of universities in Shanghai as an example for discussion, so the list of univeristies is aquired seperately from Wikipedia.<sup>6</sup> Still, some names in the list have to be manually filtered out in order to avoid error.

### Neighborhood data

First, import libraries for data acquisition and processing.

In [54]:
import pandas as pd
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

import requests # library to handle requests

# import k-means from clustering stage
#from sklearn.cluster import KMeans
#from sklearn import metrics
#from scipy.spatial.distance import cdist
#import numpy as np
#import matplotlib.pyplot as plt

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### NYC manhattan data
We are going to download the data as what we practiced before, and generate a dataframe containing the coordinates of each neighborhood. Then, all the neighborhoods will be marked on the map.

In [4]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [6]:
print(type(newyork_data))
neighborhoods_data = newyork_data['features']
print(type(neighborhoods_data))

<class 'dict'>
<class 'list'>


In [8]:
# define the dataframe columns
ny_column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
ny_neighborhoods = pd.DataFrame(columns=ny_column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
ny_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [14]:
manhattan_data = ny_neighborhoods[ny_neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [15]:
address1 = 'Manhattan, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address1)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [17]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

#### Toronto main area data
We are going to read the table from the webpage, and generate a dataframe containing the coordinates of each neighborhood. Then, all the neighborhoods will be marked on the map.

In [18]:
toronto_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
toronto_data_df = toronto_data[0]
toronto_data2 = toronto_data_df.drop(toronto_data_df[toronto_data_df['Borough']=='Not assigned'].index, axis=0)
toronto_data3 = toronto_data2.copy()
type(toronto_data3)

pandas.core.frame.DataFrame

In [19]:
# toronto_data3.reset_index()
toronto_data_group = toronto_data3.groupby(['Postcode']).agg(lambda x: ", ".join(sorted(set(x))))
toronto_data_group.reset_index()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [21]:
toronto_data_group['Neighbourhood'].replace("Not assigned", toronto_data_group["Borough"],inplace=True)
toronto_data_group.reset_index(inplace=True)
toronto_data_group.drop(['index'],axis=1,inplace=True)
toronto_data_group.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [22]:
toronto_data_group.shape

(103, 3)

In [23]:
postcode_df = pd.read_csv('https://cocl.us/Geospatial_data')
postcode_df.rename(columns={'Postal Code':'Postcode'}, inplace=True)
postcode_df.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [24]:
toronto_data_merged = toronto_data_group
toronto_data_merged = toronto_data_merged.join(postcode_df.set_index('Postcode'), on='Postcode')
toronto_data_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [25]:
toronto_data_filtered = toronto_data_merged.copy()
for i in range(0, len(toronto_data_filtered['Postcode'])):
    if 'Toronto' not in toronto_data_filtered.ix[i,'Borough']:
        #print(toronto_data_filtered.ix[i,'Borough'])
        #print(i)
        toronto_data_filtered.drop([i], inplace=True)
toronto_data_filtered.reset_index(inplace=True)
toronto_data_filtered.drop(['index'],axis=1,inplace=True)
toronto_data_filtered.head()

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  app.launch_new_instance()


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"Riverdale, The Danforth West",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [26]:
address2 = 'Toronto, Ontario, Canada'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address2)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [32]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=12)

# add markers to map
for lat, lng, label in zip(toronto_data_filtered['Latitude'], toronto_data_filtered['Longitude'], toronto_data_filtered['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Shanghai main area data
We are going to read the table from the webpage, and generate a dataframe containing the coordinates of each neighborhood. Then, all the neighborhoods will be marked on the map.

In [33]:
sh_district_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_administrative_divisions_of_Shanghai')
sh_district_data_df = sh_district_data[3]
print(type(sh_district_data_df))
print(sh_district_data_df.shape)
sh_district_data_df.head()

<class 'pandas.core.frame.DataFrame'>
(16, 9)


Unnamed: 0_level_0,Unnamed: 0_level_0,County Level,County Level,County Level,County Level,County Level,County Level,County Level,County Level
Unnamed: 0_level_1,Unnamed: 0_level_1.1,Name,Chinese,Hanyu Pinyin,Division code[2],Division code[2].1,Area (km²)[3],Population (2015 census)[4],Density (/km²)
0,,Huangpu District[5](City seat),黄浦区,Huángpǔ Qū,310101,HGP,20.46,658600,32190
1,,Xuhui District,徐汇区,Xúhuì Qū,310104,XHI,54.76,1089100,19889
2,,Changning District,长宁区,Chángníng Qū,310105,CNQ,38.3,691100,18044
3,,Jing'an District,静安区,Jìng'ān Qū,310106,JAQ,37.37,1000000,27000
4,,Putuo District,普陀区,Pǔtuó Qū,310107,PTQ,54.83,1288000,23491


In [36]:
district_info = sh_district_data_df['County Level']['Name'].to_frame()
district_info.rename(columns={'Name':'Neighborhood'}, inplace=True)
for i in range(0,len(district_info['Neighborhood'])):
    district_info.iloc[i,0] = district_info.iloc[i,0].rsplit(' District', 1)[0]
district_info

Unnamed: 0,Neighborhood
0,Huangpu
1,Xuhui
2,Changning
3,Jing'an
4,Putuo
5,Hongkou
6,Yangpu
7,Minhang
8,Baoshan
9,Jiading


In [38]:
district_info['Latitude'] = 0
district_info['Longitude'] = 0

for i in range(0,len(district_info['Neighborhood'])):
    address = '{}, Shanghai, China'.format(district_info.iloc[i,0])
    #print(address)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    district_info.iloc[i,1] = latitude
    district_info.iloc[i,2] = longitude
    #print('The geograpical coordinates of {} district are {}, {}.'.format(district_info.iloc[i,0],latitude, longitude))
district_info.drop(index=15, axis=0, inplace=True)    
district_info.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Huangpu,31.233593,121.479864
1,Xuhui,31.163698,121.427994
2,Changning,31.209276,121.389986
3,Jing'an,31.229776,121.44306
4,Putuo,31.251326,121.391229


In [39]:
address3 = 'Shanghai, China'
geolocator = Nominatim(user_agent="sh_explorer")
location = geolocator.geocode(address3)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Shanghai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Shanghai are 31.2322758, 121.4692071.


In [45]:
# create map of Shanghai using latitude and longitude values
map_sh = folium.Map(location=[31.2322758, 121.4692071], zoom_start=10)

# add markers to map
for lat, lng, label in zip(district_info['Latitude'], district_info['Longitude'], district_info['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sh)  
    
map_sh

### Venue data
Next, we will need to use Foursquare location data to find out the venues around the neighborhoods.

In [47]:
CLIENT_ID = 'KAMEMLBI2F4LUL2QDJKG3Z2EFYUOXU5H5B2LNQN25USIFBON' # your Foursquare ID
CLIENT_SECRET = '2SWPL53GOW0YI252BYKHQACJMHLGSJG2I55EUBBQNJMO3SDE' # your Foursquare Secret
VERSION = '20200119' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KAMEMLBI2F4LUL2QDJKG3Z2EFYUOXU5H5B2LNQN25USIFBON
CLIENT_SECRET:2SWPL53GOW0YI252BYKHQACJMHLGSJG2I55EUBBQNJMO3SDE


Create a function to explore all the neighborhoods.

In [62]:
LIMIT = 100
radius = 500
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### NYC manhattan venue data

In [57]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
manhattan_venues.groupby('Neighborhood').count()

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,96,96,96,96,96,96
Carnegie Hill,100,100,100,100,100,100
Central Harlem,45,45,45,45,45,45
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,39,39,39,39,39,39
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


#### Toronto main area venue data

In [59]:
toronto_venues = getNearbyVenues(names=toronto_data_filtered['Neighbourhood'], 
                                 latitudes=toronto_data_filtered['Latitude'], 
                                 longitudes=toronto_data_filtered['Longitude'])
toronto_venues.groupby('Neighborhood').count()

The Beaches
Riverdale, The Danforth West
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
North Midtown, The Annex, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
"Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara",15,15,15,15,15,15
Berczy Park,56,56,56,56,56,56
"Brockton, Exhibition Place, Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,14,14,14,14,14,14
"Cabbagetown, St. James Town",47,47,47,47,47,47
Central Bay Street,83,83,83,83,83,83
"Chinatown, Grange Park, Kensington Market",84,84,84,84,84,84
Christie,18,18,18,18,18,18
Church and Wellesley,84,84,84,84,84,84


#### Shanghai main area venue data

In [63]:
sh_venues = getNearbyVenues(names=district_info['Neighborhood'], 
                            latitudes=district_info['Latitude'], 
                            longitudes=district_info['Longitude'], radius = 1000)
sh_venues.groupby('Neighborhood').count()

Huangpu
Xuhui
Changning
Jing'an
Putuo
Hongkou
Yangpu
Minhang
Baoshan
Jiading
Pudong New Area
Jinshan
Songjiang
Qingpu
Fengxian


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Baoshan,3,3,3,3,3,3
Changning,70,70,70,70,70,70
Fengxian,8,8,8,8,8,8
Hongkou,37,37,37,37,37,37
Huangpu,100,100,100,100,100,100
Jiading,11,11,11,11,11,11
Jing'an,100,100,100,100,100,100
Jinshan,1,1,1,1,1,1
Minhang,16,16,16,16,16,16
Pudong New Area,19,19,19,19,19,19


In addition, we will import a list of universities in Shanghai.

In [64]:
sh_univ_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_universities_and_colleges_in_Shanghai')
sh_univ_data_df = sh_univ_data[0]
print(type(sh_univ_data_df))
print(sh_univ_data_df.shape)
sh_univ_data_df.head()

<class 'pandas.core.frame.DataFrame'>
(36, 4)


Unnamed: 0,Name,Chinese name,Type,Note
0,Fudan University,复旦大学,National (Direct),Ω
1,Tongji University,同济大学,National (Direct),Ω
2,Shanghai Jiao Tong University,上海交通大学,National (Direct),Ω
3,East China University of Science and Technology,华东理工大学,National (Direct),Ω
4,University of Shanghai for Science and Technology,上海理工大学,Municipal,


In [65]:
univ_info = sh_univ_data_df['Name'].to_frame()
univ_info.rename(columns={'Name':'University'}, inplace=True)
univ_info.drop(index=[8,10,20,25,27,30,32,33,35], inplace=True)
univ_info.reset_index(inplace=True)
univ_info.drop(['index'],axis=1,inplace=True)
univ_info

Unnamed: 0,University
0,Fudan University
1,Tongji University
2,Shanghai Jiao Tong University
3,East China University of Science and Technology
4,University of Shanghai for Science and Technology
5,Shanghai Maritime University
6,Donghua University
7,Shanghai Institute of Technology
8,Shanghai Ocean University
9,East China Normal University


In [67]:
univ_info['Latitude'] = 0
univ_info['Longitude'] = 0

for i in range(0,len(univ_info['University'])):
    address = '{}, China'.format(univ_info.iloc[i,0])
    #print(address)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    univ_info.iloc[i,1] = latitude
    univ_info.iloc[i,2] = longitude
    #print('The geograpical coordinates of {} district are {}, {}.'.format(district_info.iloc[i,0],latitude, longitude))
    
univ_info.head()

Unnamed: 0,University,Latitude,Longitude
0,Fudan University,31.301044,121.500455
1,Tongji University,31.284739,121.496949
2,Shanghai Jiao Tong University,31.200815,121.428407
3,East China University of Science and Technology,31.145081,121.419509
4,University of Shanghai for Science and Technology,31.295016,121.550674


Now we have got the venue data that will be analyzed later, which will tell us more about lives in the three big cities.

### Reference

1. [Which Cities Are The World's Financial Centers?](https://www.worldatlas.com/articles/the-world-s-top-financial-cities.html)
2. [Toronto - Wikipedia](https://en.wikipedia.org/wiki/Toronto)
3. [New York City - Wikipedia](https://en.wikipedia.org/wiki/New_York_City)
4. [Shanghai - Wikipedia](https://en.wikipedia.org/wiki/Shanghai)
5. [List of administrative divisions of Shanghai](https://en.wikipedia.org/wiki/List_of_administrative_divisions_of_Shanghai)
6. [List of universities and colleges in Shanghai](https://en.wikipedia.org/wiki/List_of_universities_and_colleges_in_Shanghai)