# Capstone Project - The Battle of the Neighborhoods
## A Full Report
### By: Martin Foo

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# 1.0 Introduction
## 1.1 Background
   **Toronto** is the capital city of the Canadian province of Ontario. With a recorded population of approximately 2.7 million in 2016, it is the most populous city in Canada and the fourth most populous city in North America. The diverse population of Toronto reflects its current and historical role as an important destination for immigrants to Canada. More than 50 percent of residents belong to a visible minority population group, and over 200 distinct ethnic origins are represented among its inhabitants. Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world. Toronto covers an area of 630 square kilometres (243 sq mi), with a maximum north–south distance of 21 km (13 mi). It has a maximum east–west distance of 43 km (27 mi) and it has a 46-kilometre (29 mi) long waterfront shoreline, on the northwestern shore of Lake Ontario. Toronto encompasses a geographical area formerly administered by many separate municipalities. These municipalities have each developed a distinct history and identity over the years, and their names remain in common use among Torontonians. Former municipalities include East York, Etobicoke, Forest Hill, Mimico, North York, Parkdale, Scarborough, Swansea, Weston and York. Throughout the city there exist hundreds of small neighbourhoods and some larger neighbourhoods covering a few square kilometres.

   **Penang** is a Malaysian state located on the northwest coast of Peninsular Malaysia, by the Malacca Strait. The state consists of Penang Island, Seberang Perai (a narrow strip of the Malay Peninsula) and a handful of smaller islets. Its capital city, George Town, is located at the northeastern tip of Penang Island. They are connected by Malaysia's two longest road bridges, the Penang Bridge and the Sultan Abdul Halim Muadzam Shah Bridge. Penang's population stood at nearly 1.767 million as of 2018, while its population density rose to 1,684/km2 (4,360/sq mi). It has among the nation's highest population densities and is one of the country's most urbanised states.With a total land area of just 1,048 km2 (405 sq mi), Penang is the second smallest state in Malaysia by land mass after Perlis. Penang, situated at the northwestern coastline of Peninsular Malaysia, lies between latitudes 5.59° and 5.12°N, and longitudes 100.17° and 100.56°E. The city of George Town includes the Bayan Lepas Free Industrial Zone, a high-tech manufacturing hub regarded as the "Silicon Valley of the East". The expansion of George Town has created suburbs to its northwest, west and south. The northwestern suburbs are somewhat more affluent, given their seafront locations which attract tourists and expatriates. The southern suburbs, such as Jelutong, grew due to industrial activities. On the other hand, Air Itam and Paya Terubong emerged to the west of George Town as a result of agricultural plantations on the central hills of Penang Island. Since the 1970s, massive industrialisation around Bayan Lepas, which created the Bayan Lepas Free Industrial Zone, led to the rapid urbanisation of the southeastern corner of Penang Island as well. The western half of the island, where Balik Pulau forms the main population centre, remains sparsely-populated, although urbanisation has encroached into the area in recent years.
   
## 1.2 Problem Statement
As both selected locations as stated in clause 1.1 are regarded as top spots of tourism in the world due to the compactness and richness of culture and livelihood as paracticed in the two locations, both reputations in the tourism field can be studied and compared. Hence, the following case study is conducted with the motive of **comparing the tourism characteristics of two major cities**, which are **Toronto, Ontario** and **Georgetown, Penang**. Such comparison study is prioritized to access the potentiality of both locations on tourism today and in the near future. The fields of comparison study includes food spots (restaurants, cafes etc.), galleries, museums and so much more. But, for this study, I will be using **restaurants**, **museums** and **galleries** for the comparison study. 
## 1.3 Scope of Study   
For the case of **Toronto, Ontario**, the area scope will be covering the **entire city**. And as for **Penang island**, the scope will only be covering the **North-East District of the island**, which covers approximately 10 sub-districts.

# 2.0 Data

The targeted datas of the two locations of interest, which are **Toronto, Ontario** and the **North-East District of the Penang island** selected for this study are listed as below, with the corresponding methods:
* The number of **restaurants** in **Toronto, Ontario** and the **North-East District of the Penang island**.
* The number of **museums** in **Toronto, Ontario** and the **North-East District of the Penang island**.
* The number of **galleries** in in **Toronto, Ontario** and the **North-East District of the Penang island**.

Firstly, the data sources that explain the postal codes and corresponding neighborhoods/ boroughs/ districts of the two locations of interest are extracted from 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' (Toronto, Ontario) and 'https://en.wikipedia.org/wiki/Northeast_Penang_Island_District' (North-East District of the Penang island). As follows, the geospatial datas of the corresponding postal codes, restaurants, museums and galleris of both locations are extracted as explained below:
* The coordinates of the postal codes based on the district/ borough/ neoghbourhood of both locations are extracted via 'http://cocl.us/Geospatial_data' (Toronto, Ontario) and **GeoPy's Geocoder** (North-East District of the Penang island).
* The geospatial data, numbers and other detials of the restaurants, museums and galleries of both location of interest are extracted via **Foursquare API**.


**Importing related packages and libraries for data extraction, manipulation and analysis.**

In [17]:
import numpy as np # library to handle data in a vectorized manner and computational calculations
import pandas as pd # library for data analsysis

#import folium # To install folium (if package is not installed)
import folium # To import folium

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
import requests # library to handle requests



## 2.1 Geospatial Data of Toronto, Ontario

In [3]:
wilipedia_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' # URL of Wikipedia for Canada postal codes list
df_list = pd.read_html(wilipedia_url) # To read HTML
canada_df = pd.DataFrame (df_list[0]) # To create dataframe from the HTML file

canada_df.drop(canada_df[canada_df['Borough'] == 'Not assigned'].index, inplace = True) # To remove rows with 'Not Assigned' in 'Borough'


canada_df_filtered = canada_df.reset_index(drop=True) #To reset dataframe index

canada_df_filtered # To display filtered dataframe

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


To extract the table format as above, I used the pandas library to read the HTML file from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M . After examining the HTML file, the table format was found in df_list[0], refering the code works above. The data cleaning process was done by removing rows of data with the 'Borough' feature having the attribute of 'Not assigned'. After reseting the index of the dataframe, the expected format of dataframe is displayed as above.

Here, I proceed to download the CSV file from http://cocl.us/Geospatial_data  for the coordinates data based on the postal codes displayed in the above dataframe.

In [4]:
!wget -q -O 'geo_data_csv' http://cocl.us/Geospatial_data #To download CSV file for Geospatial Data
print('Data downloaded!')

Data downloaded!


In [6]:
geo_data = pd.read_csv('geo_data_csv') # To read CSV data using pandas
geo_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


The coordinates data of the corresponding postal codes are then merged wuth the original dataframe.

In [8]:
geo_latlng = canada_df_filtered[canada_df_filtered['Postal Code'].isin(geo_data['Postal Code'].tolist())] # Filtering out the latitude and longitude by postal codes

geo_df = pd.merge(geo_latlng,geo_data) # Merging two dataframes together with the filtered values

geo_df # Displaying dataframe with additional info of latitude and longitude to the respectibe postal codes

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


The compiled dataframe of **Toronto, Ontario** with the neighborhood, borough and postal codes is then visualised as below:

In [12]:
# create map of Toronto latitude and longitude values
map_Toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(geo_df['Latitude'], geo_df['Longitude'], geo_df['Borough'], geo_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto #To display Map of Toronto with markers of Neighbourhood and Borough

Since geocoder is unreliable and does not provide any geospatial data, the latitude and longitude values of each postal codes in the dataframe above are found and paired through the CSV as available on 'http://cocl.us/Geospatial_data'. The latitude and longitude values in the merged dataframe, along with the postal codes, boroughs and neighbourhoods are then applied as a reference data to add markers in the map of Toronto, Ontario generated by folium. An evenly-separated segmentation and clustering via markers can be observed on the map of Toronto as the geospatial data are based on the postal codes provided in the CSV file.

## 2.2 Geospatial Data of North-West District of Penang Island, Malaysia
The district and postal codes of the **North-West District of Penang Island, Malaysia** are sourced from https://en.wikipedia.org/wiki/Northeast_Penang_Island_District

In [14]:
#To conduct a case study on the north west district of Penang Island

#Selected information of North-West Penang Island Districts and their corresponding postal codes
penang_district_north_west = [[['Georgetown'],[10000]],[['Batu Ferringhi'],[11100]],[['Tanjung Bungah'],[11200]],[['Tanjung Tokong'],[10470]], \
                              [['Pulau Tikus'],[10250, 10350, 10400]],[['Batu Lanchang'],[11600]], \
                              [['Air Itam'],[11500]],[['Paya Terubong'],[11060]],[['Jelutong'],[11600]],[['Gelugor'],[11700]]]

district_list =[]
postal_code_list=[]

for dist in penang_district_north_west:
    for dist_name in dist[0]:
        for post_code in dist[1]:
            district_list.append(dist_name)
            postal_code_list.append(post_code)
     
penang_df=pd.DataFrame({'District':district_list,'Postal Code':postal_code_list}) #To create Dataframe with districts and postal code information

penang_df #Displaying the resulted dataframe

Unnamed: 0,District,Postal Code
0,Georgetown,10000
1,Batu Ferringhi,11100
2,Tanjung Bungah,11200
3,Tanjung Tokong,10470
4,Pulau Tikus,10250
5,Pulau Tikus,10350
6,Pulau Tikus,10400
7,Batu Lanchang,11600
8,Air Itam,11500
9,Paya Terubong,11060


From here, I applied **GeoPy's Geocoder** to extract the coordinates data of the corresponding postal codes and districts, and merge the data to the original dataframe.

In [15]:
geolocator = Nominatim(user_agent="foursquare_agent")

lat_list =[] #To create a list to store latitude values
lng_list=[] #To create a list to store longitude values

#To obtain geospatial data for the corresponding districts and postal codes
for pc, d in zip(penang_df['Postal Code'],penang_df['District']):
   
    location = geolocator.geocode('{},{}'.format(pc,d))
    latitude = location.latitude
    longitude = location.longitude
    lat_list.append(latitude)
    lng_list.append(longitude)

penang_df['Latitude']=lat_list
penang_df['Longitude']=lng_list

penang_df # To display dataframe with geospatial data

Unnamed: 0,District,Postal Code,Latitude,Longitude
0,Georgetown,10000,5.414568,100.329803
1,Batu Ferringhi,11100,5.478218,100.268761
2,Tanjung Bungah,11200,5.462163,100.286995
3,Tanjung Tokong,10470,5.446139,100.305254
4,Pulau Tikus,10250,5.431822,100.311768
5,Pulau Tikus,10350,5.476292,100.29737
6,Pulau Tikus,10400,5.476292,100.29737
7,Batu Lanchang,11600,5.390322,100.306109
8,Air Itam,11500,5.388131,100.278691
9,Paya Terubong,11060,5.371803,100.276162


The compiled dataframe of **North-West District of Penang Island** with the districts and postal codes is then visualised as below:

In [16]:
# create map of North-West District of Penang Island (Georgetown) latitude and longitude values
map_Penang = folium.Map(location=[5.4145, 100.329194], zoom_start=13)

# add markers to map
for lat, lng, district, postalcode in zip(penang_df['Latitude'], penang_df['Longitude'], penang_df['District'], penang_df['Postal Code']):
    label = '{}, {}'.format(postalcode, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Penang)  
    
map_Penang #To display Map of Penang island with markers of District and Postal Codes

## 2.3 To find the Geospatial Data of Restaurants, Galleries and Museums  (Foursquare API)

The credentials to call for the **Foursquare API** are stated below:

In [35]:
CLIENT_ID = 'BGC0PZQOVF2BA041GXVNCY04NFBDGGZWFBHHF00L13ROBSZ2' # your Foursquare ID
CLIENT_SECRET = 'Y0VLSA0WCJ1XNE52SIXH2ZCBUNOP03GAS3ZXGSUHJUKSOPJB' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BGC0PZQOVF2BA041GXVNCY04NFBDGGZWFBHHF00L13ROBSZ2
CLIENT_SECRET:Y0VLSA0WCJ1XNE52SIXH2ZCBUNOP03GAS3ZXGSUHJUKSOPJB


### 2.3.1 To find statistical patterns of restaurants, galleries and museums in the city of Tonto, Ontario

#### 2.3.1.1 Geospatial Data of Restaurants in Toronto, Ontario
By using the **'search' query**, I am able to explore the geospatial data of nearby retaurants, galleries and museums of ""Toronto, Ontario"", with a **default radius of 2000 m**. The defined function to explore nearby spots and obtain the corresponding geospatial data via **Foursquare API** is explained as below:

In [34]:
#Defining a function to search nearby venues as specified via search-query

def getNearbyVenues(names, latitudes, longitudes, radius, search):
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
         # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, search, radius, LIMIT)
            
        # make the GET request
        results_1 = requests.get(url).json()["response"]['venues']
   
        for r,n,la,lo in zip(results_1,names,latitudes,longitudes):
        
              venues_list.append([(n,la,lo,r['name'],r['location']['lat'],r['location']['lng'])])
            
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude']
    
    return(nearby_venues)   

In [36]:
df_toronto_res = getNearbyVenues(names=geo_df['Borough'], latitudes=geo_df['Latitude'], longitudes=geo_df['Longitude'],radius=2000, search='Restaurant') #Call function for restaurants geospatial data

North York
North York
Downtown Toronto
North York
Downtown Toronto
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Mississauga
Etobicoke
Scarborough
Central Toronto
Downtown Toronto
West Toron

In [39]:
print('There are a total of {} registered restaurants throughout Toronto, Ontario.'.format(len(df_toronto_res['Venue'].unique())))
df_toronto_res.head()

There are a total of 1003 registered restaurants throughout Toronto, Ontario.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,North York,43.753259,-79.329656,The Curry & Roti Restaurant,43.742554,-79.308792
1,North York,43.725882,-79.315572,Katsura Japanese Restaurant 桂,43.756259,-79.349571
2,Downtown Toronto,43.65426,-79.360636,Darband Restaurant,43.755194,-79.348498
3,North York,43.718518,-79.464763,Hakka No.1 Restaurant,43.7568,-79.31285
4,Downtown Toronto,43.662301,-79.389494,Valley Fields Family Restaurant,43.741452,-79.319633


Due to the fact that the radius settings for the search query via Foursquare API was set at **2000 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.

The geospatial data of restaurants is visualized as below:

In [40]:
map_toronto_res = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df_toronto_res['Venue Latitude'], df_toronto_res['Venue Longitude'], df_toronto_res['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_res)  
    
map_toronto_res #Display map

#### 2.3.1.2 Geospatial Data of Galleries in Toronto, Ontario

In [41]:
df_toronto_gal = getNearbyVenues(names=geo_df['Borough'], latitudes=geo_df['Latitude'], longitudes=geo_df['Longitude'],radius=2000, search='Gallery')

North York
North York
Downtown Toronto
North York
Downtown Toronto
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Mississauga
Etobicoke
Scarborough
Central Toronto
Downtown Toronto
West Toron

In [42]:
print('There are a total of {} registered galleries throughout Toronto, Ontario.'.format(len(df_toronto_gal['Venue'].unique())))
df_toronto_gal.head()

There are a total of 411 registered galleries throughout Toronto, Ontario.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,North York,43.753259,-79.329656,Sandra Ainsley Gallery,43.719044,-79.308892
1,North York,43.753259,-79.329656,Danish Connection/Gallery 402,43.65399,-79.36109
2,North York,43.725882,-79.315572,Rouge gallery,43.65907,-79.349182
3,Downtown Toronto,43.65426,-79.360636,Bottē Gallery,43.65411,-79.360976
4,North York,43.718518,-79.464763,Jane Roos Gallery,43.653962,-79.36109


Due to the fact that the radius settings for the search query via Foursquare API was set at **2000 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.

The geospatial data of galleries is visualized as below:

In [46]:
map_toronto_gal = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df_toronto_gal['Venue Latitude'], df_toronto_gal['Venue Longitude'], df_toronto_gal['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_gal)  
    
map_toronto_gal

#### 2.3.1.3 Geospatial Data of Museums in Toronto, Ontario

In [47]:
df_toronto_meu = getNearbyVenues(names=geo_df['Borough'], latitudes=geo_df['Latitude'], longitudes=geo_df['Longitude'],radius=2000, search='Museum')

North York
North York
Downtown Toronto
North York
Downtown Toronto
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Mississauga
Etobicoke
Scarborough
Central Toronto
Downtown Toronto
West Toron

In [48]:
print('There are a total of {} registered museums throughout Toronto, Ontario.'.format(len(df_toronto_meu['Venue'].unique())))
df_toronto_meu.head()

There are a total of 47 registered museums throughout Toronto, Ontario.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,North York,43.753259,-79.329656,Aga Khan Museum,43.725105,-79.332076
1,North York,43.725882,-79.315572,Islamic Museum Toronto,43.717784,-79.32775
2,North York,43.753259,-79.329656,Museum Of Illusions,43.650219,-79.369451
3,North York,43.725882,-79.315572,Museum Of The End Of The World,43.654061,-79.385452
4,Downtown Toronto,43.65426,-79.360636,Cabbagetown Regent Park Museum,43.667732,-79.35998


Due to the fact that the radius settings for the search query via Foursquare API was set at **2000 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.

The geospatial data of museums is visualized as below:

In [49]:
map_toronto_meu = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df_toronto_meu['Venue Latitude'], df_toronto_meu['Venue Longitude'], df_toronto_meu['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_meu)  
    
map_toronto_meu

### 2.3.2 To find statistical patterns of restaurants, galleries and museums in the North-West District of Penang, Malaysia

#### 2.3.2.1 Geospatial Data of Restaurants in  North-West District of Penang, Malaysia

In [52]:
df_penang_res = getNearbyVenues(names=penang_df['District'], latitudes=penang_df['Latitude'], longitudes=penang_df['Longitude'],radius=2500, search='Restaurant') #Call function for restaurants geospatial data

Georgetown
Batu Ferringhi
Tanjung Bungah
Tanjung Tokong
Pulau Tikus
Pulau Tikus
Pulau Tikus
Batu Lanchang
Air Itam
Paya Terubong
Jelutong
Gelugor


In [53]:
print('There are a total of {} registered restaurants throughout North-West District of Penang, Malaysia.'.format(len(df_penang_res['Venue'].unique())))
df_penang_res.head()

There are a total of 88 registered restaurants throughout North-West District of Penang, Malaysia.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Georgetown,5.414568,100.329803,Hameediyah Restaurant,5.418519,100.332556
1,Batu Ferringhi,5.478218,100.268761,Kimberly Restaurant (汕头街权记鸭粥粿汁专卖店) (Restoran K...,5.416286,100.332799
2,Tanjung Bungah,5.462163,100.286995,Kapitan Restaurant,5.41622,100.338714
3,Tanjung Tokong,5.446139,100.305254,Yunus Khan Restaurant (Jiao Sai) 鸟粪,5.417568,100.323369
4,Pulau Tikus,5.431822,100.311768,Guo Guo Le Steamboat Restaurant (锅锅乐),5.414798,100.330769


Due to the fact that the radius settings for the search query via Foursquare API was set at **2500 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.


The geospatial data of restaurants is visualized as below:

In [56]:
map_penang_res = folium.Map(location=[5.4145, 100.329194], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df_penang_res['Venue Latitude'], df_penang_res['Venue Longitude'], df_penang_res['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_penang_res)  
    
map_penang_res #Display map

#### 2.3.2.2 Geospatial Data of Galleries in  North-West District of Penang, Malaysia

In [57]:
df_penang_gal = getNearbyVenues(names=penang_df['District'], latitudes=penang_df['Latitude'], longitudes=penang_df['Longitude'],radius=2500, search='Gallery') #Call function for galleries geospatial data

Georgetown
Batu Ferringhi
Tanjung Bungah
Tanjung Tokong
Pulau Tikus
Pulau Tikus
Pulau Tikus
Batu Lanchang
Air Itam
Paya Terubong
Jelutong
Gelugor


In [58]:
print('There are a total of {} registered galleries throughout North-West District of Penang, Malaysia.'.format(len(df_penang_gal['Venue'].unique())))
df_penang_gal.head()

There are a total of 59 registered galleries throughout North-West District of Penang, Malaysia.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Georgetown,5.414568,100.329803,Tomnic Gallery,5.412844,100.331163
1,Batu Ferringhi,5.478218,100.268761,Food Gallery (食代广场),5.411952,100.326028
2,Tanjung Bungah,5.462163,100.286995,Tongkat Ali King Gallery,5.414604,100.329768
3,Tanjung Tokong,5.446139,100.305254,World Buddhist Stamps Gallery,5.414942,100.322103
4,Pulau Tikus,5.431822,100.311768,Penang State Museum & Art Gallery,5.421268,100.339404


Due to the fact that the radius settings for the search query via Foursquare API was set at **2500 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.

The geospatial data of galleries is visualized as below:

In [59]:
map_penang_gal = folium.Map(location=[5.4145, 100.329194], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df_penang_gal['Venue Latitude'], df_penang_gal['Venue Longitude'], df_penang_gal['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_penang_gal)  
    
map_penang_gal #Display map

#### 2.3.2.3 Geospatial Data of Museums in  North-West District of Penang, Malaysia

In [60]:
df_penang_meu = getNearbyVenues(names=penang_df['District'], latitudes=penang_df['Latitude'], longitudes=penang_df['Longitude'],radius=2500, search='Museum') #Call function for museums geospatial data

Georgetown
Batu Ferringhi
Tanjung Bungah
Tanjung Tokong
Pulau Tikus
Pulau Tikus
Pulau Tikus
Batu Lanchang
Air Itam
Paya Terubong
Jelutong
Gelugor


In [61]:
print('There are a total of {} registered museums throughout North-West District of Penang, Malaysia.'.format(len(df_penang_meu['Venue'].unique())))
df_penang_meu.head()

There are a total of 26 registered museums throughout North-West District of Penang, Malaysia.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Georgetown,5.414568,100.329803,Ghost Museum,5.413839,100.334855
1,Batu Ferringhi,5.478218,100.268761,Asia Comic Cultural Museum 亚洲漫画文化馆,5.414315,100.330464
2,Tanjung Bungah,5.462163,100.286995,Penang State Museum (King Edward VII Memorial ...,5.415391,100.325809
3,Tanjung Tokong,5.446139,100.305254,Asia Camera Museum,5.417695,100.32934
4,Pulau Tikus,5.431822,100.311768,Glass Museum Penang,5.417439,100.329198


Due to the fact that the radius settings for the search query via Foursquare API was set at **2500 m**, overlapping of venues might occur. Hence, it is important to filter duplicated data during the **data analysis section**.

The geospatial data of museums is visualized as below:

In [63]:
map_penang_meu = folium.Map(location=[5.4145, 100.329194], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df_penang_meu['Venue Latitude'], df_penang_meu['Venue Longitude'], df_penang_meu['Venue']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_penang_meu)  
    
map_penang_meu #Display map