# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis and Discussion](#analysis)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The COVID pandemic has raged the whole world from its outbreak in January 2020 until today. Due to its nature of transmission and its deadly symptoms, data has been heavily utilized to trace contacts and confirmed cases of people that had contracted the virus in order to reduce its transmission.


From the previous activities, we have utilized Foursquare venue data to get common venues from neighborhoods and cluster these neighborhoods according to their similarity of frequency of specific venues. For this project, we want to look at how common venues in neighborhoods may have affected COVID cases in these neighborhoods through clustering and correlation of population and confirmed covid cases. 

I have taken information in datasets found in the capital region of the Philippines - the National Capital Region or Metro Manila as locals call the region. As a Filipino, I may be able to insert context into the data and further expound correlations in the data analysis.

The results should be able to inform governing bodies in implementing informed quarantine measures and contact tracing to more vulnerable regions.

## Data <a name="data"></a>

To start, neighborhoods in the Philippines are called Barangays and comprise a smaller unit inside a local government unit (LGU) like a city. We'll be grabbing data on the list of barangays in Metro Manila, the population of each barangay (when data is present) and the current confirmed COVID cases as posted by each LGU in Facebook. We'll also be using data on venues closest to these barangays.

1. List of barangays in Metro Manila - https://en.wikipedia.org/wiki/List_of_barangays_of_Metro_Manila

2. List of confirmed COVID cases found in:
	* Public Information Office of the government unit in charge of City (via Facebook)
	* Facebook pages of the current mayor of the government unit
    
    
3. Venues near each barangay via Foursquare API

We scrape the data off the Wikipedia page via pandas. I have personally compiled all of the data on the numbers of covid cases in each barangay. The data is stored in an excel file via Google Drive.

In [61]:
import pandas as pd
#Webscrape wikipedia page via pandas
manila_dummy = pd.read_html('https://en.wikipedia.org/wiki/List_of_barangays_of_Metro_Manila')
covid = pd.read_excel('https://drive.google.com/uc?id=1Nsw3K6e1GntljYfJ7zaU2hWolJUBoBvw&export=download',
              sheet_name='A_edit_') 
manila = pd.concat([manila_dummy[x] for x in range(1,25)], axis=0).reset_index(drop=True)
manila

We now render the map backdrop where the data will be drawn using Folium. We will also find the respective coordinates of the barangays listed in the Wikipedia page. All barangays that did not return coordinates will be dropped in future analysis. The same method applied in Week 2 of the capstone will be applied here.

In [4]:
from geopy.geocoders import Nominatim
address= 'National Capital Region, Philippines'

geolocator = Nominatim(user_agent="capstone")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of National Capital Region, Philippines are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of National Capital Region, Philippines are 14.5736108, 121.0329706.


In [5]:
latitude_x = [None] * 519
longitude_x = [None] * 519
for index,x in manila.iterrows():
    location1 = geolocator.geocode('{}, {}'.format(x[0],x[3]))
    if location1 is None:
        continue
    latitude_x[index]=location1.latitude
    longitude_x[index]=location1.longitude

In [6]:
final_df = pd.DataFrame({'Neighborhood':manila['Name'],'City/Municipality':manila['City/Municipality'],'Latitude':latitude_x,'Longitude':longitude_x})
final_df.dropna(inplace=True)

In [7]:
import folium 
# create map of Toronto using latitude and longitude values
map_manila= folium.Map(location=[latitude, longitude], zoom_start=10)
final_df1=final_df.dropna()
for index, x in final_df1.iterrows():
    label = '{}, {}'.format(x[1],x[2])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [x[2], x[3]],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manila) 
  
map_manila


Now that we have the necessary information to scrape venues through the Foursquare API, we now proceed to retrieve nearby venues for each barangay. The same methodology will be applied from the Week 2 of the capstone.

In [8]:
CLIENT_ID = 'IUAZRPBJO2WU52PVY5KNCKTU4FG23TLDGESFU45FH1B2HX0S' # your Foursquare ID
CLIENT_SECRET = 'KDYE21MN0U03DJGF3JHJHSD5QWVH3ZGSRTUB2045NDWWOAD5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [9]:
radius = 500
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
manila_venues = getNearbyVenues(names=final_df['Neighborhood'],
                                   latitudes=final_df['Latitude'],
                                   longitudes=final_df['Longitude']
                                  )

Acacia
Addition Hills
Addition Hills
Aguho
Alabang
Alicia
Almanza Uno
Almanza Dos
Amihan
Amparo
Apolonio Samson
Arkong Bato
Aurora
Ayala Alabang (New Alabang)
Baclaran
Baesa
Baesa
Bagbag
Bagbaguin
Bagong Barrio
Bagong Ilog
Bagong Katipunan
Bagong Lipunan ng Crame
Bagong Pag-asa
Bagong Silang
Bagong Silang
Bagong Silangan
Bagumbayan
Bagumbayan
Bagumbayan North
Bagumbayan South
Bagumbong
Bagumbuhay
Bahay Toro
Balangkas
Balingasa
Balintawak
Balic-Balic
Balong Bato
Balong-bato
Balut
Bambang
Bambang
Bangkal
Bangkulasi
Barangka
Barangka Drive
Barangka Ibaba
Barangka Ilaya
Barangka Itaas
Baritan
Barrio San Jose
Batasan Hills
Batis
Bay City
Bayanan
Bayanihan
Bayan-bayanan
Bel-Air
BF Homes Caloocan
BF Homes
BF International Village-CAA
Biglang-Awa
Bignay
Binondo
Bisig
Blue Ridge A
Blue Ridge B
Botocan
Buayang Bato
Buli
Bungad
Burol
Buting
Cabrera
Calumpang
Calzada
Camarin
Camarin-Central
Camarin-Cielito
Camp Aguinaldo
Caniogan
Canumay East
Canumay West
Capri
Carmona
Cartimar
Catmon
Caybiga
Cemb

In [11]:
print(manila_venues.shape)
manila_venues.head()

(8787, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acacia,14.668471,120.96998,Save More,14.670069,120.970051,Grocery Store
1,Acacia,14.668471,120.96998,Jollibee,14.66831,120.965801,Fast Food Restaurant
2,Acacia,14.668471,120.96998,Dunkin',14.668701,120.966594,Donut Shop
3,Acacia,14.668471,120.96998,Max's Restaurant,14.667699,120.965948,Fried Chicken Joint
4,Acacia,14.668471,120.96998,Chowking,14.666208,120.96665,Chinese Restaurant


Now that we have the backdrop and the necessary data for analysis, we now discuss the methodology in executing the analysis and draw results.

## Methodology <a name="methodology"></a>

From the previous processing, we have gathered the required data: **location of every venue near barangays** within Metro Manila and the **Number of COVID cases** for each barangay. The wikipedia page also provides values on population of each barangay. This will also be used for analysis. However, it is important to note that the data was gathered in 2015 so the **actual percentage ratios of confirmed cases to yet-to-be affected persons will be far** from the results here. For the sake of discussion, we assume that the rate of difference in populations between barangays should be the same for today. The next step would be to process the data to get the frequency of venues in a barangay via one-hot encoding.

After one-hot encoding, we will be utilizing K-means clustering to find commonalities between common venues in barangays and give context towards the covid cases in each barangay. After clustering and encoding, we will be plotting the clusters (via color) and COVID cases/Population (via size) of each barangay and try to correlate the clustering of barangays (based on common venues) and the number of covid cases in that barangay.

After the analysis, we will conclude on how the common venues may have affected the rate of infection in each barangay based on the potential movement of people through these venues.

## Analysis and Discussion<a name="analysis"></a>

We now proceed to one hot encoding and listing of top venues. The methodology will be the same as Week 2 of the capstone

In [12]:
# one hot encoding
manila_onehot = pd.get_dummies(manila_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manila_onehot['_Neighborhood'] = manila_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [manila_onehot.columns[-1]] + list(manila_onehot.columns[:-1])
manila_onehot = manila_onehot[fixed_columns]

manila_onehot.head()

Unnamed: 0,_Neighborhood,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,...,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yakitori Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Acacia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Acacia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Acacia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Acacia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Acacia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
manila_grouped = toronto_onehot.groupby('_Neighborhood').mean().reset_index()
manila_grouped

Unnamed: 0,_Neighborhood,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,...,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yakitori Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Acacia,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
1,Addition Hills,0.0,0.0,0.0,0.0,0.031250,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.03125,0.0,0.0
2,Aguho,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
3,Alabang,0.0,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,...,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.00000,0.0,0.0
4,Alicia,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
446,West Rembo,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
447,West Triangle,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
448,Western Bicutan,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0
449,White Plains,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0


In [14]:
num_top_venues = 5

for hood in manila_grouped['_Neighborhood']:
    print("----"+hood+"----")
    temp = manila_grouped[manila_grouped['_Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Acacia----
                 venue  freq
0  Fried Chicken Joint  0.17
1        Grocery Store  0.17
2   Chinese Restaurant  0.17
3        Shopping Mall  0.17
4           Donut Shop  0.17


----Addition Hills----
                 venue  freq
0  Japanese Restaurant  0.09
1   Chinese Restaurant  0.09
2                  Bar  0.09
3               Bakery  0.06
4    Convenience Store  0.06


----Aguho----
                  venue  freq
0  Fast Food Restaurant  0.57
1      Asian Restaurant  0.14
2      Basketball Court  0.14
3                 Diner  0.14
4       Other Nightlife  0.00


----Alabang----
                  venue  freq
0           Coffee Shop  0.09
1  Fast Food Restaurant  0.07
2                Bakery  0.06
3           Pizza Place  0.05
4                 Hotel  0.03


----Alicia----
                     venue  freq
0                   Hostel  0.33
1  Health & Beauty Service  0.33
2            Grocery Store  0.33
3            National Park  0.00
4  New American Restaurant  0.00


-

4         Organic Grocery  0.00


----Burol----
                  venue  freq
0  Fast Food Restaurant  0.30
1     Convenience Store  0.17
2                  Café  0.09
3                Bakery  0.04
4            Donut Shop  0.04


----Buting----
                  venue  freq
0           Supermarket  0.12
1            Restaurant  0.12
2                Bakery  0.12
3  Fast Food Restaurant  0.12
4             Pet Store  0.12


----Cabrera----
                  venue  freq
0     Convenience Store  0.50
1  Fast Food Restaurant  0.25
2           Pizza Place  0.25
3       Organic Grocery  0.00
4                  Park  0.00


----Calumpang----
                 venue  freq
0                 Café  0.14
1         Noodle House  0.10
2  Filipino Restaurant  0.10
3         Burger Joint  0.10
4    Convenience Store  0.05


----Calzada----
                     venue  freq
0        Convenience Store  0.25
1  Comfort Food Restaurant  0.25
2           Ice Cream Shop  0.25
3                BBQ Joint  0.25


4                Hostel  0.05


----Elias Aldana----
           venue  freq
0     Food Truck  0.17
1         Bakery  0.17
2  Grocery Store  0.17
3          Plaza  0.17
4            Spa  0.17


----Ermita----
                venue  freq
0               Hotel  0.10
1         Coffee Shop  0.07
2  Chinese Restaurant  0.06
3                 Bar  0.06
4         Pizza Place  0.05


----Ermitaño----
                venue  freq
0  Chinese Restaurant  0.14
1            Boutique  0.09
2       Garden Center  0.05
3      Massage Studio  0.05
4   Convenience Store  0.05


----Escopa I----
          venue  freq
0  Dessert Shop  0.19
1     BBQ Joint  0.12
2   Coffee Shop  0.12
3           Bar  0.06
4   Pizza Place  0.06


----Escopa II----
          venue  freq
0  Dessert Shop  0.14
1     BBQ Joint  0.14
2   Coffee Shop  0.14
3           Bar  0.07
4  Burger Joint  0.07


----Escopa III----
            venue  freq
0     Music Store  0.12
1      Laundromat  0.12
2           Plaza  0.12
3  Shipping Store

                  venue  freq
0     Convenience Store   0.4
1  Fast Food Restaurant   0.2
2    Seafood Restaurant   0.2
3          Dessert Shop   0.2
4            Non-Profit   0.0


----La Paz----
                 venue  freq
0  Filipino Restaurant  0.10
1    Convenience Store  0.07
2                Hotel  0.07
3        Shopping Mall  0.03
4                  Bar  0.03


----Laging Handa----
                 venue  freq
0  Filipino Restaurant  0.06
1            BBQ Joint  0.06
2               Bakery  0.05
3   Seafood Restaurant  0.05
4   Chinese Restaurant  0.04


----Lawang Bato----
               venue  freq
0  Convenience Store   1.0
1  Accessories Store   0.0
2    Organic Grocery   0.0
3        Pastry Shop   0.0
4               Park   0.0


----Layug----
                  venue  freq
0  Fast Food Restaurant  0.16
1   Japanese Restaurant  0.09
2                 Hotel  0.09
3                  Park  0.07
4               Theater  0.05


----Leveriza----
                           venue 

                           venue  freq
0                           Café  0.11
1            Japanese Restaurant  0.08
2  Vegetarian / Vegan Restaurant  0.05
3            Filipino Restaurant  0.05
4             Chinese Restaurant  0.05


----Merville----
               venue  freq
0               Park  0.50
1  Convenience Store  0.25
2               Café  0.25
3        Record Shop  0.00
4        Pastry Shop  0.00


----Milagrosa----
                 venue  freq
0         Dessert Shop  0.11
1            BBQ Joint  0.11
2  Filipino Restaurant  0.08
3                  Bar  0.08
4           Restaurant  0.05


----Monumento----
                  venue  freq
0  Fast Food Restaurant  0.28
1    Chinese Restaurant  0.19
2           Pizza Place  0.06
3             Bookstore  0.06
4           Supermarket  0.06


----Moonwalk----
                  venue  freq
0     Convenience Store  0.23
1           Snack Place  0.15
2      Asian Restaurant  0.08
3  Gym / Fitness Center  0.08
4  Fast Food Restauran

                        venue  freq
0        Fast Food Restaurant  0.25
1           Convenience Store  0.12
2                Burger Joint  0.12
3               Grocery Store  0.12
4  Tourist Information Center  0.12


----Pasolo----
                venue  freq
0          Food Stand   0.2
1    Tapas Restaurant   0.2
2        Food Service   0.2
3  Basketball Stadium   0.2
4          Donut Shop   0.2


----Pasong Putik----
                  venue  freq
0                  Café  0.11
1  Fast Food Restaurant  0.08
2            Donut Shop  0.08
3           Coffee Shop  0.08
4  Gym / Fitness Center  0.05


----Pasong Tamo----
               venue  freq
0                Spa  0.25
1       Burger Joint  0.25
2          Pet Store  0.25
3  Convenience Store  0.25
4          Nightclub  0.00


----Payatas----
              venue  freq
0               Gym   0.2
1            Bakery   0.2
2         Pet Store   0.2
3        Club House   0.2
4  Asian Restaurant   0.2


----Pedro Cruz----
                 

                  venue  freq
0     Convenience Store  0.13
1   Filipino Restaurant  0.11
2                  Park  0.08
3                  Pool  0.08
4  Fast Food Restaurant  0.05


----San Nicolas----
                  venue  freq
0     Convenience Store  0.18
1  Fast Food Restaurant  0.15
2           Pizza Place  0.12
3    Chinese Restaurant  0.12
4              Pharmacy  0.06


----San Pedro----
                  venue  freq
0  Fast Food Restaurant  0.18
1              Pharmacy  0.18
2     Convenience Store  0.09
3      Tapas Restaurant  0.05
4      Basketball Court  0.05


----San Perfecto----
                  venue  freq
0     Convenience Store  0.21
1   Filipino Restaurant  0.14
2  Fast Food Restaurant  0.14
3            Donut Shop  0.07
4           Supermarket  0.07


----San Rafael----
              venue  freq
0    Breakfast Spot  0.17
1   Badminton Court  0.17
2       Art Gallery  0.17
3        Food Court  0.17
4  Asian Restaurant  0.17


----San Rafael Village----
         

                venue  freq
0              Market  0.17
1          Donut Shop  0.17
2                Pool  0.17
3  Seafood Restaurant  0.17
4           Rest Area  0.17


----Tanza 2----
                  venue  freq
0         Grocery Store  0.25
1  Fast Food Restaurant  0.25
2          Burger Joint  0.25
3                  Café  0.25
4     Accessories Store  0.00


----Tatalon----
                venue  freq
0         Flower Shop  0.12
1   Convenience Store  0.12
2  Chinese Restaurant  0.12
3       Bowling Alley  0.12
4              Church  0.12


----Tañong----
                  venue  freq
0  Fast Food Restaurant  0.10
1              Tea Room  0.10
2    Chinese Restaurant  0.07
3   American Restaurant  0.07
4       Bubble Tea Shop  0.07


----Teacher's Village East----
                 venue  freq
0  Filipino Restaurant  0.12
1                Diner  0.10
2                 Café  0.07
3   Chinese Restaurant  0.05
4             Tea Room  0.05


----Teacher's Village West----
           

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['_Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['_Neighborhood'] = manila_grouped['_Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manila_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,_Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acacia,Grocery Store,Fried Chicken Joint,Shopping Mall,Chinese Restaurant,Donut Shop,Fast Food Restaurant,Flea Market,Empanada Restaurant,Dumpling Restaurant,Duty-free Shop
1,Addition Hills,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant
2,Aguho,Fast Food Restaurant,Asian Restaurant,Basketball Court,Diner,Zoo Exhibit,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market
3,Alabang,Coffee Shop,Fast Food Restaurant,Bakery,Pizza Place,Hotel,Bubble Tea Shop,Restaurant,Convenience Store,Clothing Store,French Restaurant
4,Alicia,Grocery Store,Health & Beauty Service,Hostel,Farmers Market,Drugstore,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant


We now proceed to cluster the barangays from their common venues using K-means clustering

In [17]:
from sklearn.cluster import KMeans
kclusters = 5

manila_grouped_clustering = manila_grouped.drop('_Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manila_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 4, 2, 4, 4, 4, 4, 4, 4, 4])

In [20]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manila_merged = final_df


manila1 = manila[['Name','Population(2015)[2]']].dropna()
manila1.rename(columns={'Population(2015)[2]':'Population'},inplace=True)

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manila_merged = manila_merged.join(neighborhoods_venues_sorted.set_index('_Neighborhood'), on='Neighborhood')
manila_merged = manila_merged.merge(manila1.set_index('Name'), left_on='Neighborhood',right_on='Name',how='left')

manila_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,City/Municipality,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
0,Acacia,Malabon,14.668471,120.96998,2.0,Grocery Store,Fried Chicken Joint,Shopping Mall,Chinese Restaurant,Donut Shop,Fast Food Restaurant,Flea Market,Empanada Restaurant,Dumpling Restaurant,Duty-free Shop,5127.0
1,Addition Hills,Mandaluyong,14.58464,121.036301,4.0,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,99058.0
2,Addition Hills,Mandaluyong,14.58464,121.036301,4.0,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,3739.0
3,Addition Hills,San Juan,14.594252,121.040723,4.0,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,99058.0
4,Addition Hills,San Juan,14.594252,121.040723,4.0,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,3739.0


Now that we have the clusters for the barangays. We now proceed to plot these data using Folium.

In [21]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


manila_merged1 = manila_merged.dropna()


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, size in zip(manila_merged1['Latitude'], manila_merged1['Longitude'], manila_merged1['Neighborhood'], manila_merged1['Cluster Labels'].astype(int),manila_merged1['Population']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=size/10000,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

From the plot, we can see three major clusters (Blue, Orange and Red). Orange (Cluster 4.0) is the cluster that contained the most barangays. Blue (Cluster 2.0) comes second and red (Cluster 0.0) comes third. From the summary of mean values (below), we can see that the blue cluster gets a higher average population. Orange cluster receives the second highest mean population and red cluster receives the third highest mean population.

Further analysis will be described by looking at the common venues for the clusters

In [22]:
manila_merged.loc[manila_merged['Cluster Labels'] == 0, manila_merged.columns[[1] + list(range(5, manila_merged.shape[1]))]]

Unnamed: 0,City/Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
14,Quezon City,Convenience Store,Filipino Restaurant,Basketball Court,Chinese Restaurant,Dessert Shop,Burger Joint,Café,Shop & Service,Donut Shop,Grocery Store,5636.0
19,Quezon City,Convenience Store,Fast Food Restaurant,Pharmacy,Comfort Food Restaurant,Korean Restaurant,Food Truck,Fruit & Vegetable Store,Soup Place,Bar,Bakery,56936.0
23,Pasig,Convenience Store,Pizza Place,Fast Food Restaurant,Café,Chinese Restaurant,Asian Restaurant,Bubble Tea Shop,Spa,Noodle House,Grocery Store,1231.0
24,Quezon City,Convenience Store,Jazz Club,Fast Food Restaurant,Gun Range,Gym / Fitness Center,Chinese Restaurant,Badminton Court,Asian Restaurant,Sports Bar,Department Store,14996.0
26,Caloocan,Convenience Store,Fast Food Restaurant,Market,Chinese Restaurant,Filipino Restaurant,Food Court,Bookstore,Salon / Barbershop,Spa,Bar,5572.0
...,...,...,...,...,...,...,...,...,...,...,...,...
575,Quezon City,Fast Food Restaurant,Convenience Store,Coffee Shop,Clothing Store,Bakery,Video Game Store,Shopping Mall,Dessert Shop,Dumpling Restaurant,Department Store,7267.0
579,Taguig,Basketball Court,Park,Vegetarian / Vegan Restaurant,Filipino Restaurant,Convenience Store,Food Court,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,41216.0
584,Makati,Convenience Store,Café,Filipino Restaurant,Asian Restaurant,Italian Restaurant,Fast Food Restaurant,Gym / Fitness Center,Diner,Bar,Malay Restaurant,6310.0
587,Pasay,Convenience Store,Fast Food Restaurant,Gym / Fitness Center,Intersection,Arts & Entertainment,Pharmacy,Zoo Exhibit,Duty-free Shop,Eastern European Restaurant,Electronics Store,


In [24]:
manila_merged.loc[manila_merged['Cluster Labels'] == 1, manila_merged.columns[[1] + list(range(5, manila_merged.shape[1]))]]

Unnamed: 0,City/Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
20,Valenzuela,Convenience Store,Motorcycle Shop,Bike Rental / Bike Share,Zoo Exhibit,Filipino Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,13770.0
42,Quezon City,Market,Convenience Store,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,8228.0
77,Quezon City,Convenience Store,Furniture / Home Store,Deli / Bodega,Health Food Store,Zoo Exhibit,Field,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,8057.0
80,Pasay,Convenience Store,Pizza Place,Fast Food Restaurant,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,
94,Caloocan,Convenience Store,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,
123,Caloocan,Convenience Store,Miscellaneous Shop,Fast Food Restaurant,Dim Sum Restaurant,Zoo Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,
124,Caloocan,Convenience Store,Miscellaneous Shop,Fast Food Restaurant,Dim Sum Restaurant,Zoo Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,
194,Parañaque,Convenience Store,Fast Food Restaurant,Seafood Restaurant,Dessert Shop,Zoo Exhibit,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,9569.0
197,Valenzuela,Convenience Store,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,19301.0
204,Taguig,Convenience Store,Gym,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,9700.0


In [25]:
manila_merged.loc[manila_merged['Cluster Labels'] == 2, manila_merged.columns[[1] + list(range(5, manila_merged.shape[1]))]]

Unnamed: 0,City/Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
0,Malabon,Grocery Store,Fried Chicken Joint,Shopping Mall,Chinese Restaurant,Donut Shop,Fast Food Restaurant,Flea Market,Empanada Restaurant,Dumpling Restaurant,Duty-free Shop,5127.0
5,Pateros,Fast Food Restaurant,Asian Restaurant,Basketball Court,Diner,Zoo Exhibit,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,
16,Parañaque,Fast Food Restaurant,Convenience Store,Seafood Restaurant,Fried Chicken Joint,Park,Zoo Exhibit,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,28385.0
17,Caloocan,Fast Food Restaurant,Convenience Store,Gym,Shopping Mall,Automotive Shop,Pharmacy,Zoo Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,61278.0
18,Quezon City,Fast Food Restaurant,Convenience Store,Gym,Shopping Mall,Automotive Shop,Pharmacy,Zoo Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,61278.0
...,...,...,...,...,...,...,...,...,...,...,...,...
566,Taguig,Fast Food Restaurant,Shopping Mall,American Restaurant,Supermarket,Restaurant,Bookstore,Cantonese Restaurant,Donut Shop,Automotive Shop,Coffee Shop,10730.0
569,City of Manila,Fast Food Restaurant,Shopping Mall,Chinese Restaurant,Asian Restaurant,Burger Joint,Filipino Restaurant,Massage Studio,Grocery Store,Trail,Train Station,
582,Taguig,Fast Food Restaurant,Convenience Store,Automotive Shop,Bistro,Coffee Shop,Filipino Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,54186.0
599,Makati,Fast Food Restaurant,Gym,Convenience Store,Hostel,Zoo Exhibit,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,29899.0


In [26]:
manila_merged.loc[manila_merged['Cluster Labels'] == 3, manila_merged.columns[[1] + list(range(5, manila_merged.shape[1]))]]

Unnamed: 0,City/Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
88,Valenzuela,Pool,Zoo Exhibit,Fast Food Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,12462.0
106,Caloocan,Pool,Tea Room,Health & Beauty Service,Smoke Shop,Coffee Shop,Restaurant,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,
118,Malabon,Pool,Resort,Snack Place,Filipino Restaurant,Zoo Exhibit,Farmers Market,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,12124.0
161,Taguig,Water Park,Pool,Basketball Court,Zoo Exhibit,Field,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,18652.0
176,Valenzuela,Pool,Tea Room,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,4793.0
180,Pasay,Bakery,Airport Terminal,Pool,Spa,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,
251,Malabon,Concert Hall,Discount Store,Pool,Fast Food Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,11262.0
267,Taguig,Filipino Restaurant,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,Exhibit,Farmers Market,Fast Food Restaurant,49829.0
289,Valenzuela,Donut Shop,Pool,Smoke Shop,Tea Room,Filipino Restaurant,Farmers Market,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,6089.0
340,Valenzuela,Breakfast Spot,Pool,Resort,Zoo Exhibit,Field,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Event Space,20930.0


In [27]:
manila_merged.loc[manila_merged['Cluster Labels'] == 4, manila_merged.columns[[1] + list(range(5, manila_merged.shape[1]))]]

Unnamed: 0,City/Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population
1,Mandaluyong,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,99058.0
2,Mandaluyong,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,3739.0
3,San Juan,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,99058.0
4,San Juan,Bar,Japanese Restaurant,Chinese Restaurant,Bakery,Convenience Store,Bubble Tea Shop,Vegetarian / Vegan Restaurant,Yoga Studio,American Restaurant,Mediterranean Restaurant,3739.0
6,Muntinlupa,Coffee Shop,Fast Food Restaurant,Bakery,Pizza Place,Hotel,Bubble Tea Shop,Restaurant,Convenience Store,Clothing Store,French Restaurant,63793.0
...,...,...,...,...,...,...,...,...,...,...,...,...
593,City of Manila,Park,Plaza,Chinese Restaurant,Café,Snack Place,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,
595,Taguig,Flea Market,Sandwich Place,Burger Joint,Diner,Zoo Exhibit,Farmers Market,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,11177.0
597,San Juan,Multiplex,Shopping Mall,Movie Theater,Coffee Shop,Snack Place,Frozen Yogurt Shop,Bookstore,Mediterranean Restaurant,Japanese Restaurant,Market,16773.0
600,Quezon City,Coffee Shop,Café,Spa,Bar,Chinese Restaurant,Dessert Shop,Soccer Field,Fast Food Restaurant,Beer Garden,Tea Room,4199.0


Looking at the top 3 clusters, we can observe a trend in population vs common venues. 

The orange cluster (cluster 4.0) shows a variety of restaurants as its common venues. We also have bars, bakeries and coffee shops. This cluster happen to contain barangays that are comprised of residential areas. Due to the nature of roadbuilding in Metro Manila, most of these barangays will not be travelled by a person unless he/she lives and/or visits someone in that area. 

The blue cluster (cluster 2.0) shows a spread of common venues. We can see fast food restaurants, electronics stores, gyms, pharmacies, and some niche areas like zoo exhibits. The barangays in the blue cluster happens to be areas that are frequented or commonly visited by people outside that barangay. Most of the venues there have the cheapest rates in some commodities or the area happens to house multiple commercial establishments in a common area inside the barangay.

The red cluster (cluster 0.0) shows a larger spread of common venues compared to the blue cluster. However, one thing in common in these barangays is the frequency of convenience stores in these areas. Most of the barangays in these clusters happen to be very far to the city center to frequent the nearest mall or happen to contain transportation routes that are commonly frequented by people that wish to travel to another place.

We now proceed in plotting the COVID cases through Folium.

In [43]:
manila_merged2 = manila_merged.merge(covid.set_index('Name'), left_on='Neighborhood',right_on='Name',how='left')

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


toronto_merged2 = toronto_merged2.dropna()


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, size in zip(manila_merged2['Latitude'], manila_merged2['Longitude'], manila_merged2['Neighborhood'], manila_merged2['Cluster Labels'].astype(int),manila_merged2['COVID Cases']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=size/500,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

From the plot, we can see that the barangays with higher covid cases are situated near the city center. It also happens that these barangays are also densely populated areas as well. We observe a large distribution of total covid cases in the orange cluster. There is a disparity in total covid cases in the blue cluster. For the red cluster, most of the barangays have a sharp distribution (or small deviation) in covid cases.


We now look at summary of averages below. Looking at the top three clusters (0.0/Red, 2.0/Blue, 4.0/Orange), we can see the respective covid cases and percentage affected. We can see that the orange cluster has the highest percentage of infected, with blue having the second highest and red having the third highest. Looking at the data regarding common venues in the orange cluster, we can correlate how these common venues (restaurants) may have been agents of transmission for locals. 

Taking into account the quarantine measures applied during the pandemic, we should expect that transportation routes should be less frequented compared to pre-pandemic times. As the orange cluster contains residential areas, we should expect high covid cases in those barangays. Comparing the small number of covid cases in the orange and red clusters, we can see that barangays in the red clusters has a higher tally of cases. 

While transporation routes are less frequented, the transportation issue of Metro Manila may have affected transmission in these areas. While blue areas are commonly frequented by people from other barangays, the controlled movement of population may have contributed to the smaller covid cases of barangays in the blue cluster compared to barangays in orange clusters. However, it is still important to note that the common venues in blue clusters are fast food restaurants and shopping malls. These areas may have been agents of transmission and should be recommended to be avoided as much as possible.

In [44]:
manila_merged2.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,Latitude,Longitude,Population,COVID Cases,% Population
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0.0,14.482712,121.127002,24065.733945,808.17,4.075252
1.0,14.632559,121.01015,21708.764706,672.7,3.806586
2.0,14.602364,121.019734,35912.929577,1001.046875,4.092451
3.0,14.665729,121.006285,20147.666667,890.6,3.395679
4.0,14.868267,119.535094,24860.176656,1044.470588,4.234115


## Conclusion <a name="conclusion"></a>

The objective of this study is to determine a correlation between common venues in a neighborhood and the total covid cases in the neighborhood. From context and data averages, we observe a correlation in the two data. While stakeholders may now have an informed decision in further restrictions, more data may be utilized such as the rate of change of cases and travel routes of workers to complement the analysis and further fortify these claims. Nevertheless, we were able to find clusters and/or neighborhoods that may be high-risk such as barangays in Red and Blue clusters.