## Coursera Capstone Project 
# The Battle of Neighborhoods
In this project, I would like to make comparisons between two metropolitan cities in South East Asia, Jakarta and Bangkok. In this analysis we are zooming only to central district of both cities.

In [102]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
import requests 
from pandas.io.json import json_normalize 

#!pip install lxml
from lxml import etree

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# install the geocoder package and import the library

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!pip install geocoder
import geocoder

print('Libraries imported.')

Libraries imported.


### Central Jakarta 

This section is shown how neighborhoods of Central Jakarta is extracted to a dataframe, including the geospatial data. 

Data for boroughs and neighborhoods in Central Jakarta (Ind: Jakarta Pusat) is loaded from an excel list consisting of all boroughs in Indonesia. Below cell shows how the data is extracted, sorted and specified the boroughs located in Jakarta only.

In [103]:
df_        = pd.read_excel('Data Kodepos Indonesia.xlsx',usecols = 'A:C,E:F')
df_Jkt     = df_[df_['Kabupaten'] == 'Jakarta Pusat'].reset_index(drop=True)  
print(df_Jkt.shape)
df_Jkt.head()

(44, 5)


Unnamed: 0,No Kode Pos,Kelurahan,Kecamatan,Kabupaten,Propinsi
0,10110,Gambir,Gambir,Jakarta Pusat,DKI Jakarta
1,10120,Kebon Kelapa,Gambir,Jakarta Pusat,DKI Jakarta
2,10130,Petojo Utara,Gambir,Jakarta Pusat,DKI Jakarta
3,10140,Duri Pulo,Gambir,Jakarta Pusat,DKI Jakarta
4,10150,Cideng,Gambir,Jakarta Pusat,DKI Jakarta


Standardize the column header with the given structure as seen below and delete the unnecessary column from the dataframe

In [87]:
df_Jkt = df_Jkt.drop('Propinsi', 1)
df_Jkt.rename(columns = {'No Kode Pos':'Postal Code', 'Kelurahan':'Neighborhood', 
                              'Kecamatan':'Borough', 'Kabupaten': 'City'}, inplace = True) 
print('df size =',df_Jkt.shape)
df_Jkt.head()

df size = (44, 4)


Unnamed: 0,Postal Code,Neighborhood,Borough,City
0,10110,Gambir,Gambir,Jakarta Pusat
1,10120,Kebon Kelapa,Gambir,Jakarta Pusat
2,10130,Petojo Utara,Gambir,Jakarta Pusat
3,10140,Duri Pulo,Gambir,Jakarta Pusat
4,10150,Cideng,Gambir,Jakarta Pusat


Geospatial data of neighborhoods in Jakarta is added as per below cell.

In [88]:
latitude=[]
longitude=[]
for code in df_Jkt['Postal Code']:
    g = geocoder.arcgis('{}, Jakarta Pusat, DKI Jakarta'.format(code))
    
    latlng = g.latlng
    latitude.append(latlng[0])
    longitude.append(latlng[1])

df_Jkt['Latitude']  = latitude
df_Jkt['Longitude'] = longitude

df_Jkt.head(20)

Unnamed: 0,Postal Code,Neighborhood,Borough,City,Latitude,Longitude
0,10110,Gambir,Gambir,Jakarta Pusat,-6.176395,106.826449
1,10120,Kebon Kelapa,Gambir,Jakarta Pusat,-6.163759,106.824444
2,10130,Petojo Utara,Gambir,Jakarta Pusat,-6.165436,106.815426
3,10140,Duri Pulo,Gambir,Jakarta Pusat,-6.16344,106.804257
4,10150,Cideng,Gambir,Jakarta Pusat,-6.172431,106.808155
5,10160,Petojo Selatan,Gambir,Jakarta Pusat,-6.174288,106.815949
6,10210,Bendungan Hilir,Tanah Abang,Jakarta Pusat,-6.209489,106.809016
7,10220,Karet Tengsin,Tanah Abang,Jakarta Pusat,-6.207506,106.817861
8,10230,Kebon Melati,Tanah Abang,Jakarta Pusat,-6.197152,106.816753
9,10240,Kebon Kacang,Tanah Abang,Jakarta Pusat,-6.190494,106.817182


Use geopy to get the latitude and logitude value of Central Jakarta, DKI Jakarta

In [89]:
address = 'Jakarta Pusat, DKI Jakarta'

geolocator = Nominatim(user_agent="jkpus_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Central Jakarta are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Central Jakarta are -6.1753942, 106.827183.


Show map of Central Jakarta with neighborhoods superimposed on top

In [90]:
import folium 

# create map of Central Jakarta using latitude and longitude values
map_jakpus = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_Jkt['Latitude'], df_Jkt['Longitude'], df_Jkt['Borough'], df_Jkt['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jakpus)  
    
map_jakpus

### Explore Central Jakarta

For exploring the neighborhoods in Central Jakarta, Foursquare credential is required.

In [91]:
CLIENT_ID = 'PSBR41RBPPSLYTTXWKGWY2XC1CYVUAHWZYITIF2QPSJ2UQW1' # your Foursquare ID
CLIENT_SECRET = 'FFLMJOG0HZHJHYS1RENTVI13VYNE0MXXWOCQXP4GTWNFACIL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

With given geospatial location above, the neighborhoods of Central Jakarta will be explored within radius 500m

In [98]:
#### function to get url of all the neighborhoods in Central Jakarta

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [100]:
# write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues

jkpus_venues = getNearbyVenues(names=df_Jkt['Neighborhood'],
                                   latitudes=df_Jkt['Latitude'],
                                   longitudes=df_Jkt['Longitude']
                                  )

Gambir
Kebon Kelapa
Petojo Utara
Duri Pulo
Cideng
Petojo Selatan
Bendungan Hilir
Karet Tengsin
Kebon Melati
Kebon Kacang
Kampung Bali
Petamburan
Gelora
Menteng
Pegangsaan
Cikini
Kebon Sirih
Gondangdia
Senen
Kwitang
Kenari
Paseban
Kramat
Bungur
Cempaka Putih Timur
Cempaka Putih Barat
Galur
Tanah Tinggi
Kampung Rawa
Johar Baru
Rawasari
Gunung Sahari Selatan
Kemayoran
Kebon Kosong
Cempaka Baru
Harapan Mulya
Sumur Batu
Serdang
Utan Panjang
Pasar Baru
Gunung Sahari Utara
Mangga Dua Selatan
Karang Anyar
Kartini


In [101]:
# check the size of resulting data
print(jkpus_venues.shape)
jkpus_venues.head()

(910, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Gambir,-6.176395,106.826449,National Monument Park,-6.175331,106.827047,Park
1,Gambir,-6.176395,106.826449,Museum Nasional Indonesia,-6.176206,106.822854,Museum
2,Gambir,-6.176395,106.826449,Lapangan Silang Monas,-6.177649,106.826851,Plaza
3,Gambir,-6.176395,106.826449,Jogging Track MONAS,-6.175537,106.827134,Track
4,Gambir,-6.176395,106.826449,Starbucks,-6.177147,106.830818,Coffee Shop


Get the category type

In [94]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean the json and structure it into a pandas dataframe.

In [95]:
venues = results['response']['groups'][0]['items']
    
jkpus_nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
jkpus_nearby_venues = jkpus_nearby_venues.loc[:, filtered_columns]

# filter the category for each row
jkpus_nearby_venues['venue.categories'] = jkpus_nearby_venues.apply(get_category_type, axis=1)

# clean columns
jkpus_nearby_venues.columns = [col.split(".")[-1] for col in jkpus_nearby_venues.columns]

jkpus_nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,National Monument Park,Park,-6.175331,106.827047
1,Museum Nasional Indonesia,Museum,-6.176206,106.822854
2,Lapangan Silang Monas,Plaza,-6.177649,106.826851
3,Jogging Track MONAS,Track,-6.175537,106.827134
4,Starbucks,Coffee Shop,-6.177147,106.830818


In [97]:
print('{} venues were returned by Foursquare.'.format(jkpus_nearby_venues.shape[0]))

23 venues were returned by Foursquare.


In [None]:
Analyze each neighborhood in Central Jakarta

In [None]:
# one hot encoding
jkpus_onehot = pd.get_dummies(jkpus_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jkpus_onehot['Neighborhood'] = jkpus_venues['Neighborhood'] 

# move neighborhood column to the first column
#fixed_columns = [jkpus_onehot.columns[154]] + list(jkpus_onehot.columns[0:154]) + list(jkpus_onehot.columns[155:] )
jkpus_onehot = jkpus_onehot[fixed_columns]

jkpus_onehot.head()

## Central Bangkok

This section is shown how neighborhoods of Central Bangkok is extracted to a dataframe, including the geospatial data.

Data for Districs in Bangkok is loaded from Wikipedia. Below cell shows how the data is extracted, sorted and presented for boroughs of Bangkok

In [65]:
Bkk_url = 'https://en.wikipedia.org/wiki/List_of_districts_of_Bangkok'

html        = requests.get(Bkk_url).content
df_Bkklist  = pd.read_html(html) # get the list

df_Bkk      = pd.DataFrame(df_Bkklist[0]) # get the dataframe
print(df_Bkk.shape)
df_Bkk.head()

(50, 8)


Unnamed: 0,District(Khet),MapNr,Post-code,Thai,Popu-lation,No. ofSubdis-trictsKhwaeng,Latitude,Longitude
0,Bang Bon,50,10150,บางบอน,105161,4,13.6592,100.3991
1,Bang Kapi,6,10240,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,40,10160,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,5,10220,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,31,10120,บางคอแหลม,94956,3,13.693333,100.5025


From the table above it is shown that the geospatial data for each neighborhood has been provided. Next step, standardize the column header with the given structure as seen below and delete the unnecessary column(s) the dataframe

In [66]:
df_Bkk = df_Bkk.drop(['MapNr','Popu-lation','No. ofSubdis-trictsKhwaeng'], 1)
df_Bkk.rename(columns = {'District(Khet)':'Neighborhood', 'Post-code':'Postal code', 
                              'Thai':'Neighborhood-in-Thai', 'Latitude': 'Latitude', 'Longitude': 'Longitude'}, inplace = True) 

In [67]:
df_Bkk.reindex(columns = ['Postal code','Neighborhood','Neighborhood-in-Thai','Latitude','Longitude'])

Unnamed: 0,Postal code,Neighborhood,Neighborhood-in-Thai,Latitude,Longitude
0,10150,Bang Bon,บางบอน,13.6592,100.3991
1,10240,Bang Kapi,บางกะปิ,13.765833,100.647778
2,10160,Bang Khae,บางแค,13.696111,100.409444
3,10220,Bang Khen,บางเขน,13.873889,100.596389
4,10120,Bang Kho Laem,บางคอแหลม,13.693333,100.5025
5,10150,Bang Khun Thian,บางขุนเทียน,13.660833,100.435833
6,10260,Bang Na,บางนา,13.680081,100.5918
7,10700,Bang Phlat,บางพลัด,13.793889,100.505
8,10500,Bang Rak,บางรัก,13.730833,100.524167
9,10800,Bang Sue,บางซื่อ,13.809722,100.537222


Use geopy to locate the latitude and longitude of Bangkok, Thailand

In [68]:
address = 'Bangkok, Thailand'

geolocator = Nominatim(user_agent="bkk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangkok are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangkok are 13.7538929, 100.8160803.


Show map of Central Bangkok with superimposed neighborhoods on top

In [77]:
import folium 

# create map of Central Bangkok using latitude and longitude values
map_bkk = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df_Bkk['Latitude'], df_Bkk['Longitude'], df_Bkk['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bkk)  
    
map_bkk

Foursquare credential