# Capstone Project - The Battle of the Neighborhoods (Week 1)
### IBM  Applied Data Science Capstone by Coursera



## Introduction: Business Problem <a name="introduction"></a>

Malaysia is a country located in South East Asia, with over 32 million of population [1]. 

According to the Health Minister Datuk Seri Dr Adham Baba, Malaysia has a total **11,059 dentists** as of June 2020, with the dentist-population ratio at **1:2,963** [2]. There isn't an ideal population-to-dentist ratio recommended by WHO, however, if we compare to the U.S which has ratio of **1:1,638** [2], this would suggests there are rooms of opportunities in providing dental service in Malaysia.

We will use this project to find an optimal location to open a dental clinic. As I resides in Kuala Lumpur, Malaysia, the focus areas will be targeting the vicinity of this city. This report would also suits the stakeholders who are interested starting a **dental clinic in Kuala Lumpur, Malaysia**.

For the selection of an optimal location, we will focus on detecting the area that have **no dental clinics in the vicinity**. It is also preferred if the location can be **surrounded by general clinics or specialist clinics that provide non-dental services**, as this can be an indicator of the potential demand in vicinity. Lastly, the location will have to **close to Kuala Lumpur**.

We will use data and our data science knowledge to identify a list of areas that fit the criteria above. Advantages of each area will also be clearly stated so the stakeholders can make the best possible location selection.



## Data <a name="introduction"></a>

To allow the measurement of the selection criteria listed in the problem statement, the following data are needed: <br>
1. GPS Location of Kuala Lumpur<br>
We will use **geo-location function** to obtain the GPS coordinate of Kuala Lumpur. This GPS coordination will then uses are a center point in calculating the distance. 

2. City near Kuala Lumpur by Postcode<br>
Kuala Lumpur is surrounded by Selangor State, as such the data collection has also extended to Selangor. We have obtain the postcode for cities in both Kuala Lumpur and Selangor from a **public website** [3][4]. The information has then been tidied up to a table format that provides a list of postcode with details on latitude and longitude.

3. Clinics and their types in the areas near Kuala Lumpur<br>
We will use **Foursquare API** to populate the clinics nearby the postcode within the area.


### City Candidates

We will first identify the latitude and longitude of Kuala Lumpur.Then use that as a center point to calculate the distance from the Kuala Lumpur city.

Let's first start wtih gettting the latitude and longitude for Kuala Lumpur using geo-location function.

In [3]:
# import the libraries
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [4]:
# get the GPS location for Kuala Lumpur
address = 'Kuala Lumpur City, MY'

geolocator = Nominatim(user_agent="kuala_lumpur_explorer")
location = geolocator.geocode(address)
lat = location.latitude
lon = location.longitude
print('The geograpical coordinate of Kuala Lumpur City are {}, {}.'.format(lat, lon))

The geograpical coordinate of Kuala Lumpur City are 3.1516964, 101.6942371.


<br>
Now, let's calculate the distance of the each city from Kuala Lumpur. <br> <br>


In order to calculate the accurately calculate distances in meters, we need to create a grid of locations in Cartesian 2D coordinate system. Then, we will use the city's latitude and longitude degrees to calculate the distance

<br>
Now, let's get the list of the city in Kuala Lumpur and Selangor from the table we prepared earlier and use Haversine Distance formula to calculate the distance of each city to Kuala Lumpur.

In [5]:
import pandas as pd

url = 'https://raw.githubusercontent.com/shilingt/Coursera_Capstone/main/Klang%20Valley%20Postcode.csv'
city_postcode= pd.read_csv(url)

city_postcode.head()

Unnamed: 0,State,City,Postcode,Latitude,Longitude
0,Kuala Lumpur,Kuala Lumpur,50000,3.1433,101.6955
1,Kuala Lumpur,Kuala Lumpur,50050,3.1451,101.6945
2,Kuala Lumpur,Kuala Lumpur,50088,3.1479,101.7008
3,Kuala Lumpur,Kuala Lumpur,50100,3.1513,101.6947
4,Kuala Lumpur,Kuala Lumpur,50150,3.1406,101.6955


<br>

Use Haversine formula to determine the great-circle distance between two points on a sphere given their longitudes and latitudes[5]. And the radius of planet Earth is 6471km.

In [6]:
#import libraries
import numpy as np

# define Haversine Fomurla where radius of Earth is 6,371KM
def haversine_distance(lat1, lon1, lat2, lon2):
   r = 6371
   phi1 = np.radians(lat1)
   phi2 = np.radians(lat2)
   delta_phi = np.radians(lat2 - lat1)
   delta_lambda = np.radians(lon2 - lon1)
   a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
   res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
   return np.round(res, 2)

Now we are ready to start calculating the distances. We can do so with a simple loop, storing distances in a list temporary

In [7]:
# set the stating geo-location to the Kuala Lumpur Latitude and Longitude
start_lat = lat
start_lon = lon

distances_km = []

for row in city_postcode.itertuples(index=False):
   distances_km.append(
       haversine_distance(start_lat, start_lon, row.Latitude, row.Longitude)
   )


Then, transform this list into a new column in our Dataframe:

In [8]:
city_postcode['DistanceFromKL'] = pd.DataFrame(distances_km)
city_postcode

Unnamed: 0,State,City,Postcode,Latitude,Longitude,DistanceFromKL
0,Kuala Lumpur,Kuala Lumpur,50000,3.1433,101.6955,0.94
1,Kuala Lumpur,Kuala Lumpur,50050,3.1451,101.6945,0.73
2,Kuala Lumpur,Kuala Lumpur,50088,3.1479,101.7008,0.84
3,Kuala Lumpur,Kuala Lumpur,50100,3.1513,101.6947,0.07
4,Kuala Lumpur,Kuala Lumpur,50150,3.1406,101.6955,1.24
...,...,...,...,...,...,...
554,Selangor,Tanjong Sepat,42800,2.7691,101.5625,44.99
555,Selangor,Telok Panglima Garang,42425,3.0195,101.5246,23.89
556,Selangor,Telok Panglima Garang,42500,3.0639,101.5355,20.15
557,Selangor,Telok Panglima Garang,42507,2.9129,101.4686,36.51


As we are only interested in the cities that close to Kuala Lumpur.So we will set to consider the city that are within the 10KM radius from the Kuala Lumpur.

In [9]:
kl_vicinity = city_postcode[(city_postcode['DistanceFromKL']<=10)]
kl_vicinity = kl_vicinity.sort_values('DistanceFromKL',ascending='True')
kl_vicinity = kl_vicinity.reset_index(drop=True)
kl_vicinity

Unnamed: 0,State,City,Postcode,Latitude,Longitude,DistanceFromKL
0,Kuala Lumpur,Kuala Lumpur,50100,3.1513,101.6947,0.07
1,Kuala Lumpur,Kuala Lumpur,50350,3.1512,101.6956,0.16
2,Kuala Lumpur,Kuala Lumpur,50512,3.1503,101.6924,0.26
3,Kuala Lumpur,Kuala Lumpur,50634,3.1488,101.6952,0.34
4,Kuala Lumpur,Kuala Lumpur,50400,3.1490,101.6970,0.43
...,...,...,...,...,...,...
292,Selangor,Petaling Jaya,46400,3.0762,101.6529,9.57
293,Selangor,Petaling Jaya,46200,3.0832,101.6407,9.66
294,Selangor,Petaling Jaya,47300,3.1097,101.6171,9.75
295,Selangor,Subang Jaya,47650,3.1402,101.6072,9.75


In [10]:
print('There {} of postcodes that are within the 10KM radius from Kuala Lumpur'.format(len(kl_vicinity)))

There 297 of postcodes that are within the 10KM radius from Kuala Lumpur


Now, we will visualise the location of these postcodes that are within the 10KM radius from Kuala Lumpur

In [11]:
#!pip install folium

import folium

map_kl = folium.Map(location=(start_lat,start_lon), zoom_start=12)
folium.Marker((start_lat,start_lon), popup='Alexanderplatz').add_to(map_kl)
for lat, lon in zip(kl_vicinity['Latitude'], kl_vicinity['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_kl) 
    folium.Circle([lat, lon], radius=500, color='blue', fill=False).add_to(map_kl)
    #folium.Marker([lat, lon]).add_to(map_kl)
map_kl

From the map, postcode is not a good neighborhood center point selection, as the radius of 500m were not able to cover all the areas. Some of the postcode areas are closely overlap where a significant of areas are not covered.
To solve this issue, we will use Foursquare to locate the medical centers within the 2KM of the postcode, then assess if medical centers will be the better neighborhood center point selection.

### Foursquare
Now that we have a list of postcodes that is close to Kuala Lumpur, let's use Foursquare API to get info on medical centers in each neighborhood.

Fill up our Foursquare credentials

In [12]:
client_id = 'KWFZQK0QDTDEW3SCX11HN1BKOLPAYMXQZ4QF4U2A1CRNKOJQ' 
client_secret = 'WNMIWSTTHFTM2ITAFEYTFWDRJWRD2GJJUQCZLUNPFBIZMKRA' 
access_token = 'V2V54KY2UEBWOEBNEN3JKE43FM551OSOUZXPTM0QHNZYILZB' 

#client_id='W2XC0FLXQWZSBBLD3W1E3PHTGUHGGTH55BHODIYU1OYDTOEP'
#client_secret = 'CAVE5BSYB0MZFJLKZDJ5YADJNPE1MPGSQ5KAKJUOIKSMJ2BS'
#access_token = 'E1MAX5GF2RW4A3AAJ24ZWHKWK1DGRCU5VDMDLI0WNP3UE44T'

version = '20180724'
print('Your credentails:')
print('CLIENT_ID: ' + client_id)
print('CLIENT_SECRET:' + client_secret)

Your credentails:
CLIENT_ID: KWFZQK0QDTDEW3SCX11HN1BKOLPAYMXQZ4QF4U2A1CRNKOJQ
CLIENT_SECRET:WNMIWSTTHFTM2ITAFEYTFWDRJWRD2GJJUQCZLUNPFBIZMKRA


In [13]:
# Category IDs corresponding to medical_centers were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories)
# We are also defining the dental clinics category to ease and simplifying the coding in section later

medical_category = '4bf58dd8d48988d104941735'
dentist_sub_category = ['4bf58dd8d48988d178941735']

def is_medical_center(categories, specific_filter=None):
    medical_words = ['Medical Center']
    medical_center = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in medical_words:
            if r in category_name:
                medical_center = True
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            medical_center = True
    return medical_center, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Kuala Lumpur', '')
    address = address.replace(', Selangor', '')
    address = address.replace(', Malaysia', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    import requests
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues


In [14]:
# Let's now go over our neighborhood locations and get nearby medical services; we'll also maintain a dictionary of all found medical centers and all found dental clinics.

import pickle

def get_new_list(lats, lons):
    medical_centers_initial = {}
    dental_clinics_ = {}
    location_dentals_ = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=2km to make sure we have overlaps/full coverage so we don't miss any medical center (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, medical_category, client_id, client_secret, radius=2000, limit=300)
        area_medical_centers = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_med = is_medical_center(venue_categories, specific_filter=None)
            is_dental = is_medical_center(venue_categories, specific_filter=dentist_sub_category)[1]
            if is_med:
                medical_center = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_dental)
                if venue_distance<=2000:
                    area_medical_centers.append(medical_center)
                medical_centers_initial[venue_id] = medical_center
                if is_dental:
                    dental_clinics_[venue_id] = medical_center
            location_dentals_.append(area_medical_centers)
        print(' .', end='')
    print(' done.')
    return medical_centers_initial, dental_clinics_, location_dentals_

medical_centers_initial, dental_clinics_, location_dentals_ = get_new_list(kl_vicinity['Latitude'], kl_vicinity['Longitude'])

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


Next, we will assess if the medical centers is a good neighborhood selection area by have it visualize on the map.

In [15]:
# turn the medical centers into DataFrame and assign column name
df_medical_centers = pd.DataFrame(medical_centers_initial)
df_medical_centers = df_medical_centers.transpose()
df_medical_centers = df_medical_centers.reset_index(drop=True)
df_medical_centers.columns = ['Venue ID','Venue','Latitude','Longitude','Address','Venue Distance','is dental']
df_medical_centers.drop(['Venue Distance'],axis=1,inplace=True)
df_medical_centers

Unnamed: 0,Venue ID,Venue,Latitude,Longitude,Address,is dental
0,4c20af098082d13ab8f9f72a,AL-Islam Specialist Hospital (KBMC),3.16347,101.704,"85 Jalan Raja Abdullah (Kg Baru), 50300 Kuala ...",False
1,4d05810654d0236a5ccff4d5,Klinik Kakitangan DBKL,3.15259,101.695,"Menara DBKL I (Jalan Raja Laut), 50350 Kuala L...",False
2,4c6b710c99b9236ae3bce0c9,Sachdev Skin Specialist,3.15255,101.696,Malaysia,False
3,4d798a23ceaa224bb4c0fb70,Klinik Chong Dispensary,3.15167,101.698,Malaysia,False
4,4b95ff7af964a520b4b934e3,Twin Towers Medical Centre,3.15675,101.711,"Suria KLCC (LC 402-404, Level 4, Lot C), 50088...",False
...,...,...,...,...,...,...
1636,4cfdcfb0c6cca35dc2e09832,Kedai Ubat Yun Choong Tong,3.0878,101.738,"Taman Taynton View (Cheras), Federal Territory...",False
1637,578c4e05498e639d77b924dc,漢生堂 Han Sheng Tang Chinese Medical Centre,3.07388,101.739,Malaysia,False
1638,56f16180498ecc5661eb10c3,Klinik HealthMate,3.0701,101.743,"Alam Damai, 56000 Cheras",False
1639,57038d56498e5706dd6236d3,OneMeds Pharmacy,3.06978,101.743,Malaysia,False


In [16]:
# caluclate the distance from KL
distances_km = []

for row in df_medical_centers.itertuples(index=False):
   distances_km.append(
       haversine_distance(start_lat, start_lon, row.Latitude, row.Longitude)
   )

df_medical_centers['DistanceFromKL'] = pd.DataFrame(distances_km)
df_medical_centers

Unnamed: 0,Venue ID,Venue,Latitude,Longitude,Address,is dental,DistanceFromKL
0,4c20af098082d13ab8f9f72a,AL-Islam Specialist Hospital (KBMC),3.16347,101.704,"85 Jalan Raja Abdullah (Kg Baru), 50300 Kuala ...",False,1.67
1,4d05810654d0236a5ccff4d5,Klinik Kakitangan DBKL,3.15259,101.695,"Menara DBKL I (Jalan Raja Laut), 50350 Kuala L...",False,0.10
2,4c6b710c99b9236ae3bce0c9,Sachdev Skin Specialist,3.15255,101.696,Malaysia,False,0.24
3,4d798a23ceaa224bb4c0fb70,Klinik Chong Dispensary,3.15167,101.698,Malaysia,False,0.41
4,4b95ff7af964a520b4b934e3,Twin Towers Medical Centre,3.15675,101.711,"Suria KLCC (LC 402-404, Level 4, Lot C), 50088...",False,1.95
...,...,...,...,...,...,...,...
1636,4cfdcfb0c6cca35dc2e09832,Kedai Ubat Yun Choong Tong,3.0878,101.738,"Taman Taynton View (Cheras), Federal Territory...",False,8.62
1637,578c4e05498e639d77b924dc,漢生堂 Han Sheng Tang Chinese Medical Centre,3.07388,101.739,Malaysia,False,9.99
1638,56f16180498ecc5661eb10c3,Klinik HealthMate,3.0701,101.743,"Alam Damai, 56000 Cheras",False,10.55
1639,57038d56498e5706dd6236d3,OneMeds Pharmacy,3.06978,101.743,Malaysia,False,10.58


In [20]:
# Limit the data to only those within 10km from Kuala Lumpur
kl_vicinity2 = df_medical_centers[(df_medical_centers['DistanceFromKL']<10)]
kl_vicinity2 = kl_vicinity2.sort_values('DistanceFromKL',ascending='True')
kl_vicinity2 = kl_vicinity2.reset_index(drop=True)
kl_vicinity2

Unnamed: 0,Venue ID,Venue,Latitude,Longitude,Address,is dental,DistanceFromKL
0,4e9cdaaa9adfe5e71c8d4ffb,Bilik Sakit Mental & Jiwa BRCC,3.15188,101.695,"Kuala Lumpur, Federal Territory of Kuala Lum",False,0.09
1,4d05810654d0236a5ccff4d5,Klinik Kakitangan DBKL,3.15259,101.695,"Menara DBKL I (Jalan Raja Laut), 50350 Kuala L...",False,0.10
2,4d3a3df249cb236ac329b847,Sachdev Skin Clinic,3.15243,101.696,"6th Floor, Wisma Gurcharan (71-75, Jalan Tuank...",False,0.22
3,4efa64fc722e340611e5422a,Qualitas Medic Clinic (W.K),3.15356,101.695,"1A Floor,Jalan Raja Laut (Epf Building), 50350...",False,0.22
4,4c6b710c99b9236ae3bce0c9,Sachdev Skin Specialist,3.15255,101.696,Malaysia,False,0.24
...,...,...,...,...,...,...,...
1534,50d9a202e4b0dbf31aacf256,Parkcity Dental Clinic,3.19618,101.617,"Lot B-F-3A, Ativo Plaza, No.1, Jalan PJU 9/1,...",True,9.94
1535,4cf7229571538cfab8f0b92e,Center For Sight (PJ),3.11593,101.612,Malaysia,False,9.95
1536,4ef57963775b54cdb2d7dccb,Centre For Sight Laser Eye Centre 慧眼眼科专科诊所,3.11593,101.612,"No. 1-1, Jalan SS23/15, Taman SEA, 47400 Petal...",False,9.96
1537,4d5cd79f5d153704eb5377e7,Klinik Medi-Link PJ City,3.0905,101.629,"Ground Floor, Tower B, PJ City, 46100 Petaling...",False,9.96


The location has now decreases from 1,641 to 1,539 after filter to within 10KM. Now, let's plot it on the map and assess.

In [21]:
import folium

map_kl = folium.Map(location=(start_lat,start_lon), zoom_start=12)
folium.Marker((start_lat,start_lon), popup='Alexanderplatz').add_to(map_kl)
for lat, lon in zip(kl_vicinity2['Latitude'], kl_vicinity2['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_kl) 
    folium.Circle([lat, lon], radius=500, color='blue', fill=False).add_to(map_kl)
    #folium.Marker([lat, lon]).add_to(map_kl)
map_kl

Seems like many of the locations are overlapping with each other. Let's try remove the locations which are very close to each other, to reduce the number of location that will be query via Foursquare in the steps later.

In [22]:
# We will consider thatthe location are very close to each other if they are the same latitude and longitude (upto 3 decimals)
kl_vicinity2['TrLat'] = np.trunc(1000*kl_vicinity2['Latitude'])/1000
kl_vicinity2['TrLon'] = np.trunc(1000*kl_vicinity2['Longitude'])/1000
kl_vicinity2.drop_duplicates(subset = ('TrLat','TrLon'), keep=False, inplace =True)
kl_vicinity2

Unnamed: 0,Venue ID,Venue,Latitude,Longitude,Address,is dental,DistanceFromKL,TrLat,TrLon
0,4e9cdaaa9adfe5e71c8d4ffb,Bilik Sakit Mental & Jiwa BRCC,3.15188,101.695,"Kuala Lumpur, Federal Territory of Kuala Lum",False,0.09,3.151,101.695
1,4d05810654d0236a5ccff4d5,Klinik Kakitangan DBKL,3.15259,101.695,"Menara DBKL I (Jalan Raja Laut), 50350 Kuala L...",False,0.10,3.152,101.694
3,4efa64fc722e340611e5422a,Qualitas Medic Clinic (W.K),3.15356,101.695,"1A Floor,Jalan Raja Laut (Epf Building), 50350...",False,0.22,3.153,101.694
5,528ed36f498ea0152cbafc1a,Bala neuro Medical,3.15416,101.695,Malaysia,False,0.28,3.154,101.694
8,4d3e51e46b3d236a5cc47164,Klinik Chin,3.15404,101.696,Jalan Tunku Abdul Rahman,False,0.35,3.154,101.696
...,...,...,...,...,...,...,...,...,...
1531,4becd8432cf820a11f59b91c,Klinik Pergigian Low,3.11455,101.613,Malaysia,True,9.93,3.114,101.612
1533,4c183db56a21c9b619d6c897,Yeoh Veterinary Clinic & Surgery,3.11385,101.613,"126, Jalan SS24/2, Taman Megah,, 47301 Petalin...",False,9.94,3.113,101.613
1534,50d9a202e4b0dbf31aacf256,Parkcity Dental Clinic,3.19618,101.617,"Lot B-F-3A, Ativo Plaza, No.1, Jalan PJU 9/1,...",True,9.94,3.196,101.616
1537,4d5cd79f5d153704eb5377e7,Klinik Medi-Link PJ City,3.0905,101.629,"Ground Floor, Tower B, PJ City, 46100 Petaling...",False,9.96,3.09,101.628


Now, we have the locations drops to 678. Lets plot the map to see how it looks like.

In [23]:
import folium

map_kl = folium.Map(location=(start_lat,start_lon), zoom_start=12)
folium.Marker((start_lat,start_lon), popup='Alexanderplatz').add_to(map_kl)
for lat, lon in zip(kl_vicinity2['Latitude'], kl_vicinity2['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_kl) 
    folium.Circle([lat, lon], radius=500, color='blue', fill=False).add_to(map_kl)
    #folium.Marker([lat, lon]).add_to(map_kl)
map_kl

Looks good. The overlapping has reduced significantly and the area nearby Kuala Lumpur are still nicely covered when radius is 500m. So, we will use this location to generate the medical centers nearby, with the identification of dental clinics.

In [24]:
# save dataframe
import pickle
kl_vicinity2.to_pickle('kl_vicinity2.pkl')

Now, we will look for the nearby medical services from the 678 locations in the list.

In [51]:
# Let's now go over our neighborhood locations and get nearby medical services; we'll also maintain a dictionary of all found medical centers and all found dental clinics.

import pickle

def get_medical_centers(lats, lons):
    medical_centers = {}
    dental_clinics = {}
    location_medical_centers

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=500m to make sure we have overlaps/full coverage so we don't miss any medical center (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, medical_category, client_id, client_secret, radius=500, limit=100)
        area_medical_centers = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_med = is_medical_center(venue_categories, specific_filter=None)
            is_dental = is_medical_center(venue_categories, specific_filter=dentist_sub_category)[1]
            if is_med:
                medical_center = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_dental)
                if venue_distance<=500:
                    area_medical_centers.append(medical_center)
                medical_centers[venue_id] = medical_center
                if is_dental:
                    dental_clinics[venue_id] = medical_center
        location_medical_centers.append(area_medical_centers)
        print(' .', end='')
    print(' done.')
    return medical_centers, dental_clinics, location_medical_centers

# Try to load from local file system in case we did this before
medical_centers = {}
dental_clinics = {}
location_medical_centers = []
loaded = False
try:
    with open('medical_centers_500.pkl', 'rb') as f:
        medical_centers = pickle.load(f)
    with open('dental_clinics_500.pkl', 'rb') as f:
        dental_clinics = pickle.load(f)
    with open('location_medical_centers_500.pkl', 'rb') as f:
        location_medical_centers = pickle.load(f)
    print('Medical Centers data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    medical_centers, dental_clinics, location_medical_centers = get_medical_centers(kl_vicinity2['Latitude'], kl_vicinity2['Longitude'])
    
    # Let's persists this in local file system
    with open('medical_centers_500.pkl', 'wb') as f:
        pickle.dump(medical_centers, f)
    with open('dental_clinics_500.pkl', 'wb') as f:
        pickle.dump(dental_clinics, f)
    with open('location_medical_centers_500.pkl', 'wb') as f:
        pickle.dump(location_medical_centers, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [26]:
import numpy as np

print('Total number of Medical Centers:', len(medical_centers))
print('Total number of Dental Clinics:', len(dental_clinics))
print('Percentage of Dental Clinics: {:.2f}%'.format(len(dental_clinics) / len(medical_centers) * 100))
print('Average number of Medical Centers in each city Postcode:', np.array([len(r) for r in location_medical_centers]).mean())

Total number of Medical Centers: 2636
Total number of Dental Clinics: 329
Percentage of Dental Clinics: 12.48%
Average number of Medical Centers in each city Postcode: 17.874631268436577


In [28]:
print('List of all medical centers')
print('-----------------------')
for r in list(medical_centers.values())[:10]:
    print(r)
print('...')
print('Total:', len(medical_centers))

List of all medical centers
-----------------------
('4c0f3c6198102d7f2afce406', 'Klinik Pakar Kulit Md Noh', 3.153035973861806, 101.69652467026499, 'Jalan tunku abdul rahman', 358, False)
('4cb3c2c61168a09343994423', 'Klinik Kumpulan Medic (KWSP)', 3.1513110438866234, 101.69530039570797, 'Malaysia', 380, False)
('4dc9dde8c65bebb82f2c8c73', 'Klinik Kumpulan Medic', 3.1532184279898634, 101.69500270079763, 'Bangunan KWSP, 50000 Kuala Lumpur, Federal Territory of Kuala Lum', 444, False)
('4efa64fc722e340611e5422a', 'Qualitas Medic Clinic (W.K)', 3.153556958758094, 101.69478506422777, '1A Floor,Jalan Raja Laut (Epf Building), 50350 Wilayah Persekutuan', 476, False)
('4c6b710c99b9236ae3bce0c9', 'Sachdev Skin Specialist', 3.1525515529465094, 101.69621789461728, 'Malaysia', 416, False)
('4d05810654d0236a5ccff4d5', 'Klinik Kakitangan DBKL', 3.1525850485260944, 101.6945036152386, 'Menara DBKL I (Jalan Raja Laut), 50350 Kuala Lumpur', 347, False)
('4d798a23ceaa224bb4c0fb70', 'Klinik Chong Dispen

In [29]:
print('List of Dental Clinics')
print('---------------------------')
for r in list(dental_clinics.values())[:10]:
    print(r)
print('...')
print('Total:', len(dental_clinics))

List of Dental Clinics
---------------------------
('4ecc4df5e5fa85e5ec2a51b4', 'wilayah mahsa dental clinic', 3.153581491588436, 101.69649146190253, 'Malaysia', 486, True)
('4cf356787bf3b60c08906a7f', 'Nair Dental Surgery', 3.1536033761248055, 101.69787744944841, 'Malaysia', 478, True)
('4d23e3a83c026ea8a269874e', 'Kawauchi Dental Clinic', 3.156928, 101.696481, 'Malaysia', 404, True)
('4ecc4f089a528a8ec6a36e9d', 'Mahsa Dental Clinic', 3.15618112352476, 101.69911546918516, 'Jalan Dang Wangi (Jalan Munshi Abdullah), Federal Territory of Kuala Lum', 320, True)
('510f2aabe4b0428ee3976de1', 'Dr. Wong Dental Specialist', 3.1578684726530923, 101.69734759259626, 'Malaysia', 482, True)
('4d018c9a85c6a14336bc5337', 'Lim Dental', 3.1466258700852476, 101.69919886033233, '11 Jalan Tun Tan Cheng Lock', 500, True)
('4ce1cd107e2e236a0162921b', 'Klinik Pergigian Cahaya Suria', 3.1465102786715615, 101.69917401089789, 'Kuala Lumpur', 500, True)
('502c5432e4b06e61e070d222', 'Unit Ortodontik KP Cahaya Sur

In [30]:
print('Medical Centers around location')
print('---------------------------')
for i in range(670, 678):
    rs = location_medical_centers[i][:15]
    names = ', '.join([r[1] for r in rs])
    print('Medical Centers around location {}: {}'.format(i+1, names))

Medical Centers around location
---------------------------
Medical Centers around location 671: Klinik Mediviron, Mediviron Clinic, Tmn Sri Sentosa, Klinik H. P. Kwok, Klinik mediviron taman sri sentosa, Klinik Sri Sentosa, Chang Clinic, Klinik Ahmad Shah, Klinik TTDI Sri Manja, Klinik TTDI, Manja Square, Petaling Jaya, Klinik Pergigian Datta
Medical Centers around location 672: Clinic Aw, Klinik Aw Sri Damansara, Tabib Cina Siew Hong, Eye Clinic, Wisma Twintech
Medical Centers around location 673: Yeoh Veterinary Clinic & Surgery, Centre For Sight Laser Eye Centre 慧眼眼科专科诊所, Center For Sight (PJ), FT Wong Clinic Taman Megah, Yee Chou Acupuncture, Klinik Petaling, Klinik Ling, Klinik Low Pergigian, 宇宙中医针灸铁打诊所, Sime Darby Specialist Center, BP Specialist Centre, Hauz of Smile Dental Care, Klinik Pergigian Low, L.H. Ong Dental Surgery, Sim & Hooi Dental Centre
Medical Centers around location 674: Yeoh Veterinary Clinic & Surgery, Centre For Sight Laser Eye Centre 慧眼眼科专科诊所, Center For Sig

Now, let's visualize the data we have: Kuala Lumpur City Center and the locations of medical centers in blue and dental clinics in red. White circle is 10km radius from City Center 

In [31]:
#!pip install folium

import folium

map_kl = folium.Map(location=(start_lat,start_lon), zoom_start=12)
folium.Marker((start_lat,start_lon), popup='Alexanderplatz').add_to(map_kl)
folium.Circle((start_lat,start_lon), radius=10000, fill=False, color='white').add_to(map_kl)
for med in medical_centers.values():
    lat = med[2]; lon = med[3]
    is_dental = med[6]
    color = 'red' if is_dental else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_kl)

map_kl

Looking good. So we now have all the medical centers that are within 10km from Kuala Lumpur, and able to tell which one are the dental clinics.
Let's clean the dataset up and removed the unwanted columns.

In [36]:
kl_vicinity2.drop( ['TrLat','TrLon'], axis=1, inplace=True)
kl_vicinity2.reset_index(drop=True, inplace=True)
kl_vicinity2.head()

Unnamed: 0,Venue ID,Venue,Latitude,Longitude,Address,is dental,DistanceFromKL
0,4e9cdaaa9adfe5e71c8d4ffb,Bilik Sakit Mental & Jiwa BRCC,3.15188,101.695,"Kuala Lumpur, Federal Territory of Kuala Lum",False,0.09
1,4d05810654d0236a5ccff4d5,Klinik Kakitangan DBKL,3.15259,101.695,"Menara DBKL I (Jalan Raja Laut), 50350 Kuala L...",False,0.1
2,4efa64fc722e340611e5422a,Qualitas Medic Clinic (W.K),3.15356,101.695,"1A Floor,Jalan Raja Laut (Epf Building), 50350...",False,0.22
3,528ed36f498ea0152cbafc1a,Bala neuro Medical,3.15416,101.695,Malaysia,False,0.28
4,4d3e51e46b3d236a5cc47164,Klinik Chin,3.15404,101.696,Jalan Tunku Abdul Rahman,False,0.35



This conclude our data gathering and we are now ready to use this data for analysis to determine the optimal location selection for our dental clinic, 

## Appendix <a name="introduction"></a>

[1] Current Population Estimates, Malaysia, 2021 https://www.dosm.gov.my/v1/index.php?r=column/cthemeByCat&cat=155&bul_id=ZjJOSnpJR21sQWVUcUp6ODRudm5JZz09&menu_id=L0pheU43NWJwRWVSZklWdzQ4TlhUUT09 <br> 
[2]  'Msia doctor-population ratio stands at 1: 454' by New Straits Times, 2020 https://www.nst.com.my/news/nation/2020/08/613844/msia-doctor-population-ratio-stands-1-454 <br>
[3] Kuala Lumpur Postcode list http://malaysia.postcode.info/kuala-lumpur/kuala-lumpur <br>
[4] Selangor Postcode list http://malaysia.postcode.info/selangor/<br>
[5] Haversine Formula https://en.wikipedia.org/wiki/Haversine_formula <br>