# Capstone Project - Where to open a health facility in a developing country

## Introduction & Background
In a developing country like Pakistan, health facilities are not available and at times people have to travel a great deal to see a medical practitioner or expert pertaining to their needs.  With scarce resources, it is imporatnt to make a right decision when opening a health facility to ensure maximum accessibility and coverage.
From a business point of view, one would open a facility where population density is higher and there is little competition.  In contrast, from an accessibility point of view, the facility should be located where there is little or no alternative available nearby.
Therefore, the factors to consider are:
* Population in the vicinity (within x miles/kilometers)
* Presense of facilities in the vicinity

## Data
As data available is not easily available and need to be manually gathered or using other APIs, the scope is restricted to certain or ALL districts of Sindh Provice.  Pakistan has four provinces (and other administrative units).  Sindh Province has 29 districts.  Depending on the time, some or all of 29 districts data would be used for the project.

FourSquare API would be used to explore presense of Medical centers; It is one of the parent categories and include various sub-categories.  Geopy would be used to get geo coordinates of districts.


### Initial Data Exploration
Data about districts of Sindh province was found here: https://en.wikipedia.org/wiki/Districts_of_Sindh

Geopy was used for coordinates; Some of them were incorrect and were manually corrected.  Work done is provided in **Appendix A** below.

Foursquare API was used to search medical centers nearby.  Summary of results are provided in **Appendix B** below.

## Limitations
The results of the work depend on the accuracy of foursquare API data.


### Request
**Not sure whether the above level of detail is sufficient, please feel free to comment whether any of the above need more elaboration**

# Appendix A

In [128]:
import pandas as pd
import numpy as np

In [3]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [65]:
CLIENT_ID = 'J0KEZIR01SNVEOP2XXP21HP1I1MW2HUQDLW2NPIYSQDHKPGY'
# '5LIOP4P3K4JN335DOX15LH1WRJFQ4QFRK0HU1QOMRJLA4DSE' # your Foursquare ID
CLIENT_SECRET = '2OMWJZGFUHGU1Z2U1FL5QWJBW4UFS5LAS3HHKCI1ND0OD4DM'
# 'TRMRMBB0OYOLXKWJSHVA4U11LCHMHKVXDILGBE02FD5WY3RA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius
import requests # library to handle requests

In [5]:
# Webpage url                                                                                                               
url = 'https://en.wikipedia.org/wiki/Districts_of_Sindh'

# Extract tables
dfs = pd.read_html(url)

df = dfs[0]
df.dropna(how='all',inplace=True)
print(df.shape)
#print(df.columns)
#print(len(df.index))

(29, 7)


In [6]:
# fix district names
df.loc[(df.District == 'Naushahro Firoze'), 'District'] = 'Naushahro Feroze'
df.loc[(df.District == 'Shaheed Benazirabad (formerly Nawabshah)'), 'District'] = 'Nawabshah'
df.loc[(df.District == 'Umerkot'), 'District'] = 'Umarkot'
df.loc[(df.District == 'Qambar Shahdadkot'), 'District'] = 'Qambar'

In [7]:
# let's see whether we can get coordinates from geopy

df['lng'] = 0.0
df['lat'] = 0.0
df.columns

for ind in range(len(df.index)):
    #print(ind)
    name = df.at[ind, 'District']
    address = name +', Sindh, Pakistan'

    geolocator = Nominatim(user_agent="pakistan")
    location = geolocator.geocode(address)
    if location is None:
        print('The geographical coordinate of District {} are not found!'.format(name))
    else:
        latitude = location.latitude
        longitude = location.longitude
        #print('The geograpical coordinate of District {} are {}, {}.'.format(name, latitude, longitude))
        df.at[ind, 'lng'] = longitude
        df.at[ind, 'lat'] = latitude

df

Unnamed: 0,Map,Sr. No.,District,Headquarters,Area (km²),Population (in 2017),Density (people/km²),lng,lat
0,,1,Badin,Badin,6470.0,1804516.0,279.0,68.840151,24.655167
1,,2,Dadu,Dadu,8034.0,1550266.0,193.0,67.771833,26.732137
2,,3,Ghotki,Mirpur Mathelo,6506.0,1647239.0,253.0,69.31411,28.002629
3,,4,Hyderabad,Hyderabad,1022.0,2201079.0,2155.0,68.375038,25.380102
4,,5,Jacobabad,Jacobabad,2771.0,1006297.0,363.0,68.436436,28.281309
5,,6,Jamshoro,Jamshoro,11250.0,993142.0,88.0,68.266172,25.400723
6,,7,Karachi Central,Karachi,62.0,2972639.0,48336.0,67.184777,25.14469
7,,8,Kashmore,Kandhkot,2551.0,1089169.0,427.0,69.581475,28.432292
8,,9,Khairpur,Khairpur,15925.0,2405523.0,151.0,68.763411,27.52954
9,,10,Larkana,Larkana,1906.0,1524391.0,800.0,68.210151,27.55648


In [47]:
# fix incorrect longitudes and latitudes
df.loc[(df.District == 'Sanghar'), 'lng'] = 68.9316359
df.loc[(df.District == 'Sanghar'), 'lat'] = 26.0455708
# karachi central 24.944166,66.9784251
df.loc[(df.District == 'Karachi Central'), 'lng'] = 66.9784251
df.loc[(df.District == 'Karachi Central'), 'lat'] = 24.944166
# karachi east 24.8708942,67.0466539
df.loc[(df.District == 'Karachi East'), 'lng'] = 67.0466539
df.loc[(df.District == 'Karachi East'), 'lat'] = 24.8708942
# karachi south 24.844128,66.980173
df.loc[(df.District == 'Karachi South'), 'lng'] = 66.980173
df.loc[(df.District == 'Karachi South'), 'lat'] = 24.844128
# karachi west 24.9287243,66.6935111
df.loc[(df.District == 'Karachi West'), 'lng'] = 66.959243
df.loc[(df.District == 'Karachi West'), 'lat'] = 24.969503
df

Unnamed: 0,Map,Sr. No.,District,Headquarters,Area (km²),Population (in 2017),Density (people/km²),lng,lat
0,,1,Badin,Badin,6470.0,1804516.0,279.0,68.840151,24.655167
1,,2,Dadu,Dadu,8034.0,1550266.0,193.0,67.771833,26.732137
2,,3,Ghotki,Mirpur Mathelo,6506.0,1647239.0,253.0,69.31411,28.002629
3,,4,Hyderabad,Hyderabad,1022.0,2201079.0,2155.0,68.375038,25.380102
4,,5,Jacobabad,Jacobabad,2771.0,1006297.0,363.0,68.436436,28.281309
5,,6,Jamshoro,Jamshoro,11250.0,993142.0,88.0,68.266172,25.400723
6,,7,Karachi Central,Karachi,62.0,2972639.0,48336.0,66.978425,24.944166
7,,8,Kashmore,Kandhkot,2551.0,1089169.0,427.0,69.581475,28.432292
8,,9,Khairpur,Khairpur,15925.0,2405523.0,151.0,68.763411,27.52954
9,,10,Larkana,Larkana,1906.0,1524391.0,800.0,68.210151,27.55648


# Appendix B

In [68]:
venues_list=[]

for name, lng, lat in zip(df['District'], df['lng'], df['lat']):
    
    print( "Getting venues of district {} with lat: {} and lng: {}".format(name, lat, lng))
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={},{}&radius={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        100000, 
        200,
        '4bf58dd8d48988d104941735')

    # make the GET request
    temp = requests.get(url).json()
    #print(temp)
    results = temp["response"]['venues']
    #if len(results) > 0:
    #    print('first result of {}:'.format(name))
    #    print(results[0])

    # return only relevant information for each nearby venue
    venues_list.append([(
    name,
    lat,
    lng,
    v['location']['address'] if 'address' in v['location'] else '', 
    v['name'], 
    v['location']['lat'], 
    v['location']['lng'],  
    v['categories'][0]['name'] if len(v['categories']) > 0 else '') for v in results])

nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['District', 'Latitude', 'Longitude',
              'Venue Address', 
              'Venue', 
              'Venue Latitude', 
              'Venue Longitude', 
              'Venue Category']
print(nearby_venues.shape)
nearby_venues.head(15)

Getting venues of district Badin with lat: 24.6551671 and lng: 68.8401509
Getting venues of district Dadu with lat: 26.7321366 and lng: 67.7718334
Getting venues of district Ghotki with lat: 28.00262865 and lng: 69.31411028830948
Getting venues of district Hyderabad with lat: 25.3801017 and lng: 68.3750376
Getting venues of district Jacobabad with lat: 28.2813094 and lng: 68.4364361
Getting venues of district Jamshoro with lat: 25.4007232 and lng: 68.266172
Getting venues of district Karachi Central with lat: 24.944166 and lng: 66.9784251
Getting venues of district Kashmore with lat: 28.4322915 and lng: 69.5814755
Getting venues of district Khairpur with lat: 27.5295402 and lng: 68.7634109
Getting venues of district Larkana with lat: 27.5564798 and lng: 68.2101509
Getting venues of district Matiari with lat: 25.5971858 and lng: 68.4454874
Getting venues of district Mirpur Khas with lat: 25.5262817 and lng: 69.0110617
Getting venues of district Naushahro Feroze with lat: 26.8491233 and 

Unnamed: 0,District,Latitude,Longitude,Venue Address,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Badin,24.655167,68.840151,,Afzal medical centre,25.401089,68.321594,Hospital
1,Badin,24.655167,68.840151,,Saddar Doctor's Line,25.398275,68.36851,Medical Center
2,Badin,24.655167,68.840151,,Wali Bhai Rajputana Hospital,25.417656,68.345596,Hospital
3,Badin,24.655167,68.840151,Near Tower Market,Liaquat University Hospital,25.403516,68.367846,Hospital
4,Badin,24.655167,68.840151,LUMHS Colloney,Civil Hospital Jamshoro,25.431655,68.276126,Hospital
5,Badin,24.655167,68.840151,,Dr Azam Clinic,25.402332,68.375298,Doctor's Office
6,Badin,24.655167,68.840151,Latifabad #6,Hilal-e-Ahmer Hospital,25.365412,68.352836,Hospital
7,Badin,24.655167,68.840151,,Civil Hospital Hyderabad,25.401903,68.367101,Hospital
8,Badin,24.655167,68.840151,Near CTS Coaching Centre,G.G Jhagrani Medicare,25.400939,68.37007,Medical Center
9,Badin,24.655167,68.840151,Autobhan,Dr Safdar,25.388852,68.349872,Hospital


In [69]:
#nearby_venues.groupby(['District', 'Venue Category']).agg({'Venue': 'count', 'Venue Distance': ['min', 'max']})
nearby_venues.groupby('District').agg({'Venue': 'count', 'Venue Category': 'nunique'})

Unnamed: 0_level_0,Venue,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1
Badin,11,3
Dadu,5,3
Ghotki,13,4
Hyderabad,13,3
Jacobabad,9,4
Jamshoro,47,6
Karachi Central,49,6
Karachi East,49,6
Karachi South,49,6
Karachi West,49,6


# Appendix C

In [30]:
# lets calculate the distance
from math import sin, cos, sqrt, atan2, radians

def calc_dist(x):

    # approximate radius of earth in km
    R = 6373.0

    lat1 = radians(x[0])
    lng1 = radians(x[1])
    lat2 = radians(x[2])
    lng2 = radians(x[3])

    dlon = lng2 - lng1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    return R * c


In [79]:
#nearby_venues['distance'] = nearby_venues[['Latitude','Longitude','Venue Latitude','Venue Longitude']].apply(calc_dist,axis=1)
nearby_venues['Sub10K'] = np.where(nearby_venues['distance'] <= 10.0, 1, 0)
nearby_venues['Sub20K'] = np.where(nearby_venues['distance'] <= 20.0, 1, 0)
nearby_venues['Sub50K'] = np.where(nearby_venues['distance'] <= 50.0, 1, 0)
#columns = ['Sub10K', 'Sub20K', 'Sub50K']
#nearby_venues.drop(columns, inplace=True, axis=1)
nearby_venues.head(30)

Unnamed: 0,District,Latitude,Longitude,Venue Address,Venue,Venue Latitude,Venue Longitude,Venue Category,distance,Sub10K,Sub20K,Sub50K
0,Badin,24.655167,68.840151,,Afzal medical centre,25.401089,68.321594,Hospital,98.056911,0,0,0
1,Badin,24.655167,68.840151,,Saddar Doctor's Line,25.398275,68.36851,Medical Center,95.349407,0,0,0
2,Badin,24.655167,68.840151,,Wali Bhai Rajputana Hospital,25.417656,68.345596,Hospital,98.371765,0,0,0
3,Badin,24.655167,68.840151,Near Tower Market,Liaquat University Hospital,25.403516,68.367846,Hospital,95.887845,0,0,0
4,Badin,24.655167,68.840151,LUMHS Colloney,Civil Hospital Jamshoro,25.431655,68.276126,Hospital,103.392746,0,0,0
5,Badin,24.655167,68.840151,,Dr Azam Clinic,25.402332,68.375298,Doctor's Office,95.402671,0,0,0
6,Badin,24.655167,68.840151,Latifabad #6,Hilal-e-Ahmer Hospital,25.365412,68.352836,Hospital,93.026424,0,0,0
7,Badin,24.655167,68.840151,,Civil Hospital Hyderabad,25.401903,68.367101,Hospital,95.769651,0,0,0
8,Badin,24.655167,68.840151,Near CTS Coaching Centre,G.G Jhagrani Medicare,25.400939,68.37007,Medical Center,95.528016,0,0,0
9,Badin,24.655167,68.840151,Autobhan,Dr Safdar,25.388852,68.349872,Hospital,95.402497,0,0,0


In [152]:
nearby_venues.groupby('District').sum()[['Sub10K', 'Sub20K']].reset_index().sort_values(by=['Sub10K', 'Sub20K'], ascending=False).reset_index()

Unnamed: 0,index,District,Sub10K,Sub20K
0,7,Karachi East,43,49
1,12,Korangi,27,49
2,8,Karachi South,15,45
3,3,Hyderabad,10,11
4,14,Malir,7,47
5,6,Karachi Central,5,47
6,5,Jamshoro,5,9
7,11,Khairpur,4,4
8,9,Karachi West,2,45
9,13,Larkana,2,2


In [149]:
countdf = nearby_venues.groupby('District').count()[['Venue']].reset_index()
mergecountdf = df[['District', 'lat', 'lng']]
mergecountdf = mergecountdf.join(countdf.set_index('District'), on='District')
mergecountdf.fillna(0, inplace=True)
mergecountdf = mergecountdf.sort_values(by='Venue').reset_index()
mergecountdf

Unnamed: 0,index,District,lat,lng,Venue
0,22,Umarkot,25.36553,69.740126,0.0
1,20,Tharparkar,24.943186,70.241012,0.0
2,1,Dadu,26.732137,67.771833,5.0
3,7,Kashmore,28.432292,69.581475,5.0
4,14,Qambar,27.589243,67.999675,9.0
5,4,Jacobabad,28.281309,68.436436,9.0
6,8,Khairpur,27.52954,68.763411,9.0
7,9,Larkana,27.55648,68.210151,9.0
8,17,Sukkur,27.696188,68.858875,9.0
9,16,Shikarpur,27.957798,68.646551,9.0


In [150]:
mergecountdf.sort_values(by='Venue', ascending=False)

Unnamed: 0,index,District,lat,lng,Venue
28,28,Malir,24.895919,67.196919,49.0
27,26,Karachi West,24.969503,66.959243,49.0
26,25,Karachi South,24.844128,66.980173,49.0
25,24,Karachi East,24.870894,67.046654,49.0
24,23,Sujawal,24.603377,68.0789,49.0
23,21,Thatta,24.7469,67.924028,49.0
22,6,Karachi Central,24.944166,66.978425,49.0
21,27,Korangi,24.826819,67.129813,49.0
20,5,Jamshoro,25.400723,68.266172,47.0
19,13,Nawabshah,26.245292,68.404023,14.0


In [93]:
grouped = nearby_venues.groupby(['District','Venue Category']).agg({'Venue': 'count', 'Sub10K': 'sum', 'Sub20K': 'sum', 'Sub50K': 'sum'}).reset_index()
print(grouped.columns)
grouped.head()

Index(['District', 'Venue Category', 'Venue', 'Sub10K', 'Sub20K', 'Sub50K'], dtype='object')


Unnamed: 0,District,Venue Category,Venue,Sub10K,Sub20K,Sub50K
0,Badin,Doctor's Office,2,0,0,0
1,Badin,Hospital,7,0,0,0
2,Badin,Medical Center,2,0,0,0
3,Dadu,Doctor's Office,1,0,1,1
4,Dadu,Hospital,2,0,0,0


In [94]:
pivoted = grouped.pivot(index='District', columns='Venue Category').fillna(0).reset_index()
print(pivoted.shape)
print(pivoted.columns)
pivoted.head()

(27, 29)
MultiIndex(levels=[['Venue', 'Sub10K', 'Sub20K', 'Sub50K', 'District'], ['Dentist's Office', 'Doctor's Office', 'Emergency Room', 'Eye Doctor', 'Hospital', 'Medical Center', 'Medical Lab', '']],
           codes=[[4, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3], [7, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]],
           names=[None, 'Venue Category'])


Unnamed: 0_level_0,District,Venue,Venue,Venue,Venue,Venue,Venue,Venue,Sub10K,Sub10K,...,Sub20K,Sub20K,Sub20K,Sub50K,Sub50K,Sub50K,Sub50K,Sub50K,Sub50K,Sub50K
Venue Category,Unnamed: 1_level_1,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Medical Center,Medical Lab,Dentist's Office,Doctor's Office,...,Hospital,Medical Center,Medical Lab,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Medical Center,Medical Lab
0,Badin,0.0,2.0,0.0,0.0,7.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Dadu,0.0,1.0,0.0,0.0,2.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,Ghotki,0.0,2.0,1.0,0.0,7.0,3.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,Hyderabad,0.0,2.0,0.0,0.0,8.0,3.0,0.0,0.0,2.0,...,7.0,2.0,0.0,0.0,2.0,0.0,0.0,7.0,2.0,0.0
4,Jacobabad,0.0,2.0,1.0,0.0,4.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [82]:
# lets create a new dataframe and record number of centers of each type
onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")
# add district column back to dataframe
onehot['District'] = nearby_venues['District'] 

# move district column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

#nearby_grouped = onehot.groupby('District').sum().reset_index()
#nearby_grouped

Unnamed: 0,District,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Medical Center,Medical Lab
0,Badin,0,0,0,0,1,0,0
1,Badin,0,0,0,0,0,1,0
2,Badin,0,0,0,0,1,0,0
3,Badin,0,0,0,0,1,0,0
4,Badin,0,0,0,0,1,0,0


In [125]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

kclusters = 4

clustering = pivoted.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 2, 3, 0, 3, 3, 1, 1, 0, 1, 3, 1, 1, 1, 1, 1, 1, 1,
       2, 1, 1, 1, 2], dtype=int32)

In [111]:
#pivoted.columns = pivoted.columns.map('|'.join).str.strip('|')

pivoted.head()

Unnamed: 0,Cluster Labels,District,Venue|Dentist's Office,Venue|Doctor's Office,Venue|Emergency Room,Venue|Eye Doctor,Venue|Hospital,Venue|Medical Center,Venue|Medical Lab,Sub10K|Dentist's Office,...,Sub20K|Hospital,Sub20K|Medical Center,Sub20K|Medical Lab,Sub50K|Dentist's Office,Sub50K|Doctor's Office,Sub50K|Emergency Room,Sub50K|Eye Doctor,Sub50K|Hospital,Sub50K|Medical Center,Sub50K|Medical Lab
0,1,Badin,0.0,2.0,0.0,0.0,7.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,Dadu,0.0,1.0,0.0,0.0,2.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,1,Ghotki,0.0,2.0,1.0,0.0,7.0,3.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,1,Hyderabad,0.0,2.0,0.0,0.0,8.0,3.0,0.0,0.0,...,7.0,2.0,0.0,0.0,2.0,0.0,0.0,7.0,2.0,0.0
4,1,Jacobabad,0.0,2.0,1.0,0.0,4.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [126]:
pivoted.drop('Cluster Labels', inplace=True, axis=1)
pivoted.insert(0, 'Cluster Labels', kmeans.labels_)

merged = df[['District', 'lat', 'lng']]
#merged.append('Cluster Labels', kmeans.labels_)
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
merged = merged.join(pivoted.set_index('District'), on='District')

# change na to something else
merged['Cluster Labels'].fillna(1, inplace=True)

print(merged.columns)
merged # check the last columns!

Index(['District', 'lat', 'lng', 'Cluster Labels', 'Venue|Dentist's Office',
       'Venue|Doctor's Office', 'Venue|Emergency Room', 'Venue|Eye Doctor',
       'Venue|Hospital', 'Venue|Medical Center', 'Venue|Medical Lab',
       'Sub10K|Dentist's Office', 'Sub10K|Doctor's Office',
       'Sub10K|Emergency Room', 'Sub10K|Eye Doctor', 'Sub10K|Hospital',
       'Sub10K|Medical Center', 'Sub10K|Medical Lab',
       'Sub20K|Dentist's Office', 'Sub20K|Doctor's Office',
       'Sub20K|Emergency Room', 'Sub20K|Eye Doctor', 'Sub20K|Hospital',
       'Sub20K|Medical Center', 'Sub20K|Medical Lab',
       'Sub50K|Dentist's Office', 'Sub50K|Doctor's Office',
       'Sub50K|Emergency Room', 'Sub50K|Eye Doctor', 'Sub50K|Hospital',
       'Sub50K|Medical Center', 'Sub50K|Medical Lab'],
      dtype='object')


Unnamed: 0,District,lat,lng,Cluster Labels,Venue|Dentist's Office,Venue|Doctor's Office,Venue|Emergency Room,Venue|Eye Doctor,Venue|Hospital,Venue|Medical Center,...,Sub20K|Hospital,Sub20K|Medical Center,Sub20K|Medical Lab,Sub50K|Dentist's Office,Sub50K|Doctor's Office,Sub50K|Emergency Room,Sub50K|Eye Doctor,Sub50K|Hospital,Sub50K|Medical Center,Sub50K|Medical Lab
0,Badin,24.655167,68.840151,1.0,0.0,2.0,0.0,0.0,7.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Dadu,26.732137,67.771833,1.0,0.0,1.0,0.0,0.0,2.0,2.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,Ghotki,28.002629,69.31411,1.0,0.0,2.0,1.0,0.0,7.0,3.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,Hyderabad,25.380102,68.375038,1.0,0.0,2.0,0.0,0.0,8.0,3.0,...,7.0,2.0,0.0,0.0,2.0,0.0,0.0,7.0,2.0,0.0
4,Jacobabad,28.281309,68.436436,1.0,0.0,2.0,1.0,0.0,4.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Jamshoro,25.400723,68.266172,2.0,5.0,7.0,0.0,1.0,23.0,9.0,...,7.0,2.0,0.0,0.0,0.0,0.0,0.0,7.0,2.0,0.0
6,Karachi Central,24.944166,66.978425,3.0,5.0,6.0,1.0,0.0,33.0,3.0,...,31.0,3.0,1.0,5.0,6.0,1.0,0.0,33.0,3.0,1.0
7,Kashmore,28.432292,69.581475,1.0,0.0,1.0,0.0,0.0,3.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
8,Khairpur,27.52954,68.763411,1.0,0.0,2.0,1.0,0.0,4.0,2.0,...,2.0,0.0,0.0,0.0,1.0,1.0,0.0,3.0,1.0,0.0
9,Larkana,27.55648,68.210151,1.0,0.0,2.0,1.0,0.0,4.0,2.0,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0


In [156]:
merged[merged['Cluster Labels'] != 1.0][['District', 'Cluster Labels', 'Sub10K|Hospital', 'Sub20K|Hospital', 'Venue|Hospital']]

Unnamed: 0,District,Cluster Labels,Sub10K|Hospital,Sub20K|Hospital,Venue|Hospital
5,Jamshoro,2.0,5.0,7.0,23.0
6,Karachi Central,3.0,4.0,31.0,33.0
21,Thatta,2.0,0.0,0.0,33.0
23,Sujawal,2.0,0.0,0.0,33.0
24,Karachi East,0.0,27.0,33.0,33.0
25,Karachi South,3.0,11.0,29.0,33.0
26,Karachi West,3.0,1.0,30.0,33.0
27,Korangi,0.0,17.0,33.0,33.0
28,Malir,3.0,5.0,31.0,33.0


In [101]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

    altair:  4.1.0-py_1 conda-forge
    branca:  0.4.1-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
branca-0.4.1         | 26 KB     | #####

In [127]:
import matplotlib.cm as cm
import matplotlib.colors as colors


map_clusters = folium.Map(location=[latitude, longitude], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters+1)
ys = [i + x + (i*x)**2 for i in range(kclusters+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['lat'], merged['lng'], merged['District'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters