# Capstone Project - Where to open a health facility in a developing country

## Introduction & Background
In a developing country like Pakistan, health facilities are not available and at times people have to travel a great deal to see a medical practitioner or expert pertaining to their needs.  With scarce resources, it is imporatnt to make a right decision when opening a health facility to ensure maximum accessibility and coverage.
From a business point of view, one would open a facility where population density is higher and there is little competition.  In contrast, from an accessibility point of view, the facility should be located where there is little or no alternative available nearby.
Therefore, the factors to consider are:
* Population in the vicinity (within x miles/kilometers)
* Presense of facilities in the vicinity

## Data
As data available is not easily available and need to be manually gathered or using other APIs, the scope is restricted to certain or ALL districts of Sindh Provice.  Pakistan has four provinces (and other administrative units).  Sindh Province has 29 districts.  Depending on the time, some or all of 29 districts data would be used for the project.

FourSquare API would be used to explore presense of Medical centers; It is one of the parent categories and include various sub-categories.  Geopy would be used to get geo coordinates of districts.


### Initial Data Exploration
Data about districts of Sindh province was found here: https://en.wikipedia.org/wiki/Districts_of_Sindh

Geopy was used for coordinates; Some of them were incorrect and were manually corrected.  Work done is provided in **Appendix A** below.

Foursquare API was used to search medical centers nearby.  Summary of results are provided in **Appendix B** below.

## Limitations
The results of the work depend on the accuracy of foursquare API data.


### Request
**Not sure whether the above level of detail is sufficient, please feel free to comment whether any of the above need more elaboration**

# Appendix A

In [2]:
import pandas as pd
import numpy as np

In [3]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [4]:
CLIENT_ID = '5LIOP4P3K4JN335DOX15LH1WRJFQ4QFRK0HU1QOMRJLA4DSE' # your Foursquare ID
CLIENT_SECRET = 'TRMRMBB0OYOLXKWJSHVA4U11LCHMHKVXDILGBE02FD5WY3RA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius
import requests # library to handle requests

In [5]:
# Webpage url                                                                                                               
url = 'https://en.wikipedia.org/wiki/Districts_of_Sindh'

# Extract tables
dfs = pd.read_html(url)

df = dfs[0]
df.dropna(how='all',inplace=True)
print(df.shape)
#print(df.columns)
#print(len(df.index))

(29, 7)


In [6]:
# fix district names
df.loc[(df.District == 'Naushahro Firoze'), 'District'] = 'Naushahro Feroze'
df.loc[(df.District == 'Shaheed Benazirabad (formerly Nawabshah)'), 'District'] = 'Nawabshah'
df.loc[(df.District == 'Umerkot'), 'District'] = 'Umarkot'
df.loc[(df.District == 'Qambar Shahdadkot'), 'District'] = 'Qambar'

In [7]:
# let's see whether we can get coordinates from geopy

df['lng'] = 0.0
df['lat'] = 0.0
df.columns

for ind in range(len(df.index)):
    #print(ind)
    name = df.at[ind, 'District']
    address = name +', Sindh, Pakistan'

    geolocator = Nominatim(user_agent="pakistan")
    location = geolocator.geocode(address)
    if location is None:
        print('The geographical coordinate of District {} are not found!'.format(name))
    else:
        latitude = location.latitude
        longitude = location.longitude
        #print('The geograpical coordinate of District {} are {}, {}.'.format(name, latitude, longitude))
        df.at[ind, 'lng'] = longitude
        df.at[ind, 'lat'] = latitude

df

Unnamed: 0,Map,Sr. No.,District,Headquarters,Area (km²),Population (in 2017),Density (people/km²),lng,lat
0,,1,Badin,Badin,6470.0,1804516.0,279.0,68.840151,24.655167
1,,2,Dadu,Dadu,8034.0,1550266.0,193.0,67.771833,26.732137
2,,3,Ghotki,Mirpur Mathelo,6506.0,1647239.0,253.0,69.31411,28.002629
3,,4,Hyderabad,Hyderabad,1022.0,2201079.0,2155.0,68.375038,25.380102
4,,5,Jacobabad,Jacobabad,2771.0,1006297.0,363.0,68.436436,28.281309
5,,6,Jamshoro,Jamshoro,11250.0,993142.0,88.0,68.266172,25.400723
6,,7,Karachi Central,Karachi,62.0,2972639.0,48336.0,67.184777,25.14469
7,,8,Kashmore,Kandhkot,2551.0,1089169.0,427.0,69.581475,28.432292
8,,9,Khairpur,Khairpur,15925.0,2405523.0,151.0,68.763411,27.52954
9,,10,Larkana,Larkana,1906.0,1524391.0,800.0,68.210151,27.55648


In [8]:
# fix incorrect longitudes and latitudes
df.loc[(df.District == 'Sanghar'), 'lng'] = 68.9316359
df.loc[(df.District == 'Sanghar'), 'lat'] = 26.0455708
df

Unnamed: 0,Map,Sr. No.,District,Headquarters,Area (km²),Population (in 2017),Density (people/km²),lng,lat
0,,1,Badin,Badin,6470.0,1804516.0,279.0,68.840151,24.655167
1,,2,Dadu,Dadu,8034.0,1550266.0,193.0,67.771833,26.732137
2,,3,Ghotki,Mirpur Mathelo,6506.0,1647239.0,253.0,69.31411,28.002629
3,,4,Hyderabad,Hyderabad,1022.0,2201079.0,2155.0,68.375038,25.380102
4,,5,Jacobabad,Jacobabad,2771.0,1006297.0,363.0,68.436436,28.281309
5,,6,Jamshoro,Jamshoro,11250.0,993142.0,88.0,68.266172,25.400723
6,,7,Karachi Central,Karachi,62.0,2972639.0,48336.0,67.184777,25.14469
7,,8,Kashmore,Kandhkot,2551.0,1089169.0,427.0,69.581475,28.432292
8,,9,Khairpur,Khairpur,15925.0,2405523.0,151.0,68.763411,27.52954
9,,10,Larkana,Larkana,1906.0,1524391.0,800.0,68.210151,27.55648


# Appendix B

In [9]:
venues_list=[]

for name, lng, lat in zip(df['District'], df['lng'], df['lat']):
    
    print( "Getting venues of district {} with lat: {} and lng: {}".format(name, lat, lng))
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={},{}&radius={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        15000, 
        200,
        '4bf58dd8d48988d104941735')

    # make the GET request
    results = requests.get(url).json()["response"]['venues']
    print(len(results))

    # return only relevant information for each nearby venue
    venues_list.append([(
    name, 
    v['location']['distance'] if 'distance' in v['location'] else '', 
    v['location']['address'] if 'address' in v['location'] else '', 
    v['name'], 
    v['location']['lat'], 
    v['location']['lng'],  
    v['categories'][0]['name'] if len(v['categories']) > 0 else '') for v in results])

nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['District', 
              'Venue Distance', 
              'Venue Address', 
              'Venue', 
              'Venue Latitude', 
              'Venue Longitude', 
              'Venue Category']
nearby_venues.head(5)

Getting venues of district Badin with lat: 24.6551671 and lng: 68.8401509
0
Getting venues of district Dadu with lat: 26.7321366 and lng: 67.7718334
0
Getting venues of district Ghotki with lat: 28.00262865 and lng: 69.31411028830948
0
Getting venues of district Hyderabad with lat: 25.3801017 and lng: 68.3750376
11
Getting venues of district Jacobabad with lat: 28.2813094 and lng: 68.4364361
0
Getting venues of district Jamshoro with lat: 25.4007232 and lng: 68.266172
11
Getting venues of district Karachi Central with lat: 25.1446897 and lng: 67.1847767315734
4
Getting venues of district Kashmore with lat: 28.4322915 and lng: 69.5814755
0
Getting venues of district Khairpur with lat: 27.5295402 and lng: 68.7634109
4
Getting venues of district Larkana with lat: 27.5564798 and lng: 68.2101509
2
Getting venues of district Matiari with lat: 25.5971858 and lng: 68.4454874
0
Getting venues of district Mirpur Khas with lat: 25.5262817 and lng: 69.0110617
0
Getting venues of district Naushahro

Unnamed: 0,District,Venue Distance,Venue Address,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hyderabad,,Saddar,Doctor's Line,25.401572,68.368609,Doctor's Office
1,Hyderabad,,,Afzal medical centre,25.401089,68.321594,Hospital
2,Hyderabad,,,Civil Hospital Hyderabad,25.401903,68.367101,Hospital
3,Hyderabad,,,Wali Bhai Rajputana Hospital,25.417656,68.345596,Hospital
4,Hyderabad,,Autobhan,Dr Safdar,25.388852,68.349872,Hospital


In [10]:
#nearby_venues.groupby(['District', 'Venue Category']).agg({'Venue': 'count', 'Venue Distance': ['min', 'max']})
nearby_venues.groupby('District').agg({'Venue': 'count', 'Venue Category': 'nunique'})

Unnamed: 0_level_0,Venue,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyderabad,11,3
Jamshoro,11,3
Karachi Central,4,3
Karachi East,4,3
Karachi South,4,3
Karachi West,4,3
Khairpur,4,3
Korangi,49,6
Larkana,2,2
Malir,49,8
