# Data Science Capstone Project - The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

When the stakeholder is planning to build a hospital in order to help people in emergency situations, it is better to computer distance between existing hospitals and find the areas with no or less hospitals. So, the purpose of this project is to find some optimal locations for hospitals.

Particularly, we are interested in locations of **Banglore, India** and we will focus on the areas with less or no hospitals. We are focussing on areas with **no hospitals** or having **less than 2 hospitals** within 1 km and **close to city center** as possible

We will use data analysis and clustering algorithm to generate the most optimal neighborhoods based on this criteria and suggest 5 neighbourhoods to the stakeholder as a result

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing hospitals in the neighborhood
* distance between hospitals in the neighborhood

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of hospitals and their type and location in every neighborhood will be obtained using **Foursquare API**

### Neighborhood Candidates

First, we will find the latitude & longitude coordinates of shivaji nagar, Bangalore, India (center part of Bangalore). Then, we will create a grid of cells covering our area of interest which is aprox. 5*5 killometers centered around city center.

Let's first find the latitude & longitude of city center, using specific, well known address and Google Maps geocoding API.

In [292]:
# install dependencies

# !pip install folium
# !pip install shapely
# !pip install pyproj

In [293]:
# The code was removed by Watson Studio for sharing.

In [294]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
location_center_address = 'shivaji Nagar, Bengaluru'
location_center = get_coordinates(google_api_key, location_center_address)
print('Coordinate of {}: {}'.format(location_center_address, location_center))

Coordinate of shivaji Nagar, Bengaluru: [12.9856503, 77.60569269999999]


Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~24km from shivaji nagar. Our neighborhoods will be defined as circular areas with a radius of 1200 meters, so our neighborhood centers will be 2400 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [295]:
radius = 1200
cover_distance = 24
cover_distance_in_meters = cover_distance * 1000

In [296]:
import shapely.geometry

import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Location center longitude={}, latitude={}'.format(location_center[1], location_center[0]))
x, y = lonlat_to_xy(location_center[1], location_center[0])
print('Location center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Location center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Location center longitude=77.60569269999999, latitude=12.9856503
Location center UTM X=782669.2908568365, Y=1436993.904563177
Location center longitude=77.60569269999999, latitude=12.985650299999998


In [87]:
location_center_x, location_center_y = lonlat_to_xy(location_center[1], location_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = location_center_x - cover_distance_in_meters
x_step = 2*radius
y_min = location_center_y - cover_distance_in_meters - (int(21/k)*k*2*radius - 2*cover_distance_in_meters)/2
y_step = 2*radius * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = radius if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(location_center_x, location_center_y, x, y)
        if (distance_from_center <= (cover_distance_in_meters+1)):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [91]:
import folium

map_location = folium.Map(location=location_center, zoom_start=11)
folium.Marker(location_center, popup=location_center_address).add_to(map_location)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=radius, color='blue', fill=False).add_to(map_location)
map_location

In [92]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, location_center[0], location_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(location_center[0], location_center[1], addr))

Reverse geocoding check
-----------------------
Address of [12.9856503, 77.60569269999999] is: New Market Road, Sulthangunta, Shivaji Nagar, Bengaluru, Karnataka 560051, India


In [93]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', India', '') # country name is not needed in address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [94]:
addresses[1:20]

['Unnamed Road, Bengaluru, Karnataka 560083',
 'Unnamed Road, Bengaluru, Karnataka 562112',
 'Unnamed Road, Bukkasagara, Karnataka 560105',
 'Unnamed Road, Jigani, Karnataka 560105',
 'Shagufta, Plot No. 322, Bommasandra Jigani Link Rd, Jigani, Karnataka 560099',
 'HENNAGARA Rd, Rajapura, Karnataka 562106',
 'kaggalipura,kanakapura main road, opp.masjid,near muthoot finance, bangalore -560082, Bengaluru, Karnataka 560082',
 'Gulakamale Village, Near Kaggalipura 17th Mile Kanakapura Road, Post Taralu, Bengaluru, Karnataka 560082',
 'Kaggalipura Road, Bengaluru, Karnataka 560083',
 'Bannerghatta Rd, Bengaluru, Karnataka 560083',
 'kaleshwari, Karnataka 560105',
 'Bannerghatta Main Road, Jigani, Bangalore, Bengaluru, Karnataka 560105',
 '67 nanjapura village jigani hobli meenakshi meadows an email taluk, Nanjapura, Karnataka 560105',
 'Bommasandra, Bangalor, No. 86, Link road, KIADB Industrial Area, Jigani, Karnataka 560105',
 'Hennagara Main Rd, Omkar Nagar, Bengaluru, Karnataka 562106',

In [95]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y
0,"Unnamed Road, Bengaluru, Karnataka 560082",23969.981227,12.779765,77.537288,775469.290857,1414131.0
1,"Unnamed Road, Bengaluru, Karnataka 560083",23361.506801,12.779551,77.559378,777869.290857,1414131.0
2,"Unnamed Road, Bengaluru, Karnataka 562112",22988.692873,12.779336,77.581468,780269.290857,1414131.0
3,"Unnamed Road, Bukkasagara, Karnataka 560105",22863.07066,12.779119,77.603558,782669.290857,1414131.0
4,"Unnamed Road, Jigani, Karnataka 560105",22988.692873,12.7789,77.625647,785069.290857,1414131.0
5,"Shagufta, Plot No. 322, Bommasandra Jigani Lin...",23361.506801,12.778679,77.647736,787469.290857,1414131.0
6,"HENNAGARA Rd, Rajapura, Karnataka 562106",23969.981227,12.778457,77.669824,789869.290857,1414131.0
7,"kaggalipura,kanakapura main road, opp.masjid,n...",23423.065555,12.798859,77.504336,771869.290857,1416209.0
8,"Gulakamale Village, Near Kaggalipura 17th Mile...",22417.850031,12.798648,77.526429,774269.290857,1416209.0
9,"Kaggalipura Road, Bengaluru, Karnataka 560083",21633.307653,12.798435,77.548521,776669.290857,1416209.0


In [116]:
df_locations.to_pickle('./locations.pkl')    

In [117]:
# The code was removed by Watson Studio for sharing.

In [118]:
category = '4bf58dd8d48988d104941735' # 'Root' category for all medicine related venues

# def is_hospital(categories):
#     hospital_words = ['hospital', 'clinic']
#     hospital = False
#     specific = False
#     for c in categories:
#         category_name = c[0].lower()
#         category_id = c[1]
#         for r in hospital_words:
#             if r in category_name:
#                 hospital = True
#         if 'shop' in category_name:
#             hospital = False
#     return hospital

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius, limit=1000):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(client_id, client_secret, version, lat, lon, category, radius, limit)

    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(
                item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'],
                   item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']
                  ) for item in results]
    except:
        venues = []
    return venues

In [119]:
# We will now go over our neighborhood locations and get nearby hospitals and also maintain a dictionary for all found hospitals

import pickle

def get_hospitals(lats, lons):
    hospitals = {}
    location_hospitals = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=1300 to meke sure we have overlaps/full coverage so we don't miss any hospital (we're using dictionaries to remove any dup
        venues = get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, 1300)
        area_hospitals = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
            hospital = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, x, y)
            if venue_distance<=radius:
                area_hospitals.append(hospital)
            hospitals[venue_id] = hospital
        location_hospitals.append(area_hospitals)
        print(' .', end='')
    print(' done.')
    return hospitals, location_hospitals

# Try to load from local file system in case we did this before
hospitals = {}
location_hospitals = []
loaded = False
try:
    with open('hospitals.pkl', 'rb') as f:
        hospitals = pickle.load(f)
    with open('location_hospitals.pkl', 'rb') as f:
        location_hospitals = pickle.load(f)
    print('Hospitals data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    hospitals, location_hospitals = get_hospitals(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('hospitals.pkl', 'wb') as f:
        pickle.dump(hospitals, f)
    with open('location_hospitals.pkl', 'wb') as f:
        pickle.dump(location_hospitals, f)
        

Hospitals data loaded.
Obtaining venues around candidate locations:0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .2
[('4fa4dff8e4b063f121b3bd27', 'cadabams', [('Medical Center', '4bf58dd8d48988d104941735')], (12.799620608872443, 77.52870250212564), 'India', 269), ('5b44984612c8f0002cea3316', "Cadabam's Amitha - Centre for Short and Long Term Rehabilitation Care", [('Rehab Center', '56aa371be4b08b9a8d57351d')], (12.795396793666296, 77.52622604370117), 'Gulakamale Village, Near Kaggalipura, 17th Mile Post Taralu, Kanakapura Rd, Bengaluru, Karnataka 560082, Bangalore 560082, Karnātaka, India', 362)]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .0
[]
 .3
[('526d4a5f11d24e478cee56fb', 'Sri Sri Ayurveda Hospital', [('Medical Center', '4bf58dd8d48988d104941735')], (12.822264693544481, 77.5215019147042), '21st km Kanakapura Road, P. O. Udayapura,, Bangalore 560082, Karnātaka, India', 832), ('4e892c2f02d5b5c2e88cf040', 'Sri Sri Panchkarma', [("Doctor's Office", '4bf5

In [122]:
import numpy as np

print('Total number of hospitals:', len(hospitals))
print('Average number of hospitals in neighborhood:', np.array([len(r) for r in location_hospitals]).mean())

Total number of hospitals: 571
Average number of hospitals in neighborhood: 1.39285714286


In [123]:
print('List of all hospitals')
print('-----------------------')
for r in list(hospitals.values())[0:10]:
    print(r)
print('...')
print('Total:', len(hospitals))

List of all hospitals
-----------------------
('5c19fd2c3b8307002c64932a', "Ratkal's Rescue Urology Center", 12.921384, 77.48174, 'No 458, 6th main road, opposite kalikamba temple, kengeri Satellite Town, Bangalore 560060, Karnātaka, India', 1088, 769284.0255233254, 1429746.6078434568)
('4d061c8d28926ea8bee473c2', 'Apollo Clinic', 12.956195880819386, 77.70551601419882, 'Bangalore, Karnātaka, India', 1017, 793540.0061090232, 1433846.308695319)
('58edfa692980db13cf585243', 'Wishing Well Health Care Pvt. Ltd', 12.9991939, 77.59793400000001, '19/1A 1st Main Rd, Bangalore 560046, Karnātaka, India', 1040, 781811.8000027211, 1438484.4339257712)
('58480b22dfa6ff4be6f86c06', 'Smiles Dental Clinic', 13.000308, 77.617922, 'India', 1175, 783980.0977506011, 1438629.9831477357)
('4f94cface4b01143d31e7faa', 'PD Hinduja Sindhi Hospital', 12.964624439179051, 77.59294266533912, 'Sindhi Hospital Road, (12th A Cross, Behind Woodlands Hotel), Sampangirama Nagar 560027, Karnātaka, India', 1072, 781308.95622

In [125]:
print('Hospitals around location')
print('---------------------------')
for i in range(0, 50):
    rs = location_hospitals[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Hospitals around location {}: {}'.format(i+1, names))

Hospitals around location
---------------------------
Hospitals around location 1: 
Hospitals around location 2: 
Hospitals around location 3: 
Hospitals around location 4: 
Hospitals around location 5: 
Hospitals around location 6: 
Hospitals around location 7: 
Hospitals around location 8: 
Hospitals around location 9: cadabams, Cadabam's Amitha - Centre for Short and Long Term Rehabilitation Care
Hospitals around location 10: 
Hospitals around location 11: 
Hospitals around location 12: 
Hospitals around location 13: 
Hospitals around location 14: 
Hospitals around location 15: 
Hospitals around location 16: 
Hospitals around location 17: 
Hospitals around location 18: 
Hospitals around location 19: 
Hospitals around location 20: Sri Sri Ayurveda Hospital, Ashram Clinic
Hospitals around location 21: 
Hospitals around location 22: 
Hospitals around location 23: Bannerghatta Road Pet Clinic
Hospitals around location 24: 
Hospitals around location 25: 
Hospitals around location 26: 
Ho

In [126]:
map_location = folium.Map(location=location_center, zoom_start=11)
folium.Marker(location_center, popup=location_center_address).add_to(map_location)
for hosp in hospitals.values():
    lat = hosp[2]; lon = hosp[3]
    color = 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_location)
map_location

In [128]:
location_hospitals_count = [len(res) for res in location_hospitals]

df_locations['Hospitals in area'] = location_hospitals_count

print('Average number of hospitals in every area with radius=1200m:', np.array(location_hospitals_count).mean())

df_locations.head(10)

Average number of hospitals in every area with radius=1200m: 1.39285714286


Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y,Hospitals in area
0,"Unnamed Road, Bengaluru, Karnataka 560082",23969.981227,12.779765,77.537288,775469.290857,1414131.0,0
1,"Unnamed Road, Bengaluru, Karnataka 560083",23361.506801,12.779551,77.559378,777869.290857,1414131.0,0
2,"Unnamed Road, Bengaluru, Karnataka 562112",22988.692873,12.779336,77.581468,780269.290857,1414131.0,0
3,"Unnamed Road, Bukkasagara, Karnataka 560105",22863.07066,12.779119,77.603558,782669.290857,1414131.0,0
4,"Unnamed Road, Jigani, Karnataka 560105",22988.692873,12.7789,77.625647,785069.290857,1414131.0,0
5,"Shagufta, Plot No. 322, Bommasandra Jigani Lin...",23361.506801,12.778679,77.647736,787469.290857,1414131.0,0
6,"HENNAGARA Rd, Rajapura, Karnataka 562106",23969.981227,12.778457,77.669824,789869.290857,1414131.0,0
7,"kaggalipura,kanakapura main road, opp.masjid,n...",23423.065555,12.798859,77.504336,771869.290857,1416209.0,0
8,"Gulakamale Village, Near Kaggalipura 17th Mile...",22417.850031,12.798648,77.526429,774269.290857,1416209.0,2
9,"Kaggalipura Road, Bengaluru, Karnataka 560083",21633.307653,12.798435,77.548521,776669.290857,1416209.0,0


In [130]:
distances_to_hospitals = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for hosp in hospitals.values():
        res_x = hosp[6]
        res_y = hosp[7]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_hospitals.append(min_distance)

df_locations['Distance to Hospitals'] = distances_to_hospitals

df_locations.head(10)

print('Average distance to closest hospital from each area center:', df_locations['Distance to Hospitals'].mean())

Average distance to closest hospital from each area center: 2890.4462357494567


In [145]:
hospital_latlons = [[hosp[2], hosp[3]] for hosp in hospitals.values()]

from folium import plugins
from folium.plugins import HeatMap

map_location = folium.Map(location=location_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_location) #cartodbpositron cartodbdark_matter
HeatMap(hospital_latlons).add_to(map_location)
folium.Marker(location_center).add_to(map_location)
folium.Circle(location_center, radius=5000, fill=False, color='white').add_to(map_location)
folium.Circle(location_center, radius=10000, fill=False, color='white').add_to(map_location)
folium.Circle(location_center, radius=15000, fill=False, color='white').add_to(map_location)
map_location

In [228]:
roi_x_min = location_center_x - 2000
roi_y_max = location_center_y + 1000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_location = folium.Map(location=roi_center, zoom_start=12)
HeatMap(hospital_latlons).add_to(map_location)
folium.Marker(location_center).add_to(map_location)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_location)
map_location

In [285]:
roi_radius = 400
roi_distance = 10000

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 2*roi_radius
y_step = 2*roi_radius * k 
roi_y_min = roi_center_y - roi_distance

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = roi_radius if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= (roi_distance+1)):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

def count_hospitals_nearby(x, y, hospitals, radius):    
    count = 0
    for hosp in hospitals.values():
        hosp_x = hosp[6]; hosp_y = hosp[7]
        d = calc_xy_distance(x, y, hosp_x, hosp_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_hospital(x, y, hospitals):
    d_min = 100000
    for hosp in hospitals.values():
        hosp_x = hosp[6]; hosp_y = hosp[7]
        d = calc_xy_distance(x, y, hosp_x, hosp_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_hospital_counts = []
roi_hospital_distance = []

print('Generating data on location candidates... ', end='')
# roi_latitudes = latitudes
# roi_longitudes = longitudes
# roi_xs = xs
# roi_ys = ys
roi_center = location_center
for x, y in zip(roi_xs, roi_ys):
    count = count_hospitals_nearby(x, y, hospitals, 1000)
    roi_hospital_counts.append(count)
    distance = find_nearest_hospital(x, y, hospitals)
    roi_hospital_distance.append(distance)
print('done.')


375 candidate neighborhood centers generated.
Generating data on location candidates... done.


In [286]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Hospitals nearby':roi_hospital_counts,
                                 'Distance to hospitals':roi_hospital_distance})

df_roi_locations.tail(10)

Unnamed: 0,Distance to hospitals,Hospitals nearby,Latitude,Longitude,X,Y
365,650.526222,2,13.050409,77.639543,786269.290857,1444200.0
366,310.108138,3,13.050334,77.646914,787069.290857,1444200.0
367,1011.315021,0,13.050258,77.654284,787869.290857,1444200.0
368,294.42334,6,13.057152,77.591696,781069.290857,1444893.0
369,734.336719,6,13.057078,77.599068,781869.290857,1444893.0
370,466.025129,3,13.057004,77.606439,782669.290857,1444893.0
371,530.950441,1,13.05693,77.61381,783469.290857,1444893.0
372,633.83545,1,13.056855,77.621181,784269.290857,1444893.0
373,966.167665,1,13.05678,77.628553,785069.290857,1444893.0
374,1105.425068,0,13.056705,77.635924,785869.290857,1444893.0


In [287]:
good_hosp_count = np.array((df_roi_locations['Hospitals nearby']<=2))
print('Locations with no more than two hospitals nearby:', good_hosp_count.sum())

good_hosp_distance = np.array(df_roi_locations['Distance to hospitals']>=1000)
print('Locations with no hospitals within 1km:', good_hosp_distance.sum())

good_locations = np.logical_and(good_hosp_count, good_hosp_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two hospitals nearby: 181
Locations with no hospitals within 1km: 53
Locations with both conditions met: 53


In [288]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_location = folium.Map(location=roi_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_location)
HeatMap(hospital_latlons).add_to(map_location)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_location)
folium.Marker(location_center).add_to(map_location)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_location)
map_location

In [289]:
from sklearn.cluster import KMeans

number_of_clusters = 5
good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_location = folium.Map(location=location_center, zoom_start=11)
folium.TileLayer('cartodbpositron').add_to(map_location)
HeatMap(hospital_latlons).add_to(map_location)
folium.Circle(roi_center, radius=5000, color='white', fill=True, fill_opacity=0.4).add_to(map_location)
folium.Marker(location_center).add_to(map_location)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=radius, color='green', fill=True, fill_opacity=0.25).add_to(map_location) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_location)
map_location

In [290]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', India', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, location_center_x, location_center_y)
    print('{}{} => {:.1f}km from Shivaji nagar'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

4th A Cross Road, Banjara Residency, Lakeview Residency, Bengaluru, Karnataka 560043 => 7.9km from Shivaji nagar
Unnamed Road, Kempapura, Bellandur, Bengaluru, Karnataka 560017 => 8.6km from Shivaji nagar
18, MG Road, Craig Park Layout, Ashok Nagar, Bengaluru, Karnataka 560001 => 2.0km from Shivaji nagar
45/1, 10th B Cross Rd, Nagavarapalya, C V Raman Nagar, Bengaluru, Karnataka 560093 => 6.7km from Shivaji nagar
1, 1st Cross Rd, Kodichikknahalli, Bommanahalli, Bengaluru, Karnataka 560076 => 9.8km from Shivaji nagar


In [291]:
map_location = folium.Map(location=roi_center, zoom_start=11)
folium.Circle(location_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_location)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_location) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_location)
map_location

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Bangalore areas close to center with low number of hospitals or no hospitals in order to aid stakeholders in narrowing down the search for optimal location for a new hospital. Final decision of location will be taken by stake holders based on their additional needs like cost of building hospital in that area, road facilities etc..
