# 1. Introduction

Santiago de Chile is a growing city with a major increase in population during the last decades. That raises several issues, in particular healthcare problems specially in winter during the peak period of diseases.
To address this problem, the city need to allocate their resources to build more healthcare facilitates in locations where today are non existant or the only that exists are private facilities.


# 2. Business Problem

It is needed to find the best position for the next healthcare facilites that are goint to be built in Santiago and be able to attend the increased demand specially during the peak period.

#### Stakeholders 
Local goverment/authorities

#### Target Audience
Santiago population

#### Reason to solve 
If the allocation of the new facilites are correct, the quality of life of thousands of people will be improved vastly during the peak period of deceases (winter)

# 3. Data Section

#### Repositories

The idea is to gather information of Population, Latitude, Longitude and Location of healthcare facilites for each Neighborhood in Santiago de Chile. To achieve this, i'm going to use the following sites/api to gather information.

Neighborhood and Population is going to be gathered from Wikipedia https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile , Latitude and Longitude for each Neighborhood will be retrieved using geopy library and the information of the healthcare facilities from Foursquare API considering the venus in the category "Medical Center"https://developer.foursquare.com/docs/resources/categories"

#### Header information of libraries needed

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#!pip install BeautifulSoup4 --yes # uncomment this line if you dont have BeautifulSoup installes

from bs4 import BeautifulSoup

geolocator = Nominatim(user_agent="ny_explorer")

print('Libraries imported.')

Libraries imported.


#### Download and Refine Dataset
The following code does the following
1. Download the population and neighbour data
2. Like in the previous assigment, that data is cleaned up 
3. With the cleaned data, i assign for each neighbour the location data using the geopy API
4. Finally, i saved that data as csv so i can read it from the file instead of calling it everytime (this improve the speed of testing once the data is ok)

In [4]:
!wget -q -O 'santiago_data.html' https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile

with open("santiago_data.html") as html_doc:
    soup = BeautifulSoup(html_doc, 'lxml') 
    table_data = soup.find_all('table')[3] 
    table_data = [[cell.text.replace('\n', '') for cell in row("td")]
                         for row in table_data("tr")]
    table_data[0] = ["Comuna","Ubicacion","Poblacion","Viviendas","Densidad","Crecimiento","ICVU","Pobreza"]

data = np.array(table_data)
df = pd.DataFrame({'Neighborhood':data[1:,0],'Population':data[1:,2]})

# function that get the location of Neighborhood
def get_location(row):
        location = geolocator.geocode('{}, Santiago'.format(row['Neighborhood']))
        return '{},{}'.format(location.latitude,location.longitude)

df['Location'] = df.apply(get_location, axis=1)
new = df["Location"].str.split(",", n = 1, expand = True) 
df["Latitude"]= new[0] 
df["Longitude"]= new[1] 
df.drop(columns =["Location"], axis=1, inplace = True) 
df.reset_index()
df.set_index("Neighborhood")
df.to_csv("santiago_data.csv")
print("Done saving file")

Done saving file


At this point, i work with the data from the csv file and the only data that is pending to retrieve is the Foursquare data.

In [5]:
df = pd.read_csv('santiago_data.csv')
df.drop(df.columns[0], axis=1, inplace = True)
df[["Longitude","Latitude"]].apply(pd.to_numeric)
df.head()

Unnamed: 0,Neighborhood,Population,Latitude,Longitude
0,Cerrillos,80832,-33.487987,-70.703081
1,Cerro Navia,132622,-33.425145,-70.743954
2,Conchalí,126955,-33.385096,-70.674491
3,El Bosque,162505,3.485685,-76.529368
4,Estación Central,147041,-33.463658,-70.704966


Now i retrieve the Medical Center facilities from Foursquare, add it to the dataframe and save it to disk like i did before

In [9]:
# The code was removed by Watson Studio for sharing.

In [10]:

VERSION = '20180604' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
CATEGORY= "4bf58dd8d48988d104941735" # Medical Center categoryID

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, #use your foursware ID here
            CLIENT_SECRET, #use your foursquare password here
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            CATEGORY)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

df_full = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )
df_full.to_csv("santiago_data_full.csv")
print("Done saving file")

Done saving file


Check the saved information with all the data needed for the map, and filter by Hospital and Emergency Room

In [11]:
df_full = pd.read_csv('santiago_data_full.csv')
df_full[["Neighborhood Latitude", "Neighborhood Longitude","Venue Latitude","Venue Longitude"]].apply(pd.to_numeric)
valid_categories = ['Hospital','Emergency Room']
df_full = df_full[df_full['Venue Category'].isin(valid_categories)]
df_full.head(20)

Unnamed: 0.1,Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
15,15,Cerrillos,-33.487987,-70.703081,Hospital Clínico Mutual de Seguridad,-33.457448,-70.700683,Hospital
17,17,Cerrillos,-33.487987,-70.703081,Clínica Hospital del Profesor,-33.457492,-70.702841,Hospital
22,22,Cerrillos,-33.487987,-70.703081,Consultorio N° 5,-33.451177,-70.673671,Hospital
24,24,Cerrillos,-33.487987,-70.703081,Clinica Bicentenario,-33.457439,-70.701743,Hospital
25,25,Cerrillos,-33.487987,-70.703081,Pabellon Mutual de Seguridad,-33.45737,-70.701023,Hospital
26,26,Cerrillos,-33.487987,-70.703081,Clínica Bicentenario,-33.457076,-70.701917,Hospital
27,27,Cerrillos,-33.487987,-70.703081,UCI Mutual De Seguridad,-33.456659,-70.701221,Hospital
28,28,Cerrillos,-33.487987,-70.703081,Hospital IST,-33.476114,-70.651433,Hospital
32,32,Cerrillos,-33.487987,-70.703081,Urgencia Hospital Del Profesor,-33.457954,-70.702976,Emergency Room
33,33,Cerrillos,-33.487987,-70.703081,Urgencia Clinica Bicentenario,-33.457823,-70.701218,Emergency Room


Now print a sample map with the information so visually we can see where all these venues are located

In [None]:
location = geolocator.geocode('Santiago, Chile')
santiago_map = folium.Map(location=[location.latitude, location.longitude], zoom_start=11)

def get_color(category):
    ret = '#000000'
    if ( category == 'Hospital'):
        ret = '#FF0000'
    elif ( category == 'Emergency Room'):
        ret = '#00FF00'
    return ret

for lat, lng, neighborhood,population in zip(df['Latitude'], df['Longitude'], df['Neighborhood'],df['Population']):
    label_str = '{}, Population : {}'.format(neighborhood,population)
    label = folium.Popup(label_str, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(santiago_map)  

for lat, lng, venue,category in zip(df_full['Venue Latitude'], df_full['Venue Longitude'], df_full['Venue'],df_full['Venue Category']):
    label_str = '{}, Category : {}'.format(venue,category)
    label = folium.Popup(label_str, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=get_color(category),
        fill=True,
        fill_color=get_color(category),
        fill_opacity=0.7,
        parse_html=False).add_to(santiago_map) 
    
santiago_map