# Hospital Info File Data Cleaning

This notebook is used to clean the raw hospital info file, which was created initially by Julie Leung using Microsoft Excel.

### Information about the Source Data:
* Geographic location data such as City Area, City Population and City Latitude and Longitude were provided by the Data > Geography feature described here:
    * https://support.microsoft.com/en-us/office/get-geographic-location-data-287b4cf2-3d7d-4bc1-b412-3d00f45dbbd6
* The Hospital addresses were manually looked up and filled in using Google Maps.
* The delination for 'services' provided were informed by a manual look up of each hospital at Alberta Health Services' Find Healthcare website at:
    * https://www.albertahealthservices.ca/findhealth/search.aspx?type=facility#icon_banner
* The 'citytype' categorization ('peri', 'urban', rural') was made arbitrarily as follows:
    * Population < 60,000:  classify as 'rural'
    * Population > 60,000 and < 100,000:  classify as 'peri' (peri-urban)
    * Population > 100,000:  classify as 'urban'

### What is done in this notebook:
* The raw Hospital Info file is:
* cleaned
* Geocode API is queried to fill in hospital latitude and hospital logitude
* Where Geocode API returns empty payload, hospital latitude and longitude values are looked up manually on geocoder.ca and filled into the dataframe
* Finalized "cleaned" hospital info dataframe is saved as .csv

In [1]:
from modules.utility import Utility
import pandas as pd
import time

In [2]:
# Load Hospital Info
info = Utility.get_raw_hospital_info_dataframe()

In [3]:
info.head()

Unnamed: 0,name,id,services,city,province,city.area,city.lat,city.long,city.pop,citytype,address,hosp_lat,hosp_long
0,Alberta Children's Hospital,ach,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"28 Oki Drive NW, Calgary, Alberta, T3B 6A8",,
1,Foothills Medical Centre,fmc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1403 29 Street NW, Calgary, Alberta, T2N 2T9",,
2,Peter Lougheed Centre,plc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"3500 26 Avenue NE, Calgary, Alberta, T1Y 6J4",,
3,Rockyview General Hospital,rgh,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"7007 14 Street SW, Calgary, Alberta, T2V 1P9",,
4,South Health Campus,shc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"4448 Front Street SE, Calgary, Alberta, T3M 1M4",,


### Get geocoded coordinates for all hospitals

Note: "get_Geocode" function is defined in the Utility class

In [8]:
for address in info['address']:
    # Set mask
    mask = (info['address'] == address)

    print(f"\nWorking on hospital {info[mask]['id'].iloc[0]}, getting lat/long for address: {address}")

    # Call the API
    lat=0
    long=0
    lat,long = Utility.get_geocode(address)  # Remove comment when ready to run

    # Set the lat and long for this line in the info df
    print(f"     Writing lat={lat} and long={long} into the info dataframe...")
    info.loc[mask, 'hosp_lat'] = lat
    info.loc[mask, 'hosp_long'] = long

    # Sleep for 1 second, as geocode.maps.co has a call limit of 2 per second
    time.sleep(2)  # Remove comment when ready to run


Working on hospital ach, getting lat/long for address: 28 Oki Drive NW, Calgary, Alberta, T3B 6A8
     Calling API: request_url = https://geocode.maps.co/search?q=28 Oki Drive NW, Calgary, Alberta, T3B 6A8
     Writing lat=51.0747591 and long=-114.1468334 into the info dataframe...

Working on hospital fmc, getting lat/long for address: 1403 29 Street NW, Calgary, Alberta, T2N 2T9
     Calling API: request_url = https://geocode.maps.co/search?q=1403 29 Street NW, Calgary, Alberta, T2N 2T9
          *** Inside get_geocode(): payload is empty!  Please check out address: 1403 29 Street NW, Calgary, Alberta, T2N 2T9!
     Writing lat=0 and long=0 into the info dataframe...

Working on hospital plc, getting lat/long for address: 3500 26 Avenue NE, Calgary, Alberta, T1Y 6J4
     Calling API: request_url = https://geocode.maps.co/search?q=3500 26 Avenue NE, Calgary, Alberta, T1Y 6J4
     Writing lat=51.0789144 and long=-113.9846106 into the info dataframe...

Working on hospital rgh, getting

# Dealing with addresses which had no geocode information returned

The geocode API does not have latitude and longitude for all addresses (we'll forgive them, as it's a free service!)

Therefore, "get_geocode" function is set to return (0,0) is there is no payload.  Next, we need to fill the lat/long of some hospitals manually.  Find out which ones those are.

In [9]:
# Figure out which hospitals we need to pull and fill manually
info[(info['hosp_lat']==0) | (info['hosp_long']==0)]

Unnamed: 0,name,id,services,city,province,city.area,city.lat,city.long,city.pop,citytype,address,hosp_lat,hosp_long
1,Foothills Medical Centre,fmc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1403 29 Street NW, Calgary, Alberta, T2N 2T9",0,0
4,South Health Campus,shc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"4448 Front Street SE, Calgary, Alberta, T3M 1M4",0,0
6,South Calgary Health Centre,schc,urgentcare,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"31 Sunpark Plaza SE, Calgary, Alberta, T2X 3W5",0,0
10,Devon General Hospital,dgh,emergency,"Devon, Alberta",Alberta,14,53.3633,-113.7322,6545,rural,"101 Erie Street S, Devon, Alberta, T9G 1A6",0,0
16,Fort Sask Community Hospital,fsch,emergency,Fort Saskatchewan,Alberta,48,53.7128,-113.2131,27088,rural,"9401 86 Avenue, Fort Saskatchewan, Alberta, T8...",0,0
18,Northeast Community Health Centre,nchc,emergency,Edmonton,Alberta,768,53.5333,-113.5,1010899,urban,"14007 50 Street NW, Edmonton, Alberta, T5A 5E4",0,0
21,WestView Health Centre,whc,emergency,"Stony Plain, Alberta",Alberta,36,53.53,-114.006,17993,rural,"4405 South Park Drive, Stony Plain, Alberta, T...",0,0
26,Innisfail Health Centre,ihc,emergency,"Innisfail, Alberta",Alberta,19,52.0333,-113.95,7985,rural,"5023 42 Street, Innisfail, Alberta, T4G 1A9",0,0


# Here are the results of manually getting data from geocoder.ca:

* 1403 29 Street NW, Calgary, Alberta, T2N 2T9

51.064657, -114.130926

* 4448 Front Street SE, Calgary, Alberta, T3M 1M4

50.880825, -113.952720

* 31 Sunpark Plaza SE, Calgary, Alberta, T2X 3W5

50.902701, -114.058634

* 101 Erie Street S, Devon, Alberta, T9G 1A6

53.352265, -113.728288

* 9401 86 Avenue, Fort Saskatchewan, Alberta, T8L 0C6

53.693175, -113.213436

* 14007 50 Street NW, Edmonton, Alberta, T5A 5E4

53.604308, -113.417595

* 4405 South Park Drive, Stony Plain, Alberta, T7Z 2M7

53.538406, -113.978897

* 5023 42 Street, Innisfail, Alberta, T4G 1A9

52.020008, -113.951336

In [10]:
# Writing the lat and longs to each address:

manual_geocodes = [
{
    'address':'1403 29 Street NW, Calgary, Alberta, T2N 2T9',
    'lat':51.064657,
    'long':-114.130926
}
,
{
    'address':'4448 Front Street SE, Calgary, Alberta, T3M 1M4',
    'lat':50.880825,
    'long':-113.952720
}
,
{
    'address':'31 Sunpark Plaza SE, Calgary, Alberta, T2X 3W5',
    'lat':50.902701,
    'long':-114.058634
}
,
{
    'address':'101 Erie Street S, Devon, Alberta, T9G 1A6',
    'lat':53.352265,
    'long':-113.728288
}
,
{
    'address':'9401 86 Avenue, Fort Saskatchewan, Alberta, T8L 0C6',
    'lat':53.693175,
    'long':-113.213436
}
,
{
    'address':'14007 50 Street NW, Edmonton, Alberta, T5A 5E4',
    'lat':53.604308,
    'long':-113.417595
}
,
{
    'address':'4405 South Park Drive, Stony Plain, Alberta, T7Z 2M7',
    'lat':53.538406,
    'long':-113.978897
}
,
{
    'address':'5023 42 Street, Innisfail, Alberta, T4G 1A9',
    'lat':52.020008,
    'long':-113.951336
}
]

for item in manual_geocodes:
    mask = info['address']==item['address']
    info.loc[mask, 'hosp_lat'] = item['lat']
    info.loc[mask, 'hosp_long'] = item['long']

In [11]:
info

Unnamed: 0,name,id,services,city,province,city.area,city.lat,city.long,city.pop,citytype,address,hosp_lat,hosp_long
0,Alberta Children's Hospital,ach,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"28 Oki Drive NW, Calgary, Alberta, T3B 6A8",51.0747591,-114.1468334
1,Foothills Medical Centre,fmc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1403 29 Street NW, Calgary, Alberta, T2N 2T9",51.064657,-114.130926
2,Peter Lougheed Centre,plc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"3500 26 Avenue NE, Calgary, Alberta, T1Y 6J4",51.0789144,-113.9846106
3,Rockyview General Hospital,rgh,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"7007 14 Street SW, Calgary, Alberta, T2V 1P9",50.9900558,-114.09707702684253
4,South Health Campus,shc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"4448 Front Street SE, Calgary, Alberta, T3M 1M4",50.880825,-113.95272
5,Sheldon M. Chumir Centre,smcc,urgentcare,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1213 4 Street SW, Calgary, Alberta, T2R 0X7",51.04116535,-114.0721785791019
6,South Calgary Health Centre,schc,urgentcare,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"31 Sunpark Plaza SE, Calgary, Alberta, T2X 3W5",50.902701,-114.058634
7,Airdrie Community Health Centre,achc,urgentcare,"Airdrie, Alberta",Alberta,33,51.2917,-114.014,74100,peri,"604 Main Street S, Airdrie, Alberta, T4B 3K7",51.2871094,-114.0134299
8,Cochrane Community Health Centre,cchc,urgentcare,"Cochrane, Alberta",Alberta,30,51.189,-114.467,32199,peri,"60 Grande Boulevard, Cochrane, Alberta, T4C 0S4",51.184676,-114.4738215
9,Okotoks Health and Wellness Centre,ohwc,urgentcare,Okotoks,Alberta,20,50.725,-113.975,30405,peri,"11 Cimarron Common, Okotoks, Alberta, T1S 2E9",50.707517800000005,-113.9735785300802


# Clean up / standardize the formatting of the 'city' column that came from Microsoft's service in the original Excel file

In [12]:
# Clean up the 'city' column that came from Excel
info['city'] = info['city'].str.replace(', Alberta', '')

In [13]:
info

Unnamed: 0,name,id,services,city,province,city.area,city.lat,city.long,city.pop,citytype,address,hosp_lat,hosp_long
0,Alberta Children's Hospital,ach,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"28 Oki Drive NW, Calgary, Alberta, T3B 6A8",51.0747591,-114.1468334
1,Foothills Medical Centre,fmc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1403 29 Street NW, Calgary, Alberta, T2N 2T9",51.064657,-114.130926
2,Peter Lougheed Centre,plc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"3500 26 Avenue NE, Calgary, Alberta, T1Y 6J4",51.0789144,-113.9846106
3,Rockyview General Hospital,rgh,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"7007 14 Street SW, Calgary, Alberta, T2V 1P9",50.9900558,-114.09707702684253
4,South Health Campus,shc,emergency,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"4448 Front Street SE, Calgary, Alberta, T3M 1M4",50.880825,-113.95272
5,Sheldon M. Chumir Centre,smcc,urgentcare,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"1213 4 Street SW, Calgary, Alberta, T2R 0X7",51.04116535,-114.0721785791019
6,South Calgary Health Centre,schc,urgentcare,Calgary,Alberta,826,51.05,-114.0667,1306784,urban,"31 Sunpark Plaza SE, Calgary, Alberta, T2X 3W5",50.902701,-114.058634
7,Airdrie Community Health Centre,achc,urgentcare,Airdrie,Alberta,33,51.2917,-114.014,74100,peri,"604 Main Street S, Airdrie, Alberta, T4B 3K7",51.2871094,-114.0134299
8,Cochrane Community Health Centre,cchc,urgentcare,Cochrane,Alberta,30,51.189,-114.467,32199,peri,"60 Grande Boulevard, Cochrane, Alberta, T4C 0S4",51.184676,-114.4738215
9,Okotoks Health and Wellness Centre,ohwc,urgentcare,Okotoks,Alberta,20,50.725,-113.975,30405,peri,"11 Cimarron Common, Okotoks, Alberta, T1S 2E9",50.707517800000005,-113.9735785300802


In [17]:
# Make lat/long into floats
info['hosp_lat'] = info['hosp_lat'].astype(float)
info['hosp_long'] = info['hosp_long'].astype(float)

In [18]:
info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       27 non-null     object 
 1   id         27 non-null     object 
 2   services   27 non-null     object 
 3   city       27 non-null     object 
 4   province   27 non-null     object 
 5   city.area  27 non-null     int64  
 6   city.lat   27 non-null     float64
 7   city.long  27 non-null     float64
 8   city.pop   27 non-null     object 
 9   citytype   27 non-null     object 
 10  address    27 non-null     object 
 11  hosp_lat   27 non-null     float64
 12  hosp_long  27 non-null     float64
dtypes: float64(4), int64(1), object(8)
memory usage: 2.9+ KB


# Write the cleaned hospital info .csv file to the appropriate subdirectory

In [19]:
# Write info out to cleandata subdirectory
info.to_csv(Utility.CLEAN_INFO_FILENAME, index=False)