# <p style="text-align: center;"> Using School Immunization Rates and Venue Location Data to Choose Medical Clinic Locations </p>

## <p style="text-align: center;">  Kaif Patel </p>

## Introduction/Business Problem:

The business problem that we have is understanding how to choose the ideal location for opening a medical clinic. The audience that would be interested in solving this problem are medical professionals and medical clinic stakeholders. As stakeholders their primary interests would be due to public health concerns as well as their return on investment. We will be using the city of Toronto as an example to understand this problem and create an ideal solution. By assessing the supply and demand of medical services we can address this problem. One method that we can use to assess the demand is by looking at the immunization rates of all nearby schools. Then in order to understand the supply of medical support we will look at existing medical clinics in the local area. 

<br>

## Data:

The data used to solve this problem will be the Immunization Coverage for Students Data from the Toronto Open Data portal, and the Foursquare location data in order to search and count medical clinics near all Toronto schools. The immunization data has immunization rates for each individual school, as well as latitude and longitude values which we can then use to search for nearby medical clinic venues within a given radius.

In [1]:
# importing and installing necessary libraries

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment these 2 lines if you don't have these packages installed
#!conda install -c conda-forge geopy --yes 

import pandas as pd
import numpy as np
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
import geocoder 
from geopy.geocoders import Nominatim 
import json 
from pandas.io.json import json_normalize 
import requests
import csv


print ('imported')

imported


In [2]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


In [4]:
# downloading dataset from toronto open data portal

!wget -O immunization-coverage-2018-2019.csv https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/dd2f7d66-3902-478d-bde8-dcfbaef589d6?format=csv

--2020-02-02 04:37:47--  https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/dd2f7d66-3902-478d-bde8-dcfbaef589d6?format=csv
Resolving ckan0.cf.opendata.inter.prod-toronto.ca (ckan0.cf.opendata.inter.prod-toronto.ca)... 13.249.134.65, 13.249.134.19, 13.249.134.28, ...
Connecting to ckan0.cf.opendata.inter.prod-toronto.ca (ckan0.cf.opendata.inter.prod-toronto.ca)|13.249.134.65|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘immunization-coverage-2018-2019.csv’

immunization-covera     [ <=>                ]  57.05K  --.-KB/s    in 0.02s   

2020-02-02 04:37:49 (2.53 MB/s) - ‘immunization-coverage-2018-2019.csv’ saved [58417]



In [5]:
# reading and displaying dataset

df = pd.read_csv('immunization-coverage-2018-2019.csv')
df

Unnamed: 0,_id,School Name,Enrolled population,DTP coverage rate (%),DTP Religious exemption rate (%),MMR coverage rate (%),MMR Religious exemption rate (%),Lat,Lng
0,1,A Y Jackson S.S.,1027,89.0,1.1,96.5,1.1,43.805261,-79.366555
1,2,Academie Alexandre-Dumas,129,86.8,1.6,89.1,1.6,43.762419,-79.179765
2,3,Adam Beck Jr P.S.,309,96.1,3.6,95.8,3.6,43.683152,-79.288488
3,4,Africentric Alternative School,77,71.4,20.8,72.7,20.8,43.745424,-79.488261
4,5,Agincourt C.I.,1241,87.3,1.0,98.1,1.0,43.788874,-79.278910
...,...,...,...,...,...,...,...,...,...
801,802,York Memorial C.I.,876,86.8,2.5,96.3,2.5,43.690279,-79.476240
802,803,York Mills C.I.,1200,86.3,1.3,97.2,1.3,43.751529,-79.373524
803,804,Yorkview P.S.,256,93.8,2.3,93.0,2.3,43.772574,-79.435566
804,805,Yorkwoods P.S.,224,88.4,0.9,91.5,0.9,43.750660,-79.513885


## Table Information:

DTP = diphtheria, tetanus, polio vaccine
<br>
MMR = measles, mumps, rubella vaccine

In [6]:
# We don't need to know the religious exemption rates for both vaccines so we can drop these columns. We will leave the enrolled population data so we can see just how much the lack of vaccinations may be affecting the population. 

df = df.drop(['_id', 'DTP Religious exemption rate (%)', 'MMR Religious exemption rate (%)'], axis=1)
df.head()

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng
0,A Y Jackson S.S.,1027,89.0,96.5,43.805261,-79.366555
1,Academie Alexandre-Dumas,129,86.8,89.1,43.762419,-79.179765
2,Adam Beck Jr P.S.,309,96.1,95.8,43.683152,-79.288488
3,Africentric Alternative School,77,71.4,72.7,43.745424,-79.488261
4,Agincourt C.I.,1241,87.3,98.1,43.788874,-79.27891


In [7]:
# Due to the recent resurgence of measles, we will sort by MMR coverage rate to gain a better idea for what schools have the lowest MMR immunization rates

df = df.sort_values(['MMR coverage rate (%)']).reset_index(drop=True)
df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283
...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151


In [8]:
# checking toronto latitude and longitude coordinates using geocoder

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [9]:
# creating map of Toronto with school data
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10.7)

# adding markers to map
for lat, lng, school, pop, mmrcov in zip(df['Lat'], df['Lng'], df['School Name'], df['Enrolled population'], df['MMR coverage rate (%)']):
    label = ("{}, Vaccination Rate (%): {}, Population: {} ").format(school, mmrcov, pop)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

# generating map    
map_toronto

### Figure 1. Map showing vaccination rate and population of every school in Toronto. The blue markers represent the schools. 

In [10]:
# entering foursquare credentials

CLIENT_ID = 'MSWQU2E54PNNZER5WFZQ4P3YMHKXXESTOBXLSPUFRDASCJKI' # your Foursquare ID
CLIENT_SECRET = '5VTGKWKXGLXY4KSLLXVMEWIMQZ0BIVAEEGFL2BXXWZBRRQJI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MSWQU2E54PNNZER5WFZQ4P3YMHKXXESTOBXLSPUFRDASCJKI
CLIENT_SECRET:5VTGKWKXGLXY4KSLLXVMEWIMQZ0BIVAEEGFL2BXXWZBRRQJI


In [11]:
# creating function to generate venue information from foursquare API

def getNearbyVenues(names, latitudes, longitudes, radius=3000, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d177941735&intent=browse&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['School Name', 
                  'School Lat', 
                  'School Lng', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# generating venue data from foursquare API

names=df['School Name']
latitudes=df['Lat']
longitudes=df['Lng']

school_venues = getNearbyVenues(names, 
                                latitudes,
                                longitudes)

In [None]:
# saving venue information

school_venues.to_csv('school_venues.csv')

In [12]:
# reading venue information

school_venues = pd.read_csv('school_venues.csv')

In [13]:
# displaying venue information

school_venues

Unnamed: 0.1,Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office
1,1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office
2,2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office
3,3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office
4,4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office
...,...,...,...,...,...,...,...,...
12004,12004,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office
12005,12005,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office
12006,12006,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office
12007,12007,East York Alternative S.S.,43.694991,-79.325296,GoodLife Fitness Toronto Danforth and Pape,43.678545,-79.345305,Doctor's Office


In [14]:
# dropping extra index column from saved file

school_venues.drop(school_venues.columns[0], axis = 1, inplace = True)

In [15]:
# checking venues for outliers that could be making the data dirty

school_venues.head(10)

Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category
0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.2949,Doctor's Office
1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office
2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office
3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office
4,Da Vinci School,43.658953,-79.40191,Totum Life Science King,43.645119,-79.395883,Doctor's Office
5,Da Vinci School,43.658953,-79.40191,GoodLife Fitness Toronto 137 Yonge Street,43.651242,-79.378068,Gym
6,Da Vinci School,43.658953,-79.40191,GoodLife Fitness Toronto Bell Trinity Centre,43.653436,-79.382314,Doctor's Office
7,Da Vinci School,43.658953,-79.40191,GoodLife Fitness Toronto Richmond and John,43.648754,-79.39194,Gym
8,Da Vinci School,43.658953,-79.40191,GoodLife Fitness Toronto Bloor and Bay,43.669863,-79.390176,Doctor's Office
9,Da Vinci School,43.658953,-79.40191,Emkiro Health Services,43.646784,-79.384264,Doctor's Office


In [16]:
# creating new column with frequency counts of all venues per school

school_venues['freq'] = school_venues.groupby('School Name')['School Name'].transform('count')
school_venues

Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category,freq
0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office,4
1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office,4
2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office,4
3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office,4
4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office,69
...,...,...,...,...,...,...,...,...
12004,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office,13
12005,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office,13
12006,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office,13
12007,East York Alternative S.S.,43.694991,-79.325296,GoodLife Fitness Toronto Danforth and Pape,43.678545,-79.345305,Doctor's Office,13


In [17]:
# renaming dataframe

df2 = school_venues
df2

Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category,freq
0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office,4
1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office,4
2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office,4
3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office,4
4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office,69
...,...,...,...,...,...,...,...,...
12004,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office,13
12005,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office,13
12006,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office,13
12007,East York Alternative S.S.,43.694991,-79.325296,GoodLife Fitness Toronto Danforth and Pape,43.678545,-79.345305,Doctor's Office,13


In [18]:
# dropping all rows with goodlife fitness

df2 = df2[~df2.Venue.str.startswith('GoodLife Fitness')]

In [None]:
# saving cleaned venue data

df2.to_csv('school_venues_cleaned.csv')

In [19]:
# reading cleaned venue data to new dataframe

df3 = pd.read_csv('school_venues_cleaned.csv')

In [20]:
# displaying cleaned venue data

df3

Unnamed: 0.1,Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category,freq
0,0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office,4
1,1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office,4
2,2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office,4
3,3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office,4
4,4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office,69
...,...,...,...,...,...,...,...,...,...
11495,12003,East York Alternative S.S.,43.694991,-79.325296,Dawes Family Practice & Walk-In,43.689467,-79.296951,Doctor's Office,13
11496,12004,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office,13
11497,12005,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office,13
11498,12006,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office,13


In [21]:
# dropping extra index column from saved file

df3.drop(df3.columns[0:1], axis = 1, inplace = True)

df3

Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category,freq
0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office,4
1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office,4
2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office,4
3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office,4
4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office,69
...,...,...,...,...,...,...,...,...
11495,East York Alternative S.S.,43.694991,-79.325296,Dawes Family Practice & Walk-In,43.689467,-79.296951,Doctor's Office,13
11496,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office,13
11497,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office,13
11498,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office,13


In [22]:
# dropping columns to clean this dataframe so it can be merged with original dataframe

df2.drop(['School Lat', 'School Lng', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'], axis = 1, inplace = True)
df2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,School Name,freq
0,St Rene Goupil C.S.,4
1,St Rene Goupil C.S.,4
2,St Rene Goupil C.S.,4
3,St Rene Goupil C.S.,4
4,Da Vinci School,69
...,...,...
12003,East York Alternative S.S.,13
12004,East York Alternative S.S.,13
12005,East York Alternative S.S.,13
12006,East York Alternative S.S.,13


In [23]:
# dropping duplicates and resetting index

df2.drop_duplicates(inplace = True)
df2.reset_index(drop = True, inplace = True)
df2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,School Name,freq
0,St Rene Goupil C.S.,4
1,Da Vinci School,69
2,Ecole elementaire Paul-Demers,11
3,Greenwood S.S.,17
4,Alpha II Alternative School,22
...,...,...
801,South East Year Round Alternative,4
802,Monsignor Fraser College - Midland,10
803,Monsignor Fraser College - St. Martin,67
804,Victoria Park Elementary,11


In [24]:
# displaying original dataframe

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283
...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151


In [25]:
# merging dataframes to get frequency counts of nearby clinics for each school

result = pd.merge(df,
                 df2[['School Name', 'freq']],
                 on='School Name',
                 how='left')

df = result
df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22
...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11


In [26]:
# finding average vaccination rate because some schools have really high MMR coverage but really low DTP coverage(or vice versa), we want to get an approximate idea of overall vaccination rate per school

cols = ['DTP coverage rate (%)', 'MMR coverage rate (%)']

df['Average vaccine coverage rate (%)'] = df[cols].astype(float).mean(axis=1)

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%)
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50
...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45


In [27]:
# finding total percentage of students that are not vaccinated

df['Average vaccine noncoverage rate (%)'] = 100 - df['Average vaccine coverage rate (%)']

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%)
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90,42.10
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50,41.50
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80,43.20
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00,37.00
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50,39.50
...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55


In [28]:
# changing nonvaccinated student percentage into a ratio

df['Average vaccine noncoverage ratio'] = df['Average vaccine noncoverage rate (%)']/100

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90,42.10,0.4210
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50,41.50,0.4150
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80,43.20,0.4320
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00,37.00,0.3700
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50,39.50,0.3950
...,...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70,0.0370
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25,0.0625
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55,0.0055


In [29]:
# estimating total number of nonvaccinated students per school

df['Nonvaccinated students'] = df['Enrolled population']*df['Average vaccine noncoverage ratio']

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90,42.10,0.4210,39.995
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50,41.50,0.4150,19.505
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80,43.20,0.4320,28.512
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00,37.00,0.3700,45.510
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50,39.50,0.3950,7.505
...,...,...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70,0.0370,0.999
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25,0.0625,1.000
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875,3.000
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55,0.0055,0.506


In [30]:
# finding number of nonvaccinated students per clinic to understand overall demand for medical clinics in the nearby area

df['Nonvaccinated students/clinic'] = df['Nonvaccinated students']/df['freq']

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90,42.10,0.4210,39.995,9.998750
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50,41.50,0.4150,19.505,0.282681
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80,43.20,0.4320,28.512,2.592000
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00,37.00,0.3700,45.510,2.677059
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50,39.50,0.3950,7.505,0.341136
...,...,...,...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70,0.0370,0.999,0.249750
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25,0.0625,1.000,0.100000
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875,3.000,0.044776
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55,0.0055,0.506,0.046000


In [31]:
# rounding values so they display nicely on the map

df['Nonvaccinated students'] = df['Nonvaccinated students'].round(2)
df['Nonvaccinated students/clinic'] = df['Nonvaccinated students/clinic'].round(2)

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,St Rene Goupil C.S.,95,57.9,57.9,43.819348,-79.288303,4,57.90,42.10,0.4210,40.00,10.00
1,Da Vinci School,47,57.4,59.6,43.658953,-79.401910,69,58.50,41.50,0.4150,19.50,0.28
2,Ecole elementaire Paul-Demers,66,53.0,60.6,43.791952,-79.367340,11,56.80,43.20,0.4320,28.51,2.59
3,Greenwood S.S.,123,61.8,64.2,43.682839,-79.334090,17,63.00,37.00,0.3700,45.51,2.68
4,Alpha II Alternative School,19,52.6,68.4,43.659557,-79.436283,22,60.50,39.50,0.3950,7.51,0.34
...,...,...,...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70,0.0370,1.00,0.25
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25,0.0625,1.00,0.10
803,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875,3.00,0.04
804,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55,0.0055,0.51,0.05


In [32]:
# displaying venue dataframe

df3

Unnamed: 0,School Name,School Lat,School Lng,Venue,Venue Latitude,Venue Longitude,Venue Category,freq
0,St Rene Goupil C.S.,43.819348,-79.288303,Silver Star Medical Centre,43.815107,-79.294900,Doctor's Office,4
1,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738,Doctor's Office,4
2,St Rene Goupil C.S.,43.819348,-79.288303,Dr. Norman Bethune C.I.,43.811589,-79.319662,Doctor's Office,4
3,St Rene Goupil C.S.,43.819348,-79.288303,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283,Doctor's Office,4
4,Da Vinci School,43.658953,-79.401910,Totum Life Science King,43.645119,-79.395883,Doctor's Office,69
...,...,...,...,...,...,...,...,...
11495,East York Alternative S.S.,43.694991,-79.325296,Dawes Family Practice & Walk-In,43.689467,-79.296951,Doctor's Office,13
11496,East York Alternative S.S.,43.694991,-79.325296,Appletree Medical Centre,43.713675,-79.306989,Doctor's Office,13
11497,East York Alternative S.S.,43.694991,-79.325296,Crescent Town Health Centre,43.696474,-79.293405,Doctor's Office,13
11498,East York Alternative S.S.,43.694991,-79.325296,German Roots in Russia PGS,43.674828,-79.342977,Doctor's Office,13


In [33]:
# cleaning venue dataframe

df3.drop(['School Name', 'School Lat', 'School Lng', 'Venue Category', 'freq'], axis = 1, inplace = True)
df3

Unnamed: 0,Venue,Venue Latitude,Venue Longitude
0,Silver Star Medical Centre,43.815107,-79.294900
1,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738
2,Dr. Norman Bethune C.I.,43.811589,-79.319662
3,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283
4,Totum Life Science King,43.645119,-79.395883
...,...,...,...
11495,Dawes Family Practice & Walk-In,43.689467,-79.296951
11496,Appletree Medical Centre,43.713675,-79.306989
11497,Crescent Town Health Centre,43.696474,-79.293405
11498,German Roots in Russia PGS,43.674828,-79.342977


In [34]:
# dropping duplicates and resetting index

df3.drop_duplicates(inplace = True)
df3.reset_index(drop = True, inplace = True)
df3

Unnamed: 0,Venue,Venue Latitude,Venue Longitude
0,Silver Star Medical Centre,43.815107,-79.294900
1,Dr. Albert S.Y. Ng & Associates (Optometrist),43.804021,-79.286738
2,Dr. Norman Bethune C.I.,43.811589,-79.319662
3,MCI The Doctor's Office (Bamburgh),43.814874,-79.323283
4,Totum Life Science King,43.645119,-79.395883
...,...,...,...
390,MCI The Doctor's Office At The Clinic In Walmart,43.835784,-79.255757
391,Dr. Sriharan,43.810699,-79.255699
392,Dr. R.S. Rana's office,43.820171,-79.261795
393,Dr. Judith Nacua M.D,43.798225,-79.216995


In [35]:
# creating map of Toronto with school and venue data
map_toronto_2 = folium.Map(location=[latitude, longitude], zoom_start=10.7)

# adding school markers to map
for lat, lng, school, non in zip(df['Lat'], df['Lng'], df['School Name'], df['Nonvaccinated students/clinic']):
    label = ("{}, Estimated nonvaccinated students/clinic: {}").format(school, non)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_2)
    

# adding venue markers to map    
for lat, lng, venue in zip(df3["Venue Latitude"], df3["Venue Longitude"], df3["Venue"]):
    label = ("{}").format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#f76f6a',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_2)
    
# generating map    
map_toronto_2

### Figure 2. Map showing estimated nonvaccinated students/clinic for each school, and every medical clinic within 3 km of each school in Toronto. The blue markers represent the schools and the red markers represent the medical clinics. 

In [36]:
# sorting dataframe by nonvaccinated students to see the schools with the greatest population of nonvaccinated students

df = df.sort_values(['Nonvaccinated students'], ascending=False).reset_index(drop=True)

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,Michael Power/St Joseph H.S.,1818,84.6,95.5,43.659290,-79.581984,5,90.05,9.95,0.0995,180.89,36.18
1,Marc Garneau C.I.,1629,83.2,95.0,43.708976,-79.334968,10,89.10,10.90,0.1090,177.56,17.76
2,Bishop Allen Academy Catholic S.S.,1540,85.1,93.6,43.634428,-79.504465,11,89.35,10.65,0.1065,164.01,14.91
3,Northview Heights S.S.,1490,84.6,93.7,43.774429,-79.446325,14,89.15,10.85,0.1085,161.66,11.55
4,Northern S.S.,1678,85.3,95.8,43.710537,-79.390164,30,90.55,9.45,0.0945,158.57,5.29
...,...,...,...,...,...,...,...,...,...,...,...,...
801,South East Year Round Alternative,27,92.6,100.0,43.728101,-79.256689,4,96.30,3.70,0.0370,1.00,0.25
802,Monsignor Fraser College - Midland,16,87.5,100.0,43.801069,-79.285717,10,93.75,6.25,0.0625,1.00,0.10
803,Victoria Park Elementary,92,98.9,100.0,43.712689,-79.298151,11,99.45,0.55,0.0055,0.51,0.05
804,Kimberley Jr P.S.,160,100.0,100.0,43.682620,-79.299128,13,100.00,0.00,0.0000,0.00,0.00


In [37]:
# sorting dataframe by nonvaccinated students/clinic to see the schools with the highest ratio of unvaccinated students per clinic

df = df.sort_values(['Nonvaccinated students/clinic'], ascending=False).reset_index(drop=True)

df

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,Monsignor Percy Johnson Catholic H.S.,936,84.9,94.7,43.720695,-79.572017,1,89.80,10.20,0.1020,95.47,95.47
1,Josyf Cardinal Slipyj C.S.,395,84.8,84.8,43.659475,-79.566025,1,84.80,15.20,0.1520,60.04,60.04
2,St. Basil The Great College,1258,86.2,95.1,43.727168,-79.533451,2,90.65,9.35,0.0935,117.62,58.81
3,St. Mother Teresa Catholic Academy,413,83.5,90.1,43.807243,-79.217789,1,86.80,13.20,0.1320,54.52,54.52
4,St Benedict C.S.,436,86.5,89.4,43.720695,-79.572017,1,87.95,12.05,0.1205,52.54,52.54
...,...,...,...,...,...,...,...,...,...,...,...,...
801,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875,3.00,0.04
802,Rosedale Jr P.S.,151,98.7,98.7,43.677656,-79.381686,53,98.70,1.30,0.0130,1.96,0.04
803,Montrose Jr P.S.,112,99.1,98.2,43.658682,-79.418844,53,98.65,1.35,0.0135,1.51,0.03
804,Kimberley Jr P.S.,160,100.0,100.0,43.682620,-79.299128,13,100.00,0.00,0.0000,0.00,0.00


In [38]:
# checking dataframe to see any trends

df.head(60)

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,Monsignor Percy Johnson Catholic H.S.,936,84.9,94.7,43.720695,-79.572017,1,89.8,10.2,0.102,95.47,95.47
1,Josyf Cardinal Slipyj C.S.,395,84.8,84.8,43.659475,-79.566025,1,84.8,15.2,0.152,60.04,60.04
2,St. Basil The Great College,1258,86.2,95.1,43.727168,-79.533451,2,90.65,9.35,0.0935,117.62,58.81
3,St. Mother Teresa Catholic Academy,413,83.5,90.1,43.807243,-79.217789,1,86.8,13.2,0.132,54.52,54.52
4,St Benedict C.S.,436,86.5,89.4,43.720695,-79.572017,1,87.95,12.05,0.1205,52.54,52.54
5,St John Vianney C.S.,282,86.9,86.2,43.737976,-79.553846,1,86.55,13.45,0.1345,37.93,37.93
6,Michael Power/St Joseph H.S.,1818,84.6,95.5,43.65929,-79.581984,5,90.05,9.95,0.0995,180.89,36.18
7,Hollycrest Middle School,475,90.7,94.3,43.656104,-79.582708,1,92.5,7.5,0.075,35.62,35.62
8,Father John Redmond Catholic S.S.,1187,83.0,94.4,43.595141,-79.516539,4,88.7,11.3,0.113,134.13,33.53
9,Richview C.I.,1035,84.8,96.8,43.678869,-79.538912,3,90.8,9.2,0.092,95.22,31.74


In [39]:
# checking bottom of dataframe - interesting to note that a considerable number of schools at the bottom of the dataframe tend to have low population sizes, perhaps some of these are schools with more stringent vaccination requirements

df.tail(60)

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
746,Harrison P.S.,102,97.1,97.1,43.757368,-79.377309,14,97.1,2.9,0.029,2.96,0.21
747,Spectrum Alternative Sr,60,88.3,91.7,43.699276,-79.394896,28,90.0,10.0,0.1,6.0,0.21
748,George Anderson P.S.,91,100.0,96.7,43.705454,-79.478948,7,98.35,1.65,0.0165,1.5,0.21
749,Monsignor Fraser College - Midland North,28,89.3,96.4,43.801069,-79.285717,10,92.85,7.15,0.0715,2.0,0.2
750,Whitney Jr P.S.,161,95.7,95.7,43.68716,-79.378397,37,95.7,4.3,0.043,6.92,0.19
751,Nelson Mandela Park P.S.,234,94.0,94.9,43.658353,-79.360603,67,94.45,5.55,0.0555,12.99,0.19
752,Clinton Street Jr P.S.,273,95.6,96.0,43.657646,-79.414058,60,95.8,4.2,0.042,11.47,0.19
753,Quest Alternative Sr,66,86.4,89.4,43.67046,-79.35213,43,87.9,12.1,0.121,7.99,0.19
754,Orde Street Jr P.S.,243,96.3,92.6,43.658627,-79.39226,71,94.45,5.55,0.0555,13.49,0.19
755,St Francis of Assisi C.S.,123,87.8,93.5,43.6562,-79.414628,61,90.65,9.35,0.0935,11.5,0.19


In [40]:
# slicing dataframe to show only top 10% of schools 

topdecile = df[0:80]
topdecile

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
0,Monsignor Percy Johnson Catholic H.S.,936,84.9,94.7,43.720695,-79.572017,1,89.80,10.20,0.1020,95.47,95.47
1,Josyf Cardinal Slipyj C.S.,395,84.8,84.8,43.659475,-79.566025,1,84.80,15.20,0.1520,60.04,60.04
2,St. Basil The Great College,1258,86.2,95.1,43.727168,-79.533451,2,90.65,9.35,0.0935,117.62,58.81
3,St. Mother Teresa Catholic Academy,413,83.5,90.1,43.807243,-79.217789,1,86.80,13.20,0.1320,54.52,54.52
4,St Benedict C.S.,436,86.5,89.4,43.720695,-79.572017,1,87.95,12.05,0.1205,52.54,52.54
...,...,...,...,...,...,...,...,...,...,...,...,...
75,Satec at W A Porter C.I.,1181,88.0,96.4,43.716593,-79.287270,9,92.20,7.80,0.0780,92.12,10.24
76,Joseph Brant Sr P.S.,459,86.9,90.8,43.767044,-79.175498,5,88.85,11.15,0.1115,51.18,10.24
77,John Polanyi C.I.,743,79.1,87.9,43.717809,-79.439855,12,83.50,16.50,0.1650,122.60,10.22
78,Pierre Laporte Middle School,398,88.2,91.7,43.725075,-79.492124,4,89.95,10.05,0.1005,40.00,10.00


In [41]:
# slicing dataframe to show only top 10% of schools 

bottomdecile = df[725:-1]
bottomdecile

Unnamed: 0,School Name,Enrolled population,DTP coverage rate (%),MMR coverage rate (%),Lat,Lng,freq,Average vaccine coverage rate (%),Average vaccine noncoverage rate (%),Average vaccine noncoverage ratio,Nonvaccinated students,Nonvaccinated students/clinic
725,Iroquois Jr P.S.,240,99.2,99.2,43.802680,-79.269441,7,99.20,0.80,0.0080,1.92,0.27
726,St Michael C.S.,129,82.9,90.7,43.647869,-79.369620,65,86.80,13.20,0.1320,17.03,0.26
727,Givins/Shaw Jr P.S.,194,92.8,93.3,43.646408,-79.417584,51,93.05,6.95,0.0695,13.48,0.26
728,Subway Academy I,21,71.4,81.0,43.679140,-79.336636,19,76.20,23.80,0.2380,5.00,0.26
729,Ernest P.S.,69,95.7,91.3,43.791415,-79.337383,17,93.50,6.50,0.0650,4.49,0.26
...,...,...,...,...,...,...,...,...,...,...,...,...
800,Native Learning Centre,15,66.7,86.7,43.663608,-79.379178,67,76.70,23.30,0.2330,3.49,0.05
801,Monsignor Fraser College - St. Martin,16,62.5,100.0,43.667175,-79.364426,67,81.25,18.75,0.1875,3.00,0.04
802,Rosedale Jr P.S.,151,98.7,98.7,43.677656,-79.381686,53,98.70,1.30,0.0130,1.96,0.04
803,Montrose Jr P.S.,112,99.1,98.2,43.658682,-79.418844,53,98.65,1.35,0.0135,1.51,0.03


In [42]:
# generating map of Toronto with top 10% of schools with the highest rate of nonvaccinated students/clinic
map_deciles = folium.Map(location=[latitude, longitude], zoom_start=10.7)

# adding markers to map
for lat, lng, school, non, nsc in zip(topdecile['Lat'], topdecile['Lng'], topdecile['School Name'], topdecile['Nonvaccinated students'], topdecile['Nonvaccinated students/clinic']):
    label = ("{}, Nonvaccinated students: {}, Nonvaccinated students/clinic: {}").format(school, non, nsc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#40e3f5',
        fill_opacity=0.7,
        parse_html=False).add_to(map_deciles)
    
for lat, lng, school, non, nsc in zip(bottomdecile['Lat'], bottomdecile['Lng'], bottomdecile['School Name'], bottomdecile['Nonvaccinated students'], bottomdecile['Nonvaccinated students/clinic']):
    label = ("{}, Nonvaccinated students: {}, Nonvaccinated students/clinic: {}").format(school, non, nsc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='red',
        fill=True,
        fill_color='#fa7b0c',
        fill_opacity=0.7,
        parse_html=False).add_to(map_deciles)
    
# generating map    
map_deciles

### Figure 3. Map representing the population of nonvaccinated students and nonvaccinated students/clinic for each school in Toronto. The blue markers represent the top 10% of schools in Toronto in terms of highest nonvaccinated student/clinic rate, and the red markers represent the bottom 10% of schools. From an investor standpoint: the blue markers would be ideal locations to open a medical clinic, and the red markers would be non-ideal locations with heavy competition.

In [43]:
# dropping all other columns to show only simplified information in dataframe

df.drop(df.columns[1:10], axis = 1, inplace = True)

df

Unnamed: 0,School Name,Nonvaccinated students,Nonvaccinated students/clinic
0,Monsignor Percy Johnson Catholic H.S.,95.47,95.47
1,Josyf Cardinal Slipyj C.S.,60.04,60.04
2,St. Basil The Great College,117.62,58.81
3,St. Mother Teresa Catholic Academy,54.52,54.52
4,St Benedict C.S.,52.54,52.54
...,...,...,...
801,Monsignor Fraser College - St. Martin,3.00,0.04
802,Rosedale Jr P.S.,1.96,0.04
803,Montrose Jr P.S.,1.51,0.03
804,Kimberley Jr P.S.,0.00,0.00


## Data Insights:

From this analysis we can see the top areas of interest to a possible investor looking to open a medical clinic in the city of Toronto. A noteworthy argument against this analysis however is that this analysis makes the assumption that there is a strong correlation between vaccination rates and disease prevalence in the local area. In reality there may only be occasional outbreaks. Furthermore, the Foursquare venue data is not entirely accurate and may be making our data dirty to some degree. For example, many Goodlife Fitness gyms have been miscategorized as medical clinics, which have been removed, but other venues may also be miscategorized. The results may also be skewed because of a really high number of medical clinics per school in the downtown area. 

That being said though, this analysis does discover some interesting trends. It can be noted that the worst schools in terms of high numbers of nonvaccinated students/clinic tend to be high population schools in some of the poorer areas of the city. Furthermore the data also shows that the schools with the lowest rates of nonvaccinated students/clinic tend to be low population schools, possibly alternative and/or private schools with stringent vaccination requirements. This is supported by the fact that many of these low population schools are in downtown, which should normally have higher populations. This data thus suggests a possible correlation between vaccination rates and income. 