### Step 1: Identify the universities in Toronto

In [1]:
import pandas as pd
import numpy as np
import folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [2]:
# list of universities in Torronto retrieved from: https://www.unipage.net/en/universities?country=38&city_id=6167865
dataset = pd.read_excel('uni_toronto.xlsx')

# preview of dataset
dataset

Unnamed: 0,University,Country,City,Latitude,Longitude
0,University of Toronto,Canada,Toronto,43.663462,-79.39776
1,York University,Canada,Toronto,43.779242,-79.483559
2,Ryerson University,Canada,Toronto,43.658469,-79.378993
3,Humber College,Canada,Toronto,43.728512,-79.60625
4,Centennial College,Canada,Toronto,43.785792,-79.22781
5,George Brown College,Canada,Toronto,6.93121,-10.91934
6,OCAD University,Canada,Toronto,43.649334,-79.388887
7,Michener Institute for Applied Health Sciences,Canada,Toronto,,
8,Canadian Memorial Chiropractic College,Canada,Toronto,,
9,Tyndale College & Seminary,Canada,Toronto,43.797251,-79.392617


#### 1.1 Using geopy library to get the latitude and longitude values of Universities in Toronto

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ca_explorer</em>, as shown below.

In [None]:
address = 'University of Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of University of Toronto are {}, {}.'.format(latitude, longitude))

Great! We are able to retrive the lattitude and longitude coordinate using the geopy's Nominatim function. Let's use a **for** loop to retrieve all the coordinates for the remaining universities and add them to the dataframe.

In [None]:
for index, rows in dataset.iterrows():
    address = rows['University']
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
        longitude = location.longitude
        print(f"{index}: {rows['Title']}: {latitude},{longitude}")
        dataset.loc[index, 'Latitude'] = latitude
        dataset.loc[index, 'Longitude'] = longitude
        
    except:
        latitude = np.nan
        print("No location")

# saving what we have for now
dataset.to_excel('uni_toronto_edited.xlsx', index = None)

dataset

Cool. We are able to get most of the coordinates for the universities. There are a few that do not have either the lat, long coordinates. That is still fine. We can manually add them in for now. First, let's preview the dataframe and see what we currently have.

#### 1.2 Handling missing coordinates

In [4]:
df = pd.read_excel('uni_toronto_edited.xlsx')

# get the rows with nan values
df[df['Longitude'].isna() == True]

Unnamed: 0,University,Country,City,Latitude,Longitude


In [None]:
# coordinates for Michener Institute for Applied Health Sciences: 43.6558896, -79.3910475
df.loc[7, 'Latitude'] = 43.6558896
df.loc[7, 'Longitude'] = -79.3910475

# coordinates for Canadian Memorial Chiropractic College: 43.8039466,-79.3718667
df.loc[8, 'Latitude'] = 43.8039466
df.loc[8, 'Longitude'] = -79.3718667

# coordinates for Regis College University of Toronto: 43.6639847,-79.3902391
df.loc[10, 'Latitude'] = 43.6639847
df.loc[10, 'Longitude'] = -79.3902391

# coordinates for Academy of Design Toronto: 43.696708,-79.396104
df.loc[13, 'Latitude'] = 43.696708
df.loc[13, 'Longitude'] = -79.396104

# coordinates for Sprott Shaw Language College: 43.7076913,-79.3949289
df.loc[15, 'Latitude'] = 43.7076913
df.loc[15, 'Longitude'] = -79.3949289


# coordinates for Upper Career College Business & Technology: 43.7076738,-79.3949037
df.loc[16, 'Latitude'] = 43.7076738
df.loc[16, 'Longitude'] = -79.3949037

In [5]:
# Let's preview the dataset
df

Unnamed: 0,University,Country,City,Latitude,Longitude
0,University of Toronto,Canada,Toronto,43.663462,-79.39776
1,York University,Canada,Toronto,43.779242,-79.483559
2,Ryerson University,Canada,Toronto,43.658469,-79.378993
3,Humber College,Canada,Toronto,43.728512,-79.60625
4,Centennial College,Canada,Toronto,43.785792,-79.22781
5,George Brown College,Canada,Toronto,43.67424,-79.419903
6,OCAD University,Canada,Toronto,43.649334,-79.388887
7,Michener Institute for Applied Health Sciences,Canada,Toronto,43.65589,-79.391047
8,Canadian Memorial Chiropractic College,Canada,Toronto,43.803947,-79.371867
9,Tyndale College & Seminary,Canada,Toronto,43.797251,-79.392617


Now that we have all the coordinates fileld up, let's visualise them to see if they are correct.

In [7]:
# create map of Toronto using latitude and longitude values
toronto_latitude = 43.6532
toronto_longitude = -79.3832
map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10.5)

# add markers to map
for lat, lng, uni in zip(df['Latitude'], df['Longitude'], df['University']):
    label = f'{uni}'
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

If we were to zoom out a little bit, we can see several points that are not within Toronto. Let's fix that and we should have a complete dataset for the universities.

#### 1.3 Handling incorrect coordinates

Incorrect points 

    - victoria university
    - george brown college
    - institute for christian studies
    - internaltional langauage academy of canada
    - upper madison college

In [None]:
# coordinates for victoria university: 43.6585955,-79.4060162
df.loc[12, 'Latitude'] = 43.6585955
df.loc[12, 'Longitude'] = -79.4060162

# coordinates for George Brown College: 43.6742401,-79.4199025
df.loc[5, 'Latitude'] = 43.6742401
df.loc[5, 'Longitude'] = -79.4199025

# coordinates for institute for christian studies: 43.6613561,-79.3970904
df.loc[19, 'Latitude'] = 43.6613561
df.loc[19, 'Longitude'] = -79.3970904

# coordinates for internaltional langauage academy of canada: 43.6740717,-79.3887625
df.loc[17, 'Latitude'] = 43.6740717
df.loc[17, 'Longitude'] = -79.3887625

# coordinates for upper madison college: 43.7066472,-79.3999302
df.loc[18, 'Latitude'] = 43.7066472
df.loc[18, 'Longitude'] = -79.3999302

In [None]:
# Let's preview the data again and see if the locations are correct

toronto_latitude = 43.6532
toronto_longitude = -79.3832
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# add markers to map
for lat, lng, uni in zip(df['Latitude'], df['Longitude'], df['University']):
    label = f'{uni}'
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Great! We manage to get the coordinates for universities in Toronto. Its a bit of work to get the dataset but it will be crucial for our analysis.

#### 1.4 Saving final output

In [None]:
# saving the data to excel
df.to_excel('uni_toronto_final.xlsx', index = None)