<H3>Import Toronto neighborhood data from Wikipedia</H3>

<ul>
    <li>The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood</li>
    <li>Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.</li>
</ul>

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <table> tags into a list.
tables = soup.find_all('table')

# Toronto table in the first table on the table list
table_toronto = tables[0]

# Extracting all the rows from toronto table
table_toronto_rows = table_toronto.find_all('tr')

# Create a empty data frame with coloumns PostalCode, Borough and Neighborhood
column_name = ['PostalCode', 'Borough', 'Neighborhood']
df = pd.DataFrame(columns=column_name)

# For each row in the toronto table, populate each data in each column into the dataframe
for row in table_toronto_rows:
    table_toronto_columns = row.find_all('td')
    
    if len(table_toronto_columns) > 0:
        
        postcode = table_toronto_columns[0].string
        
        borough_a = table_toronto_columns[1].find('a')
        if borough_a != None:
            borough = borough_a.string
        else:
            borough = table_toronto_columns[1].string
        
        neighborhood_a = table_toronto_columns[2].find('a')
        if neighborhood_a != None:
            neighborhood = neighborhood_a.string
        else:
            neighborhood = table_toronto_columns[2].string
        
        #print(neighborhood)
        if borough.upper() != 'NOT ASSIGNED':
            df = df.append({'PostalCode': postcode,
                    'Borough': borough,
                    'Neighborhood': neighborhood}, ignore_index=True)

df.Neighborhood = df.Neighborhood.str.strip()            
print(df.head(10))
print(df.shape)

  PostalCode           Borough      Neighborhood
0        M3A        North York         Parkwoods
1        M4A        North York  Victoria Village
2        M5A  Downtown Toronto      Harbourfront
3        M5A  Downtown Toronto       Regent Park
4        M6A        North York  Lawrence Heights
5        M6A        North York    Lawrence Manor
6        M7A      Queen's Park      Not assigned
7        M9A         Etobicoke  Islington Avenue
8        M1B       Scarborough             Rouge
9        M1B       Scarborough           Malvern
(212, 3)


<ul>
<li>More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.</li>
</ul>

In [3]:
# Get the list of unique PostalCode
postal_code_list = df.PostalCode.unique()

# Create a new dataframe with PostalCode and the new combined Neighborhood
column_name = ['PostalCode', 'Neighborhood']
df_postal = pd.DataFrame(columns=column_name)

# Loop through the PostalCode so that the Neighborhoods under the same PostalCode are combined into one separated by comma
for postal_code in postal_code_list:
    neighborhood_list = df.loc[df['PostalCode'] == postal_code]['Neighborhood'].tolist()
    neighborhood_new = ", ".join(neighborhood_list)
        
    df_postal = df_postal.append({'PostalCode': postal_code,
                    'Neighborhood': neighborhood_new}, ignore_index=True)

print(df_postal.head())

# Now merge the 2 dataframes together to get the final dataframe with PostalCode, Borough, and Neighborhood
df.drop(['Neighborhood'], axis=1, inplace=True)
df.drop_duplicates(inplace=True)

print(df.head())

df_toronto = pd.merge(df, df_postal, on='PostalCode')

print(df_toronto.head())

  PostalCode                      Neighborhood
0        M3A                         Parkwoods
1        M4A                  Victoria Village
2        M5A         Harbourfront, Regent Park
3        M6A  Lawrence Heights, Lawrence Manor
4        M7A                      Not assigned
  PostalCode           Borough
0        M3A        North York
1        M4A        North York
2        M5A  Downtown Toronto
4        M6A        North York
6        M7A      Queen's Park
  PostalCode           Borough                      Neighborhood
0        M3A        North York                         Parkwoods
1        M4A        North York                  Victoria Village
2        M5A  Downtown Toronto         Harbourfront, Regent Park
3        M6A        North York  Lawrence Heights, Lawrence Manor
4        M7A      Queen's Park                      Not assigned


<ul>
<li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.</li>
</ul>

In [4]:
mask = df_toronto.Neighborhood == 'Not assigned'
column_name = 'Neighborhood'
df_toronto.loc[mask, column_name] = df_toronto.Borough

df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


<ul><li>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</li></ul>

In [5]:
df_toronto.shape

(103, 3)

<H3>Getting Coordinates..</H3>

First, install geocoder package

In [6]:
!conda install -c conda-forge geocoder --yes

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geocoder:   1.38.1-py_0  conda-forge
    orderedset: 2.0-py35_0   conda-forge
    ratelim:    0.1.6-py35_0 conda-forge

orderedset-2.0 100% |################################| Time: 0:00:00  43.49 MB/s
ratelim-0.1.6- 100% |################################| Time: 0:00:00  12.47 MB/s
geocoder-1.38. 100% |################################| Time: 0:00:00  46.11 MB/s


After the package has been installed, import geocoder, pass in the postal code to retrieve the latitude and longtitude of each neighborhood in the dataframe.

In [7]:
import geocoder # import geocoder

def get_coordinates(postal_code):
    lat_lng_coords = None
    
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng
    
    return lat_lng_coords

df_toronto['Latitude'] = 0
df_toronto['Longitude'] = 0

for index, row in df_toronto.iterrows():
    coords = get_coordinates(row['PostalCode'])
    df_toronto.loc[index, 'Latitude'] = coords[0]
    df_toronto.loc[index, 'Longitude'] = coords[1]

df_toronto.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.752435,-79.329268
1,M4A,North York,Victoria Village,43.730417,-79.31334
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65512,-79.36264
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.72327,-79.451603
4,M7A,Queen's Park,Queen's Park,43.661072,-79.390895
5,M9A,Etobicoke,Islington Avenue,43.662302,-79.528305
6,M1B,Scarborough,"Rouge, Malvern",43.81165,-79.195561
7,M3B,North York,Don Mills North,43.749055,-79.362212
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.707465,-79.311803
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657363,-79.37818


Get the latitude and longtigude of Toronto city

In [8]:
address = 'Toronto, Ontario'

g_toronto = geocoder.arcgis(address)
toronto_coords = g_toronto.latlng
toronto_lat = toronto_coords[0]
toronto_long = toronto_coords[1]

print(toronto_lat, toronto_long)

43.648690000000045 -79.38543999999996


Create a Toronto map with neighborhoods

In [10]:
# install folium
!conda install -c conda-forge folium=0.5.0

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  51.86 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  35.87 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  36.85 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  48.32 MB/s


In [18]:
import folium
import os

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto.save('simple_popups.html')
map_toronto

<H3>North York borough only</H3>

Let's narrow down our analysis to only North York borough

In [15]:
df_north_york = df_toronto.loc[df_toronto['Borough'].str.contains('North York')]

df_north_york

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.752435,-79.329268
1,M4A,North York,Victoria Village,43.730417,-79.31334
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.72327,-79.451603
7,M3B,North York,Don Mills North,43.749055,-79.362212
10,M6B,North York,Glencairn,43.70799,-79.44837
13,M3C,North York,"Flemingdon Park, Don Mills South",43.721375,-79.343415
27,M2H,North York,Hillcrest Village,43.802845,-79.356236
28,M3H,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.757875,-79.448621
33,M2J,North York,"Fairview, Henry Farm, Oriole",43.78081,-79.347782
34,M3J,North York,"Northwood Park, York University",43.764765,-79.488502


In [16]:
# create map of Toronto with North York borough only, using latitude and longitude values
map_north_york = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_north_york['Latitude'], df_north_york['Longitude'], df_north_york['Borough'], df_north_york['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_north_york)  
    
map_north_york

In [17]:
df_north_york.shape

(24, 5)

In [18]:
# save the dataframe to csv for next assignment
df_north_york.to_csv('north_york.csv')