<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>

<p>Import the required libraries.</p>

In [3]:
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd

<p>From Wikipedia, get an html page that contains a list of postal codes for Toronto.</p>

In [4]:
wiki_link = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
raw_wiki_page = requests.get(wiki_link)
page_text = raw_wiki_page.text

<p>Parse the page using BeautifulSoup.</p>

In [5]:
soup = BeautifulSoup(page_text, "html.parser")

<p>Find the table body, then select all the rows.</p>

In [6]:
codes_table = soup.find("tbody")
table_rows = codes_table.find_all('tr')

<p>Iterate through the rows. For each cell in each row, grab the text, append it to a new list (rows) then append the list to the data list.</p>

In [7]:
data = [[], [], []]
columns = ["PostalCode", "Borough", "Neighborhood"]
for row in table_rows:
    data_cells = row.find_all('td')
    rows = []
    for cell in data_cells:
         rows.append(cell.get_text(strip=True))
    data.append(rows)

<p>Create a dataframe from the data list. Remove rows where Borough is Not Assigned. Remove rows with values of None. Where Neighborhood is Not Assigned, replace with the value from the Borough column.</p>

In [8]:
df = pd.DataFrame(data=data, columns=columns)
df = df[df.Borough != "Not assigned"]
df = df.dropna()
df = df.reset_index(drop=True)
df['Neighborhood'] = np.where(df['Neighborhood'] == "Not assigned", df['Borough'], df['Neighborhood'])


<p>Define a method which joins the Neighborhoods for common Boroughs into a String. Group the Boroughs and run the create_list method.</p>

In [9]:
def create_list(x):
    return ', '.join(x)

df = df.groupby('Borough', as_index=False).agg({'PostalCode': 'first', 'Neighborhood': create_list})
df

Unnamed: 0,Borough,PostalCode,Neighborhood
0,Central Toronto,M4N,"Lawrence Park, Roselawn, Davisville North, For..."
1,Downtown Toronto,M5A,"Harbourfront, Regent Park, Ryerson, Garden Dis..."
2,East Toronto,M4E,"The Beaches, The Danforth West, Riverdale, The..."
3,East York,M4B,"Woodbine Gardens, Parkview Hill, Woodbine Heig..."
4,Etobicoke,M9A,"Islington Avenue, Cloverdale, Islington, Marti..."
5,Mississauga,M7R,Canada Post Gateway Processing Centre
6,North York,M3A,"Parkwoods, Victoria Village, Lawrence Heights,..."
7,Queen's Park,M7A,Queen's Park
8,Scarborough,M1B,"Rouge, Malvern, Highland Creek, Rouge Hill, Po..."
9,West Toronto,M6H,"Dovercourt Village, Dufferin, Little Portugal,..."


<h2>Part Two</h2>

In [10]:
%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

<p>Read geospatial coordinates for Toronto.</p>

In [12]:
geo_coords = pd.read_csv('Geospatial_Coordinates.csv')

In [13]:
geo_coords.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
#geo_coords.head()

In [14]:
new_df = pd.merge(df, geo_coords, how="inner", on="PostalCode")
new_df

Unnamed: 0,Borough,PostalCode,Neighborhood,Latitude,Longitude
0,Central Toronto,M4N,"Lawrence Park, Roselawn, Davisville North, For...",43.72802,-79.38879
1,Downtown Toronto,M5A,"Harbourfront, Regent Park, Ryerson, Garden Dis...",43.65426,-79.360636
2,East Toronto,M4E,"The Beaches, The Danforth West, Riverdale, The...",43.676357,-79.293031
3,East York,M4B,"Woodbine Gardens, Parkview Hill, Woodbine Heig...",43.706397,-79.309937
4,Etobicoke,M9A,"Islington Avenue, Cloverdale, Islington, Marti...",43.667856,-79.532242
5,Mississauga,M7R,Canada Post Gateway Processing Centre,43.636966,-79.615819
6,North York,M3A,"Parkwoods, Victoria Village, Lawrence Heights,...",43.753259,-79.329656
7,Queen's Park,M7A,Queen's Park,43.662301,-79.389494
8,Scarborough,M1B,"Rouge, Malvern, Highland Creek, Rouge Hill, Po...",43.806686,-79.194353
9,West Toronto,M6H,"Dovercourt Village, Dufferin, Little Portugal,...",43.669005,-79.442259


<h2>Part Three</h2>

<p>Generate a map of Toronto.</p>

In [31]:
to_map = folium.Map(location=[43.6532, -79.3832], zoom_start=11, tiles='Stamen Terrain')

<p>Add a marker for each Borough in Toronto.</p>

In [32]:

for lat, lng, borough, neighborhood in zip(new_df['Latitude'], new_df['Longitude'], new_df['Borough'], new_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(to_map)  
    
to_map