<a href="https://colab.research.google.com/github/ryan-cqx/Coursera-Capstone-Project/blob/master/Segmenting_and_Clustering_Neighborhoods_in_Toronto_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## ***Segmenting and Clustering Neighborhoods in Toronto***

**Import libraries**

In [0]:
import pandas as pd 
import requests
from bs4 import BeautifulSoup


**Extract Data**

In [0]:
website_link = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_link, 'lxml')

tab = soup.find('table')
table_data = []
for line in tab.find_all('tr'):
    if (line.contents[3].text != 'Not assigned'):
        table_data.append([line.contents[1].text, line.contents[3].text, line.contents[5].text[:-1]])

**Build pandas dataframe**

In [0]:
df_toronto = pd.DataFrame(table_data[1:], columns=table_data[0])
df_code = df_toronto['Postcode'].unique()
df_toronto.set_index('Postcode', drop=False, inplace=True)


## **Data Processing**



## Group data based on postcode and Borough

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.


In [0]:
separator=', '
for postcode in df_code:
    df_1 = df_toronto.loc[postcode]
    if(df_1.size > 3):
        neighbourhood = separator.join(df_1['Neighbourhood'])
        df_toronto.loc[postcode, 'Neighbourhood'] = neighbourhood

df_toronto.drop(df_toronto[df_toronto.Neighbourhood == 'Not assigned'].index, inplace=True)
df_toronto.drop_duplicates('Postcode', inplace=True)
df_toronto.reset_index(drop = True, inplace=True)
df_toronto.head()


In [80]:
df_toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M9A,Etobicoke,Islington Avenue


In [81]:
df_toronto.shape

(102, 3)

**Geocoder**

In [0]:
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

***Create Dataframe of latitude and longitude***


In [95]:
geocoor_df = pd.read_csv('https://cocl.us/Geospatial_data')
geocoor_df.columns =['Postcode', 'Latitude', 'Longitude']

# Add the coordinates to the main dataframe
df_toronto_final = pd.merge(df_toronto, geocoor_df)
df_toronto_final

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
5,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
6,M3B,North York,Don Mills North,43.745906,-79.352188
7,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
8,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
9,M6B,North York,Glencairn,43.709577,-79.445073


# **Explore**

Explore and cluster the neighborhoods in Toronto. 

**Import Libaries**


In [89]:
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

/bin/bash: conda: command not found
Libraries imported.


In [90]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto is 43.653963, -79.387207.


#### Create a map of New York with neighborhoods superimposed on top.

In [0]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_final['Latitude'], df_toronto_final['Longitude'], df_toronto_final['Borough'], df_toronto_final['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto