# Segmenting and Clustering Neighborhoods in Toronto, Canada

## Part 2: Creating a DataFrame that contains PostalCode, Borough, and Neighborhoods of Toronto. Add Latitude and Longitude of each PostalCode to the DataFrame.

### Autor: Fereshteh Bashiri

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe 

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

In [2]:
# Web scraping
base_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
res = requests.get(base_url)
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
df_wiki = pd.read_html(str(table))[0]

# Ignore cells with a borough that is "Not assigned" 
df_wiki = df_wiki[df_wiki['Borough']!="Not assigned"]
df_wiki.reset_index(inplace=True, drop=True)

# replace cells with a Neighborhood that is "Not assigned" with it's Borough's name
for i, cell in enumerate(df_wiki['Neighbourhood']):
    if cell == "Not assigned":
        df_wiki['Neighbourhood'][i] = df_wiki['Borough'][i]

# df_wiki

# merge Neighborhoods with a same Borough
df_wiki.rename(columns={'Postcode':'PostalCode'}, inplace=True)
df_toronto = df_wiki.groupby(by=['PostalCode','Borough'])['Neighbourhood'].apply(list).reset_index(name='Neighbourhood')
df_toronto.head(10)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]
5,M1J,Scarborough,[Scarborough Village]
6,M1K,Scarborough,"[East Birchmount Park, Ionview, Kennedy Park]"
7,M1L,Scarborough,"[Clairlea, Golden Mile, Oakridge]"
8,M1M,Scarborough,"[Cliffcrest, Cliffside, Scarborough Village West]"
9,M1N,Scarborough,"[Birch Cliff, Cliffside West]"


Use the .shape method to print the number of rows of your dataframe

In [3]:
print('The number of rows of the datafram is: {}.'.format(df_toronto.shape[0]))

The number of rows of the datafram is: 103.


Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In [4]:
## One way is to download a csv file that has the geographical coordinates of each postal code:
# import sys
# !{sys.executable} -m pip install wget

# import wget
# lat_lng_url = 'https://cocl.us/Geospatial_data'
# wget.download(lat_lng_url, 'Geospatial_data.csv')

## The other way is to use the geocoder python package in a loop, to obtain lat-lng coordinates of each neighborhood
# import geocoder

# for postal_code in toronto_merged['Postcode']:
#     print('\nDownloading coordinates of ' + postal_code)
    
#     # initialize your variable to None
#     lat_lng_coords = None

#     # loop until you get the coordinates
#     while(lat_lng_coords is None):
#         g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#         lat_lng_coords = g.latlng
    
#     df_toronto.loc[postal_code,'Latitude'] = lat_lng_coords[0]
#     df_toronto.loc[postal_code,'Longitude'] = lat_lng_coords[1]

## Another way: download the file that contains geo coordinates on a local drive, and read from it
geo_df = pd.read_csv('./Geospatial_Coordinates.csv')
# geo_df.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
geo_df.sort_values(by='Postal Code', inplace=True)
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [5]:
# merge Lat and Lng info to toronto dataframe
df_toronto[['Latitude','Longitude']] = geo_df[['Latitude','Longitude']]
df_toronto.tail()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
98,M9N,York,[Weston],43.706876,-79.518188
99,M9P,Etobicoke,[Westmount],43.696319,-79.532242
100,M9R,Etobicoke,"[Kingsview Village, Martin Grove Gardens, Rich...",43.688905,-79.554724
101,M9V,Etobicoke,"[Albion Gardens, Beaumond Heights, Humbergate,...",43.739416,-79.588437
102,M9W,Etobicoke,[Northwest],43.706748,-79.594054
