## Scaraping Toronto Neighborhoods Data
### The code here scrapes the data from Wikipedia page of Toronto's neighborhoods, and clean the data.

#### Import libraries and display seetings:

In [38]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
pd.set_option('display.max_rows', None)

#### using pandas and BeautifulSoup to scrape the table from the Wiki page of Toronto:

In [50]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
df = df[0]   #df[0] is the table as a DataFrame

#### drop those rows where Borough is 'Not assigned'

In [51]:
#drop those rows where Borough is 'Not assigned'
df = df.drop(df[df['Borough']=='Not assigned'].index)
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Take care the 'Not assigned' or NaN entries of 'Neighborhood' column

In [30]:
#if Neighborhood is 'Not assigned' or NaN, replace it with the Borough name
try:
    neighborhoodNAList = np.logical_or(df['Neighborhood'] == 'Not assigned', df['Neighborhood'].isna())
    df.loc[neighborhoodNAList, 'Neighborhood'] = df.loc[neighborhoodNAList, 'Borough'] 
    if sum(neighborhoodNAList) > 0:
        print('These Neighborhoods names replaced with Borough name:')
        print(df[neighborhoodNAList])
    else: 
        print('There are no Neighborhood rows with \'Not assigned\' or empty values.')

except: 
    pass

There are no Neighborhood rows with 'Not assigned' or empty values.


####  Display the number of rows of the dataframe:

In [31]:
print("Number of rows of the dataframe:", df.shape[0], '\n') 

Number of rows of the dataframe: 103 



### Next, get and add latitudes and longitudes to the dataframe:

Using Geocoder Python package to get latitude and longitude coordinates for all postal codes:

In [32]:
#Install geocoder 
%pip install geocoder
print('geocoder installed!')

import geocoder # import geocoder
print('geocoder imported!')

# loop until you get the coordinates
for postCode in df['Postal Code']:
    print('Postal code: ', postCode)
    i = 1
    
    latList = []
    lngList = []
   
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postCode))
        #to track progress of trials, since this while loop is because the calls are not reliable
        i = i + 1
        if i % 20 == 1:
            print('Tried ' + str(i) + ' times...')
        #stop trying after 200 failed trials 
        if i > 200:
            break
        lat_lng_coords = g.latlng
        
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
        
    latList.append(latitude)
    lngList.append(long)



Note: you may need to restart the kernel to use updated packages.
geocoder installed!
geocoder imported!
Postal code:  M3A
Tried 21 times...
Tried 41 times...
Tried 61 times...
Tried 81 times...
Tried 101 times...
Tried 121 times...
Tried 141 times...
Tried 161 times...
Tried 181 times...
Tried 201 times...


TypeError: 'NoneType' object is not subscriptable

Using the code above, it couldn't get the latitude and longitude coordinates. So will use the data in the csv file provided at this link: http://cocl.us/Geospatial_data and merge it with df:

In [52]:
geospatial_data = pd.read_csv('http://cocl.us/Geospatial_data')
df = df.merge(geospatial_data, left_on='Postal Code', right_on='Postal Code', how='inner') 
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
