# Segmenting and Clustering Neighborhoods in Toronto Part 2

In Part 1,we can get the data from Wikipedia page,clean and ready for next step.

In Part 2 (this part) we have to get the latitude and the longitude coordinates of each neighborhoods in Toronto.

Let's begin!

In order to get the coordinates,at the beginning,we must script all codes again up to end of Part 1

In [1]:
# import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

First, we have to get url or html in order to get the data from Wikipedia.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
wikipedia_page = requests.get(url).text

In [3]:
# Reading html for making pandas dataframe
soup = BeautifulSoup(wikipedia_page,'xml')
table = soup.find('table')
column_name = ['PostalCode','Borough','Neighbourhood']
df = pd.DataFrame(columns=column_name)

For a good dataframe, we have to strip cells

In [4]:
for tr in table.find_all('tr'):
    data=[]
    for td in tr.find_all('td'):
        data.append(td.text.strip())
    if len(data)==3:
        df.loc[len(df)] = data

In [5]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
# Cleaning values as'Not assigned'
df = df[df.Borough != 'Not assigned']
df = df[df.Neighbourhood != 'Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
# Checking the data,if there is any 'Not assigned' value for boroughs
df.loc[df.Borough == 'Not assigned']

Unnamed: 0,PostalCode,Borough,Neighbourhood


In [8]:
# Checking the data,if there is any 'Not assigned' value for neighbourhoods
df.loc[df.Neighbourhood == 'Not assigned']

Unnamed: 0,PostalCode,Borough,Neighbourhood


As you can see the above, the data does not include raw data.Now we can group neighbourhoods by postalcode and drop duplicates.

In [9]:
# Grouping Neighborhoods with same PostCode
grouped_df = df.groupby('PostalCode')['Neighbourhood'].apply(lambda x: "%s" % ','.join(x))
grouped_df = grouped_df.reset_index(drop=False)
grouped_df.rename(columns={'Neighbourhood':'joined_neighbourhood'},inplace=True)

In [10]:
# Merging two dataframe for final version
df_merged = pd.merge(df,grouped_df,on='PostalCode')

In [11]:
df_merged.drop(['Neighbourhood'],axis=1,inplace=True)
df_merged.drop_duplicates(inplace=True)

In [12]:
df_merged.rename(columns={'joined_neighbourhood':'Neighbourhood'},inplace=True)

In [13]:
df_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Now we can get the geodata for our data.

In [14]:
#Reading Geospatial Data
geo_df = pd.read_csv('http://cocl.us/Geospatial_data')

In [15]:
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
geo_df.rename(columns={'Postal Code':'PostalCode'},inplace=True)

Now we can merge our data with geodata.

In [17]:
df_geodata = pd.merge(geo_df,df_merged,on='PostalCode')
df_geodata.head()

Unnamed: 0,PostalCode,Latitude,Longitude,Borough,Neighbourhood
0,M1B,43.806686,-79.194353,Scarborough,"Malvern, Rouge"
1,M1C,43.784535,-79.160497,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


In [18]:
df_geodata = df_geodata[['PostalCode','Borough','Neighbourhood','Latitude','Longitude']]
df_geodata

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
