# Toronto Neighborhood Segmentation Data

### By: Gyan Prakash

*The notebook below fetches the table from wikipedia page, and converts it into pandas dataframe. After this data wrangling is performed to clean the data.*

### Let's import the libraries 

In [46]:
import numpy as np
import requests
import pandas as pd


### Now we will fetch the tables from the given page into a list of dataframe objects

In [47]:
url= 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
r = requests.get(url)
page=pd.read_html(url)

*We have got tabless from wikipedia page. It is a list.*
### Let's check the datatype of 'page':

In [48]:
type(page)

list

*Since we need to work only with the first table,* 
### Let's take out the first table from page: 

In [49]:
df=page[0]

In [50]:
type(df)

pandas.core.frame.DataFrame

*Let's have a look at our dataframe:*

In [51]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Insert the column names:

In [52]:
df.columns=['Post Code','Borough','Neighborhood']

*Missing values in our dataframe are displayed as 'Not assigned'. Let's replace them with numpy NaN. It will make the processing easier.'*

In [53]:
df.replace( "Not assigned",np.nan, inplace=True)

### Droping NaN rows for Borough

In [54]:
df.dropna(subset=["Borough"], axis=0, inplace=True)

# reset index, because we droped two rows
df.reset_index(drop=True, inplace=True)

In [55]:
df.head(10)

Unnamed: 0,Post Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


### Now, let's replace missing neighbourhood values with the values of corresponding Borough, as instructed in the assignment question

In [56]:
df['Neighborhood'].replace(np.nan, df['Borough'],inplace=True)

In [57]:
df.head(10)

Unnamed: 0,Post Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


## Hurrah!
### We are done with the cleaning phase.
### Finally, let's check the number of rows and columns in the dataframe:

In [58]:
df.shape

(211, 3)

# Finding out latitude and longitude for each borough
### Let's create another dataframe which will combine above dataframe with latitude ad longitude of each borough:

In [59]:
column_names=['Post Code','Borough','Neighborhood','latitude','longitude']
df2=pd.DataFrame(columns=column_names)

In [60]:
#!conda install -c conda-forge geopy --yes

In [61]:
df2[['Post Code','Borough','Neighborhood']]=df[['Post Code','Borough','Neighborhood']]

### Import the library for finding out Latitude and Longitude. We are going to use Foursquare agent

In [37]:
from geopy.geocoders import Nominatim

In [42]:
CLIENT_ID = 'removed after running the code' # your Foursquare ID
CLIENT_SECRET = 'removed after running the code' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: removed after running the code
CLIENT_SECRET:removed after running the code


In [39]:
for index, row in df2.iterrows():
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(row['Borough'])
    row['latitude'] = location.latitude
    row['longitude'] = location.longitude


In [45]:
df2.head()

Unnamed: 0,Post Code,Borough,Neighborhood,latitude,longitude
0,M3A,North York,Parkwoods,43.7708,-79.4133
1,M4A,North York,Victoria Village,43.7708,-79.4133
2,M5A,Downtown Toronto,Harbourfront,43.6542,-79.3808
3,M5A,Downtown Toronto,Regent Park,43.6542,-79.3808
4,M6A,North York,Lawrence Heights,43.7708,-79.4133
