#   Segmenting and Clustering Neighborhoods in Toronto


## Introduction

In this assignment, we will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.


### 1.- Extracting and Formating Data from Wikipedia.

#### Importing libraries and scraping wikipedia for postalcode table

In [2]:
import requests 
from bs4 import BeautifulSoup 
import pandas as pd

In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
resp=requests.get(url) 
column_names = ['Postalcode','Borough','Neighborhood']

#http_respone 200 means OK status 
if resp.status_code==200: 
    print("Successfully opened the web page") 
    
    # we need a parser,Python built-in HTML parser is enough . 
    soup=BeautifulSoup(resp.text,'html.parser')     
  
    # table is the list which contains all the text 
    table=soup.find("table",{"class":"wikitable sortable"}) 

else: 
    print("Error") 
    

Successfully opened the web page


#### Create dataframe based in the scrape table, removing records with Borough that are "Not assigned" and assigning Borough value to neighbourhood that are "Not assigned".

In [4]:
df_table = pd.read_html(str(table))
df_table = df_table[0].dropna(axis=0)
df_table.columns = column_names
df_table.drop(df_table.loc[df_table['Borough']=='Not assigned'].index, inplace=True)
i = 0
for i in range(0,df_table.shape[0]):
    if df_table.iloc[i][2] == 'Not assigned':
        df_table.iloc[i][2] = df_table.iloc[i][1]
        i = i+1
df_table.reset_index(drop=True,inplace=True)        

#### Combine rows with common postal code and separating by comma

In [5]:
df_table = df_table.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

In [6]:
df_table

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


#### Show final results

In [7]:
df_table.shape

(103, 3)

### 2.- Working with coordinates 

#### We created the following code to get the coordinates using Geocoder however there are issues with the library to get the values.
#### For that reason we will use CSV file with coordinates.

import geocoder 

def get_coord(postal_code):
    
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords
   
get_coord('M4G')

#### Importing dataset with coordinates and merge in dataframe

In [9]:
df_coord = pd.read_csv('C:/$user/test/Geospatial_Coordinates.csv')

In [10]:
df_coord

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [21]:
df_total = df_table.merge(df_coord, how='inner', left_on='Postalcode', right_on='Postal Code')

In [23]:
df_total.drop(['Postal Code'], axis=1, inplace=True)
df_total

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
