<h2>Segmenting and Clustering Neighborhoods in Toronto</h2>

<h4>Importing libraries</h4>

Here we import all the libraries that are going to be needed to fulfill the capstone project

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<h4>Scraping wikipedia webpage</h4>

Using the BeautifulSoup library it is possible to scrape a wikipedia webpage in order to extract the table information needed

In [4]:
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')

My_table = soup.find('table',{'class':'wikitable sortable'})

<h4>Finding special characters in the html code for creating the dataframe</h4>

Once the html code has been extracted from the wikipedia webpage we have to scan the characters that will led us to encounter all the information that we want to gather. 

In [78]:
rows = My_table.findAll('td')
rowslink = My_table.findAll('a')
results=[]
for row in rows:
    aux=str(row)
    if len(aux.split('<'))<4: 
        results.append(aux.split('>')[1].split('<')[0])
    else:
        results.append(aux.split('>')[2].split('<')[0])
results2=[]
for result in results:
    if result[-1]=='\n':
        result=result[:-1]
    results2.append(result)

<h4>Creating the dataframe</h4>

In this step we create the dataframe and assign the column names. 

In [79]:
import pandas as pd
df = pd.DataFrame()
df['PostalCode'] = results2[0::3]
df['Borough']=results2[1::3]
df['Neighborhood']=results2[2::3]

<h4>Cleaning the data</h4>

Here the data is structured and cleaned for presenting a more adequate aspect. All the conditions exposed in the assignment are fulfilled. 

In [80]:
#Run cleaning the dataframe cell before running this one for not superposing data.
df=df[~df.Borough.str.contains("Not assigned")]
df.groupby('PostalCode')
trows=df.index.get_values()
for i in trows:
    if df['Neighborhood'][i]=='Not assigned':
        df['Neighborhood'][i]=df['Borough'][i]
for i in range(0,len(trows)-1): 
    if df['PostalCode'][trows[i]]==df['PostalCode'][trows[i+1]]:
         df['Neighborhood'][trows[i+1]]=df['Neighborhood'][trows[i+1]]+','+df['Neighborhood'][trows[i]]
df2 = df.drop_duplicates(subset=['PostalCode'], keep='last', inplace=False)                   

In [102]:
df2.shape

(103, 3)

<h4>Importing coordinates with CSV file</h4>

In [103]:
dfcoord = pd.read_csv('https://cocl.us/Geospatial_data')

In [104]:
dfcoord.shape

(103, 3)

<h4>Sorting data for merging dataframes</h4>

In [105]:
df2=df2.sort_values('PostalCode')

In [106]:
df2.index = range(len(df2))

In [113]:
df_row_merged = pd.concat([df2, dfcoord], axis=1)

In [114]:
df_row_merged=df_row_merged.drop(['Postal Code'], axis=1)
df_row_merged.head(15)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern,Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Port Union,Rouge Hill,Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"West Hill,Morningside,Guildwood",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park,Ionview,East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Oakridge,Golden Mile,Clairlea",43.711112,-79.284577
8,M1M,Scarborough,"Scarborough Village West,Cliffside,Cliffcrest",43.716316,-79.239476
9,M1N,Scarborough,"Cliffside West,Birch Cliff",43.692657,-79.264848
