# This notebook will be mainly used for the Capstone Project of IBM Data Science Professional Certificate 

## The capstone project aims to compare Canada Neighborhoods with Location Data from Foursquare API

In [1]:
import pandas as pd

import numpy as np

print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Part 1: Segmenting and Clustering Neighborhoods in Torontoimport requests

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.

- If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.

- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.

- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

### Import necessary Libraries and Retreive the URL

In [2]:
import requests
from bs4 import BeautifulSoup

result = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
print(result.status_code)

200


### Creating a Beautiful soup object

In [3]:
data = result.content

soup = BeautifulSoup(data,'html5lib')

### Find the table and create the DataFrame

In [4]:
# Firstly create a list
table_contents=[]

# Table data
table=soup.find('table')


for row in table.findAll('td'):
    
    # Create a dictionary called cell having 3 keys PostalCode, Borough and Neighborhood
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        # As postal code contains up to 3 characters extract that using tablerow.p.text
        cell['PostalCode'] = row.p.text[:3]
        # Use split ,strip and replace functions for getting Borough and Neighborhood information
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        #Append to the list
        table_contents.append(cell)

# Create a dataframe with list
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [5]:
print(df[df['PostalCode']=='M5A'])
print(df[df['Borough']==df['Neighborhood']])

  PostalCode           Borough               Neighborhood
2        M5A  Downtown Toronto  Regent Park, Harbourfront
Empty DataFrame
Columns: [PostalCode, Borough, Neighborhood]
Index: []


In [6]:
df.shape

(103, 3)

### Add Latitude and Longitude

In [7]:
lat_lng_coords = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
lat_lng_coords.columns = ['PostalCode', 'Latitude', 'Longitude']

df = pd.merge(df, lat_lng_coords, on = "PostalCode")
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [8]:
df.shape

(103, 5)