# Applied Data Science Capstone Project

This notebook will be mainly used to program the assignments in the IBM Applied Data Science Capstone Project from Coursera.

I am excited to share my notebook with you, and hope to learn a lot from this experience!

## Part 1: Creating Notebook and importing libraries

The first assignment is to import some libraries and saying hello to the Notebook's readers.

In [1]:
import pandas as pd
import numpy as np
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Part 2-A: Creating a dataframe of neighborhoods in Toronto

The assignment of Week 3 is to create a dataframe consisting of neighborhoods in Toronto from the following Wikipedia Page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

To do so, we will use the BeautifulSoup Library:

In [2]:
import requests
from bs4 import BeautifulSoup

First, we check if scrapping the table from this website is legal. If the response status codo is 200, it is legal.

In [3]:
wikiurl="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wikiurl)
print(response.status_code)

200


Since it is legal, we can proceed. The next step is to import the table using the html attributes of the table object in the website.

In [4]:
soup = BeautifulSoup(response.text, 'html.parser')
table=soup.find('table',{'class':"wikitable"})

Now, we modify the table with pandas until we have the desired dataframe, sorted by postal code.

In [5]:
toronto_neighborhoods=pd.read_html(str(table))
toronto_neighborhoods=pd.DataFrame(toronto_neighborhoods[0])
toronto_neighborhoods.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)
toronto_neighborhoods=toronto_neighborhoods[~toronto_neighborhoods.Borough.str.contains('Not assigned')]
toronto_neighborhoods.sort_values('Postal Code',axis=0,inplace=True)
toronto_neighborhoods.reset_index(inplace=True)
toronto_neighborhoods.drop('index',axis=1,inplace=True)
toronto_neighborhoods.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


## Part 2-B: Appending coordinates to the neighborhoods dataframe

The second part of Week 3's assignment is to annex the coordinates of each neighborhood to the dataframe created in Part 2-A.

To do so, we import the coordinates from the Geospacial Coordinates CSV file.

In [6]:
coordinates=pd.read_csv('Geospatial_Coordinates.csv')
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Since the coordinates dataframe is also sorted by postal code, we can annex the 'Latitude' and 'Longitude' columns to the neighborhoods dataframe.

In [7]:
toronto_neighborhoods['Latitude']=coordinates['Latitude']
toronto_neighborhoods['Longitude']=coordinates['Longitude']
toronto_neighborhoods

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
