# Introduction.

## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto.

### Contains the exercises 1 and 2.

### Table of Contents:
1. <a href="#item1">Exercise 1</a>

2. <a href="#item2">Exercise 2</a>

<a id='item1'></a>
# 1. First exercise.

## 1. Download dependencies needed.

In [3]:
import numpy as np
import pandas as pd 
import requests 
from bs4 import BeautifulSoup 
import wget

## 2. Download Dataset

In [7]:
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
res = requests.get(URL) 

## 3. Scraping...

In [8]:
scraped_data = BeautifulSoup(res.text, 'html.parser')

## 4. Cleaning...

In [9]:
data_rows = scraped_data.select('.wikitable tr')

In [10]:
data_header = data_rows[0]
data_content = data_rows[1:] 

In [11]:
data_columns = []

for column in data_header.select('th'): 
    column_name = column.getText()
    data_columns.append(column_name.replace('\n', ''))  
    
data_columns

['Postal code', 'Borough', 'Neighborhood']

In [12]:
final_data = []

for row in data_content:
    data_row = []
    for data in row.select('td'): 
        data_row.append(data.getText().replace('\n', ''))
    final_data.append(data_row)

## 5. Creating Pandas DataFrame

In [13]:
torontoDF = pd.DataFrame(data=final_data, columns=data_columns)
torontoDF.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [14]:
torontoDF.tail()

Unnamed: 0,Postal code,Borough,Neighborhood
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...
179,M9Z,Not assigned,


## 6. Cleaning Pandas DataFrame
##### 1. Ignore cells with a borough that is Not assigned.
##### 2. More than one neighborhood can exist in one postal code area. These rows will be combined into one row with the neighborhoods separated with a comma.
##### 3. Reset relative number of rows.

In [15]:
torontoDF = torontoDF[torontoDF.Borough != 'Not assigned']
torontoDF['Neighborhood'] = torontoDF['Neighborhood'].apply(lambda n: n.replace(' / ', ', '))
torontoDF = torontoDF.reset_index(drop=True)
torontoDF.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## 7. Saving result to csv file

In [16]:
torontoDF.to_csv('Toronto_pc.csv')

## 8. Number of rows

In [17]:
torontoDF.shape

(103, 3)

<a id='item2'></a>
# 2. Second exercise.

## 9. We need to get the latitude and the longitude coordinates of each neighborhood.

In [4]:
Geospatial_data = wget.download('https://cocl.us/Geospatial_data')

  0% [                                                                                ]    0 / 2891100% [................................................................................] 2891 / 2891

In [28]:
GeospatialDF = pd.read_csv(r'Geospatial_Coordinates.csv')
GeospatialDF.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


## 10. Combining both data frames

In [24]:
Final_data = pd.merge(torontoDF, GeospatialDF, left_on='Postal code', right_on='Postal Code', how='inner')
Final_data.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",M5A,43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",M7A,43.662301,-79.389494


## 11. Cleaning duplicated Postal Code column

In [25]:
Final_data = Final_data.drop(['Postal Code'], axis=1)
Final_data.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [26]:
Final_data.shape

(103, 5)