# Project : Part |
- Steps mentioned below
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

Start by creating a new Notebook for this assignment.

Use the Notebook to build the code to scrape the following Wikipedia page,

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M,

in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe like the 

one shown below:



     

In [2]:
# !pip install BeautifulSoup
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [3]:
page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(page.content, 'html.parser')

In [4]:
# soup
table = soup.find('tbody')
rows= table.select('tr')


In [5]:
row = []
for r in rows:
    row.append(r.get_text())
row

['\nPostal Code\n\nBorough\n\nNeighbourhood\n',
 '\nM1A\n\nNot assigned\n\nNot assigned\n',
 '\nM2A\n\nNot assigned\n\nNot assigned\n',
 '\nM3A\n\nNorth York\n\nParkwoods\n',
 '\nM4A\n\nNorth York\n\nVictoria Village\n',
 '\nM5A\n\nDowntown Toronto\n\nRegent Park, Harbourfront\n',
 '\nM6A\n\nNorth York\n\nLawrence Manor, Lawrence Heights\n',
 "\nM7A\n\nDowntown Toronto\n\nQueen's Park, Ontario Provincial Government\n",
 '\nM8A\n\nNot assigned\n\nNot assigned\n',
 '\nM9A\n\nEtobicoke\n\nIslington Avenue, Humber Valley Village\n',
 '\nM1B\n\nScarborough\n\nMalvern, Rouge\n',
 '\nM2B\n\nNot assigned\n\nNot assigned\n',
 '\nM3B\n\nNorth York\n\nDon Mills\n',
 '\nM4B\n\nEast York\n\nParkview Hill, Woodbine Gardens\n',
 '\nM5B\n\nDowntown Toronto\n\nGarden District, Ryerson\n',
 '\nM6B\n\nNorth York\n\nGlencairn\n',
 '\nM7B\n\nNot assigned\n\nNot assigned\n',
 '\nM8B\n\nNot assigned\n\nNot assigned\n',
 '\nM9B\n\nEtobicoke\n\nWest Deane Park, Princess Gardens, Martin Grove, Islington, Clover

In [6]:
df= pd.DataFrame(row)

### table create and cleaning stepwise


In [7]:
df1 = df[0].str.split('\n',expand =True)

In [8]:
df2 = df1.rename(columns=df1.iloc[0])
df3 = df2.drop(df2.index[0])
df3.head()
df3.shape

(180, 7)

### remove data having not assigned borough

In [9]:
df4 = df3[df3.Borough != 'Not assigned']
df4.shape

(103, 7)

### remove data duplicates

In [10]:
df5 = df4.groupby(['Postal Code', "Borough"], sort=  False ).agg(','.join)
df5.reset_index(inplace = True)
df5

Unnamed: 0,Postal Code,Borough,Unnamed: 3,Unnamed: 4,Unnamed: 5,Neighbourhood,Unnamed: 7
0,M3A,North York,,,,Parkwoods,
1,M4A,North York,,,,Victoria Village,
2,M5A,Downtown Toronto,,,,"Regent Park, Harbourfront",
3,M6A,North York,,,,"Lawrence Manor, Lawrence Heights",
4,M7A,Downtown Toronto,,,,"Queen's Park, Ontario Provincial Government",
...,...,...,...,...,...,...,...
98,M8X,Etobicoke,,,,"The Kingsway, Montgomery Road, Old Mill North",
99,M4Y,Downtown Toronto,,,,Church and Wellesley,
100,M7Y,East Toronto,,,,"Business reply mail Processing Centre, South C...",
101,M8Y,Etobicoke,,,,"Old Mill South, King's Mill Park, Sunnylea, Hu...",


In [11]:
df5.shape

(103, 7)

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181 entries, 0 to 180
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       181 non-null    object
dtypes: object(1)
memory usage: 1.5+ KB


# Part One Ended

# Answer 2 Strat Here


Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking  postal code M5G as an example, your code would look something like this:

In [13]:
url ="http://cocl.us/Geospatial_data"
df6 = pd.read_csv(url)
df6.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
df6  = pd.merge(df5,df6, on='Postal Code')

In [15]:
df6

Unnamed: 0,Postal Code,Borough,Unnamed: 3,Unnamed: 4,Unnamed: 5,Neighbourhood,Unnamed: 7,Latitude,Longitude
0,M3A,North York,,,,Parkwoods,,43.753259,-79.329656
1,M4A,North York,,,,Victoria Village,,43.725882,-79.315572
2,M5A,Downtown Toronto,,,,"Regent Park, Harbourfront",,43.654260,-79.360636
3,M6A,North York,,,,"Lawrence Manor, Lawrence Heights",,43.718518,-79.464763
4,M7A,Downtown Toronto,,,,"Queen's Park, Ontario Provincial Government",,43.662301,-79.389494
...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,,,,"The Kingsway, Montgomery Road, Old Mill North",,43.653654,-79.506944
99,M4Y,Downtown Toronto,,,,Church and Wellesley,,43.665860,-79.383160
100,M7Y,East Toronto,,,,"Business reply mail Processing Centre, South C...",,43.662744,-79.321558
101,M8Y,Etobicoke,,,,"Old Mill South, King's Mill Park, Sunnylea, Hu...",,43.636258,-79.498509


## week 2 step II ended

# PART  3 Started :::: Answers III

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
to generate maps to visualize your neighborhoods and how they cluster together. 
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

In [17]:
df6


Unnamed: 0,Postal Code,Borough,Unnamed: 3,Unnamed: 4,Unnamed: 5,Neighbourhood,Unnamed: 7,Latitude,Longitude
0,M3A,North York,,,,Parkwoods,,43.753259,-79.329656
1,M4A,North York,,,,Victoria Village,,43.725882,-79.315572
2,M5A,Downtown Toronto,,,,"Regent Park, Harbourfront",,43.654260,-79.360636
3,M6A,North York,,,,"Lawrence Manor, Lawrence Heights",,43.718518,-79.464763
4,M7A,Downtown Toronto,,,,"Queen's Park, Ontario Provincial Government",,43.662301,-79.389494
...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,,,,"The Kingsway, Montgomery Road, Old Mill North",,43.653654,-79.506944
99,M4Y,Downtown Toronto,,,,Church and Wellesley,,43.665860,-79.383160
100,M7Y,East Toronto,,,,"Business reply mail Processing Centre, South C...",,43.662744,-79.321558
101,M8Y,Etobicoke,,,,"Old Mill South, King's Mill Park, Sunnylea, Hu...",,43.636258,-79.498509


In [20]:
tor = df6[df6['Borough'].str.contains("Toronto")]
tor

Unnamed: 0,Postal Code,Borough,Unnamed: 3,Unnamed: 4,Unnamed: 5,Neighbourhood,Unnamed: 7,Latitude,Longitude
2,M5A,Downtown Toronto,,,,"Regent Park, Harbourfront",,43.65426,-79.360636
4,M7A,Downtown Toronto,,,,"Queen's Park, Ontario Provincial Government",,43.662301,-79.389494
9,M5B,Downtown Toronto,,,,"Garden District, Ryerson",,43.657162,-79.378937
15,M5C,Downtown Toronto,,,,St. James Town,,43.651494,-79.375418
19,M4E,East Toronto,,,,The Beaches,,43.676357,-79.293031
20,M5E,Downtown Toronto,,,,Berczy Park,,43.644771,-79.373306
24,M5G,Downtown Toronto,,,,Central Bay Street,,43.657952,-79.387383
25,M6G,Downtown Toronto,,,,Christie,,43.669542,-79.422564
30,M5H,Downtown Toronto,,,,"Richmond, Adelaide, King",,43.650571,-79.384568
31,M6H,West Toronto,,,,"Dufferin, Dovercourt Village",,43.669005,-79.442259


In [28]:
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from geopy.geocoders import Nominatim


In [29]:

address = 'Toronto'
geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [40]:
Tor_map = folium.Map(location=[latitude, longitude], zoom_start=20)


In [45]:
for lat, lng, borough, neighborhood in zip(tor['Latitude'], tor['Longitude'], 
                                           tor['Borough'], tor['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='red',
        fill_opacity=0.8,
        parse_html=False).add_to(Tor_map)  
Tor_map  