## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

Instructions: Use a Jupyter Notebook (linked to your Github repository) to perform the following steps. Explore, segment, and cluster the neighborhoods in the city of Toronto. Scrape the Wikipedia page which has all the information we need to explore and cluster the neighborhoods in Toronto. Wrangle scraped data, clean it, and read it into a pandas dataframe. Use the structured data to explore and cluster the neighborhoods in the city of Toronto. 
(paraphrased from week three instrucitons)

## This is my notebook, used to meet the requirements outlined in the instructions (above).

## ========= Week Three Assignment Below ===============

### Scraping the data...

In [46]:
from bs4 import BeautifulSoup
import requests

In [47]:
wiki_url= 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source= requests.get(wiki_url).text
soup= BeautifulSoup(source, 'xml')
table= soup.find('table')

### Importing Pandas and NumPy...

In [48]:
import pandas as pd
import numpy as np

In [49]:
column_names= ['PostalCode', 'Borough', 'Neighborhood']
df= pd.DataFrame(columns=column_names)

#### Locating each borough, neighborhood, and postal codes

In [50]:
for tr_cell in table.find_all('tr'):
    row_data= []
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)== 3:
        df.loc[len(df)]= row_data
        
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


### We have gathered our data. Now, we have to clean the data...

In [51]:
df= df[df['Borough'] != 'Not assigned']
df= df.groupby(['PostalCode','Borough'], sort=False).agg(', '.join)
df.reset_index(inplace=True)

### Next, we have to replace the names of each 'Not assigned' neighbourhood and neighborhoods showing names of boroughs

In [52]:
df['Neighborhood']= np.where(df['Neighborhood']== 'Not assigned',df['Borough'], df['Neighborhood'])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [53]:
df.shape

(103, 3)

In [54]:
# Now, we should pull in data map data - latitude and longitude

lat_lon_df= pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [55]:
# Now, we must merge each dataframe

lat_lon_df.rename(columns= {'Postal Code':'PostalCode'},inplace= True)
df= pd.merge(df,lat_lon_df, on= 'PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


### Analysizing each cluster...

In [57]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

    altair:  4.1.0-py_1 conda-forge
    branca:  0.4.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Down

In [65]:
map_toronto= folium.Map(location= [43.651070,-79.347015],zoom_start= 10)

for lat,lng,borough,neighborhood in zip(df['Latitude'],df['Longitude'],df['Borough'],df['Neighborhood']):
    label= '{}, {}'.format(neighborhood, borough)
    label= folium.Popup(label, parse_html= True)
    folium.CircleMarker(
    [lat,lng],
    radius= 5,
    popup= label,
    color= 'blue',
    fill= True,
    fill_color= '#000d99',
    fill_opacity= 0.8,
    parse_html= False).add_to(map_toronto)
map_toronto

## IMPORTANT  NOTE:  You must open the shareable link in Firefox, as it Internet Explorer and Edge fail to show the above map. 

In [59]:
k= 5
toronto_clustering= df.drop(['PostalCode','Borough','Neighborhood'],1)
kmeans= KMeans(n_clusters= k,random_state= 0).fit(toronto_clustering)
kmeans.labels_
df.insert(0, 'Cluster Labels', kmeans.labels_)

### Creating our map...

In [60]:
map_clusters= folium.Map(location=[43.651070,-79.347015],zoom_start= 10)

### Cluster are color-coded to identify different grouping...

In [61]:
x= np.arange(k)
ys= [i + x + (i*x)**2 for i in range(k)]
colors_array= cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow= [colors.rgb2hex(i) for i in colors_array]

### Place markers on the map...

In [66]:
markers_colors= []
for lat, lon, neighborhood, cluster in zip(df['Latitude'], df['Longitude'], df['Neighborhood'], df['Cluster Labels']):
    label= folium.Popup(' Cluster ' + str(cluster), parse_html= True)
    folium.CircleMarker(
        [lat, lon],
        radius= 5,
        popup= label,
        color= rainbow[cluster-1],
        fill= True,
        fill_color= rainbow[cluster-1],
        fill_opacity= 0.7).add_to(map_clusters)
map_clusters

## IMPORTANT  NOTE:  You must open the shareable link in Firefox, as it Internet Explorer and Edge fail to show the above map. 