<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City</font></h1>

## Introduction

In this lab, you will learn how to web scrape Toronto Neighborhood data and convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in Toronto. You will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import csv

print("Libraries imported.")

Solving environment: / 

Toronto Neighborhood has a total of 11 boroughs. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 11 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Before we start working with dataset, we need do data mining since we don't have toronto data readily available. Luckily, we have wiki page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M from which we can web scrape using Beautiful Soup python package and extra the dataset.

#### The below function extracts the data using Beautiful Soup page and saves in CSV file locally.

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

def getData(url):
    html = urlopen(url)
    bsObj = BeautifulSoup(html, 'html.parser')
    tables = bsObj.find('table', {'class':'wikitable'})
    table = tables.find('tbody')

    output_rows = []
    for table_row in table.findAll('tr'):
        columns = table_row.findAll('td')
        output_row = []
        for column in columns:
            output_row.append((column.text).rstrip())
        output_rows.append(output_row)

    with open('output.csv', 'w') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows(output_rows)

getData(url)
print('Data downloaded!')

## PART 1

Let us Load the data and change the column Names as appropriately __PostalCode__, __Borough__, and __Neighborhood__

In [None]:
my_data = pd.read_csv("output.csv", delimiter=",")
df = my_data.rename(columns={"M1A": "Postcode", "Not assigned": "Borough", "Not assigned.1": "Neighborhood"})
df.head()

- Process the cells that have an assigned borough by ignoring cells with a borough that is __Not assigned__
- The Next step is to find more than one neighborhood that exist in one postal code area and combined these two rows into one row with the neighborhoods separated with a comma as shown in below output.

In [None]:
# process the cells that have an assigned borough by ignoring cells with a borough that is __Not assigned__
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace=True)


# The Next step is to find more than one neighborhood that exist in one postal code area and 
# combined these two rows into one row with the neighborhoods separated with a comma as shown in below output.

df = df.groupby(['Postcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df.head()

Next replace the neighborhood with Borough where neighborhood as "Not Assigned"

In [None]:
df.loc[df['Neighborhood'] == 'Not assigned', ['Neighborhood']] = df['Borough']
print('Shape of data frame', df.shape)
df.head()

### Let us check the total numbers records we have with __.shape__ method

In [None]:
df.shape

## PART 2

#### Download the CSV file that has the geographical coordinates from the given URL http://cocl.us/Geospatial_data 

In [None]:
!wget -q -O 'toronto_data.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Load longitude and latitude into pandas dataframe

In [None]:
# Load longitude and latitude into pandas dataframe

geos = pd.read_csv("toronto_data.csv", delimiter=",")
geos.head()

Merging both data frame based on PostCode

In [None]:
# Merging both data frame based on PostCode

geos = geos.rename(columns={'Postal Code':'Postcode'})

neighborhoods_geos = pd.merge(df, geos, on='Postcode')
print("Shapre of the data frame", neighborhoods_geos.shape)
neighborhoods_geos.head()

## PART 3

In [None]:
# Finding the longitude and Latitude of Toronto for mapping purposes

address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tornoto City are {}, {}.'.format(latitude, longitude))

#### Create a map of Toronto with neighborhoods superimposed on top.

In [None]:
# create map of Toronto using latitude and longitude values
map_tornoto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods_geos['Latitude'], neighborhoods_geos['Longitude'], neighborhoods_geos['Borough'], neighborhoods_geos['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tornoto)  
    
map_tornoto

In [None]:
toronto_Borough = neighborhoods_geos[neighborhoods_geos['Borough'].str.contains('Toronto',regex=False)].reset_index(drop=True)
toronto_Borough.head()

#### Create a map of Toronto where Borough contains "Toronto" with neighborhoods superimposed on top.

In [None]:
# create map of Toronto using latitude and longitude values
map_tornoto_borough = folium.Map(location=[latitude, longitude], zoom_start=10)


# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_Borough['Latitude'], toronto_Borough['Longitude'], toronto_Borough['Borough'], toronto_Borough['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tornoto_borough)  
    
map_tornoto_borough

### Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 5 clusters.

In [None]:
kclusters = 5
toronto_clustering = toronto_Borough.drop(['Postcode','Borough','Neighborhood'],1)
kmeans = KMeans(n_clusters = kclusters,random_state=0).fit(toronto_clustering)
kmeans.labels_
toronto_Borough.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_Borough.head()

#### Finally, let's visualize the resulting clusters

In [None]:
# create map
import numpy as np

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
kclusters = 5
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_Borough['Latitude'], toronto_Borough['Longitude'], toronto_Borough['Borough'], toronto_Borough['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters