# Segmenting and Clustering Neighborhoods in Toronto

In this notebook we will be clustering the neighborhood in Toronto.We will be starting by Importing the usefull liberaries then we will first be scrapping the data from the wikipedia.

### Importing Liberaries

In [15]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)

pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

### Part 1

In [16]:
from bs4 import BeautifulSoup

Now that we have all the useful liberaries we will be scrapping data from the wikipedia and store it into some variable.

In [17]:
r=requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

In [18]:
soup=BeautifulSoup(r.text,'lxml')

After scrapping html page now we will be scrapping the specific table from which we will be getting the required data and storing the each of the coloum of the table in the separate arrays.

In [19]:
postal_table=soup.find('table',{'class':'wikitable sortable'})

In [20]:
postal_codes = []
borough = []
neighborhood = []

for row in postal_table.findAll('tr'):
    cells = row.findAll('td') 
    if len(cells) == 3:
        postal_codes.append(cells[0].text.strip())
        borough.append(cells[1].text.strip())
        neighborhood.append(cells[2].text.strip())


Next we will be creating the dataframe and clean the same in subsequent steps

In [21]:
df=pd.DataFrame()
df['Postal Code']=postal_codes
df['Borough']=borough
df['Neighborhood']=neighborhood
df.shape

(180, 3)

In [22]:
df.drop(df[df["Borough"]=="Not assigned"].index,axis=0,inplace=True)
df.reset_index()
df.head()


Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [23]:
df.reset_index(inplace=True,drop=True)
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [24]:
df.columns

Index(['Postal Code', 'Borough', 'Neighborhood'], dtype='object')

### Part 2

### Geospatial Data
Now we will be downloading a csv file and storing it into the new data frame

In [25]:
df2=pd.read_csv("https://cocl.us/Geospatial_data")
df2

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Next we will be merging the two dataframe in accordance with Postal Code for getting the Latitude and Longitude for the PostalCodes that we have in the dataframe we created using the data scrapped from the wikipedia

In [26]:
df_merge=pd.merge(df,df2,on='Postal Code')
df_merge

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Part 3

### Generating map

In [27]:
# Toronto latitude and longitude values
latitude = 43.651070
longitude = -79.347015

In [28]:
# create map and display it
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)
# display the map of Toronto
toronto_map

### Creating clusters
now we will be creating clusters with respect to the borough on the map

In [29]:
from folium import plugins
# instantiate a feature group for the Borough in the dataframe
borough = plugins.MarkerCluster().add_to(toronto_map)

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng,label in zip(df_merge.Latitude, df_merge.Longitude,df_merge.Borough):
   folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(borough)


# add incidents to map
toronto_map

As you can see that we have created clusters for different Borough.when we zoom out then we get the one big cluster that contains all the neighborhoods and when we zoom in we get individual values.