# Segmenting And Clustering Neighbourhoods In Toronto

### Part I - Neighbourhood Dataframe Creation

In [271]:
import pandas as pd # Importing Pandas

<em>**Reading a table from web by using pandas**</em>

In [272]:
tables = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

In [274]:
raw_data = tables[0]

In [276]:
raw_data.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


<em>**Filtering Borough those have values "Not assigned"**</em>

In [277]:
raw_data_2 = raw_data[raw_data['Borough'] != "Not assigned"]
raw_data_2.reset_index(drop=True, inplace=True)

<em>**Replacing a value of Neighbourhood that have value "Not assigned" with corresponding value of Borough**</em>

In [278]:
p = raw_data_2[raw_data_2['Neighbourhood'] == 'Not assigned'].index
xv = raw_data_2[raw_data_2['Neighbourhood'] == 'Not assigned'].values[0][1]

In [279]:
print("Index of \'Not assigned\':",p[0],"\nValue to be replaced with:",xv)

Index of 'Not assigned': 6 
Value to be replaced with: Queen's Park


In [282]:
raw_data_2.at[p,'Neighbourhood'] = xv

<em>**Bringing Neighbourhood dataframe in desired format, that is, all Neighbourhoods grouped by commas at Postal Code**</em>

In [283]:
def f(x):
     return pd.DataFrame.from_dict(dict(Postcode = x['Postcode'], 
                        Borough = x['Borough'], 
                        Neighbourhood = "%s" % ', '.join(x['Neighbourhood'])))

In [285]:
raw_data_3 = raw_data_2.groupby('Postcode').apply(f)

In [286]:
raw_data_3.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M5A,Downtown Toronto,"Harbourfront, Regent Park"
4,M6A,North York,"Lawrence Heights, Lawrence Manor"
5,M6A,North York,"Lawrence Heights, Lawrence Manor"
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,"Rouge, Malvern"
9,M1B,Scarborough,"Rouge, Malvern"


<em>**Final shape of Neighbourhood dataframe**</em>

In [287]:
raw_data_3.shape

(211, 3)

### Part II - Importing Cordinates From Geospatial Data in Neighbourhood Dataframe

<em>**Getting Geospatial data in CSV format from given URL**</em>

In [288]:
url = "https://cocl.us/Geospatial_data"
geo_data = pd.read_csv(url)

In [289]:
geo_data.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


<em>**Bringing Latitude and Longitude information from Geospatial data into Neighbourhood data**</em>

In [290]:
geo_data.rename(columns={'Postal Code': 'Postcode'}, inplace=True)

In [291]:
dd = raw_data_3.set_index('Postcode').join(geo_data.set_index('Postcode'),on = 'Postcode', how = 'left')
dd.reset_index(inplace=True)

In [292]:
dd.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
5,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
6,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
7,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
8,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
9,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353


### Part III - Exploring And Clustering Neighbourhoods In Toronto

<em>**Formation of Toronto dataframe from Neighbourhood dataframe**</em>

In [293]:
toronto_data = dd[dd['Borough'].str.contains("Toronto")]

In [294]:
toronto_data.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
13,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
14,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
27,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
36,M4E,East Toronto,The Beaches,43.676357,-79.293031
37,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
41,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
42,M6G,Downtown Toronto,Christie,43.669542,-79.422564
49,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568


<em>**Getting geographical coordinates for Toronto**</em>

In [295]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Solving environment: done

# All requested packages already installed.



In [296]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


<em>**Creating a map for segmenting and clustering neighbourhoods in Toronto**</em>

In [297]:
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

# All requested packages already installed.



In [298]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto