# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

# Gokhan Ince

## PART 1

### For this assignment, we are required to explore and cluster the neighborhoods in Toronto.

### First, we need to import pandas library to pull data from the link. And then assign the URL to a variable. And then pull the data by using read_html function to put them in a dataframe.

In [1]:
import pandas as pd
!pip install lxml

# I assigned the link to url variable
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

# I creaated a dataframe by pulling data from the link
df = pd.read_html(url, header = 0)[0]



### After that we need to ignore the cells which has no borough information.

In [2]:
#Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
df = df[df.Borough != 'Not assigned']

### Sometimes a postal code is related to different neighbourhoods. We need to combine them.

In [3]:
#More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
df = df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

### If a cell has borough info but not assigned to a neighborhood, the neighborhood will be the same as the borough.

In [4]:
#If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
for index, row in df.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']

### Now, our dataframe is ready. Let's see its shape.

In [5]:
#shape of the dataframe
df.shape

(103, 3)

### Its shape is (103,3). It means our dataframe has 3 columns and 103 rows.
### Now we can see our data

In [6]:
# data
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


## Part 2

### Now, I will download the geospatial data of Toronto, create a dataframe and then show some data from the dataframe.

In [7]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_lon_lat = pd.read_csv('Toronto_long_lat_data.csv')
df_lon_lat.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### I will rename the column names to make it more meaningful and it will look better.

In [8]:
df_lon_lat.columns=['Postcode','Latitude','Longitude']
df_lon_lat.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Now I will merge the dataframes by using Postcode. It will update my df_toronto dataframe.

In [9]:
df_toronto = pd.merge(df,df_lon_lat[['Postcode','Latitude', 'Longitude']], on='Postcode')
df_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Part 3

### Now, I will import a library to convert an address in to latitude and longitude values.

In [14]:
!pip install geopy

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/80/93/d384479da0ead712bdaf697a8399c13a9a89bd856ada5a27d462fb45e47b/geopy-1.20.0-py2.py3-none-any.whl (100kB)
[K     |████████████████████████████████| 102kB 2.9MB/s ta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/5b/ac/4f348828091490d77899bc74e92238e2b55c59392f21948f296e94e50e2b/geographiclib-1.49.tar.gz
Building wheels for collected packages: geographiclib
  Building wheel for geographiclib (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jovyan/.cache/pip/wheels/99/45/d1/14954797e2a976083182c2e7da9b4e924509e59b6e5c661061
Successfully built geographiclib
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.49 geopy-1.20.0


In [15]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated modules
import matplotlib.cm as cm
import matplotlib.colors as colors

### Now, I will detect the latitude and longitude values of Toronto.

In [16]:
# Toronto is assigned to the variable "address"
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('Latitude of Toronto is {} and longtitude is {}.'.format(latitude_toronto, longitude_toronto))

Latitude of Toronto is 43.653963 and longtitude is -79.387207.


### Now, I will import Folium library to generate maps.

In [17]:
!pip install folium
import pandas as pd
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 3.9MB/s eta 0:00:011
Installing collected packages: folium
Successfully installed folium-0.10.0


### The library is ready. Now we can generate our map.

In [18]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)
# added markers to map
for lat, lng, borough, Neighbourhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    


### At last we are able to see our map.

In [21]:
map_toronto

## Thanks for reviewing
## Regards
## Gokhan Ince