<h1 align="center">Segmenting and Clustering Neighborhoods in Toronto (part 2)</h1>
<p>By Dru Norman</p>
<hr>
<h2>Question 2:</h2>
<ul>
   <li>2.1 Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to <strong>get the latitude and the longitude coordinates of each neighborhood.</strong></li>
   <li>In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.</li>
   <li><strong>2.2 Use the Geocoder package or the csv file to create the requested dataframe</strong></li>
</ul>
<hr>

<h2>2.1 Get the latitude and longitude coordinates of each neighborhood:</h2>

*We must first create dataframe to input the newly requested data

In [3]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

# Download url data from internet

url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source=requests.get(url).text
Canada_data=BeautifulSoup(source, 'lxml')

# Create a new dataframe
column_names= 'Postalcode','Borough','Neighborhood']
toronto=pd.DataFrame(columns = column_names)
content=Canada_data.find('div', class_='mw-parser-output')
table=content.table.tbody
postcode=0
borough=0
neighborhood=0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text
            i = i + 1
        elif i == 1:
            borough = td.text
            i = i + 1
        elif i == 2: 
            neighborhood = td.text.strip('\n').replace(']','')
    toronto = toronto.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

# Clean up the dataframe 
toronto=toronto[toronto.Borough!='Not assigned']
toronto=toronto[toronto.Borough!= 0]
toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,toronto.shape[0]):
    if toronto.iloc[i][2] == 'Not assigned':
        toronto.iloc[i][2] = toronto.iloc[i][1]
        i = i+1
                                 
df=toronto.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

def neighborhood_list(grouped):    
    return ', '.join(sorted(grouped['Neighborhood'].tolist()))
                    
grp=df.groupby(['Postalcode', 'Borough'])
df2=grp.apply(neighborhood_list).reset_index(name='Neighborhood')
df2.rename(columns={'Postalcode':'Postal Code'}, inplace=True)
df2.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<h2>2.2 Use the Geocoder package or the csv file to create the requested dataframe:</h2>
* Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

In [13]:
# Use link to csv file to upload geographical coordinates

toronto_geocsv='https://cocl.us/Geospatial_data'
!wget -q -O 'toronto_m.geospatial_data.csv' toronto_geocsv
geocsv_data=pd.read_csv(toronto_geocsv).set_index("Postal Code")
geocsv_data.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [5]:
# Add Lattitude and Longitude data to the Dataframe (df2)

df=pd.merge(geocsv_data, df2, on='Postal Code')
df.head()

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighborhood
0,M1B,43.806686,-79.194353,Scarborough,"Rouge, Malvern"
1,M1C,43.784535,-79.160497,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


In [7]:
# Process the dataframe to match requested format

df = df[['Postal Code', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']]
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [8]:
df.shape

(103, 5)