# Capstone, Week Three, Part Three

## Exploring Toronto Neighborhoods

First we re-create the dataframe from parts one and two:

In [1]:
import urllib.request
from bs4 import BeautifulSoup

target_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
webpage = urllib.request.urlopen(target_url)
soup = BeautifulSoup(webpage,'html.parser')
#print(soup.prettify())

import pandas as pd
import numpy as np
nbh_frame = pd.DataFrame(columns = ['PostalCode', 'Borough', 'Neighborhood'])

for entry in soup.find_all('tr'):
    table_row = []
    for cell in entry.find_all('td'):
        table_row.append(cell.get_text())
    # following line works for the postal code page, but is not robust
    if len(table_row)==3:
        table_row[2] = table_row[2].rstrip()
        if table_row[2] == 'Not assigned':
            table_row[2] = table_row[1]
        if table_row[1] != 'Not assigned':
            if table_row[0] in nbh_frame.PostalCode.values:
                # duplicate postal zone, so append neighborhood
                idx = nbh_frame[nbh_frame['PostalCode']==table_row[0]].index.values
                nbh_frame.iloc[idx,2] += (', ' + table_row[2])
            else:
                nbh_frame.loc[len(nbh_frame)] = table_row
target_url = 'http://cocl.us/Geospatial_data'
geodata_df = pd.read_csv(target_url)
merged_df = pd.merge(nbh_frame, geodata_df, left_on='PostalCode', right_on='Postal Code')
merged_df.drop(['Postal Code'], axis = 1)
merged_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",M5A,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",M6A,43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,M7A,43.662301,-79.389494


# Postal Zone Characteristics
It would be helpful to know approximately what size these postal zones are, to aid in selecting search radii for Foursquare searches.

In [2]:
# get general idea of the distances between zip codes to help define foursquare search radii

# I will use the python implementation of the Haversine distance formula, 
# as contributed by Wayne Dyck https://gist.github.com/rochacbruno/2883505

import math

def distance(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371 # radius of the earth in km, returned distance will be in kilometers

    dlat = math.radians(lat2-lat1)
    dlon = math.radians(lon2-lon1)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
        * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = radius * c

    return d

all_distances = []
for row_namedTuple1 in merged_df.itertuples(index=False):
    lat1 = row_namedTuple1.Latitude
    lon1 = row_namedTuple1.Longitude
    distances = []
    for row_namedTuple2 in merged_df.itertuples(index=False):
        lat2 = row_namedTuple2.Latitude
        lon2 = row_namedTuple2.Longitude
        distances.append(distance((lat1,lon1),(lat2,lon2)))
    all_distances.append(distances)
distance_df = pd.DataFrame(all_distances, columns = merged_df['PostalCode'])
distance_df.set_index(merged_df['PostalCode'], inplace = True)

# We can then sort to find the distance from each postal code to thier closest neighbor, creating a new dataframe
codes = merged_df['PostalCode']
closest_list = []
for code in codes:
    pc_series = distance_df[code].copy()
    #print(type(pc_series))
    #print(pc_series.head())
    pc_series.sort_values(axis=0, ascending=True, inplace=True)
    #print(pc_series.head())
    this_pair = (pc_series.index[0], pc_series.index[1], pc_series[1])
    closest_list.append(this_pair)
closest_df = pd.DataFrame(closest_list, columns = ['from', 'to', 'distance'])    
closest_df.set_index('from', inplace = True)
print(closest_df.head())
closest_df.describe()

       to  distance
from               
M3A   M3B  1.985923
M4A   M3C  2.037127
M5A   M5C  1.228390
M6A   M6B  1.868943
M7A   M5G  0.512552


Unnamed: 0,distance
count,103.0
mean,1.767151
std,0.803399
min,0.150334
25%,1.247867
50%,1.824053
75%,2.225379
max,3.660469


### Mean distance between zones is 1.8 km.
Increasing the search radius for FourSquare queries to 1 Km seems appropriate.

In [3]:
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.1.0               |           py36_0         724 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

## Mapping postal zones

Postal zones are mapped, with a circle drawn with radius equal to half the distance to the nearest neighbor postal zone.  Postal zones are scattered evenly throughout the city of Toronto, with the exception of two groups postal codes clumped close together in downtown.

In [4]:
import folium

latitude, longitude  = 43.66586, -79.38316 # center map on Church and Wellesley Postal Code Location 
postal_zone_map = folium.Map(location=[latitude, longitude], zoom_start=11) # generate map centred around the Conrad Hotel

# add a red circle marker to show downtown toronto
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(postal_zone_map)




# add the postal code locations as blue circle markers
for lat, lng, label in zip(merged_df.Latitude, merged_df.Longitude, merged_df.PostalCode):
    
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(postal_zone_map)
    folium.Circle(
        [lat, lng],
        color = 'green',
        fill_color = 'green',
        opacity = 0.2,
        fill=True,
        radius=closest_df.at[label,'distance']*500,
        fill_opacity=0.2
    ).add_to(postal_zone_map)
# display map
postal_zone_map

In [7]:
# The code was removed by Watson Studio for sharing.

Your credentails:


# Foursquare Venues in Toronto
It seems that the number of venues available for Foursquare in the Toronto area may be restrictive when characterizing different neighborhoods.

In [8]:
#Explore Foursquare locations close to the locations of each postal zone
import requests
search_query = 'food'
for row_namedTuple in merged_df.itertuples(index=False):
    latitude = row_namedTuple.Latitude
    longitude = row_namedTuple.Longitude
    radius = 1000 # 1 km radius,there might be some overlap, especially in small zones
    LIMIT = 100
    search_query = 'food'
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    food_venues = len(results['response']['venues'])
    
    print (row_namedTuple.PostalCode, 'has ', food_venues, 'food venues')

M3A has  3 food venues
M4A has  9 food venues
M5A has  29 food venues
M6A has  2 food venues
M7A has  50 food venues
M9A has  1 food venues
M1B has  3 food venues
M3B has  3 food venues
M4B has  3 food venues
M5B has  50 food venues
M6B has  7 food venues
M9B has  3 food venues
M1C has  0 food venues
M3C has  1 food venues
M4C has  4 food venues
M5C has  50 food venues
M6C has  11 food venues
M9C has  0 food venues
M1E has  5 food venues
M4E has  15 food venues
M5E has  50 food venues
M6E has  4 food venues
M1G has  1 food venues
M4G has  6 food venues
M5G has  50 food venues
M6G has  14 food venues
M1H has  5 food venues
M2H has  2 food venues
M3H has  1 food venues
M4H has  4 food venues
M5H has  50 food venues
M6H has  13 food venues
M1J has  1 food venues
M2J has  8 food venues
M3J has  4 food venues
M4J has  13 food venues
M5J has  50 food venues
M6J has  18 food venues
M1K has  3 food venues
M2K has  0 food venues
M3K has  1 food venues
M4K has  15 food venues
M5K has  50 food ve

In [11]:
search_query = ''
latitude, longitude  = 43.66586, -79.38316
radius = 10000 # 10 kilometers
LIMIT = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
all_venues = len(results['response']['venues'])
print ('The Toronto area has ', all_venues, 'venues')

The Toronto area has  155 venues
