## Capstone Project - The Battle of the Neighborhoods
##### Presents

# **_The Battle of Alberta_**
### **_Calgary vs Edmonton_**     
##### by Pam Pritchett
<br>

###### *This location analysis project was inspired by the Battle of the Neighborhoods capstone requirements for the IBM Data Science Professional Certificate offered by Coursera.

## Introduction: Business Problem

The Battle of Alberta is the term that refers to the long standing rivalry between Calgary (the largest city in Alberta) and Edmonton (the province’s capital). This phenomenon has its roots dating back to the early 1900s when the cities competed to become the young province’s capital city. 

The rivalry later manifested in sports, most predominately the historic rivalry between their respective NHL sports teams, the Edmonton Oilers and the Calgary Flames. It fuels the passion of their fans, with each city boosting that their city is better than the other. But which city is correct? Does the neighborhoods of one city offer the same access to amenities, or does the neighborhoods of one city outshine the other? 

This project endeavors to take a playful approach to the question: Does Calgary or Edmonton have the best neighborhoods? How do they differ?

This project compares the two cities by exploring venues and amenities by analysing data retrieved from FourSquare. This is a simplistic approach that does not consider the other myriad of socio-economic indices that impact a neighborhood’s livability. It is a demonstration of location analysis to compare the merits of neighborhoods by city.

#### Target Audience
Those who have ever questioned the basis of the Calgary / Edmonton rivalry and wondered are the neighborhoods of each city different from each other and if so, then how?

## Data
### Data Sources
The data acquired for this analysis are derived from the following resources:

•	Postal Codes are scraped from Wikipedia for each city to enable neighborhood groupings will be extracted from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T

•	ArcGis geocoding web services to retrieve geological coordinates based on postal codes to create a new dataframe to be passed to FourSquare

•	FourSquare API to explore each neighborhood cluster and identify the top 5 neighborhood venues for each city

### Initial Data Preparation


## Methodology
##### Explore and Model the Data

•	Transform FourSquare json results to pandas dataframe

•	Utilize OneHot Encoding to analyse each neighorhood

•	Identify top 5 venues of each neighborhood and create dataframe

•	Perform k-means clustering of neighborhoods

•	Visualize the neighborhood rankings using the folium library

•	Examine clusters for comparison


##### Install libraries for web scraping and data preprocessing

In [1]:
!pip install beautifulsoup4 # web scraping
from bs4 import BeautifulSoup
import requests # handle requests
import pandas as pd # data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np # library to handle data in a vectorized manner
import urllib.request # extensible library for opening urls

import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

!pip install geocoder # Geocoding library
import geocoder

! pip install geopy
from geopy.geocoders import Nominatim

print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.1MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.1 soupsieve-2.0.1
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package       

##### Scrape Calgary and Edmonton postal codes from Wikipedia and create dataframes for each city

In [2]:
# Web scraping wikipedia for Calgary and Edmonton postal codes
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T'
req = urllib.request.urlopen(url)
article = req.read().decode()
with open ('List_of_postal_codes_of_Canada:_T', 'w') as fo:
    fo.write(article)
# Parse the  data with Beautiful Soup   
article = open('List_of_postal_codes_of_Canada:_T').read()
soup = BeautifulSoup(article, 'html.parser')
tables = soup.find_all('table', class_='sortable')
    
data = []

for tr in soup.tbody.find_all('tr'):
    data.append([ td.get_text().strip() for td in tr.find_all('td')])

# create dataframe from the Alberta postal code tale
AB_df = pd.DataFrame(data, columns=['0','Calgary_1','Calgary_2','4','Edmonton_1','Edmonton_2','7','8','9'])

# create dataframes fo each city
cal_df = pd.DataFrame(AB_df, columns=['Calgary_1', 'Calgary_2'])
edm_df = pd.DataFrame(AB_df, columns=['Edmonton_1', 'Edmonton_2'])

# Stack Calgary postal code columns into one and name the column 'Neighborhoods'
calgary_data = pd.DataFrame(cal_df.stack(),columns=['Neighborhood'])
edmonton_data = pd.DataFrame(edm_df.stack(),columns=['Neighborhood'])

# Reset index of dataframe for each city
calgary_data.reset_index(drop=True, inplace=True)
edmonton_data.reset_index(drop=True, inplace=True)

# Create column with parsed postal code for location analysis
calgary_data['Postal Code'] = calgary_data['Neighborhood'].str[:3]
pd.set_option('display.max_colwidth', 500)

edmonton_data['Postal Code'] = edmonton_data['Neighborhood'].str[:3]
pd.set_option('display.max_colwidth', 500)

# Export to csv for further data cleaning
calgary_data.to_csv(r'C:\Data\BattleOfAlberta\cal_post.csv', index = False)
edmonton_data.to_csv(r'C:\Data\BattleOfAlberta\edm_post.csv', index = False)

#read csv into dataframe
calgary_data=pd.read_csv("../cal_post_neigh.csv")
edmonton_data=pd.read_csv("../edm_post_neigh.csv")

##### Explore the dataframes

In [3]:
calgary_data.head()

Unnamed: 0,Neighborhood,Postal Code
0,Penbrooke Meadows / Marlborough,T2A
1,Forest Lawn / Dover / Erin Woods,T2B
2,Lynnwood Ridge / Ogden / Foothills Industrial / Great Plains,T2C
3,Bridgeland / Greenview / Zoo / YYC,T2E
4,Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome,T2G


In [4]:
calgary_data.describe()

Unnamed: 0,Neighborhood,Postal Code
count,36,36
unique,36,36
top,Queensland / Lake Bonavista / Willow Park / Acadia,T2R
freq,1,1


In [5]:
edmonton_data.head()

Unnamed: 0,Neighborhood,Postal Code
0,West Clareview / East Londonderry,T5A
1,East North Central / West Beverly,T5B
2,Central Londonderry,T5C
3,West Londonderry / East Calder,T5E
4,North Central / Queen Mary Park / Blatchford,T5G


In [6]:
edmonton_data.describe()


Unnamed: 0,Neighborhood,Postal Code
count,39,39
unique,39,39
top,Heritage Valley,T6R
freq,1,1


##### Retrieve latitude and longitude for Calgary from ArcGis geocoder services using Nominatim API

In [7]:
address = 'Calgary, Alberta'

def get_geocoder(PostalCode_from_df):
     # initialize your variable to None
     lat_lng_coords = None
     # loop until you get the coordinates
     while(lat_lng_coords is None):
       g = geocoder.arcgis('{}, Calgary, Alberta'.format(PostalCode_from_df))
       lat_lng_coords = g.latlng
     latitude = lat_lng_coords[0]
     longitude = lat_lng_coords[1]
     return latitude,longitude

calgary_data['Latitude'], calgary_data['Longitude'] = zip(*calgary_data['Postal Code'].apply(get_geocoder))

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude_calgary = location.latitude
longitude_calgary = location.longitude
print('The geograpical coordinates of Calgary are {}, {}.'.format(latitude_calgary, longitude_calgary))

The geograpical coordinates of Calgary are 51.0534234, -114.0625892.


##### Retrieve latitude and longitude for Edmonton from ArcGis geocoder using Nominatim API

In [8]:
address = 'Edmonton, Alberta'

def get_geocoder(PostalCode_from_df):
     # initialize your variable to None
     lat_lng_coords = None
     # loop until you get the coordinates
     while(lat_lng_coords is None):
       g = geocoder.arcgis('{}, Edmonton, Alberta'.format(PostalCode_from_df))
       lat_lng_coords = g.latlng
     latitude = lat_lng_coords[0]
     longitude = lat_lng_coords[1]
     return latitude,longitude

edmonton_data['Latitude'], edmonton_data['Longitude'] = zip(*edmonton_data['Postal Code'].apply(get_geocoder))
edmonton_data.head()

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude_edmonton = location.latitude
longitude_edmonton = location.longitude
print('The geograpical coordinates of Edmonton are {}, {}.'.format(latitude_edmonton, longitude_edmonton))

The geograpical coordinates of Edmonton are 53.535411, -113.507996.


##### Connect to Foursquare API

In [9]:
CLIENT_ID = '5HR0IGISRNVFYO1VD0RBMZDP1M0WZKN1HU2IIBMSAAHYEQF4' # your Foursquare ID
CLIENT_SECRET = 'JDHOZEQCK2SG2YGNCXWTSXKWUZCTO4LR0WNY2X502C3FI0XK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5HR0IGISRNVFYO1VD0RBMZDP1M0WZKN1HU2IIBMSAAHYEQF4
CLIENT_SECRET:JDHOZEQCK2SG2YGNCXWTSXKWUZCTO4LR0WNY2X502C3FI0XK


##### Explore a downtown neighborhood in the Calgary dataframe

In [10]:
# indentify the first name listed in the Calgary Neighborhoods
calgary_data.loc[11, 'Neighborhood']

'City Centre / Calgary Tower'

In [11]:
# get latitude and longitude for the City Centre / Cagary Tower neighborhood
neighborhood_latitude = calgary_data.loc[11, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = calgary_data.loc[11, 'Longitude'] # neighborhood longitude value

neighborhood_name = calgary_data.loc[11, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of City Centre / Calgary Tower are 51.05063000000007, -114.07540999999998.


##### Retrieve the top venues in City Centre / Calgary Tower within a radius of 500 meters

In [12]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=5HR0IGISRNVFYO1VD0RBMZDP1M0WZKN1HU2IIBMSAAHYEQF4&client_secret=JDHOZEQCK2SG2YGNCXWTSXKWUZCTO4LR0WNY2X502C3FI0XK&v=20180605&ll=51.05063000000007,-114.07540999999998&radius=500&limit=100'

In [13]:
#retrieve results from FourSquare in json format
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee645c6a5f59c75998ccf95'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Downtown Calgary',
  'headerFullLocation': 'Downtown Calgary, Calgary',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 51,
  'suggestedBounds': {'ne': {'lat': 51.05513000450007,
    'lng': -114.0682649733636},
   'sw': {'lat': 51.04612999550007, 'lng': -114.08255502663636}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b0586eaf964a520167522e3',
       'name': "Buchanan's",
       'location': {'address': '738 3rd Ave. SW',
        'crossStreet': '7th St.',
        'lat': 51.05081381790359,
        'lng': -114.07831957639256,
        'labeledLatLngs':

##### Transform results into dataframe

In [14]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,lat,lng
0,Buchanan's,Steakhouse,51.050814,-114.07832
1,Alforno Bakery & Cafe,Bakery,51.051528,-114.078271
2,Gyu-Kaku Japanese BBQ,Japanese Restaurant,51.047934,-114.07611
3,Q Haute Cuisine,French Restaurant,51.05213,-114.078855
4,Caesar's Steak House,Eastern European Restaurant,51.049772,-114.072317


In [15]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

51 venues were returned by Foursquare.


##### Explore a downtown neighborhood in the Edmonton dataframe 

In [16]:
# indentify the downtown (7) Neighborhoods for Edmonton
edmonton_data.loc[7, 'Neighborhood']

'South\xa0Downtown\xa0/ South\xa0Downtown Fringe/AB Government'

In [17]:
# get latitude and longitude for first neighborhood
neighborhood_latitude = edmonton_data.loc[7, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = edmonton_data.loc[7, 'Longitude'] # neighborhood longitude value

neighborhood_name = edmonton_data.loc[7, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of South Downtown / South Downtown Fringe/AB Government are 53.53798695000006, -113.51027983999995.


##### Retrieve the top venues in South Downtown / South Downtown Fringe / AB Government within a radius of 500 meters

In [21]:
LIMIT = 125 # limit of number of venues returned by Foursquare API

radius = 750 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=5HR0IGISRNVFYO1VD0RBMZDP1M0WZKN1HU2IIBMSAAHYEQF4&client_secret=JDHOZEQCK2SG2YGNCXWTSXKWUZCTO4LR0WNY2X502C3FI0XK&v=20180605&ll=53.53798695000006,-113.51027983999995&radius=750&limit=125'

In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee64785237de16992e79497'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Oliver',
  'headerFullLocation': 'Oliver, Edmonton',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 64,
  'suggestedBounds': {'ne': {'lat': 53.544736956750064,
    'lng': -113.4989429386492},
   'sw': {'lat': 53.53123694325006, 'lng': -113.52161674135071}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b08bc4cf964a5209f1123e3',
       'name': 'The Common',
       'location': {'address': '9910 109 St NW',
        'lat': 53.53763503776285,
        'lng': -113.50857049590414,
        'labeledLatLngs': [{'label': 'display',
          'lat': 53.5376350377

##### Transform results into dataframe

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,lat,lng
0,The Common,Nightclub,53.537635,-113.50857
1,District Coffee Co,Café,53.538903,-113.508257
2,Zuppa Cafe,Breakfast Spot,53.537059,-113.509847
3,Pampa Brazilian Steakhouse,Brazilian Restaurant,53.537964,-113.508288
4,Central Social Hall,Bar,53.540857,-113.508892


In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

64 venues were returned by Foursquare.


### Visualization

##### Map of Calgary with neighborhoods superimposed

In [25]:
# create map of Calgary using latitude and longitude values
map_calgary = folium.Map(location=[latitude_calgary,longitude_calgary], zoom_start=10)

# add markers to map
for lat, lng, label in zip(calgary_data['Latitude'], calgary_data['Longitude'], calgary_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)  
    
map_calgary

##### Map of Edmonton with neighborhoods superimposed

In [26]:
# create map of Edmonton using latitude and longitude values
map_edmonton = folium.Map(location=[latitude_edmonton,longitude_edmonton], zoom_start=10)

# add markers to map
for lat, lng, label in zip(edmonton_data['Latitude'], edmonton_data['Longitude'], edmonton_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edmonton)  
    
map_edmonton

##### Calgary: Retrieve location data for neighborhood analysis

In [27]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
# Let's create a function to repeat the process to all the neighborhoods in Calgary
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

limit=100
calgary_venues = getNearbyVenues(names=calgary_data['Neighborhood'],
                                   latitudes=calgary_data['Latitude'],
                                   longitudes=calgary_data['Longitude']
                                  )

Penbrooke Meadows / Marlborough
Forest Lawn / Dover / Erin Woods
Lynnwood Ridge / Ogden / Foothills Industrial / Great Plains
Bridgeland / Greenview / Zoo / YYC
Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome
Highfield / Burns Industrial
Queensland / Lake Bonavista / Willow Park / Acadia
Thorncliffe / Tuxedo Park
Brentwood / Collingwood / Nose Hill
Mount Pleasant / Capitol Hill / Banff Trail
Kensington / Westmont / Parkdale / University
City Centre / Calgary Tower
Connaught / West Victoria Park
Elbow Park / Britannia / Parkhill / Mission
outh(Altadore / Bankview / Richmond
Oak Ridge / Haysboro / Kingsland / Kelvin Grove / Windsor Park
Braeside / Cedarbrae / Woodbine
Midnapore / Sundance
Millrise / Somerset / Bridlewood / Evergreen
Douglas Glen / McKenzie Lake / Copperfield / East Shepard
Dalhousie / Edgemont / Hamptons / Hidden Valley
Montgomery / Bowness / Silver Springs / Greenwood
Rosscarrock / Westgate / Wildwood / Shaganappi / Sunalta
Lakeview / Glendale / Kill

##### Edmonton: Retrieve location data for neighborhood analysis

In [28]:
# Let's create a function to repeat the process to all the neighborhoods in Edmonton
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

limit=125
edmonton_venues = getNearbyVenues(names=edmonton_data['Neighborhood'],
                                   latitudes=edmonton_data['Latitude'],
                                   longitudes=edmonton_data['Longitude']
                                  )

West Clareview / East Londonderry
East North Central / West Beverly
Central Londonderry
West Londonderry / East Calder
North Central / Queen Mary Park / Blatchford
North and East Downtown Fringe
North Downtown
South Downtown / South Downtown Fringe/AB Government
North Westmount / West Calder / East Mistatim
South Westmount / Groat Estate / East Northwest Industrial
Glenora / SW Downtown Fringe
North Jasper Place
Central Jasper Place / Buena Vista
West Northwest Industrial / Winterburn
West Jasper Place / West Edmonton Mall
Central Mistatim
Central Beverly
East Castle Downs
Landbank / East Lake District
West Lake District
Edmonton
Ellerslie
Heritage Valley
West Castle Downs
The Meadows
North Clover Bar
Riverbend
East Southeast Industrial / South Clover Bar
South Industrial
Southwest
East Mill Woods
West Mill Woods
Kaskitayo / Aspen Gardens
Southgate / North Riverbend
West University / Strathcona Place
South Bonnie Doon / East University
Central Bonnie Doon
SE Capilano / West Southeast I

##### Exploring neighborhoods in Calgary

In [29]:
print(calgary_venues.shape)
calgary_venues.head()

(300, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Penbrooke Meadows / Marlborough,51.051934,-113.95668,Shoppers Drug Mart,51.054478,-113.955168,Pharmacy
1,Penbrooke Meadows / Marlborough,51.051934,-113.95668,Chelsea's pub and grill,51.054748,-113.954981,Pub
2,Penbrooke Meadows / Marlborough,51.051934,-113.95668,Burger King,51.053909,-113.955502,Fast Food Restaurant
3,Penbrooke Meadows / Marlborough,51.051934,-113.95668,Quiznos,51.053881,-113.954775,Sandwich Place
4,Penbrooke Meadows / Marlborough,51.051934,-113.95668,Little Caesars Pizza,51.054691,-113.955464,Pizza Place


##### Exploring neighborhoods in Edmonton

In [30]:
print(edmonton_venues.shape)
edmonton_venues.head()

(355, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,West Clareview / East Londonderry,53.5945,-113.40573,Cora's,53.597832,-113.408026,Breakfast Spot
1,West Clareview / East Londonderry,53.5945,-113.40573,Shoppers Drug Mart,53.597623,-113.409703,Pharmacy
2,West Clareview / East Londonderry,53.5945,-113.40573,East Clareview Transit Centre,53.594927,-113.404422,Bus Station
3,West Clareview / East Londonderry,53.5945,-113.40573,Michaels,53.598221,-113.405771,Arts & Crafts Store
4,West Clareview / East Londonderry,53.5945,-113.40573,Dollarama,53.597757,-113.408492,Discount Store


In [31]:
# Check the number of Calgary venues returned for each neighborhood
calgary_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Braeside / Cedarbrae / Woodbine,4,4,4,4,4,4
Bridgeland / Greenview / Zoo / YYC,5,5,5,5,5,5
City Centre / Calgary Tower,51,51,51,51,51,51
Connaught / West Victoria Park,93,93,93,93,93,93
Douglas Glen / McKenzie Lake / Copperfield / East Shepard,5,5,5,5,5,5
Elbow Park / Britannia / Parkhill / Mission,3,3,3,3,3,3
Forest Lawn / Dover / Erin Woods,4,4,4,4,4,4
Hawkwood / Arbour Lake / Citadel / Ranchlands / Royal Oak / Rocky Ridge,6,6,6,6,6,6
Highfield / Burns Industrial,7,7,7,7,7,7
Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome,7,7,7,7,7,7


In [32]:
# Check the number of Edmonton venues returned for each neighborhood
edmonton_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Beverly,4,4,4,4,4,4
Central Bonnie Doon,19,19,19,19,19,19
Central Jasper Place / Buena Vista,6,6,6,6,6,6
Central Londonderry,4,4,4,4,4,4
Central Mistatim,1,1,1,1,1,1
East North Central / West Beverly,4,4,4,4,4,4
East Castle Downs,1,1,1,1,1,1
East Mill Woods,6,6,6,6,6,6
Edmonton,1,1,1,1,1,1
Glenora / SW Downtown Fringe,3,3,3,3,3,3


In [33]:
# Identify the number of unique categories curated from all the Calgary returned venues
print('There are {} uniques categories.'.format(len(calgary_venues['Venue Category'].unique())))

There are 104 uniques categories.


In [34]:
# Identify the number of unique categories curated from all the Edmonton returned venues
print('There are {} uniques categories.'.format(len(edmonton_venues['Venue Category'].unique())))

There are 127 uniques categories.


### Modelling: Analysing Each Calgary Neighborhood

In [36]:
# one hot encoding
calgary_onehot = pd.get_dummies(calgary_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
calgary_onehot['Neighborhood'] = calgary_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [calgary_onehot.columns[-1]] + list(calgary_onehot.columns[:-1])
calgary_onehot = calgary_onehot[fixed_columns]

calgary_onehot.head()

Unnamed: 0,Neighborhood,ATM,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Baseball Field,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Bus Station,Business Service,Café,Camera Store,Chinese Restaurant,Chocolate Shop,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Dance Studio,Department Store,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Food & Drink Shop,Food Court,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,History Museum,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,Museum,Music Store,Music Venue,New American Restaurant,Other Repair Shop,Park,Pharmacy,Pizza Place,Playground,Poutine Place,Pub,Ramen Restaurant,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Rock Club,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shop & Service,Skating Rink,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theme Park,Vietnamese Restaurant,Warehouse Store,Wine Bar,Yoga Studio
0,Penbrooke Meadows / Marlborough,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Penbrooke Meadows / Marlborough,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Penbrooke Meadows / Marlborough,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Penbrooke Meadows / Marlborough,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Penbrooke Meadows / Marlborough,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Modelling: Analysing Each Edmonton Neighborhood

In [37]:
# one hot encoding
edmonton_onehot = pd.get_dummies(edmonton_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
edmonton_onehot['Neighborhood'] = edmonton_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [edmonton_onehot.columns[-1]] + list(edmonton_onehot.columns[:-1])
edmonton_onehot = edmonton_onehot[fixed_columns]

edmonton_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Business Service,Café,Caribbean Restaurant,Carpet Store,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,College Cafeteria,College Gym,College Residence Hall,Community Center,Convenience Store,Cosmetics Shop,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Space,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gas Station,Gastropub,Gay Bar,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Hobby Shop,Hockey Arena,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Martial Arts Dojo,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Museum,Music Store,New American Restaurant,Nightclub,Noodle House,Office,Optical Shop,Other Repair Shop,Paintball Field,Park,Pet Store,Pharmacy,Pizza Place,Pool,Pool Hall,Pub,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Shoe Store,Shop & Service,Shopping Mall,Smoothie Shop,Sporting Goods Shop,Sports Bar,Stables,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Shop,Women's Store,Yoga Studio
0,West Clareview / East Londonderry,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,West Clareview / East Londonderry,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,West Clareview / East Londonderry,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,West Clareview / East Londonderry,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,West Clareview / East Londonderry,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [38]:
calgary_onehot.shape

(300, 105)

In [39]:
edmonton_onehot.shape

(355, 128)

In [42]:
# Calgaey: group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
calgary_grouped = calgary_onehot.groupby('Neighborhood').mean().reset_index()

calgary_grouped.shape

(26, 105)

In [43]:
# Edmonton: group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
edmonton_grouped = edmonton_onehot.groupby('Neighborhood').mean().reset_index()

edmonton_grouped.shape

(33, 128)

In [44]:
# Print each Calgary neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in calgary_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = calgary_grouped[calgary_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Braeside / Cedarbrae / Woodbine----
                        venue  freq
0                         ATM  0.25
1  Construction & Landscaping  0.25
2           Other Repair Shop  0.25
3        Fast Food Restaurant  0.25
4  Modern European Restaurant  0.00


----Bridgeland / Greenview / Zoo / YYC----
                    venue  freq
0      Italian Restaurant   0.2
1  Furniture / Home Store   0.2
2          Hardware Store   0.2
3          Breakfast Spot   0.2
4            Gourmet Shop   0.2


----City Centre / Calgary Tower----
            venue  freq
0     Coffee Shop  0.18
1  Sandwich Place  0.08
2            Café  0.08
3           Hotel  0.06
4             Pub  0.06


----Connaught / West Victoria Park----
                   venue  freq
0            Coffee Shop  0.09
1             Restaurant  0.06
2                    Pub  0.06
3  Vietnamese Restaurant  0.04
4                   Café  0.04


----Douglas Glen / McKenzie Lake / Copperfield / East Shepard----
                  venue  freq


In [45]:
# Print each Edmonton neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in edmonton_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = edmonton_grouped[edmonton_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Beverly----
                  venue  freq
0        Sandwich Place  0.25
1       Thai Restaurant  0.25
2  Fast Food Restaurant  0.25
3  Caribbean Restaurant  0.25
4                  Park  0.00


----Central Bonnie Doon----
               venue  freq
0               Bank  0.11
1  Electronics Store  0.05
2     Breakfast Spot  0.05
3  French Restaurant  0.05
4        Gas Station  0.05


----Central Jasper Place / Buena Vista----
               venue  freq
0       Liquor Store  0.17
1     Sandwich Place  0.17
2        Pizza Place  0.17
3  Convenience Store  0.17
4             Bakery  0.17


----Central Londonderry----
               venue  freq
0         Food Court  0.25
1  Recreation Center  0.25
2                Gym  0.25
3  Martial Arts Dojo  0.25
4             Office  0.00


----Central Mistatim----
               venue  freq
0  Electronics Store   1.0
1  Accessories Store   0.0
2              Hotel   0.0
3        Pizza Place   0.0
4           Pharmacy   0.0


----East North

##### Calgary dataframe with top neighborhood venues

In [50]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new Calgary dataframe
cal_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
cal_neighborhoods_venues_sorted['Neighborhood'] = calgary_grouped['Neighborhood']

for ind in np.arange(calgary_grouped.shape[0]):
    cal_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(calgary_grouped.iloc[ind, :], num_top_venues)

cal_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Braeside / Cedarbrae / Woodbine,ATM,Construction & Landscaping,Other Repair Shop,Fast Food Restaurant,Department Store,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market
1,Bridgeland / Greenview / Zoo / YYC,Furniture / Home Store,Italian Restaurant,Hardware Store,Gourmet Shop,Breakfast Spot,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
2,City Centre / Calgary Tower,Coffee Shop,Sandwich Place,Café,Restaurant,Bar,Hotel,Pub,Indian Restaurant,Light Rail Station,Sushi Restaurant
3,Connaught / West Victoria Park,Coffee Shop,Pub,Restaurant,Vietnamese Restaurant,Bar,Café,Brewery,Hotel,French Restaurant,Pizza Place
4,Douglas Glen / McKenzie Lake / Copperfield / East Shepard,Shop & Service,Gym / Fitness Center,Sandwich Place,Park,Hardware Store,Gym,Department Store,Diner,Discount Store,Donut Shop


##### Edmonton dataframe with top neighborhood venues

In [46]:
# create a new Edmonton dataframe
edm_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
edm_neighborhoods_venues_sorted['Neighborhood'] = edmonton_grouped['Neighborhood']

for ind in np.arange(edmonton_grouped.shape[0]):
    edm_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(edmonton_grouped.iloc[ind, :], num_top_venues)

edm_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Beverly,Thai Restaurant,Caribbean Restaurant,Sandwich Place,Fast Food Restaurant,Yoga Studio,Falafel Restaurant,Factory,Fabric Shop,Event Space,Electronics Store
1,Central Bonnie Doon,Bank,Liquor Store,Gas Station,Clothing Store,Coffee Shop,French Restaurant,Electronics Store,Breakfast Spot,Eastern European Restaurant,Bowling Alley
2,Central Jasper Place / Buena Vista,Convenience Store,Pizza Place,Sandwich Place,Bakery,Liquor Store,Sushi Restaurant,Electronics Store,Factory,Fabric Shop,Event Space
3,Central Londonderry,Gym,Recreation Center,Food Court,Martial Arts Dojo,Yoga Studio,Electronics Store,Factory,Fabric Shop,Event Space,Discount Store
4,Central Mistatim,Electronics Store,Yoga Studio,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Event Space


### Clustering Calgary Neighborhoods

In [47]:
# set number of clusters
kclusters = 7

calgary_grouped_clustering = calgary_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(calgary_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_
labels

array([5, 0, 0, 0, 4, 4, 6, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 3,
       0, 5, 0, 0], dtype=int32)

In [52]:
# add clustering labels
cal_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

calgary_merged = calgary_data

# merge calgry_grouped with calgary_data to add latitude/longitude for each neighborhood
calgary_merged = calgary_merged.join(cal_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

calgary_merged.head() # check the first columns!

Unnamed: 0,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Penbrooke Meadows / Marlborough,T2A,51.051934,-113.95668,0.0,Pizza Place,Pharmacy,Pub,Fast Food Restaurant,Sandwich Place,Dance Studio,Department Store,Diner,Discount Store,Donut Shop
1,Forest Lawn / Dover / Erin Woods,T2B,51.02711,-113.96678,6.0,Playground,Liquor Store,Skating Rink,Food Court,Department Store,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
2,Lynnwood Ridge / Ogden / Foothills Industrial / Great Plains,T2C,50.979966,-113.967481,2.0,Music Venue,Yoga Studio,Hotel,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant
3,Bridgeland / Greenview / Zoo / YYC,T2E,51.086868,-114.050843,0.0,Furniture / Home Store,Italian Restaurant,Hardware Store,Gourmet Shop,Breakfast Spot,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
4,Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome,T2G,51.028627,-114.035519,0.0,Sporting Goods Shop,Brewery,Comedy Club,Farmers Market,Sports Bar,Café,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant


In [53]:
# Clean up the data for further analysis
nan_value = float("NaN")
calgary_merged.replace("", nan_value, inplace=True)

calgary_merged.dropna(subset = ["Cluster Labels"], inplace=True)

calgary_merged.head(100)  #(0, 'Cluster Labels'.astype('int64')

import numpy as np  # To use the int64 dtype, we will need to import numpy
cols = ['Cluster Labels']
for col in cols:
	calgary_merged[col] = calgary_merged[col].astype(dtype=np.int64)
calgary_merged.head()

Unnamed: 0,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Penbrooke Meadows / Marlborough,T2A,51.051934,-113.95668,0,Pizza Place,Pharmacy,Pub,Fast Food Restaurant,Sandwich Place,Dance Studio,Department Store,Diner,Discount Store,Donut Shop
1,Forest Lawn / Dover / Erin Woods,T2B,51.02711,-113.96678,6,Playground,Liquor Store,Skating Rink,Food Court,Department Store,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
2,Lynnwood Ridge / Ogden / Foothills Industrial / Great Plains,T2C,50.979966,-113.967481,2,Music Venue,Yoga Studio,Hotel,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant
3,Bridgeland / Greenview / Zoo / YYC,T2E,51.086868,-114.050843,0,Furniture / Home Store,Italian Restaurant,Hardware Store,Gourmet Shop,Breakfast Spot,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
4,Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome,T2G,51.028627,-114.035519,0,Sporting Goods Shop,Brewery,Comedy Club,Farmers Market,Sports Bar,Café,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant


In [54]:
# create map
map_clusters = folium.Map(location=[latitude_calgary, longitude_calgary], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(calgary_merged['Latitude'], calgary_merged['Longitude'], calgary_merged['Neighborhood'], calgary_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster 1

### Red circle on map represents

In [73]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 0, calgary_merged.columns[[0] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Penbrooke Meadows / Marlborough,Pizza Place,Pharmacy,Pub,Fast Food Restaurant,Sandwich Place,Dance Studio,Department Store,Diner,Discount Store,Donut Shop
3,Bridgeland / Greenview / Zoo / YYC,Furniture / Home Store,Italian Restaurant,Hardware Store,Gourmet Shop,Breakfast Spot,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
4,Inglewood / Burnsland / Chinatown / East Victoria Park / Saddledome,Sporting Goods Shop,Brewery,Comedy Club,Farmers Market,Sports Bar,Café,French Restaurant,Discount Store,Donut Shop,Eastern European Restaurant
5,Highfield / Burns Industrial,Warehouse Store,Pizza Place,Asian Restaurant,Discount Store,Coffee Shop,Fast Food Restaurant,French Restaurant,Diner,Donut Shop,Eastern European Restaurant
6,Queensland / Lake Bonavista / Willow Park / Acadia,Dance Studio,Baseball Field,Chinese Restaurant,Furniture / Home Store,Hardware Store,Food Court,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant
7,Thorncliffe / Tuxedo Park,Liquor Store,Convenience Store,Bank,Coffee Shop,Supermarket,Vietnamese Restaurant,Pharmacy,Sandwich Place,Discount Store,Diner
9,Mount Pleasant / Capitol Hill / Banff Trail,Massage Studio,Vietnamese Restaurant,Pub,Rental Car Location,Gas Station,Bookstore,Mediterranean Restaurant,Yoga Studio,Diner,Discount Store
10,Kensington / Westmont / Parkdale / University,Café,Yoga Studio,Pharmacy,Bank,Bar,Camera Store,Coffee Shop,Dance Studio,Department Store,Fast Food Restaurant
11,City Centre / Calgary Tower,Coffee Shop,Sandwich Place,Café,Restaurant,Bar,Hotel,Pub,Indian Restaurant,Light Rail Station,Sushi Restaurant
12,Connaught / West Victoria Park,Coffee Shop,Pub,Restaurant,Vietnamese Restaurant,Bar,Café,Brewery,Hotel,French Restaurant,Pizza Place


## Cluster 2
#### Purple circle

In [74]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 1, calgary_merged.columns[[0] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Martindale / Taradale / Falconridge / Saddle Ridge,Restaurant,Yoga Studio,Dance Studio,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant


## Cluster 3
#### represented by light blue circle in map

In [75]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 2, calgary_merged.columns[[0] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Lynnwood Ridge / Ogden / Foothills Industrial / Great Plains,Music Venue,Yoga Studio,Hotel,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant


## Cluster 4
#### Represent in light green

In [76]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 3, calgary_merged.columns[[0] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Rosscarrock / Westgate / Wildwood / Shaganappi / Sunalta,Café,Yoga Studio,Department Store,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service


## Cluster 5
#### Represented in orange

In [77]:
calgary_merged.loc[calgary_merged['Cluster Labels'] == 4, calgary_merged.columns[[0] + list(range(5, calgary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Elbow Park / Britannia / Parkhill / Mission,Park,River,Yoga Studio,Food Court,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market
19,Douglas Glen / McKenzie Lake / Copperfield / East Shepard,Shop & Service,Gym / Fitness Center,Sandwich Place,Park,Hardware Store,Gym,Department Store,Diner,Discount Store,Donut Shop


### Clustering Edmonton Neighborhoods

In [90]:
# set number of clusters
kclusters = 7

edmonton_grouped_clustering = edmonton_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(edmonton_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_
labels

array([0, 0, 0, 0, 2, 0, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0,
       0, 0, 0, 6, 4, 0, 0, 0, 0, 2, 0], dtype=int32)

In [101]:
# add clustering labels
edm_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

edmonton_merged = edmonton_data

# merge edmonton_grouped with edmonton_data to add latitude/longitude for each neighborhood
edmonton_merged = edmonton_merged.join(edm_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

edmonton_merged.head() # check the first columns!

Unnamed: 0,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Clareview / East Londonderry,T5A,53.5945,-113.40573,0.0,Pharmacy,Ice Cream Shop,Bus Station,Breakfast Spot,Discount Store,Arts & Crafts Store,Pizza Place,Department Store,Deli / Bodega,Dessert Shop
1,East North Central / West Beverly,T5B,53.573905,-113.443,0.0,Hockey Arena,Fabric Shop,Park,Convenience Store,Grocery Store,Gift Shop,Deli / Bodega,Department Store,Gym / Fitness Center,Dessert Shop
2,Central Londonderry,T5C,53.599927,-113.454335,0.0,Gym,Recreation Center,Food Court,Martial Arts Dojo,Yoga Studio,Electronics Store,Factory,Fabric Shop,Event Space,Discount Store
3,West Londonderry / East Calder,T5E,53.59957,-113.495145,0.0,Fast Food Restaurant,Coffee Shop,Pharmacy,Pizza Place,Discount Store,Flower Shop,Juice Bar,Bubble Tea Shop,Supermarket,Japanese Restaurant
4,North Central / Queen Mary Park / Blatchford,T5G,53.56806,-113.5074,0.0,Coffee Shop,Pizza Place,Hotel,Liquor Store,Optical Shop,College Cafeteria,Restaurant,Health & Beauty Service,Bar,Food & Drink Shop


In [103]:
# Clean up the data for further analysis
nan_value = float("NaN")
edmonton_merged.replace("", nan_value, inplace=True)

edmonton_merged.dropna(subset = ["Cluster Labels"], inplace=True)

edmonton_merged.head()  #(0, 'Cluster Labels'.astype('int64')

import numpy as np  # To use the int64 dtype, we will need to import numpy
cols = ['Cluster Labels']
for col in cols:
	edmonton_merged[col] = edmonton_merged[col].astype(dtype=np.int64)
edmonton_merged.head()

Unnamed: 0,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Clareview / East Londonderry,T5A,53.5945,-113.40573,0,Pharmacy,Ice Cream Shop,Bus Station,Breakfast Spot,Discount Store,Arts & Crafts Store,Pizza Place,Department Store,Deli / Bodega,Dessert Shop
1,East North Central / West Beverly,T5B,53.573905,-113.443,0,Hockey Arena,Fabric Shop,Park,Convenience Store,Grocery Store,Gift Shop,Deli / Bodega,Department Store,Gym / Fitness Center,Dessert Shop
2,Central Londonderry,T5C,53.599927,-113.454335,0,Gym,Recreation Center,Food Court,Martial Arts Dojo,Yoga Studio,Electronics Store,Factory,Fabric Shop,Event Space,Discount Store
3,West Londonderry / East Calder,T5E,53.59957,-113.495145,0,Fast Food Restaurant,Coffee Shop,Pharmacy,Pizza Place,Discount Store,Flower Shop,Juice Bar,Bubble Tea Shop,Supermarket,Japanese Restaurant
4,North Central / Queen Mary Park / Blatchford,T5G,53.56806,-113.5074,0,Coffee Shop,Pizza Place,Hotel,Liquor Store,Optical Shop,College Cafeteria,Restaurant,Health & Beauty Service,Bar,Food & Drink Shop


In [104]:
# create map
map_clusters = folium.Map(location=[latitude_edmonton, longitude_edmonton], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(edmonton_merged['Latitude'], edmonton_merged['Longitude'], edmonton_merged['Neighborhood'], edmonton_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Cluster 1

In [105]:
edmonton_merged.loc[edmonton_merged['Cluster Labels'] == 0, edmonton_merged.columns[[0] + list(range(5, edmonton_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Clareview / East Londonderry,Pharmacy,Ice Cream Shop,Bus Station,Breakfast Spot,Discount Store,Arts & Crafts Store,Pizza Place,Department Store,Deli / Bodega,Dessert Shop
1,East North Central / West Beverly,Hockey Arena,Fabric Shop,Park,Convenience Store,Grocery Store,Gift Shop,Deli / Bodega,Department Store,Gym / Fitness Center,Dessert Shop
2,Central Londonderry,Gym,Recreation Center,Food Court,Martial Arts Dojo,Yoga Studio,Electronics Store,Factory,Fabric Shop,Event Space,Discount Store
3,West Londonderry / East Calder,Fast Food Restaurant,Coffee Shop,Pharmacy,Pizza Place,Discount Store,Flower Shop,Juice Bar,Bubble Tea Shop,Supermarket,Japanese Restaurant
4,North Central / Queen Mary Park / Blatchford,Coffee Shop,Pizza Place,Hotel,Liquor Store,Optical Shop,College Cafeteria,Restaurant,Health & Beauty Service,Bar,Food & Drink Shop
5,North and East Downtown Fringe,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Korean Restaurant,Vegetarian / Vegan Restaurant,Bakery,Thai Restaurant,Museum,Rental Car Location,Middle Eastern Restaurant
6,North Downtown,Coffee Shop,Sandwich Place,Italian Restaurant,Pub,Café,Restaurant,Fast Food Restaurant,Pharmacy,Steakhouse,Brewery
7,South Downtown / South Downtown Fringe/AB Government,Coffee Shop,Bar,Sandwich Place,Restaurant,Nightclub,Breakfast Spot,Park,Café,Italian Restaurant,Japanese Restaurant
8,North Westmount / West Calder / East Mistatim,Hobby Shop,Sporting Goods Shop,Carpet Store,Fast Food Restaurant,Business Service,Hotel,Grocery Store,Department Store,Dessert Shop,Diner
9,South Westmount / Groat Estate / East Northwest Industrial,Fast Food Restaurant,Yoga Studio,Japanese Restaurant,Factory,Vietnamese Restaurant,Thrift / Vintage Store,Coffee Shop,Falafel Restaurant,Fabric Shop,Event Space


### Examine Cluster 2

In [106]:
edmonton_merged.loc[edmonton_merged['Cluster Labels'] == 1, edmonton_merged.columns[[0] + list(range(5, edmonton_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,East Castle Downs,Park,Yoga Studio,Hobby Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Electronics Store


### Examine Cluster 3

In [107]:
edmonton_merged.loc[edmonton_merged['Cluster Labels'] == 2, edmonton_merged.columns[[0] + list(range(5, edmonton_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,West Northwest Industrial / Winterburn,Electronics Store,Yoga Studio,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Event Space
15,Central Mistatim,Electronics Store,Yoga Studio,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Event Space


### Examine Cluster 4

In [108]:
edmonton_merged.loc[edmonton_merged['Cluster Labels'] == 3, edmonton_merged.columns[[0] + list(range(5, edmonton_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Edmonton,Stables,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Space


### Examine Cluster 5

In [109]:
edmonton_merged.loc[edmonton_merged['Cluster Labels'] == 4, edmonton_merged.columns[[0] + list(range(5, edmonton_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,West Castle Downs,Women's Store,Hobby Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Space


## Discussion

This analysis compares Calgary and Edmonton neighborhoods with data derived from FourSquare to determine similarities/differences in the number and categories of venues in each city. Overall, it presents a process for web scraping data and using geocoding services to preprocess data and preparation for data extraction using the FourSquare API. The data is then clustered into similar neighborhoods using the K-means clustering method, an unsupervised machine learning algorithm used to identify underlying patterns of the clusters. 

The results show that despite a population deficit of ~85k, Edmonton took the lead in total number of venues by 16%. Edmonton also leads in the number of unique categories by 22%. This is where the FourSquare venue categories is questionable and deserves further investigation.  

There are 23 venue categories that exist only Edmonton.Does this mean that Edmonton has a more diverse categories of venues or does it imply that the FourSquare categories are applied differently in each city?

 A quick observation suggests that at least some similar venues are categorized differently in each city. For instance, Calgary has a ‘Donut’ category with 8 venues assigned to it and in Edmonton this category does not exist. Does this mean that people don not eat donuts in Edmonton or does it imply a different set of criteria is used to assign a category in each city? The presence of Tim Horton shops in Edmonton suggests the later. 
 
In another example, Edmonton has a Deli / Bodega category that contains 8 venues and this category does not exist in the Calgary data. This does not however preclude that Calgary does not have any venues that satisfy this category, rather they are simply assigned to an existing category by an unknown set of criteria. 

These examples suggest the FourSquare data used in this analysis lacks consistency and therefore will impact the results. Further analysis is required to determine the criteria used by FourSquare to assign venues to categories in each city. 


## Conclusion

In conclusion, this analysis was intended to be a simplistic and lighthearted approach to the question: 
“Calgary or Edmonton – Which city is better to live in?” 

A severe limitation to this approach lies both in its simplicity and in data accuracy. A future approach would include other socioeconomic factors such as real estate and rental prices, commute time, weather, walk ability of neighborhoods, etc. to provide a more holistic approach. Also, data accuracy is dependent on data retrieved from FourSquare, concerns of inconsistency previously discussed would require further investigation. Therefore, the results of this analysis are inconclusive.

The question remains: “Calgary or Edmonton – Which city is better to live in?”

This analysis was designed to meet the Battle of the Neighborhoods capstone requirement for the IBM Data Science Professional Certificate offered by Coursera. These requirements were meet through the following steps:

•	Data acquisition and Preparation: webscraping, utilize various python libraries

•	Process data: geocode data to pass to FourSquare API for venue data retrieval

•	Data Exploration: using Folium to visualize data

•	Analysis: Machine learning techniques one hot encoding and k-means clustering to uncover patterns

•	Communicate results

Data science can be used to answer a myriad of questions and there is plenty of free data available to satisfy the most curious minds. Go ahead and start exploring, you will find insights into your most pressing questions.
