# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening **restaurant** in **Alberta**, Canada.

Since there are lots of restaurants in Alberta we will try to detect **locations that are not already crowded with restaurants**. We would also prefer locations **as close to city center as possible**. Additional, maybe open specific "Thai restuarant", if we have data for considering.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* distance of neighborhood from city center

Following data sources will be needed to extract/generate the required information:
* Get the information about Borough, Neighborhood, latitude, Longitude from latitude from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

## Methodology  <a name="methodology"></a>

Find the neighborhood information

Let's get the data using BeautifulSoup

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files


import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import requests
from bs4 import BeautifulSoup

In [3]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T")                                                  
soup = BeautifulSoup(res.text, 'lxml') #if you find any problem with "lxml" then try using "html.parser" instead
table = soup.find("table",class_="wikitable")
len(soup.find_all('table'))

6

In [13]:
    rows = table.findAll("tr")
    row_lengths = [len(r.findAll(['th', 'td'])) for r in rows]
    ncols = max(row_lengths)
    nrows = len(rows)
    data = []
    for i in range(nrows):
        rowD = []
        for j in range(ncols):
            rowD.append('')
        data.append(rowD)

    # process html
    for i in range(len(rows)):
        row = rows[i]
        rowD = []
        cells = row.findAll(["td", "th"])
        for j in range(len(cells)):
            cell = cells[j]

            #lots of cells span cols and rows so lets deal with that
            cspan = int(cell.get('colspan', 1))
            rspan = int(cell.get('rowspan', 1))
            l = 0
            for k in range(rspan):
                # Shifts to the first empty cell of this row
                while data[i + k][j + l]:
                    l += 1
                for m in range(cspan):
                        cell_n = j + l + m
                        row_n = i + k
                        # in some cases the colspan can overflow the table, in those cases just get the last item
                        cell_n = min(cell_n, len(data[row_n])-1)
                        data[row_n][cell_n] += cell.text.strip().replace('\n',"")
                        print(cell.text)
            data.append(rowD)

Postal Code

Borough

Neighborhood

Latitude

Longitude

T1A

Medicine Hat

Central Medicine Hat

50.036460

-110.679250

T2A

Calgary

Penbrooke Meadows, Marlborough

51.049680

-113.964320

T3A

Calgary

Dalhousie, Edgemont, Hamptons, Hidden Valley

51.126060

-114.143158

T4A

Airdrie

East Airdrie

51.272450

-113.986980

T5A

Edmonton

West Clareview, East Londonderry

53.5899

-113.4413

T6A

Edmonton

North Capilano

53.5483

-113.408

T7A

Drayton Valley

Not assigned

53.2165

-114.9893

T8A

Sherwood Park

West Sherwood Park

53.519

-113.3216

T9A

Wetaskiwin

Not assigned

52.9741

-113.3646

T1B

Medicine Hat

South Medicine Hat

50.0172

-110.651

T2B

Calgary

Forest Lawn, Dover, Erin Woods

51.0318

-113.9786

T3B

Calgary

Montgomery, Bowness, Silver Springs, Greenwood

51.0809

-114.1616

T4B

Airdrie

West Airdrie

51.2816

-114.0153

T5B

Edmonton

East North Central, West Beverly

53.5766

-113.4608

T6B

Edmonton

SE Capilano, West Southeast Industrial, East Bonni

In [14]:
df=pd.DataFrame(data)
df = df[1:] #take the data less the header row
df = df.replace(to_replace='None', value=np.nan).dropna() #remove none data row
df = df.replace(to_replace='Not assigned', value=np.nan).dropna() #remove not assigned row
df.columns = ['Postal Code', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
1,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925
2,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
3,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
4,T4A,Airdrie,East Airdrie,51.27245,-113.98698
5,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413
6,T6A,Edmonton,North Capilano,53.5483,-113.408
8,T8A,Sherwood Park,West Sherwood Park,53.519,-113.3216
10,T1B,Medicine Hat,South Medicine Hat,50.0172,-110.651
11,T2B,Calgary,"Forest Lawn, Dover, Erin Woods",51.0318,-113.9786
12,T3B,Calgary,"Montgomery, Bowness, Silver Springs, Greenwood",51.0809,-114.1616


In [15]:
df.shape

(97, 5)

In [26]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 11 boroughs and 97 neighborhoods.


In [28]:
df.groupby(['Borough']).count()

Unnamed: 0_level_0,Postal Code,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Airdrie,2,2,2,2
Calgary,34,34,34,34
Edmonton,38,38,38,38
Fort McMurray,3,3,3,3
Grande Prairie,3,3,3,3
Leduc,1,1,1,1
Lethbridge,3,3,3,3
Medicine Hat,3,3,3,3
Red Deer,3,3,3,3
Sherwood Park,6,6,6,6


We select Edmonton City because Edmonton is the capital city of the Canadian province of Alberta.

In [16]:
Alberta_data = df[df['Borough'].str.contains('Edmonton', regex=False)]
Alberta_data

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
5,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413
6,T6A,Edmonton,North Capilano,53.5483,-113.408
14,T5B,Edmonton,"East North Central, West Beverly",53.5766,-113.4608
15,T6B,Edmonton,"SE Capilano, West Southeast Industrial, East B...",53.5322,-113.4404
23,T5C,Edmonton,Central Londonderry,53.6129,-113.4572
24,T6C,Edmonton,Central Bonnie Doon,53.5182,-113.4769
32,T5E,Edmonton,"West Londonderry, East Calder",53.5923,-113.5168
33,T6E,Edmonton,"South Bonnie Doon, East University",53.5087,-113.5078
41,T5G,Edmonton,"North Central, Queen Mary Park, Blatchford",53.5682,-113.4822
42,T6G,Edmonton,"West University, Strathcona Place",53.5248,-113.5334


In [29]:
print('Edmonton City has {} boroughs and {} neighborhoods.'.format(
        len(Alberta_data['Borough'].unique()),
        Alberta_data.shape[0]
    )
)

Edmonton City has 1 boroughs and 38 neighborhoods.


### Use geopy library to get the latitude and longitude values of Edmonton, Alberta.
Let's get the geographical coordinates of Edmonton, Alberta.

In [18]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [31]:
address = 'Edmonton, Alberta'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Alberta are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Alberta are 53.535411, -113.507996.


Folium is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Alberta.

In [21]:
!conda install -c conda-forgefolium=0.5.0 --yes
!pip install folium
import folium
from IPython.display import HTML, display
import folium


CondaValueError: too few arguments, must supply command line package specs or --file

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.4 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [32]:
# create map of Toronto using latitude and longitude values
map_Alberta = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Alberta_data['Latitude'], Alberta_data['Longitude'], Alberta_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Alberta)  
    
map_Alberta

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [33]:
CLIENT_ID = '5EVFNTDVNLBCQMWSMQJI0SBKP3XW4FX2BWSZM3CJOE5RRZ3A' # your Foursquare ID
CLIENT_SECRET = 'EO1QHGNMT4LRH52ERD2HRGHYMTKMHRJ3QR3GHO424WV22LIP' # your Foursquare Secret
ACCESS_TOKEN = 'N25HUQKMW1Q3G5J5XWKKOTRFTTQ42OQ55P1OSQCNZC3GCAR5' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 5EVFNTDVNLBCQMWSMQJI0SBKP3XW4FX2BWSZM3CJOE5RRZ3A
CLIENT_SECRET:EO1QHGNMT4LRH52ERD2HRGHYMTKMHRJ3QR3GHO424WV22LIP


#### Let's explore the first neighborhood in our dataframe.

In [37]:
neighborhood_latitude = df.loc[1, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[1, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[1, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Central Medicine Hat are 50.036460, -110.679250.


#### Now, let's get the top 100 venues that are in Edmonton within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [38]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=5EVFNTDVNLBCQMWSMQJI0SBKP3XW4FX2BWSZM3CJOE5RRZ3A&client_secret=EO1QHGNMT4LRH52ERD2HRGHYMTKMHRJ3QR3GHO424WV22LIP&v=20180604&ll=50.036460,-110.679250&radius=500&limit=100'

Send the GET request and examine the resutls

In [40]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60ed222207d1373caaace977'},
 'response': {'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 50.0409600045, 'lng': -110.67225700231893},
   'sw': {'lat': 50.031959995499996, 'lng': -110.68624299768106}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c38ff1e18e72d7f6d7619f5',
       'name': 'Madhatter Coffee Roastery',
       'location': {'address': '513 3rd St SE',
        'lat': 50.03916323500158,
        'lng': -110.67718867205282,
        'labeledLatLngs': [{'label': 'display',
          'lat': 50.03916323500158,
          'lng': -110.67718867205282}],
        'distance': 335,
        'cc': 'CA',
    

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [41]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [42]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Madhatter Coffee Roastery,Coffee Shop,50.039163,-110.677189
1,Local Public Eatery Medicine Hat,Pub,50.039218,-110.676133
2,Dairy Queen,Ice Cream Shop,50.03792,-110.680125
3,Esplanade Arts And Heritage Centre,Theater,50.039907,-110.680308
4,Subway,Sandwich Place,50.040062,-110.676073


And how many venues were returned by Foursquare?

In [43]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


### Explore Neighborhoods in Edmonton, Alberta

#### Let's create a function to repeat the same process to all the neighborhoods in Edmonton, Alberta

In [44]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called _Alberta_venues_.

In [45]:
Alberta_venues = getNearbyVenues(names=Alberta_data['Neighborhood'],
                                   latitudes=Alberta_data['Latitude'],
                                   longitudes=Alberta_data['Longitude']
                                  )

West Clareview, East Londonderry
North Capilano
East North Central, West Beverly
SE Capilano, West Southeast Industrial, East Bonnie Doon
Central Londonderry
Central Bonnie Doon
West Londonderry, East Calder
South Bonnie Doon, East University
North Central, Queen Mary Park, Blatchford
West University, Strathcona Place
North Downtown Fringe, East Downtown Fringe
Southgate, North Riverbend
North Downtown
Kaskitayo, Aspen Gardens
South Downtown, South Downtown Fringe (Alberta Provincial Government)
West Mill Woods
North Westmount, West Calder, East Mistatim
East Mill Woods
South Westmount, Groat Estate, East Northwest Industrial
Southwest Edmonton
Glenora, SW Downtown Fringe
South Industrial
North Jasper Place
East Southeast Industrial, South Clover Bar
Central Jasper Place, Buena Vista
Southgate, North Riverbend
West Northwest Industrial, Winterburn
North Clover Bar
West Jasper Place, West Edmonton Mall
The Meadows
Central Mistatim
The Palisades, West Castle Downs
Central Beverly
Heritag

#### Let's check the size of the resulting dataframe

In [46]:
print(Alberta_venues.shape)
Alberta_venues.head()

(320, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"West Clareview, East Londonderry",53.5899,-113.4413,Buffet Royale Carvery,53.587229,-113.439075,Buffet
1,"West Clareview, East Londonderry",53.5899,-113.4413,Café del Sol,53.592441,-113.441455,Mexican Restaurant
2,"West Clareview, East Londonderry",53.5899,-113.4413,Red Claw Gaming,53.586937,-113.439775,Toy / Game Store
3,"West Clareview, East Londonderry",53.5899,-113.4413,My Grandma's Attic,53.586033,-113.441629,Record Shop
4,"West Clareview, East Londonderry",53.5899,-113.4413,Belvedere Transit Centre,53.587932,-113.435254,Bus Station


Let's check how many venues were returned for each neighborhood.

In [47]:
Alberta_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Beverly,4,4,4,4,4,4
Central Bonnie Doon,6,6,6,6,6,6
"Central Jasper Place, Buena Vista",10,10,10,10,10,10
Central Mistatim,3,3,3,3,3,3
East Castledowns,4,4,4,4,4,4
East Mill Woods,3,3,3,3,3,3
"East North Central, West Beverly",4,4,4,4,4,4
"East Southeast Industrial, South Clover Bar",2,2,2,2,2,2
Ellerslie,4,4,4,4,4,4
"Glenora, SW Downtown Fringe",1,1,1,1,1,1


Let's find out how many unique categories can be curated from all the returned venues

In [48]:
print('There are {} uniques categories.'.format(len(Alberta_venues['Venue Category'].unique())))

There are 129 uniques categories.


### Analyze Each Neighborhood

In [49]:
# one hot encoding
Alberta_onehot = pd.get_dummies(Alberta_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Alberta_onehot['Neighborhood'] = Alberta_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Alberta_onehot.columns[-1]] + list(Alberta_onehot.columns[:-1])
Alberta_onehot = Alberta_onehot[fixed_columns]

Alberta_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Big Box Store,Bookstore,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Café,Casino,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Gym,College Residence Hall,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Driving Range,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Health & Beauty Service,Hockey Arena,Home Service,Hot Dog Joint,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Lake,Lawyer,Light Rail Station,Liquor Store,Lounge,Massage Studio,Medical Supply Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Movie Theater,Museum,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Paintball Field,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Pool Hall,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,Rock Club,Salad Place,Sandwich Place,Shopping Mall,Skating Rink,Ski Trail,Smoke Shop,Smoothie Shop,Soccer Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Shop
0,"West Clareview, East Londonderry",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"West Clareview, East Londonderry",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"West Clareview, East Londonderry",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,"West Clareview, East Londonderry",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"West Clareview, East Londonderry",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [50]:
Alberta_onehot.shape

(320, 130)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [51]:
Alberta_grouped = Alberta_onehot.groupby('Neighborhood').mean().reset_index()
Alberta_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Dealership,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Big Box Store,Bookstore,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Café,Casino,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Gym,College Residence Hall,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Driving Range,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Health & Beauty Service,Hockey Arena,Home Service,Hot Dog Joint,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Lake,Lawyer,Light Rail Station,Liquor Store,Lounge,Massage Studio,Medical Supply Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Motorcycle Shop,Movie Theater,Museum,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Paintball Field,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Pool Hall,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,Rock Club,Salad Place,Sandwich Place,Shopping Mall,Skating Rink,Ski Trail,Smoke Shop,Smoothie Shop,Soccer Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Shop
0,Central Beverly,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Central Bonnie Doon,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0
2,"Central Jasper Place, Buena Vista",0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Mistatim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
4,East Castledowns,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,East Mill Woods,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"East North Central, West Beverly",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"East Southeast Industrial, South Clover Bar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Ellerslie,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Glenora, SW Downtown Fringe",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [52]:
Alberta_grouped.shape

(35, 130)

Let's print each neighborhood along with the top 5 most common venues

In [53]:
num_top_venues = 5

for hood in Alberta_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Alberta_grouped[Alberta_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Beverly----
                 venue  freq
0           Smoke Shop  0.50
1          IT Services  0.25
2        Grocery Store  0.25
3  American Restaurant  0.00
4               Office  0.00


----Central Bonnie Doon----
                 venue  freq
0  American Restaurant  0.17
1           Water Park  0.17
2       Cosmetics Shop  0.17
3                Trail  0.17
4         Liquor Store  0.17


----Central Jasper Place, Buena Vista----
                     venue  freq
0                     Café   0.1
1  Health & Beauty Service   0.1
2     Fast Food Restaurant   0.1
3        Convenience Store   0.1
4              Salad Place   0.1


----Central Mistatim----
                venue  freq
0        Liquor Store  0.33
1     Warehouse Store  0.33
2              Casino  0.33
3  Photography Studio  0.00
4         Pizza Place  0.00


----East Castledowns----
                venue  freq
0               Plaza  0.50
1          Playground  0.25
2        Skating Rink  0.25
3         Pizza Place 

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [54]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [55]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Alberta_grouped['Neighborhood']

for ind in np.arange(Alberta_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Alberta_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Beverly,Smoke Shop,IT Services,Grocery Store,Filipino Restaurant,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
1,Central Bonnie Doon,American Restaurant,Water Park,Cosmetics Shop,Trail,Breakfast Spot,Liquor Store,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop
2,"Central Jasper Place, Buena Vista",Café,Salad Place,Pizza Place,Health & Beauty Service,Bakery,Convenience Store,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Liquor Store
3,Central Mistatim,Warehouse Store,Casino,Liquor Store,Wine Shop,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
4,East Castledowns,Plaza,Playground,Skating Rink,Electronics Store,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Eastern European Restaurant


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [56]:
# set number of clusters
kclusters = 5

Alberta_grouped_clustering = Alberta_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Alberta_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 1, 2, 2, 2, 2, 3], dtype=int32)

In [64]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Alberta_merged = Alberta_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Alberta_merged = Alberta_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

ValueError: cannot insert Cluster Labels, already exists

In [66]:
Alberta_merged = Alberta_merged.dropna()

In [74]:
Alberta_merged= Alberta_merged.astype({"Cluster Labels": 'int64'})
Alberta_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413,2,Bus Station,Toy / Game Store,Buffet,Record Shop,Mexican Restaurant,Eastern European Restaurant,Electronics Store,Dog Run,Distribution Center,Gas Station
6,T6A,Edmonton,North Capilano,53.5483,-113.408,1,Playground,Park,Ski Trail,Bus Station,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Diner,Fast Food Restaurant
14,T5B,Edmonton,"East North Central, West Beverly",53.5766,-113.4608,2,Smoke Shop,IT Services,Grocery Store,Filipino Restaurant,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
15,T6B,Edmonton,"SE Capilano, West Southeast Industrial, East B...",53.5322,-113.4404,1,Playground,Construction & Landscaping,Bar,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Eastern European Restaurant
24,T6C,Edmonton,Central Bonnie Doon,53.5182,-113.4769,2,American Restaurant,Water Park,Cosmetics Shop,Trail,Breakfast Spot,Liquor Store,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop


In [75]:
Alberta_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413,2,Bus Station,Toy / Game Store,Buffet,Record Shop,Mexican Restaurant,Eastern European Restaurant,Electronics Store,Dog Run,Distribution Center,Gas Station
6,T6A,Edmonton,North Capilano,53.5483,-113.408,1,Playground,Park,Ski Trail,Bus Station,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Diner,Fast Food Restaurant
14,T5B,Edmonton,"East North Central, West Beverly",53.5766,-113.4608,2,Smoke Shop,IT Services,Grocery Store,Filipino Restaurant,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
15,T6B,Edmonton,"SE Capilano, West Southeast Industrial, East B...",53.5322,-113.4404,1,Playground,Construction & Landscaping,Bar,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Eastern European Restaurant
24,T6C,Edmonton,Central Bonnie Doon,53.5182,-113.4769,2,American Restaurant,Water Park,Cosmetics Shop,Trail,Breakfast Spot,Liquor Store,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop
32,T5E,Edmonton,"West Londonderry, East Calder",53.5923,-113.5168,2,Grocery Store,Arts & Crafts Store,Dog Run,Butcher,Bakery,Hockey Arena,Baseball Field,Comic Shop,Recreation Center,Wine Shop
33,T6E,Edmonton,"South Bonnie Doon, East University",53.5087,-113.5078,2,American Restaurant,Coffee Shop,Pharmacy,Mediterranean Restaurant,Flower Shop,French Restaurant,Food Truck,Food & Drink Shop,Filipino Restaurant,Fast Food Restaurant
41,T5G,Edmonton,"North Central, Queen Mary Park, Blatchford",53.5682,-113.4822,2,Pharmacy,Bakery,Bank,Music Venue,Theater,Café,Grocery Store,Flower Shop,French Restaurant,Food Truck
42,T6G,Edmonton,"West University, Strathcona Place",53.5248,-113.5334,2,Theater,College Gym,Pub,Sandwich Place,Coffee Shop,College Residence Hall,Diner,Bank,Fast Food Restaurant,French Restaurant
50,T5H,Edmonton,"North Downtown Fringe, East Downtown Fringe",53.555,-113.4822,2,Thai Restaurant,Gym,Soccer Stadium,Gift Shop,Grocery Store,Café,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,French Restaurant


Finally, let's visualize the resulting clusters

In [81]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Alberta_merged['Latitude'], Alberta_merged['Longitude'], Alberta_merged['Neighborhood'], Alberta_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

#### Cluster 0

In [88]:
Alberta_merged.loc[Alberta_merged['Cluster Labels'] == 0, Alberta_merged.columns[[1] + list(range(5, Alberta_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
60,Edmonton,0,Lake,Tennis Court,Wine Shop,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Electronics Store


#### Cluster 1

In [84]:
Alberta_merged.loc[Alberta_merged['Cluster Labels'] == 1, Alberta_merged.columns[[1] + list(range(5, Alberta_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Edmonton,1,Playground,Park,Ski Trail,Bus Station,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Diner,Fast Food Restaurant
15,Edmonton,1,Playground,Construction & Landscaping,Bar,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Eastern European Restaurant
68,Edmonton,1,Park,Thai Restaurant,Hotel,Plaza,Convenience Store,Sandwich Place,French Restaurant,Baseball Stadium,Fast Food Restaurant,Food & Drink Shop
96,Edmonton,1,Park,Golf Driving Range,Gym,Wine Shop,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
158,Edmonton,1,Plaza,Playground,Skating Rink,Electronics Store,Food Truck,Food & Drink Shop,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Eastern European Restaurant


#### Cluster 2

In [85]:
Alberta_merged.loc[Alberta_merged['Cluster Labels'] == 2, Alberta_merged.columns[[1] + list(range(5, Alberta_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Edmonton,2,Bus Station,Toy / Game Store,Buffet,Record Shop,Mexican Restaurant,Eastern European Restaurant,Electronics Store,Dog Run,Distribution Center,Gas Station
14,Edmonton,2,Smoke Shop,IT Services,Grocery Store,Filipino Restaurant,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fast Food Restaurant
24,Edmonton,2,American Restaurant,Water Park,Cosmetics Shop,Trail,Breakfast Spot,Liquor Store,Filipino Restaurant,French Restaurant,Food Truck,Food & Drink Shop
32,Edmonton,2,Grocery Store,Arts & Crafts Store,Dog Run,Butcher,Bakery,Hockey Arena,Baseball Field,Comic Shop,Recreation Center,Wine Shop
33,Edmonton,2,American Restaurant,Coffee Shop,Pharmacy,Mediterranean Restaurant,Flower Shop,French Restaurant,Food Truck,Food & Drink Shop,Filipino Restaurant,Fast Food Restaurant
41,Edmonton,2,Pharmacy,Bakery,Bank,Music Venue,Theater,Café,Grocery Store,Flower Shop,French Restaurant,Food Truck
42,Edmonton,2,Theater,College Gym,Pub,Sandwich Place,Coffee Shop,College Residence Hall,Diner,Bank,Fast Food Restaurant,French Restaurant
50,Edmonton,2,Thai Restaurant,Gym,Soccer Stadium,Gift Shop,Grocery Store,Café,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,French Restaurant
51,Edmonton,2,Furniture / Home Store,Coffee Shop,Sandwich Place,Restaurant,Clothing Store,Light Rail Station,Distribution Center,Electronics Store,Fast Food Restaurant,Filipino Restaurant
59,Edmonton,2,Coffee Shop,Sandwich Place,Pub,Restaurant,Hotel,Café,Fast Food Restaurant,Italian Restaurant,Diner,Pharmacy


#### Cluster 3

In [86]:
Alberta_merged.loc[Alberta_merged['Cluster Labels'] == 3, Alberta_merged.columns[[1] + list(range(5, Alberta_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
95,Edmonton,3,Portuguese Restaurant,Wine Shop,Furniture / Home Store,Diner,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Electronics Store,Fast Food Restaurant


#### Cluster 4

In [87]:
Alberta_merged.loc[Alberta_merged['Cluster Labels'] == 4, Alberta_merged.columns[[1] + list(range(5, Alberta_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,Edmonton,4,Paintball Field,Wine Shop,Dim Sum Restaurant,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant,Electronics Store,Fast Food Restaurant,Filipino Restaurant


## Result <a name="results"></a>

Finally, we can find the good location for opening a restaurant in **Edmonton City, Alberta**, Canada.Edmonton City has 1 boroughs and 38 neighborhoods.
We try to detect **locations that are not already crowded with restaurants**. We would also prefer locations **as close to city center as possible**.
From Edmonton's neighborhood, we found 7 venues, 129 uniques categories. were returned by Foursquare within a radius of 500 meters.
And we cluster the Edmonton's neighborhood to 5 clusters by using k-means. 
The result from clustering, Cluster no. 1,2 have many restuarants and also have Thai restuarants in most common venue.
But Cluster no. 0,3,4 don't have Thai restuarants. We will consider cluster no. 0,3,4.

## Discussion <a name="discussion"></a>
For Cluster no. 0,3,4, don't have Thai restuarants. I prefer to select cluster no.3 because this zone is close to city center.
But cluster no.4 have many Asian restuarants, we need to seperate from the exising Asian restuarants. And cluster no.0 is far from city center more than cluster no.3

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Edmonton areas close to Alberta center. Clustering of those locations was then performed in order to create zones of interest and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decision about optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.