# IBM Applied Data Science Capstone Course 

## Exploring the opportunity for opening a new recreational center in Calgary, Alberta

### Week 5 Final Report

### Table of contents

* [Introduction: Business Problem](#Introduction:Business)
* [Data](#Data)
* [Use the Foursquare API to explore the neighborhoods](#API)
* [Analyze each Neighborhood](#neigh)
* [Recreation center Data Frame](#rec)
* [Visualizing the Clusters](#vis)
* [Exploratory Data Analysis](#exp)
* [Examining Clusters](#exa)
    * [Observations](#obvs)
* [Proposed location](#prop)

## Introduction: Business Problem <a class="anchor" id="Introduction:Business"></a>

In this project we will try to find an optimal location for a recreational center. Specifically, this report will be targeting developers/ city council members who are considering opening such a development.

Data available online shows that there are currently numerous recreational facilities in Calgary currently, so one objective would be to find locations that don't already have a facility in the vicinity. It is also a preference for a location that have a a good amount of residents that would take advantage of such a facility, assuming the first condition is met.

**Target Audience**

The target audience for this project are developers or Calgary city officials exploring the idea of building a recreational center and along with other analyses are interested in looking at data led analysis of possible locations to house such a facility. 

**Data**

* A list of all neighbourhoods within the City of Calgary including residential communities, industrial areas, major parks and residual areas by electoral ward.
* Datasets that show names and addresses for current recreation facilities, including amenities available at each location.
* Longitudinal and Latitudinal data for locations of neighbourhoods as well as current recreational facilities, as this will aid in collecting more data relating to those locations as well as allow for plotting a map.

Wikipedia provides a list of the neighbourhoods in Calgary along with information such as the type, population and dwellings for each neighbourhood. Web scraping will be used to collect this data and with LXML and beautifulsoup packages I will organize this data into a Pandas data frame. Python Geocoder will then be used to assign longitude and latitude coordinates to the neighbourhoods. 

___

## Data <a class="anchor" id="Data"></a>

### Import Libraries

In [1]:
print("Importing Libraries.....")

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't install Folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge geocoder --yes
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't installed Folium
import folium # map rendering library
from folium import plugins
from folium.plugins import HeatMap

!conda install -y lxml # uncomment this if already installed
import lxml.html as lh # library to parse HTML and XML documents

print('Libraries imported.')

Importing Libraries.....
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ######

In [2]:
url_1 = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Calgary'

# get page contents
page = requests.get(url_1)

# Store contents of the webpage
doc = lh.fromstring(page.content)

# row elements in html are stored between <tr>..</tr>
row_elements = doc.xpath('//tr')

#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list name
for t in row_elements[0]:
    i+=1
    name=t.text_content()
    if name[-2] == "]":  # remove [X] in text from original table
        name = name[:-4]
        if name[-1] == "[":
            name = name[:-1]
    if name[-1] == "\n": # remove any newline characters
        name = name[:-1]
    name.rstrip("\n")
    col.append((name,[]))



In [3]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(row_elements)):
    #T is our j'th row
    T=row_elements[j]
    
    #If row is not of size 12, the //tr data is not from our table 
    if len(T)!=12:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [4]:
# Organize the data into a pandas data frame
Dict={title:column for (title,column) in col}
df = pd.DataFrame.from_dict(Dict, orient='index').transpose()
df = df.replace('\n','', regex=True)
df.columns = [c.replace(' ', '') for c in df.columns]

# remove columns not needed for analysis
df =df.drop(['Area(km2)','2012PopulationRank','Ward','Population(2011)'], axis=1)

# remove rows with no information
for index, row in df.iterrows():
    if row['Population(2012)']== " ":
        df.drop(df.index[index],inplace=True)

print(df.shape)
df.head()

(257, 8)


Unnamed: 0,Name,Quadrant,Sector,Type,Population(2012),%change,Dwellings(2012),Populationdensity
0,Abbeydale,NE/SE,Northeast,Residential,5917,3.8,2023,3480.6
1,Acadia,SE,South,Residential,10705,0.8,5053,2744.9
2,Albert Park/Radisson Heights,SE,East,Residential,6234,0.3,2709,2493.6
3,Altadore,SW,Centre,Residential,9116,2.3,4486,3143.4
4,Alyth/Bonnybrook,SE,Centre,Industrial,16,−5.9,14,4.2


In [5]:
# remove any neighbourhoods that arent residential
df = df[df.Type == "Residential"].reset_index(drop=True)
df = df.dropna()

# drop neighbourhoods with not residents
df = df[df.Populationdensity != "0"].reset_index(drop=True) 

df.head()

Unnamed: 0,Name,Quadrant,Sector,Type,Population(2012),%change,Dwellings(2012),Populationdensity
0,Abbeydale,NE/SE,Northeast,Residential,5917,3.8,2023,3480.6
1,Acadia,SE,South,Residential,10705,0.8,5053,2744.9
2,Albert Park/Radisson Heights,SE,East,Residential,6234,0.3,2709,2493.6
3,Altadore,SW,Centre,Residential,9116,2.3,4486,3143.4
4,Applewood Park,SE/NE,East,Residential,6498,1.5,2215,4061.3


In [6]:
# remove neighbourhoods with negative % change in population
for index, row in df.iterrows():
    t=row['%change']
    try:
        if t[0] == "−":
            df.drop(index, inplace=True)
    except:
        df.drop(index, inplace=True)
        
df.sort_index()
print(df.shape)
df.head(5)

(145, 8)


Unnamed: 0,Name,Quadrant,Sector,Type,Population(2012),%change,Dwellings(2012),Populationdensity
0,Abbeydale,NE/SE,Northeast,Residential,5917,3.8,2023,3480.6
1,Acadia,SE,South,Residential,10705,0.8,5053,2744.9
2,Albert Park/Radisson Heights,SE,East,Residential,6234,0.3,2709,2493.6
3,Altadore,SW,Centre,Residential,9116,2.3,4486,3143.4
4,Applewood Park,SE/NE,East,Residential,6498,1.5,2215,4061.3


We're now left with residential neighbourhoods that saw an increase in population. 

Now we'll co-ordinates of the neighbourhoods with Geopy Client. Few of the neighbourhoods returned incorrect co-ordinates for the latitude and longitude. So, the area values had to be corrected with the correct values and then the data-frame generated was as below.

In [7]:
address = 'Calgary, AB'

geolocator = Nominatim(user_agent="cyyc_explorer")
location = geolocator.geocode(address)
yyc_latitude = location.latitude
yyc_longitude = location.longitude
print('The geograpical coordinate of Calgary are {}, {}.'.format(yyc_latitude, yyc_longitude))

The geograpical coordinate of Calgary are 51.0534234, -114.0625892.


Now we get the coordinates for each neighborhood and plot them on a map of Calgary

In [8]:
df["Latitude"]=(df["Name"]+",Calgary").apply(geolocator.geocode).apply(lambda x: x.latitude if x != None else None)
df["Longitude"]=(df["Name"]+",Calgary").apply(geolocator.geocode).apply(lambda x1: x1.longitude if x1 != None else None)

df.head()

Unnamed: 0,Name,Quadrant,Sector,Type,Population(2012),%change,Dwellings(2012),Populationdensity,Latitude,Longitude
0,Abbeydale,NE/SE,Northeast,Residential,5917,3.8,2023,3480.6,51.058836,-113.929413
1,Acadia,SE,South,Residential,10705,0.8,5053,2744.9,50.968655,-114.055587
2,Albert Park/Radisson Heights,SE,East,Residential,6234,0.3,2709,2493.6,51.044845,-113.990195
3,Altadore,SW,Centre,Residential,9116,2.3,4486,3143.4,51.015104,-114.100756
4,Applewood Park,SE/NE,East,Residential,6498,1.5,2215,4061.3,51.044658,-113.928931


In [9]:
df = df.dropna()
df=df.rename(columns={"Name": "Neighborhood",})
df.head(10)

Unnamed: 0,Neighborhood,Quadrant,Sector,Type,Population(2012),%change,Dwellings(2012),Populationdensity,Latitude,Longitude
0,Abbeydale,NE/SE,Northeast,Residential,5917,3.8,2023,3480.6,51.058836,-113.929413
1,Acadia,SE,South,Residential,10705,0.8,5053,2744.9,50.968655,-114.055587
2,Albert Park/Radisson Heights,SE,East,Residential,6234,0.3,2709,2493.6,51.044845,-113.990195
3,Altadore,SW,Centre,Residential,9116,2.3,4486,3143.4,51.015104,-114.100756
4,Applewood Park,SE/NE,East,Residential,6498,1.5,2215,4061.3,51.044658,-113.928931
5,Arbour Lake,NW,Northwest,Residential,10836,0.7,3918,2462.7,51.136786,-114.202355
6,Aspen Woods,SW,West,Residential,5271,17.9,2281,1387.1,51.043119,-114.210185
7,Auburn Bay,SE,Southeast,Residential,7193,24.7,2808,1598.4,50.890605,-113.959565
8,Banff Trail,NW,Centre,Residential,3837,7.1,1950,2558.0,51.071462,-114.111647
9,Bankview,SW,Centre,Residential,5221,9.8,3528,7458.6,51.033887,-114.099518


In [10]:
# create map of Calgary using latitude and longitude values
map_calgary = folium.Map(location=[yyc_latitude,yyc_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)
map_calgary

In [11]:
map_calgary.save('map_calgary.html')

# Use the Foursquare API to explore the neighborhoods <a class="anchor" id="API"></a>

In [14]:
CLIENT_ID = 'BEJBRZS1ET504GBFN0A3IZW2LLQEVLNE1R1G5VYNZHZNDVLW' # your Foursquare ID
CLIENT_SECRET = 'DG5ME0OJZXQJI540IZNIJCLN40RI5X5UBSHPVEYOLHOJ3MZN' # your Foursquare Secret
VERSION = '20200524' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BEJBRZS1ET504GBFN0A3IZW2LLQEVLNE1R1G5VYNZHZNDVLW
CLIENT_SECRET:DG5ME0OJZXQJI540IZNIJCLN40RI5X5UBSHPVEYOLHOJ3MZN


#### Now let's get the top 100 venues that are within a radius of 2000 meters

In [15]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'],df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [16]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(8193, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abbeydale,51.058836,-113.929413,Atlas Pizza and Sports Bar,51.052481,-113.941859,Pizza Place
1,Abbeydale,51.058836,-113.929413,A&W,51.068291,-113.933571,Fast Food Restaurant
2,Abbeydale,51.058836,-113.929413,Subway,51.059239,-113.934423,Sandwich Place
3,Abbeydale,51.058836,-113.929413,Costco,51.043141,-113.917518,Warehouse Store
4,Abbeydale,51.058836,-113.929413,Calgary Co-op,51.068719,-113.934014,Grocery Store


Let's check how many venues were returned for each neighborhood

In [17]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbeydale,25,25,25,25,25,25
Acadia,100,100,100,100,100,100
Albert Park/Radisson Heights,67,67,67,67,67,67
Altadore,49,49,49,49,49,49
Applewood Park,24,24,24,24,24,24
Arbour Lake,76,76,76,76,76,76
Aspen Woods,44,44,44,44,44,44
Auburn Bay,42,42,42,42,42,42
Banff Trail,99,99,99,99,99,99
Bankview,100,100,100,100,100,100


**Let's find out how many unique categories can be curated from all the returned venues**

In [18]:
print('There are {} unique categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 248 unique categories.


**Show the unique categories**

In [19]:
venues_df['VenueCategory'].unique()[:]

array(['Pizza Place', 'Fast Food Restaurant', 'Sandwich Place',
       'Warehouse Store', 'Grocery Store', 'Gas Station',
       'Ice Cream Shop', 'Pharmacy', 'Filipino Restaurant', 'Café', 'Pub',
       'Vietnamese Restaurant', 'Italian Restaurant', 'Golf Course',
       'Park', 'Wings Joint', 'Liquor Store', 'Gym / Fitness Center',
       'Wine Shop', 'Yoga Studio', 'Farmers Market', 'Greek Restaurant',
       'Mexican Restaurant', 'Tea Room', 'Bar', 'Sushi Restaurant',
       'Clothing Store', 'Dog Run', 'Hotel', 'Restaurant', 'Nightclub',
       'Furniture / Home Store', 'Candy Store', 'American Restaurant',
       'Shopping Mall', 'Laser Tag', 'Coffee Shop', 'Sports Bar',
       'Toy / Game Store', 'Gun Shop', 'Electronics Store',
       'Burger Joint', 'Hardware Store', 'Steakhouse', 'Pool',
       'Bookstore', 'Department Store', 'Smoothie Shop',
       'Sporting Goods Shop', 'Bank', 'Juice Bar', 'Breakfast Spot',
       'Arts & Crafts Store', 'Video Game Store', 'Pet Store',
  

## Analyze each Neighborhood <a class="anchor" id="neigh"></a>

In [20]:
# one hot encoding
yyc_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
yyc_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [yyc_onehot.columns[-1]] + list(yyc_onehot.columns[:-1])
yyc_onehot = yyc_onehot[fixed_columns]

print(yyc_onehot.shape)
yyc_onehot.head()

(8193, 249)


Unnamed: 0,Neighborhoods,Accessories Store,Advertising Agency,African Restaurant,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beach,Beer Garden,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Carpet Store,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Classroom,College Gym,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Empanada Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Forest,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Hockey Rink,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Inn,Intersection,Island,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Other Great Outdoors,Outdoor Sculpture,Paintball Field,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pier,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Print Shop,Professional & Other Places,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Lodge,Smoke Shop,Smoothie Shop,Snack Place,Soccer Stadium,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stables,Stadium,State / Provincial Park,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abbeydale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbeydale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbeydale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbeydale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,Abbeydale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [21]:
yyc_grouped = yyc_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(yyc_grouped.shape)
yyc_grouped.head()

(142, 249)


Unnamed: 0,Neighborhoods,Accessories Store,Advertising Agency,African Restaurant,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beach,Beer Garden,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Carpet Store,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Classroom,College Gym,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Empanada Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Forest,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Hockey Rink,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Inn,Intersection,Island,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Other Great Outdoors,Outdoor Sculpture,Paintball Field,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pier,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Print Shop,Professional & Other Places,Pub,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Lodge,Smoke Shop,Smoothie Shop,Snack Place,Soccer Stadium,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stables,Stadium,State / Provincial Park,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Train,Train Station,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abbeydale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
1,Acadia,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0
2,Albert Park/Radisson Heights,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.014925,0.0,0.0,0.0,0.0,0.029851,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.0,0.059701,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0,0.0,0.074627,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.089552,0.0,0.029851,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.029851,0.0,0.014925,0.0,0.0,0.0,0.029851,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.029851,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.044776,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Altadore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.020408,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Applewood Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Now its time to create a Data Frame for only Recreation center data only. <a class="anchor" id="rec"></a>

In the foursquare API there are other venue categories besides just "Recreation center" that are considered as such in Calgary. For example pools, sports fields, parks are also registers as recreation centers, and therefore these will be included in the analysis.

See https://data.calgary.ca/Recreation-and-Culture/Recreation-Facilities/hxfu-6d96/data 
and https://developer.foursquare.com/docs/build-with-foursquare/categories/

for reference.

In [22]:
yyc_rec = yyc_grouped[["Neighborhoods","Athletics & Sports","Gym / Fitness Center","Recreation Center","Pool","Hockey Rink","Golf Course","Skate Park","Soccer Stadium"]]

In [23]:
yyc_rec.head()

Unnamed: 0,Neighborhoods,Athletics & Sports,Gym / Fitness Center,Recreation Center,Pool,Hockey Rink,Golf Course,Skate Park,Soccer Stadium
0,Abbeydale,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0
1,Acadia,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
2,Albert Park/Radisson Heights,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0
3,Altadore,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0
4,Applewood Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
# set number of clusters
kclusters = 3

yyc_clustering = yyc_rec.drop("Neighborhoods", 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(yyc_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:15]

array([2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [25]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
yyc_merged = yyc_rec.copy()

# add clustering labels
yyc_merged.insert(0, 'Cluster Labels', kmeans.labels_)

yyc_df=df[['Neighborhood', 'Quadrant', 'Latitude','Longitude']].copy()
#yyc_merged.head()

# merge yyc_grouped with toronto_data to add latitude/longitude for each neighborhood
yyc_merged = yyc_df.join(yyc_merged.set_index('Neighborhoods'),on='Neighborhood')

yyc_merged.head()

Unnamed: 0,Neighborhood,Quadrant,Latitude,Longitude,Cluster Labels,Athletics & Sports,Gym / Fitness Center,Recreation Center,Pool,Hockey Rink,Golf Course,Skate Park,Soccer Stadium
0,Abbeydale,NE/SE,51.058836,-113.929413,2,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0
1,Acadia,SE,50.968655,-114.055587,0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
2,Albert Park/Radisson Heights,SE,51.044845,-113.990195,0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0
3,Altadore,SW,51.015104,-114.100756,0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0
4,Applewood Park,SE/NE,51.044658,-113.928931,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Finally lets visualize the resulting clusters <a class="anchor" id="vis"></a>

In [26]:
# create map
map_clusters = folium.Map(location=[yyc_latitude, yyc_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(yyc_merged['Latitude'], yyc_merged['Longitude'], yyc_merged['Neighborhood'], yyc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Exploratory Data Analysis <a class="anchor" id="exp"></a>

In [27]:
current_rec = pd.read_csv("Recreation_Facilities.csv")

In [28]:
# create map of Manhattan using latitude and longitude values
map_calgary = folium.Map(location=[yyc_latitude,yyc_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)

# add markers to map
for lat, lng, label in zip(current_rec['latitude'], current_rec['longitude'], current_rec['COMPLEX_NAME']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)

map_calgary   

**we can see from the plot above, where the blue makers are where the neighborhoods and the red, the current recreation centers, areas where you have clusters of neighborhoods without any center nearby**

In [29]:
# create map of Calgary using latitude and longitude values
map_calgary = folium.Map(location=[yyc_latitude,yyc_longitude], zoom_start=11)


heat_df = df[['Latitude', 'Longitude']]
heat_df = heat_df.dropna(axis=0, subset=['Latitude','Longitude'])

heat_data = [[row['Latitude'],row['Longitude']] for index, row in heat_df.iterrows()]

HeatMap(heat_data).add_to(map_calgary)

map_calgary

##### **From above we can easily see that majority of the recreational facilities are in central Calgary**

## Examining Clusters <a class="anchor" id="exa"></a>

### Cluster 0

In [30]:
yyc_merged.loc[yyc_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Quadrant,Latitude,Longitude,Cluster Labels,Athletics & Sports,Gym / Fitness Center,Recreation Center,Pool,Hockey Rink,Golf Course,Skate Park,Soccer Stadium
1,Acadia,SE,50.968655,-114.055587,0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
2,Albert Park/Radisson Heights,SE,51.044845,-113.990195,0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0
3,Altadore,SW,51.015104,-114.100756,0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0
4,Applewood Park,SE/NE,51.044658,-113.928931,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Arbour Lake,NW,51.136786,-114.202355,0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0
6,Aspen Woods,SW,51.043119,-114.210185,0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0
7,Auburn Bay,SE,50.890605,-113.959565,0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0
8,Banff Trail,NW,51.071462,-114.111647,0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0
9,Bankview,SW,51.033887,-114.099518,0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0
10,Bayview,SW,50.973577,-114.112561,0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0


### Cluster 1

In [31]:
yyc_merged.loc[yyc_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Quadrant,Latitude,Longitude,Cluster Labels,Athletics & Sports,Gym / Fitness Center,Recreation Center,Pool,Hockey Rink,Golf Course,Skate Park,Soccer Stadium
46,Crestmont,SW,51.085215,-114.265072,1,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0


### Cluster 2

In [32]:
cluster_2=yyc_merged.loc[yyc_merged['Cluster Labels'] == 2]
cluster_2

Unnamed: 0,Neighborhood,Quadrant,Latitude,Longitude,Cluster Labels,Athletics & Sports,Gym / Fitness Center,Recreation Center,Pool,Hockey Rink,Golf Course,Skate Park,Soccer Stadium
0,Abbeydale,NE/SE,51.058836,-113.929413,2,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0
21,Cambrian Heights,NW,51.087196,-114.088976,2,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0
28,Chaparral,SE,50.883594,-114.021265,2,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0
33,Citadel,NW,51.145091,-114.17573,2,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0
74,Greenwood/Greenbriar,NW,51.090697,-114.222957,2,0.032258,0.0,0.0,0.0,0.0,0.064516,0.0,0.0
79,Hidden Valley,NW,51.151085,-114.112672,2,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0
110,Monterey Park,NE,51.081473,-113.928459,2,0.0,0.045455,0.0,0.0,0.045455,0.045455,0.0,0.0
134,Renfrew,NE,51.062572,-114.041687,2,0.0,0.01087,0.0,0.0,0.0,0.032609,0.0,0.0
148,Sandstone Valley,NW,51.137325,-114.098842,2,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0
153,Shawnee Slopes,SW,50.925362,-114.083834,2,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0


### Observations <a class="anchor" id="obvs"></a>

Some of observations from the analysis:

* Most of the Recreation centers are located centrally in Calgary with the vast majority in CLuster 0.
* The location in cluster 1 is isolated on the outskirts of the city and can be neglected from consideration
* Cluster 2 offers the most promising location for a new center. There are a limited number of centers, expecially in the NW Quadrant




# Proposed location of new recreation center <a class="anchor" id="prop"></a>

### Recreational centers in Cluster 2

In [33]:
# create map of Manhattan using latitude and longitude values
map_calgary = folium.Map(location=[yyc_latitude,yyc_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(cluster_2['Latitude'], cluster_2['Longitude'], cluster_2['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calgary)
map_calgary

In [34]:
# create map of Manhattan using latitude and longitude values
map_calgary = folium.Map(location=[yyc_latitude,yyc_longitude], zoom_start=11)

# Circle marker
folium.CircleMarker([51.129478, -114.170135],
                    radius=35,
                    popup='Proposed Location',
                    color='black',
                    ).add_to(map_calgary)
map_calgary

**Therefore, I recommend a facility in the region circled above as this provides the optimal location for a new facility with good proximity to the nearby residents without suffering from overcrowding of centers**