# NarniaLaudry

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction
Our client NarniaLaundry is looking to expand to Toronto. The expansion follows a clear plan that has been proven in various countries and across cultures.
* The first steps are to set a beachhead and build brand recognition.
* The second step is to acquire existing laundry services until nothing else remains.

NarniaLaundry experts have already identified several factors that offer consistent financial viability to two out of three sites using their proprietary recipe of FSLS(Fast Sport Laundry Service) in other countries. FSLF provides sports enthusiasts with the ability to wash their equipment and store it until the next use.

They are now looking for possible emplacements to reduce the risk of failure of a given laundry station. Previous experience has shown a strong correlation between successful laundry stations and the density of gyms and coffee shops. 
Therefore, our customer is looking for locations within walking distance from:

* places where people sweat a lot and they come with gym bags.
  * yoga studios 
  * gym studios
  * dojos 
  * martial arts studios
* coffee shops
  * stand up coffee shops
  * take away coffee
  * not coffee shops where they serve more than some light patisserie
  * not restaurants
  * not automatic coffee dispensers
  

In [1]:

!conda install -c conda-forge pandas --yes 
!conda install -c conda-forge geocoder --yes 
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge tqdm --yes
!conda install -c conda-forge folium=0.5.0 --yes
#!conda install -c conda-forge hdbscan --yes

# import itertools
# import numpy as np
# import matplotlib.pyplot as plt
# from matplotlib.ticker import NullFormatter
# import pandas as pd
# import numpy as np
# import matplotlib.ticker as ticker
# from sklearn import preprocessing
# %matplotlib inline


import numpy as np
import pandas as pd


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - pandas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    pandas-0.23.4              |   py35hf8a1672_0        27.2 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        30.6 MB

The following packages will be UPDATED:

    ca-certificates: 2019.1.23-0           --> 2019.3.9-hecc5488_0   conda-forge
    certifi:         2018.8.24-py35_1      --> 2018.8.24-py35_1001   conda-forge
    openssl:         1.0.2p-h14c3975_0     --> 1.0.2r-h14c3975_0    

## Data
Our dataset is a list of all the venues in Toronto that we are going to analyze.

We scrape Wikipedia for a up to date list of Neighborhood and then we retrieve the coordinates, having a list of neighborhoods with their coordinates, we are going to extract venues using data from Foursquare. 

It is we this data we are going to and determine the neighborhoods that are most likely to support our client's business.


### Scrape the Wikipedia page

In [2]:
wiki = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
wiki[0].head()

Unnamed: 0,0,1,2
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


### Create the frame

In [3]:
#The dataframe will consist of three columns: PostalCode, Borough, and Neighbourhood
df = pd.DataFrame(wiki[0])
df.rename(columns=df.iloc[0])
df = df.drop(df.index[0]) 
df.rename(index=str, columns={0: 'PostalCode',
                             1: 'Borough',
                             2: 'Neighbourhood'},
          inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


### Clean the frame

In [4]:
df.replace({'Borough': 'Not assigned' }, 
           np.nan,
           inplace = True)
df.dropna(subset=["Borough"],
          inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights


In [5]:
print(df.shape)

(211, 3)


#### If not assigned neighbourhood, then the neighbourhood will be the same as the borough

In [6]:
df['Neighbourhood'] = np.where(df['Neighbourhood'] == 'Not assigned',
                               df['Borough'], 
                               df['Neighbourhood'])
assert(df.loc[df['Neighbourhood'] == 'Not assigned'].shape == (0,3))

#### Merge neighbourhoods in one postal code area.

In [7]:
df = df.groupby(['PostalCode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

### Adding Latitude and Longitude

In [8]:
gf = pd.read_csv('https://cocl.us/Geospatial_data')
gf.rename(index=str, columns={'Postal Code': 'PostalCode'}, inplace=True)
df = pd.merge(df, gf, how='inner', on=['PostalCode'])
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Showing the area we are interested in

In [9]:
latitude = df["Latitude"].mean()
longitude = df["Longitude"].mean()

import folium
tmap = folium.Map(location=[latitude, longitude], zoom_start=11, control_scale = True)

# add markers to map
folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    popup='Toronto',
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.6,
    parse_html=False).add_to(tmap) 

  # add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tmap)
    
tmap

### Enhancing the data with info about other venues

In [10]:
CLIENT_ID = '4UDQN1GGFP32A2SVSR1LZ5KXYRQO2YZ3FZPVF3FOO03YGVRO' #Foursquare ID
CLIENT_SECRET = '41D3Z4Z5LTGUBZR3XHMIX0YWCMN3WPBVCDWBXYNCM3D1DOYP' #Foursquare Secret
VERSION = '20180604'
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
    
import requests
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

venues = getNearbyVenues(names=df['Neighbourhood'],
                         latitudes=df['Latitude'],
                         longitudes=df['Longitude']
                        )
venues.shape

(2243, 7)

In [11]:
print('There are {} venues in {} uniques categories.'.format(len(venues['Venue'].unique()),len(venues['Venue Category'].unique())))

There are 1455 venues in 274 uniques categories.


In [12]:
# one hot encoding
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
onehot['Neighbourhood'] = venues['Neighbourhood'] 

# move neighbourhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
grouped = onehot.groupby('Neighbourhood').mean().reset_index()

In [14]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] =grouped['Neighbourhood']

for ind in np.arange(grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant,Hotel,Gym,Bakery,Bar,Burger Joint
1,Agincourt,Breakfast Spot,Lounge,Sandwich Place,Skating Rink,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Coffee Shop,Yoga Studio,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pharmacy,Coffee Shop,Sandwich Place,Fast Food Restaurant,Fried Chicken Joint,Beer Store,Pizza Place,Dumpling Restaurant,Drugstore
4,"Alderwood, Long Branch",Pizza Place,Pool,Pub,Coffee Shop,Gym,Pharmacy,Athletics & Sports,Skating Rink,Sandwich Place,Department Store
5,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Sandwich Place,Supermarket,Middle Eastern Restaurant,Restaurant,Deli / Bodega,Bank,Fried Chicken Joint,Frozen Yogurt Shop,Fast Food Restaurant
6,Bayview Village,Bank,Café,Japanese Restaurant,Chinese Restaurant,Yoga Studio,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
7,"Bedford Park, Lawrence Manor East",Fast Food Restaurant,Coffee Shop,Italian Restaurant,Greek Restaurant,Thai Restaurant,Liquor Store,Sandwich Place,Juice Bar,Butcher,Restaurant
8,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Steakhouse,Bakery,Beer Bar,Seafood Restaurant,Cheese Shop,Farmers Market,Café
9,"Birch Cliff, Cliffside West",General Entertainment,College Stadium,Café,Skating Rink,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant


## Methodology

Using this information we are going to build a frame of our target venues. 

We are going to create a sweat column and a coffee colum that will mark or venues according to the profile.

Uninteresting venues will be dropped.

In [15]:
def what_am_i(x):    
    sweat_venues = [ 'Rock Climbing Spot','Curling Ice', 'College Gym', 'Skate Park', 'Climbing Gym', 'Stadium','Field','Harbor / Marina', 'Martial Arts Dojo', 'Baseball Stadium', 'Basketball Stadium', 'College Rec Center', 'Dance Studio', 'Yoga Studio', 'Trail', 'Hockey Arena', 'Baseball Field', 'Dog Run', 'Golf Course', 'Pool','Soccer Field', 'Café', 'Skating Rink', 'College Stadium']
    coffee_venues = ['Sandwich Place','College Cafeteria', 'Mac & Cheese Joint', 'Gaming Cafe', 'Cupcake Shop', 'BBQ Joint', 'Cocktail Bar', 'Chocolate Shop', 'Creperie', 'Gourmet Shop', 'Coworking Space', 'Cheese Shop', 'Health Food Store', 'Pub', 'Bagel Shop', 'Sports Bar', 'Fish & Chips Shop', 'Gastropub', 'Moving Target', 'Food Truck', 'Diner', 'Gym / Fitness Center', 'Gym', 'Bubble Tea Shop', 'Ice Cream Shop', 'Cafeteria', 'Dessert Shop', 'Wings Joint', 'Burrito Place', 'Deli / Bodega', 'Food Court', 'Smoothie Shop', 'Juice Bar', 'Shopping Mall', 'Fried Chicken Joint','Bakery','Fast Food Restaurant','Bar','Breakfast Spot', 'Coffee Shop', 'Lounge', 'Burger Joint', 'Tea Room']
    drop_venues = ['Neighborhood', 'Print Shop', 'Construction & Landscaping', 'Pizza Place', 'Electronics Store', 'Spa', 'Mexican Restaurant', 'Rental Car Location', 'Medical Center', 'Intersection', 'Korean Restaurant', 'Hakka Restaurant', 'Caribbean Restaurant', 'Thai Restaurant', 'Athletics & Sports', 'Bank', 'Playground', 'Jewelry Store', 'Department Store', 'Convenience Store', 'Discount Store', 'Hobby Shop', 'Bus Line', 'Metro Station', 'Bus Station', 'Park', 'Motel', 'Movie Theater', 'American Restaurant', 'General Entertainment', 'Indian Restaurant', 'Chinese Restaurant', 'Latin American Restaurant', 'Pet Store', 'Vietnamese Restaurant', 'Thrift / Vintage Store', 'Furniture / Home Store', 'Smoke Shop', 'Auto Garage', 'Clothing Store', 'Italian Restaurant', 'Noodle House', 'Grocery Store', 'Pharmacy', 'Cosmetics Shop', 'Nail Salon', 'Mediterranean Restaurant', 'Toy / Game Store', 'Candy Store', 'Salon / Barbershop', 'Japanese Restaurant', 'Theater', 'Restaurant', 'Liquor Store', 'Sporting Goods Shop', 'Video Game Store', 'Asian Restaurant', 'Lingerie Store', 'Frozen Yogurt Shop', 'Optical Shop', 'Boutique', 'Tailor Shop', 'Supplement Shop', 'Women\'s Store', 'Men\'s Store', 'Luggage Store', 'Greek Restaurant', 'Steakhouse', 'Ramen Restaurant', 'Indonesian Restaurant', 'Plaza', 'Arts & Crafts Store', 'Sushi Restaurant', 'Middle Eastern Restaurant', 'Hotel', 'Food & Drink Shop', 'Bike Shop', 'Beer Store', 'Dim Sum Restaurant', 'Bridal Shop', 'Video Store', 'Supermarket', 'Massage Studio', 'Miscellaneous Shop', 'Airport', 'Bus Stop', 'Other Repair Shop', 'Portuguese Restaurant', 'Brewery', 'Warehouse Store', 'Fruit & Vegetable Store', 'Bookstore', 'Light Rail Station', 'Fish Market', 'Comfort Food Restaurant', 'Seafood Restaurant', 'Stationery Store', 'Music Store', 'Swim School', 'Gift Shop', 'Farmers Market', 'Costume Shop', 'Butcher', 'Taiwanese Restaurant', 'Market', 'Dive Bar', 'Theme Restaurant', 'Ethiopian Restaurant', 'Gay Bar', 'Adult Boutique', 'Sake Bar', 'Nightclub', 'Afghan Restaurant', 'Health & Beauty Service', 'Strip Club', 'Sculpture Garden', 'Historic Site', 'Performing Arts Venue', 'French Restaurant', 'Event Space', 'Shoe Store', 'Antique Shop', 'Comic Shop', 'Taco Place', 'Music Venue', 'Vegetarian / Vegan Restaurant', 'Beer Bar', 'Art Gallery', 'Concert Hall', 'Tanning Salon', 'Modern European Restaurant', 'Hookah Bar', 'Wine Bar', 'Other Great Outdoors', 'Lake', 'Office', 'Poutine Place', 'Church', 'Poke Place', 'Speakeasy', 'New American Restaurant', 'Hostel', 'Jazz Club', 'Camera Store', 'Molecular Gastronomy Restaurant', 'Salad Place', 'Fountain', 'German Restaurant', 'Museum', 'Belgian Restaurant', 'Bistro', 'Beach', 'Irish Pub', 'Art Museum', 'Falafel Restaurant', 'Donut Shop', 'Opera House', 'Monument / Landmark', 'General Travel', 'Colombian Restaurant', 'Brazilian Restaurant', 'Record Shop', 'Gluten-free Restaurant', 'Building', 'Aquarium', 'Train Station', 'Scenic Lookout', 'History Museum', 'Hotel Bar', 'Soup Place', 'Hardware Store', 'Garden', 'Jewish Restaurant', 'College Arts Building', 'Organic Grocery', 'Dumpling Restaurant', 'Snack Place', 'Doner Restaurant', 'Filipino Restaurant', 'Hospital', 'Airport Lounge', 'Airport Food Court', 'Airport Terminal', 'Airport Gate', 'Plane', 'Airport Service', 'Boat or Ferry', 'Accessories Store', 'Arcade', 'Baby Store', 'Wine Shop', 'Cuban Restaurant', 'Malay Restaurant', 'Argentinian Restaurant', 'Tapas Restaurant', 'Southern / Soul Food Restaurant', 'Check Cashing Service', 'Flea Market', 'Cajun / Creole Restaurant', 'Eastern European Restaurant', 'Food', 'Indie Movie Theater', 'South American Restaurant', 'College Auditorium', 'Garden Center', 'Auto Workshop', 'Recording Studio', 'River', 'Home Service', 'Flower Shop', 'Empanada Restaurant', 'Mobile Phone Shop', 'Drugstore']
    if x in sweat_venues:   
        return 'sweat'
    if x in coffee_venues:
        return 'coffee'
    if x in drop_venues:
        return 'drop'
    
def set_marker_color(x):
    if x == 'sweat':
        return 'yellow'
    if x == 'coffee':
        return 'green'
    return 'black'
    
    
isolated_venues = pd.DataFrame(data=venues)

isolated_venues['Type']= isolated_venues['Venue Category'].apply(what_am_i)
isolated_venues['marker_color'] = isolated_venues['Type'].apply(set_marker_color)
isolated_venues.drop(isolated_venues.loc[isolated_venues['Type']=='drop'].index, inplace=True)
# isolated_venues.drop(isolated_venues.loc[isolated_venues['Type']=='coffee'].index, inplace=True)
# isolated_venues.drop(isolated_venues.loc[isolated_venues['Type']=='sweat'].index, inplace=True)

#one hot encoding
onehot = pd.get_dummies(isolated_venues[['Type']], prefix="", prefix_sep="")

# use pd.concat to join the new columns with our original dataframe
isolated_venues = pd.concat([isolated_venues, onehot],axis=1)
isolated_venues.drop(['Venue Category','Venue','Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude'],axis=1, inplace=True)
isolated_venues = isolated_venues[isolated_venues['marker_color'] != 'black']

In [16]:
isolated_venues.head()

Unnamed: 0,Venue Latitude,Venue Longitude,Type,marker_color,coffee,sweat
0,43.807448,-79.199056,coffee,green,1,0
1,43.782533,-79.163085,coffee,green,1,0
9,43.7678,-79.190466,coffee,green,1,0
10,43.770037,-79.221156,coffee,green,1,0
11,43.770827,-79.223078,coffee,green,1,0


In [17]:
isolated_venues.shape

(918, 6)

Showing our venues on the map

In [18]:
latitude = isolated_venues["Venue Latitude"].mean()
longitude = isolated_venues["Venue Longitude"].mean()

import folium
tmap = folium.Map(location=[latitude, longitude], zoom_start=11, control_scale = True)

# add markers to map center of Toronto
folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    popup='Toronto',
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.6,
    parse_html=False).add_to(tmap) 

  # add markers to map
    

for lat, lng, type, marker_color in zip(isolated_venues['Venue Latitude'], 
                                   isolated_venues['Venue Longitude'], 
                                   isolated_venues['Type'],
                                   isolated_venues['marker_color']):
    label = '{}'.format(type)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=marker_color,
        fill=True,
        fill_color=marker_color,
        fill_opacity=0.7,
        parse_html=False).add_to(tmap)
    
tmap

## Analysis

We cluster all the venues regardless of the type, so that we have 5 venues in any 200m radius, using DBSCAN

In [19]:
points = isolated_venues[['Venue Latitude','Venue Longitude']].values
rads = np.radians(points)

import numpy as np

from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler

#eps=200m 
db = DBSCAN(eps=0.2/6371., min_samples=5).fit(rads)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print('We have identified {} clusters'.format(str(n_clusters_)))
print('We have identified {} groups as outliers that do not meet the requirements to form a cluster'.format(str(n_noise_)))

isolated_venues['cluster'] = labels

We have identified 28 clusters
We have identified 245 groups as outliers that do not meet the requirements to form a cluster


We add colors to the clusters and show them on the map

In [20]:
counts = isolated_venues.groupby('cluster').size()
import random

color_vector = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
             for i in range(counts.size)]
color_df = pd.DataFrame({'cluster': list(range(1, counts.size + 1)),        
                        'cluster_color': color_vector})


isolated_venues = pd.merge(isolated_venues, color_df, how='inner', on=['cluster'])
isolated_venues.head()

Unnamed: 0,Venue Latitude,Venue Longitude,Type,marker_color,coffee,sweat,cluster,cluster_color
0,43.769449,-79.413081,sweat,yellow,0,1,1,#058815
1,43.768192,-79.413021,coffee,green,1,0,1,#058815
2,43.76693,-79.41206,sweat,yellow,0,1,1,#058815
3,43.768627,-79.4131,coffee,green,1,0,1,#058815
4,43.76854,-79.412671,coffee,green,1,0,1,#058815


In [21]:
latitude = isolated_venues["Venue Latitude"].mean()
longitude = isolated_venues["Venue Longitude"].mean()

import folium
tmap = folium.Map(location=[latitude, longitude], zoom_start=14, control_scale = True)
    

for lat, lng, type, marker_color, cluster_color, cluster in zip(isolated_venues['Venue Latitude'], 
                                   isolated_venues['Venue Longitude'], 
                                   isolated_venues['Type'],
                                   isolated_venues['marker_color'],
                                   isolated_venues['cluster_color'],
                                   isolated_venues['cluster']):
    label = '{},{}'.format(type,cluster)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=cluster_color,
        fill=True,
        fill_color=marker_color,
        fill_opacity=0.7,
        parse_html=False).add_to(tmap)
    
tmap

By looking at the clusters we notice that we have clusters without gyms.
We drop them.

In [22]:
groups_df = pd.DataFrame(columns=['cluster','coffee_count','sweat_count'])

for i in range(1,counts.size):
    coffee_count = 0
    sweat_count = 0
    try:
        coffee_count = isolated_venues.loc[isolated_venues['cluster'] == i]['Type'].value_counts()['coffee']
    except:
        pass
    try:
        sweat_count = isolated_venues.loc[isolated_venues['cluster'] == i]['Type'].value_counts()['sweat']
    except:
        isolated_venues = isolated_venues[isolated_venues.cluster != i]
        print('drop: ' + str(i))
    if (sweat_count != 0):
        d = {'cluster': [i], 'coffee_count':[coffee_count], 'sweat_count': [sweat_count]}
        df2 = pd.DataFrame(data=d)
        groups_df = groups_df.append(df2, ignore_index=True)

groups_df.sort_values(by=['sweat_count', 'coffee_count'],ascending = False,inplace = True)
groups_df.head(10)


drop: 2
drop: 3
drop: 4
drop: 6
drop: 10
drop: 17
drop: 23
drop: 26
drop: 27
drop: 28


Unnamed: 0,cluster,coffee_count,sweat_count
6,12,335,60
12,19,24,9
11,18,5,4
13,20,16,3
3,8,13,3
1,5,12,3
14,21,4,3
10,16,2,3
5,11,13,2
16,24,13,2


Merge the groups dataframe into the isolated_venues

In [23]:
groups_df.cluster = groups_df.cluster.astype('int64')
groups_df.coffee_count = groups_df.coffee_count.astype('int64')
groups_df.sweat_count = groups_df.sweat_count.astype('int64')
isolated_venues = isolated_venues.merge(groups_df, left_on='cluster', right_on='cluster')

isolated_venues.sort_values(by=['sweat_count', 'coffee_count', 'cluster'],ascending = False,inplace = True)

top10 = groups_df.head(10).cluster.values

keeps = pd.DataFrame(columns = list(isolated_venues))
cluster_array = []
for i in top10:
    keep = isolated_venues[isolated_venues['cluster'] == i]
    cluster_array.append(keep)
    keeps = keeps.append(keep)
keeps.head()

Unnamed: 0,Venue Latitude,Venue Longitude,Type,marker_color,coffee,sweat,cluster,cluster_color,coffee_count,sweat_count
74,43.666956,-79.385297,sweat,yellow,0,1,12,#F2EE3D,335,60
75,43.665922,-79.385567,coffee,green,1,0,12,#F2EE3D,335,60
76,43.665905,-79.38572,coffee,green,1,0,12,#F2EE3D,335,60
77,43.664991,-79.384814,coffee,green,1,0,12,#F2EE3D,335,60
78,43.663452,-79.384125,coffee,green,1,0,12,#F2EE3D,335,60


Displaying the remaining clusters

In [24]:
latitude = keeps["Venue Latitude"].mean()
longitude = keeps["Venue Longitude"].mean()

import folium
tmap = folium.Map(location=[latitude, longitude], zoom_start=12, control_scale = True)
    

for lat, lng, type, marker_color, cluster_color, cluster in zip(keeps['Venue Latitude'], 
                                   keeps['Venue Longitude'], 
                                   keeps['Type'],
                                   keeps['marker_color'],
                                   keeps['cluster_color'],
                                   keeps['cluster']):
    label = '{},{}'.format(type,cluster)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=cluster_color,
        fill=True,
        fill_color=marker_color,
        fill_opacity=0.7,
        parse_html=False).add_to(tmap)
    
tmap

Looking at each cluster to analyze it, sorted by the number of gyms in the cluster

In [25]:
#Convenience method to draw
def draw_me_venues(venues_frame):
    latitude = venues_frame["Venue Latitude"].mean()
    longitude = venues_frame["Venue Longitude"].mean()

    tmap2 = folium.Map(location=[latitude, longitude], zoom_start=16, control_scale = True)

    for lat, lng, type, marker_color, cluster_color, cluster in zip(venues_frame['Venue Latitude'], 
                                       venues_frame['Venue Longitude'], 
                                       venues_frame['Type'],
                                       venues_frame['marker_color'],
                                       venues_frame['cluster_color'],
                                       venues_frame['cluster']):
        label = '{},{}'.format(type,cluster)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=cluster_color,
            fill=True,
            fill_color=marker_color,
            fill_opacity=0.7,
            parse_html=False).add_to(tmap2)
   
    print("sweat_count:" + str(venues_frame['sweat_count'].unique()))
    print("coffee_count:" + str(venues_frame['coffee_count'].unique()))
    
    return tmap2

In [26]:
draw_me_venues(cluster_array[0])

sweat_count:[60]
coffee_count:[335]


In [27]:
draw_me_venues(cluster_array[1])

sweat_count:[9]
coffee_count:[24]


In [28]:
draw_me_venues(cluster_array[2])

sweat_count:[4]
coffee_count:[5]


In [29]:
draw_me_venues(cluster_array[3])

sweat_count:[3]
coffee_count:[16]


In [30]:
draw_me_venues(cluster_array[4])

sweat_count:[3]
coffee_count:[13]


In [31]:
draw_me_venues(cluster_array[5])

sweat_count:[3]
coffee_count:[12]


In [32]:
draw_me_venues(cluster_array[6])

sweat_count:[3]
coffee_count:[4]


In [33]:
draw_me_venues(cluster_array[7])

sweat_count:[3]
coffee_count:[2]


In [34]:
draw_me_venues(cluster_array[8])

sweat_count:[2]
coffee_count:[13]


In [35]:
draw_me_venues(cluster_array[9])

sweat_count:[2]
coffee_count:[13]


## Results

The first eight clusters are showing that we have 3 or more gyms in close proximity with coffeeshops. These are a good start for further analysis regarding the profitability of the FSLS(Fast Sport Laundry Service) approach.

The first cluster needs more analysis. Preliminary analysis indicates that around the edges of The Financial District we have four sites with visually clustered gyms.

## Conclusion
Further analysis should be target around these points.

### High ranking
With more than 3 gyms close one to eachother.

#### Financial District Cluster (#12)
With 90 gyms and over 300 coffeeshops in 3 squared kilometers. Most interesting sites would be as follow:

On the North of the Fianancial district, in the perimeter of Gerrard Street East, Church Street, Dundas Street East, Yonge Street, around the Devonian Pond there are 4 clustered gyms and a fair amout of coffeshops.
On the East of the Fianancial district, in the perimeter of Adelaide Street, Lower Jarvis Street, Front Street East, Yonge Street, around One Toronto.
On the South of the Fianancial district, between Union Station and the waterfront, east of Don Station and west of Yonge street, around the Starbucks from 15 York Street
On the West of the Fianancial district, around the corner of Adelaide Street West with University Avenue.

#### Kensigton Market Cluster (#19)
With 9 gyms and 24 coffeeshops, in the perimeter of College Street, Spadina Avenue, Dundas Street, Bellevue Avenue, mostly along Augusta Avenue

#### Harbord Village Cluster (#18)
With 4 gyms and 5 coffeeshops on Harbord Street, between the Knox Presbyterian Church and First Narayever Congregation Synagogue, on Harbord Street


### Medium ranking
3 gyms


#### Trinity Bellwoods Cluster (#20)
With 3 gyms and 16 coffeeshops on Ossington Avenue, bettwen Dundas Street West and Queen Street West.

#### Mount Pleasant Cluster (#8)
With 3 and 13 coffeeshops

#### Toronto-Danforth Cluster (#5)
With 3 gyms and 12 coffeeshops

#### Liberty Street Cluster (#21)
With 3 gyms and 4 coffeeshops

#### Elm Place Cluster (#16)
With 3 gyms and 2 coffeeshops 