# Capstone Project: Office Supplier Expansion - New Location Data Analysis

## Introduction

### Background and Description of the Problem

The client is an  Office Supplier in Toronto which services  small to medium offices. They have saturated the market in their own area and are now looking to expand into a nearby city. The two options under consideration by the client are Quebec and Montreal. As this is will be the client’s first business expansion, it is important that out of the two possible locations, they choose the city offering the best possible outcome. 
 
During this process another aspect of the clients business should also be taken into consideration. This is the fact that potential customers expect delivery within the hour, so any business location would have to be close to a number of customer sites.

The data science problem can be described using the  following questions: 

Part One: Which out of two possible locations has the greater number of potential customers? 

Part Two: In the city chosen in answer to part one, which city borough would offer the greatest catchment area for the business?

## Part One: 

## Which out of two possible locations has the greater number of potential customers? 

### Data Collection: Location option one - Quebec

Importing Libraries

In [18]:
import numpy as np
import types
import pandas as pd
import types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

Quebec borough information was scraped from wikipedia as a .csv file which was then imported as a pandas dataframe

In [142]:
# The code was removed by Watson Studio for sharing.

In [143]:
df_data_1 = pd.read_csv(body)
df_data_1.head()

Unnamed: 0,Postal Code,Borough
0,G1J,La Cité-Limoilou
1,G8T,Les Rivières
2,G1V,Sainte-Foy
3,G1H,Charlesbourg
4,G1E,Beauport


In [144]:
df_data_1.shape

(6, 2)

importing all dependencies required

In [21]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


Geo data for Quebec Boroughs was imported as a .csv file.

In [145]:
# The code was removed by Watson Studio for sharing.

In [146]:
df_geo = pd.read_csv(body)
df_geo.head()

Unnamed: 0,Postal Code,Longitude,Latitude
0,G1J,-71.21341,46.8349
1,G8T,-71.30556,46.830556
2,G1V,-71.28764,46.77857
3,G1H,-71.25796,46.85268
4,G1E,-71.19266,46.85851


The two dataframes were merged.

In [147]:
df_quebec= pd.merge(df_data_1, df_geo, on='Postal Code', how='left')
df_quebec.head()

Unnamed: 0,Postal Code,Borough,Longitude,Latitude
0,G1J,La Cité-Limoilou,-71.21341,46.8349
1,G8T,Les Rivières,-71.30556,46.830556
2,G1V,Sainte-Foy,-71.28764,46.77857
3,G1H,Charlesbourg,-71.25796,46.85268
4,G1E,Beauport,-71.19266,46.85851


Using geopy library to get the latitude and longitude values of Quebec

In [148]:
address = 'Quebec, Quebec, Canada'

geolocator = Nominatim(user_agent="quebec_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Quebec are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Quebec are 46.8259601, -71.2352226.


Creating a map of Quebec with neighbourhoods superimposed on top.

In [149]:
map_quebec = folium.Map(location=[latitude, longitude], zoom_start=10)

quebec = df_quebec

for lat, lng, borough in zip(quebec['Latitude'], quebec['Longitude'], quebec['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_quebec)  
    
map_quebec

Using the foursquare API a search was made for potential clients using foursquare's office category. This was set within a five mile radius of the centre of Quebec.

In [150]:
# The code was removed by Watson Studio for sharing.

In [151]:
LIMIT = 100

radius = 8046

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4bf58dd8d48988d124941735'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d050885dd57972e8c4e5e49'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b59ff44f964a520a7a628e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/default_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d124941735',
         'name': 'Office',
         'pluralName': 'Offices',
         'primary': True,
         'shortName': 'Office'}],
       'id': '4b59ff44f964a520a7a628e3',
       'location': {'address': '6655 boulevard Pierre-Bertrand',
        'cc': 'CA',
        'city': 'Québec',
        'country': 'Canada',
        'crossStreet': 'Boulevard Lebourgneuf',
        'distance': 1570,
        'formattedAddress': ['6655 boulevard Pierre-Bertrand (Boulevard Lebourgneuf)',
         'Quebec QC G2K 1M1',
         '

The resulting .json file was normalized and imported into a pandas dataframe.

In [152]:
offices = results['response']['groups'][0]['items']
    
nearby_offices_quebec = json_normalize(offices) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_offices_quebec =nearby_offices_quebec.loc[:, filtered_columns]

nearby_offices_quebec['venue.categories'] = nearby_offices_quebec.apply(get_category_type, axis=1)

nearby_offices_quebec.columns = [col.split(".")[-1] for col in nearby_offices.columns]

nearby_offices_quebec.head()

Unnamed: 0,name,categories,lat,lng
0,Absolu,Office,46.814336,-71.223545
1,Le Soleil,Office,46.813975,-71.2243
2,CGI Québec,Office,46.813938,-71.224311
3,Festival d'été de Québec - siège social,Office,46.815324,-71.221039
4,CSN Conseil Central Québec,Office,46.812067,-71.227938


In [153]:
nearby_offices_quebec.shape

(43, 4)

The resulting data frame show there are 43 potential customers within a five mile radius of the center of Quebec city.

### Data Collection: Location option two - Montreal

The same process was then repeated with data for the city of Montreal

Montreal borough information was scraped from wikipedia as a .csv file which was then imported as a pandas dataframe

In [154]:
# The code was removed by Watson Studio for sharing.

In [155]:
df_data_2 = pd.read_csv(body)
df_data_2.head()

Unnamed: 0,Postal code,Borough
0,H3M,Ahuntsic-Cartierville
1,H1K,Anjou
2,H3W,Côte-des-Neiges–Notre-Dame-de-Grâce
3,H8S,Lachine
4,H8N,Lasalle


In [73]:
df_data_2.shape

(19, 2)

Geo data for Montreal Boroughs was imported as a .csv file.

In [156]:
# The code was removed by Watson Studio for sharing.

In [157]:
df_geo_montreal = pd.read_csv(body)
df_geo_montreal.head()

Unnamed: 0,Postal code,Latitude,Longitude
0,H3M,45.53856,-73.69266
1,H1K,45.60933,-73.54508
2,H3W,45.49069,-73.63321
3,H8S,45.43665,-73.6851
4,H8N,45.4389,-73.62583


The two dataframes were merged.

In [158]:
df_montreal= pd.merge(df_data_2, df_geo_montreal, on='Postal code', how='left')
df_montreal.head()

Unnamed: 0,Postal code,Borough,Latitude,Longitude
0,H3M,Ahuntsic-Cartierville,45.53856,-73.69266
1,H1K,Anjou,45.60933,-73.54508
2,H3W,Côte-des-Neiges–Notre-Dame-de-Grâce,45.49069,-73.63321
3,H8S,Lachine,45.43665,-73.6851
4,H8N,Lasalle,45.4389,-73.62583


Using geopy library to get the latitude and longitude values of Montreal

In [159]:
address = 'Montreal, Quebec, Canada'

geolocator = Nominatim(user_agent="montreal_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Montreal are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Montreal are 45.4972159, -73.6103642.


Creating a map of Montreal with neighbourhoods superimposed on top.

In [160]:
map_montreal = folium.Map(location=[latitude, longitude], zoom_start=10)

montreal = df_montreal

for lat, lng, borough in zip(montreal['Latitude'], montreal['Longitude'], montreal['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_montreal)  
    
map_montreal

Using the foursquare API a search was made for potential clients using foursquare's office category. This was set within a five mile radius of the centre of Montreal.

In [161]:
LIMIT = 200

radius = 8046

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4bf58dd8d48988d124941735'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d0508be4c1f672bccb6ab81'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4c53143130f92d7f7ac433b8-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/default_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d124941735',
         'name': 'Office',
         'pluralName': 'Offices',
         'primary': True,
         'shortName': 'Office'}],
       'id': '4c53143130f92d7f7ac433b8',
       'location': {'address': '6250 rue Hutchison',
        'cc': 'CA',
        'city': 'Montréal',
        'country': 'Canada',
        'distance': 3141,
        'formattedAddress': ['6250 rue Hutchison',
         'Montréal QC H2V 4C5',
         'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.52540441831818

The resulting .json file was normalized and imported into a pandas dataframe.

In [162]:
offices = results['response']['groups'][0]['items']
    
nearby_offices_montreal = json_normalize(offices) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_offices_montreal =nearby_offices_montreal.loc[:, filtered_columns]

nearby_offices_montreal['venue.categories'] = nearby_offices_montreal.apply(get_category_type, axis=1)

nearby_offices_montreal.columns = [col.split(".")[-1] for col in nearby_offices_montreal.columns]

nearby_offices_montreal.tail()

Unnamed: 0,name,categories,lat,lng
95,MPC,Office,45.498711,-73.555387
96,Morgan Stanley,Office,45.498053,-73.555189
97,Camden,Office,45.502593,-73.555518
98,Saputo Head Office Canada,Office,45.522396,-73.654251
99,WB Games Montréal,Office,45.516739,-73.559559


In [163]:
nearby_offices_montreal.shape

(100, 4)

The resulting data frame show there are 100 potential customers within a five mile radius of the center of Montreal city.

### Results of Part One

In [164]:
print('{} potential customers were returned by Foursquare in Montreal.'.format(nearby_offices_montreal.shape[0]))
print('{} potential customers were returned by Foursquare in Quebec.'.format(nearby_offices_quebec.shape[0]))

100 potential customers were returned by Foursquare in Montreal.
43 potential customers were returned by Foursquare in Quebec.


### Conclusion of Part One

The city that should be analysed further is Montreal as it offers the greater number of potential customers for the client.

## Part Two: 

## In the city chosen in answer to part one, which city borough would offer the greatest catchment area for the business?

### Data Collection

In [165]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4bf58dd8d48988d124941735'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_offices = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_offices.columns = ['Borough', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_offices)

In [166]:
montreal_offices = getNearbyVenues(names=df_montreal['Borough'],
                                   latitudes=df_montreal['Latitude'],
                                   longitudes=df_montreal['Longitude']
                                    )

Ahuntsic-Cartierville
Anjou
Côte-des-Neiges–Notre-Dame-de-Grâce
Lachine
Lasalle
Le Plateau-Mont-Royal
Le Sud-Ouest
L'Île-Bizard–Sainte-Geneviève
Mercier–Hochelaga-Maisonneuve
Montréal-Nord
Outremont
Pierrefonds-Roxboro
Rivière-des-Prairies
Rosemont–La Petite-Patrie
Saint-Laurent
Saint-Léonard
Verdun
Ville-Marie
Villeray–Saint-Michel–Parc-Extension


In [167]:
print (montreal_offices.shape)
montreal_offices.head()

(67, 7)


Unnamed: 0,Borough,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ahuntsic-Cartierville,45.53856,-73.69266,OSHARA INC,45.536596,-73.693092,Office
1,Ahuntsic-Cartierville,45.53856,-73.69266,Peter K Photography - (Montreal Wedding Photog...,45.541345,-73.695517,Office
2,Ahuntsic-Cartierville,45.53856,-73.69266,santé de pieds podologie,45.542016,-73.694763,Coworking Space
3,Côte-des-Neiges–Notre-Dame-de-Grâce,45.49069,-73.63321,TD Canada Trust,45.491823,-73.631786,Office
4,Lachine,45.43665,-73.6851,Vitrerie Chatelle/Simard,45.438562,-73.681563,Office


In [168]:
montreal_offices.groupby('Borough').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ahuntsic-Cartierville,3,3,3,3,3,3
Côte-des-Neiges–Notre-Dame-de-Grâce,1,1,1,1,1,1
Lachine,5,5,5,5,5,5
Lasalle,8,8,8,8,8,8
Le Plateau-Mont-Royal,6,6,6,6,6,6
Montréal-Nord,1,1,1,1,1,1
Outremont,2,2,2,2,2,2
Pierrefonds-Roxboro,1,1,1,1,1,1
Rosemont–La Petite-Patrie,27,27,27,27,27,27
Saint-Laurent,4,4,4,4,4,4


Analyze each Borough of Montreal

Segmenting and clustering boroughs in montreal by slicing the original and creating a new dataframe for  montreal borough data.

In [169]:
montreal_onehot = pd.get_dummies(montreal_offices[['Venue Category']], prefix="", prefix_sep="")

montreal_onehot['Borough'] = montreal_offices['Borough'] 

fixed_columns = [montreal_onehot.columns[-1]] + list(montreal_onehot.columns[:-1])
montreal_onehot = montreal_onehot[fixed_columns]

montreal_onehot.head()

Unnamed: 0,Borough,Advertising Agency,Coworking Space,Electronics Store,Office,Tech Startup
0,Ahuntsic-Cartierville,0,0,0,1,0
1,Ahuntsic-Cartierville,0,0,0,1,0
2,Ahuntsic-Cartierville,0,1,0,0,0
3,Côte-des-Neiges–Notre-Dame-de-Grâce,0,0,0,1,0
4,Lachine,0,0,0,1,0


In [170]:
montreal_grouped = montreal_onehot.groupby('Borough').mean().reset_index()
montreal_grouped

Unnamed: 0,Borough,Advertising Agency,Coworking Space,Electronics Store,Office,Tech Startup
0,Ahuntsic-Cartierville,0.0,0.333333,0.0,0.666667,0.0
1,Côte-des-Neiges–Notre-Dame-de-Grâce,0.0,0.0,0.0,1.0,0.0
2,Lachine,0.0,0.2,0.0,0.8,0.0
3,Lasalle,0.0,0.0,0.0,1.0,0.0
4,Le Plateau-Mont-Royal,0.0,0.666667,0.0,0.333333,0.0
5,Montréal-Nord,0.0,0.0,0.0,1.0,0.0
6,Outremont,0.0,0.5,0.0,0.5,0.0
7,Pierrefonds-Roxboro,0.0,0.0,0.0,1.0,0.0
8,Rosemont–La Petite-Patrie,0.037037,0.148148,0.037037,0.703704,0.074074
9,Saint-Laurent,0.25,0.0,0.0,0.75,0.0


In [171]:
montreal_grouped.shape

(13, 6)

Print each Borough with its 5 most common venues.

In [172]:
num_top_offices = 5

for Borough in montreal_grouped['Borough']:
    print("----"+Borough+"----")
    temp = montreal_grouped[montreal_grouped['Borough'] == Borough].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_offices))
    print('\n')

----Ahuntsic-Cartierville----
                venue  freq
0              Office  0.67
1     Coworking Space  0.33
2  Advertising Agency  0.00
3   Electronics Store  0.00
4        Tech Startup  0.00


----Côte-des-Neiges–Notre-Dame-de-Grâce----
                venue  freq
0              Office   1.0
1  Advertising Agency   0.0
2     Coworking Space   0.0
3   Electronics Store   0.0
4        Tech Startup   0.0


----Lachine----
                venue  freq
0              Office   0.8
1     Coworking Space   0.2
2  Advertising Agency   0.0
3   Electronics Store   0.0
4        Tech Startup   0.0


----Lasalle----
                venue  freq
0              Office   1.0
1  Advertising Agency   0.0
2     Coworking Space   0.0
3   Electronics Store   0.0
4        Tech Startup   0.0


----Le Plateau-Mont-Royal----
                venue  freq
0     Coworking Space  0.67
1              Office  0.33
2  Advertising Agency  0.00
3   Electronics Store  0.00
4        Tech Startup  0.00


----Montréal-N

##### Place this into a pandas dataframe in descending order, display the top 3 venues for each neighbourhood.

In [173]:
def return_most_common_venues(row, num_top_offices):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_offices]

In [174]:
num_top_offices = 3

indicators = ['st', 'nd', 'rd']

columns = ['Borough']
for ind in np.arange(num_top_offices):
    try:
        columns.append('{}{} Most Common Office'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Office'.format(ind+1))

boroughs_offices_sorted = pd.DataFrame(columns=columns)
boroughs_offices_sorted['Borough'] = montreal_grouped['Borough']

for ind in np.arange(montreal_grouped.shape[0]):
    boroughs_offices_sorted.iloc[ind, 1:] = return_most_common_venues(montreal_grouped.iloc[ind, :], num_top_offices)

boroughs_offices_sorted.head()

Unnamed: 0,Borough,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
0,Ahuntsic-Cartierville,Office,Coworking Space,Tech Startup
1,Côte-des-Neiges–Notre-Dame-de-Grâce,Office,Tech Startup,Electronics Store
2,Lachine,Office,Coworking Space,Tech Startup
3,Lasalle,Office,Tech Startup,Electronics Store
4,Le Plateau-Mont-Royal,Coworking Space,Office,Tech Startup


Now running k-means to cluster the neighbourhoods into 5 clusters. Then create a new dataframe to include the cluster.

In [175]:
kclusters = 5

montreal_grouped_clustering = montreal_grouped.drop('Borough', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(montreal_grouped_clustering)

kmeans.labels_[0:10] 

array([3, 0, 3, 0, 1, 0, 1, 0, 3, 4], dtype=int32)

In [176]:
boroughs_offices_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

montreal_merged = montreal_offices

montreal_merged = montreal_merged.join(boroughs_offices_sorted.set_index('Borough'), on='Borough',how='right')

montreal_merged.head()

Unnamed: 0,Borough,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
0,Ahuntsic-Cartierville,45.53856,-73.69266,OSHARA INC,45.536596,-73.693092,Office,3,Office,Coworking Space,Tech Startup
1,Ahuntsic-Cartierville,45.53856,-73.69266,Peter K Photography - (Montreal Wedding Photog...,45.541345,-73.695517,Office,3,Office,Coworking Space,Tech Startup
2,Ahuntsic-Cartierville,45.53856,-73.69266,santé de pieds podologie,45.542016,-73.694763,Coworking Space,3,Office,Coworking Space,Tech Startup
3,Côte-des-Neiges–Notre-Dame-de-Grâce,45.49069,-73.63321,TD Canada Trust,45.491823,-73.631786,Office,0,Office,Tech Startup,Electronics Store
4,Lachine,45.43665,-73.6851,Vitrerie Chatelle/Simard,45.438562,-73.681563,Office,3,Office,Coworking Space,Tech Startup


In [177]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(montreal_merged['Latitude'], montreal_merged['Longitude'], montreal_merged['Borough'], montreal_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine clusters and determine discriminating potential customer categories for each cluster - assigning a name to each cluster

#### Cluster 1: Tech Startup

In [178]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 0, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
3,45.49069,-73.631786,Office,0,Office,Tech Startup,Electronics Store
9,45.4389,-73.622861,Office,0,Office,Tech Startup,Electronics Store
10,45.4389,-73.62259,Office,0,Office,Tech Startup,Electronics Store
11,45.4389,-73.626532,Office,0,Office,Tech Startup,Electronics Store
12,45.4389,-73.621579,Office,0,Office,Tech Startup,Electronics Store
13,45.4389,-73.623903,Office,0,Office,Tech Startup,Electronics Store
14,45.4389,-73.620129,Office,0,Office,Tech Startup,Electronics Store
15,45.4389,-73.629718,Office,0,Office,Tech Startup,Electronics Store
16,45.4389,-73.631015,Office,0,Office,Tech Startup,Electronics Store
23,45.593899,-73.634813,Office,0,Office,Tech Startup,Electronics Store


#### Cluster 2: Coworking Space/ Tech Startup

In [138]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 1, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
17,45.53021,-73.587025,Office,1,Coworking Space,Office,Tech Startup
18,45.53021,-73.587155,Office,1,Coworking Space,Office,Tech Startup
19,45.53021,-73.584759,Coworking Space,1,Coworking Space,Office,Tech Startup
20,45.53021,-73.578515,Coworking Space,1,Coworking Space,Office,Tech Startup
21,45.53021,-73.57964,Coworking Space,1,Coworking Space,Office,Tech Startup
22,45.53021,-73.579936,Coworking Space,1,Coworking Space,Office,Tech Startup
24,45.518617,-73.608387,Office,1,Office,Coworking Space,Tech Startup
25,45.518617,-73.602905,Coworking Space,1,Office,Coworking Space,Tech Startup


#### Cluster 3: Tech Startup/ Coworking Space

In [139]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 2, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
58,45.460389,-73.563123,Office,2,Office,Tech Startup,Coworking Space
59,45.460389,-73.563616,Office,2,Office,Tech Startup,Coworking Space
60,45.460389,-73.566862,Office,2,Office,Tech Startup,Coworking Space
61,45.460389,-73.563102,Coworking Space,2,Office,Tech Startup,Coworking Space
62,45.460389,-73.562706,Tech Startup,2,Office,Tech Startup,Coworking Space
63,45.460389,-73.568766,Advertising Agency,2,Office,Tech Startup,Coworking Space


#### Cluster 4: Coworking/ Tech Startup

In [140]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 3, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
0,45.53856,-73.693092,Office,3,Office,Coworking Space,Tech Startup
1,45.53856,-73.695517,Office,3,Office,Coworking Space,Tech Startup
2,45.53856,-73.694763,Coworking Space,3,Office,Coworking Space,Tech Startup
4,45.43665,-73.681563,Office,3,Office,Coworking Space,Tech Startup
5,45.43665,-73.682269,Office,3,Office,Coworking Space,Tech Startup
6,45.43665,-73.690605,Office,3,Office,Coworking Space,Tech Startup
7,45.43665,-73.683516,Coworking Space,3,Office,Coworking Space,Tech Startup
8,45.43665,-73.681848,Office,3,Office,Coworking Space,Tech Startup
27,45.531567,-73.599026,Office,3,Office,Coworking Space,Tech Startup
28,45.531567,-73.598421,Office,3,Office,Coworking Space,Tech Startup


#### Cluster 5: Advertising Agency / Tech Startup

In [141]:
montreal_merged.loc[montreal_merged['Cluster Labels'] == 4, montreal_merged.columns[[1] + list(range(5, montreal_merged.shape[1]))]]

Unnamed: 0,Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Office,2nd Most Common Office,3rd Most Common Office
54,45.50327,-73.722579,Office,4,Office,Advertising Agency,Tech Startup
55,45.50327,-73.720534,Office,4,Office,Advertising Agency,Tech Startup
56,45.50327,-73.730129,Office,4,Office,Advertising Agency,Tech Startup
57,45.50327,-73.727789,Advertising Agency,4,Office,Advertising Agency,Tech Startup
65,45.537006,-73.624193,Office,4,Office,Advertising Agency,Tech Startup
66,45.537006,-73.62448,Advertising Agency,4,Office,Advertising Agency,Tech Startup


### Results of Part Two

Cluster 4: This cluster contains the greatest concentration of potential customers for the client. Potential customers are in the form of Tech Startups and Co-working Offices. Both customer types come within the small/ medium office size the client prefers to service and they would require office supplies on a regular basis.

Cluster 4 consists of two boroughs which are Villeray–Saint-Michel–Parc-Extension and Saint-Laurent.

### Conclusion of Part Two

The city borough which would offer the greatest catchment area for the clients business is Villeray–Saint-Michel–Parc-Extension. I have chosen this borough out of the two possibles given by the data analysis because of its close proximity to two other cluster centers. The 2nd option while having a large concentration of potential customers is isolated and would reduce opportunities for future business growth.

# Conclusion of Data Analysis

After data collection, cleaning and analysis I can answer the initial data science problem posed in the introduction of this project.

Part One: Which out of two possible locations (Quebec and Montreal) has the greater number of potential customers?
Solution: Montreal as it offers the greater number of potential customers for the client.

Part Two: In the city chosen in answer to part one, in this case Montreal, which city borough would offer the greatest catchment area for the business?
Solution: The city borough of Montreal which would offer the greatest catchment area for the clients business is Villeray–Saint-Michel–Parc-Extension.

