## 1. Introduction

#### 
I recently moved to Alberta, Canada and there are lack of options when it comes to African/Caribbean restaurants throughout the city considering that the population of African Immigrants continue to grow year over year. According to 2016 census, 10.4% of the province, identify as Blacks (African or Caribbean), therefore there is a core market for Afro-Caribbean restaurant in the province. 
Together, with a group of investor’s friend, we have decided to open a high profile restaurant in the province preferably a Afro-Caribbean one. As they say in food business, location is KEY. So, project will be looking at best location to open a restaurant in the province.



## Business Problem/Challenge:

### The challenge is to find a suitable neighborhood in any borough in Alberta where the restaurant will thrive. And if possible, investigate further the viability of opening an Afro-Caribbean restaurant.Location of interest will be a densely populated area with few or no restaurants. 


## 2. Data

### 
To provide my co-investors with necessary information, I will be looking at 
1.	Top Neighborhoods with the highest number of restaurant and what kind of restaurant
2.	Borough with significant population to support the business
3.	In selected neighborhood, Top 10 restaurants and will be assessing if there is opportunity for Afro-Caribbean restaurant.
I will be combining different data set from Alberta’s open data website: https://open.alberta.ca/opendata such as neighborhood/wards population and average income per neighbourhood with additional data that helps to answer below questions. Foursquare API will be used to explore neighbourhood and K-means method to cluster and segment neighborhood.
•	Are there enough population in an area/city to be the core support of a restaurant business?
•	If, yes, are there similar business in the city?
•	If yes, what do they serve and how are they rated?
•	Is there opportunity for an Afro Caribbean restaurant in the neighbourhood.
Alberta data: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&SR=1&S=22&O=A&RPP=9999&PR=0




### Importing necessary Libraries

In [4]:
import numpy as np
import matplotlib.pyplot as plt
# import folium libraries
! pip install folium
import folium

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans



In [1]:
import pandas as pd
!conda install --yes lxml

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



### Exploring Alberta Dataset , html file that contains Alberta postal code, borough, Neighborhoods and lat_Long

In [12]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T'
df = pd.read_html(url)
df = pd.DataFrame(df[1]) 
df.head(10)   

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
3,T4A,Airdrie,East Airdrie,51.27245,-113.98698
4,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413
5,T6A,Edmonton,North Capilano,53.5483,-113.408
6,T7A,Drayton Valley,Not assigned,53.2165,-114.9893
7,T8A,Sherwood Park,West Sherwood Park,53.519,-113.3216
8,T9A,Wetaskiwin,Not assigned,52.9741,-113.3646
9,T1B,Medicine Hat,South Medicine Hat,50.0172,-110.651



### Data clean up to take out rows whose Borough is not "Not assigned", then reset the indices


In [17]:
df =df[df.Neighborhood != 'Not assigned'].reset_index(drop=True)
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158
3,T4A,Airdrie,East Airdrie,51.27245,-113.98698
4,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413
5,T6A,Edmonton,North Capilano,53.5483,-113.408
6,T8A,Sherwood Park,West Sherwood Park,53.519,-113.3216
7,T1B,Medicine Hat,South Medicine Hat,50.0172,-110.651
8,T2B,Calgary,"Forest Lawn, Dover, Erin Woods",51.0318,-113.9786
9,T3B,Calgary,"Montgomery, Bowness, Silver Springs, Greenwood",51.0809,-114.1616


###### As seen "Not assigned" Neigbourhood has been dropped

### Canadian Population Information

In [19]:
html ="https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&SR=1&S=22&O=A&RPP=9999&PR=0"
df_Canada_Population = pd.read_html(html, header=0)[0]
df_Canada_Population.head()

Unnamed: 0,Geographic name,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,,,,
1,CanadaFootnote 1,35151728.0,15412443.0,14072079.0
2,A0A,46587.0,26155.0,19426.0
3,A0B,19792.0,13658.0,8792.0
4,A0C,12587.0,8010.0,5606.0



#### Merge dataframe with population with Alberta neighborhoods


In [20]:
df_Alberta=df.merge(df_Canada_Population, left_on='Postal Code',right_on='Geographic name')
df_Alberta.drop(columns = ['Geographic name','Total private dwellings, 2016','Private dwellings occupied by usual residents, 2016'], inplace=True)
df_Alberta.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,"Population, 2016"
0,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925,25409.0
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,59641.0
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158,53224.0
3,T4A,Airdrie,East Airdrie,51.27245,-113.98698,16054.0
4,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413,35049.0


#### Rename last Column 

In [21]:
df_Alberta=df_Alberta.rename({'Population, 2016':'Population'}, axis=1)
df_Alberta.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Population
0,T1A,Medicine Hat,Central Medicine Hat,50.03646,-110.67925,25409.0
1,T2A,Calgary,"Penbrooke Meadows, Marlborough",51.04968,-113.96432,59641.0
2,T3A,Calgary,"Dalhousie, Edgemont, Hamptons, Hidden Valley",51.12606,-114.143158,53224.0
3,T4A,Airdrie,East Airdrie,51.27245,-113.98698,16054.0
4,T5A,Edmonton,"West Clareview, East Londonderry",53.5899,-113.4413,35049.0


#### Drop Postal code column and sort value by population