## Final Capstone Project

### Business problem

Given the surge of popularity with vegetarian food, I have teamed up with 3 partners to open a restaurant in Perth, Western Australia. The goal is to determine where the restaurant should be located.

Perth is an up and coming city so the number of vegetarian restaurants are still limited compared to other capital cities in Australia. Given vegetarians are a minority compared to meat eaters, our goal would be open a restaurant in an area that is popular with diners but currently lacking in vegetarian options. This project is to determine the location of this new vegetarian restaurant with an aim to open in an area that does not have existing competition, surrounded by entertainment venues and is occupied by relatively higher income earner*. 

*We have assumed that it is more likely that vegetarians are more common amongst higher income earners.

### Data description

We will be utilising Foursquare data to locate restaurants that have been described as vegetarian and locating suburbs where concentration of entertainment venues such as cinemas or night clubs or bars is high within a 1 km radius. I will also be extracting data from the following URL to determine which suburbs are more affluent than others and overlay it with data from Foursquare https://www.microburbs.com.au/heat-map/average-income#115.89421648019166:-31.97605525547553:11

I will also be obtaining suburbs location data from the Western Australian government data service. https://catalogue.data.wa.gov.au/dataset/geographic-names-geonoma

### Data cleaning

Importing Western Australia location data and dropping all but suburbs and location data

In [45]:
import pandas as pd
import numpy as np

In [46]:
df=pd.read_csv('Suburb_location_data.csv')

In [47]:
df.head()

Unnamed: 0,zone,feature_number,feature_class,longitude,easting,latitude,feature_class_description,geographic_name,northing,date_approved,geometry
0,51,100164136,AOS,124.7325,684179,-17.248889,Aborig. Outstation,WINDJINGAYR ABORIGINAL OUTSTATION,8092087,1993-06-01,Point (124.7325 -17.2488888885)
1,52,100166704,AOS,127.60935,355595,-21.13886,Aborig. Outstation,WALGALI ABORIGINAL OUTSTATION,7661852,1993-10-27,Point (127.60935 -21.1388599993)
2,52,100166706,AOS,127.85435,381214,-21.35692,Aborig. Outstation,LAMANBANDAH ABORIGINAL OUTSTATION,7637918,1993-10-27,Point (127.85435 -21.3569199995)
3,52,100166708,AOS,127.72103,367943,-21.96525,Aborig. Outstation,BIBARRD ABORIGINAL OUTSTATION,7570468,1993-10-27,Point (127.72103 -21.9652499989)
4,52,100188056,AOS,128.424167,442250,-25.728611,Aborig. Outstation,LLURRPA BORE,7154250,2001-08-03,Point (128.424166667 -25.7286111106)


In [48]:
df.drop(['zone','feature_number','feature_class','easting','northing','date_approved','geometry'],axis=1, inplace=True)

In [49]:
df.head()

Unnamed: 0,longitude,latitude,feature_class_description,geographic_name
0,124.7325,-17.248889,Aborig. Outstation,WINDJINGAYR ABORIGINAL OUTSTATION
1,127.60935,-21.13886,Aborig. Outstation,WALGALI ABORIGINAL OUTSTATION
2,127.85435,-21.35692,Aborig. Outstation,LAMANBANDAH ABORIGINAL OUTSTATION
3,127.72103,-21.96525,Aborig. Outstation,BIBARRD ABORIGINAL OUTSTATION
4,128.424167,-25.728611,Aborig. Outstation,LLURRPA BORE


In [50]:
df_clean=df[df.feature_class_description == 'Suburb']

In [51]:
df_clean.head()

Unnamed: 0,longitude,latitude,feature_class_description,geographic_name
41251,115.81509,-32.03045,Suburb,ALFRED COVE
41252,115.6687,-31.60683,Suburb,ALKIMOS
41253,115.83871,-32.01406,Suburb,APPLECROSS
41254,115.83648,-32.02767,Suburb,ARDROSS
41255,116.0151,-32.15322,Suburb,ARMADALE


Importing and cleaning data from the income from suburb (dropping unnecessary data)

In [52]:
df_income=pd.read_csv('Suburb_income.csv')
df_income.head()

Unnamed: 0,Suburb,Income per week
0,City Beach,"$1,632"
1,Dalkeith,"$1,592"
2,Floreat,"$1,558"
3,Peppermint Grove,"$1,556"
4,Swanbourne,"$1,490"


In [53]:
#removing $ sign from income and obtain Median income per week
df_income["Income per week"] = df_income["Income per week"].str.replace(r'[^\w\s]+', '')
df_income["Income per week"] = pd.to_numeric(df_income["Income per week"])
df_income.median()

Income per week    1083.0
dtype: float64

In [54]:
#Capitalising Suburb names to merge with other dataframe later on
df_income['Suburb']=df_income['Suburb'].str.upper()

Merging the two dataframe into a single dataframe and cleaning data

In [55]:
dfmerge=df_clean.merge(df_income, how='inner', left_on='geographic_name', right_on='Suburb')
dfmerge.head()

Unnamed: 0,longitude,latitude,feature_class_description,geographic_name,Suburb,Income per week
0,115.81509,-32.03045,Suburb,ALFRED COVE,ALFRED COVE,1318
1,115.83871,-32.01406,Suburb,APPLECROSS,APPLECROSS,1389
2,115.83648,-32.02767,Suburb,ARDROSS,ARDROSS,1257
3,115.84676,-32.05461,Suburb,BATEMAN,BATEMAN,1169
4,115.95398,-32.02184,Suburb,BECKENHAM,BECKENHAM,776


In [56]:
#drop data not required for this project
dfmerge.drop(['feature_class_description','geographic_name'],axis=1, inplace=True)

### Visualising metropolitan suburbs

In [57]:
#Importing packages for map creation
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import folium

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [58]:
#Map of Meteropolitan Perth
address = 'Perth'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_perth = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(dfmerge['latitude'], dfmerge['longitude'], dfmerge['Suburb']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_perth)  
    
map_perth

In [59]:
#importing libaries
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
CLIENT_ID = 'O54CELPNUXDZJT0F2U0D2Q2MKWCEUETBSJAYXZDVEJ4CT2US' # your Foursquare ID
CLIENT_SECRET = 'RZLOFRROKA2214EMGZG44FLOKLDVG4YTH3NDBJOJDHLYLDYX' # your Foursquare Secret
VERSION = '20200222' # Foursquare API version

#### Obtaining list of entertainment and nightlife venues from Foursquare and overlay on map

In [60]:
LIMIT = 300 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=1000, categoryId='56aa371be4b08b9a8d5734db,4bf58dd8d48988d1e2931735,4bf58dd8d48988d17c941735,4bf58dd8d48988d18e941735,5032792091d4c4b30a586d5c,56aa371be4b08b9a8d573532,4bf58dd8d48988d1f1931735,4bf58dd8d48988d17f941735,4bf58dd8d48988d1e5931735,4bf58dd8d48988d1f2931735,4bf58dd8d48988d184941735,4d4b7105d754a06376d81259'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryId)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Surburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [61]:
#get a list of entertainment venues from Foursquare
perth_venues = getNearbyVenues(names=dfmerge['Suburb'],
                                   latitudes=dfmerge['latitude'],
                                   longitudes=dfmerge['longitude']
                              )

ALFRED COVE
APPLECROSS
ARDROSS
BATEMAN
BECKENHAM
BEDFORD
BICTON
BOORAGOON
BOYA
CANNINGTON
CARLISLE
CARMEL
CAVERSHAM
CITY BEACH
CLOVERDALE
COTTESLOE
DAGLISH
DALKEITH
DIANELLA
EAST CANNINGTON
EAST FREMANTLE
EAST PERTH
EDEN HILL
EMBLETON
FREMANTLE
GOOSEBERRY HILL
HELENA VALLEY
KALAMUNDA
KENWICK
LANGFORD
LESMURDIE
MADDINGTON
MAIDA VALE
MANNING
MIDDLE SWAN
MIDLAND
MORLEY
MOUNT HAWTHORN
MYAREE
NEDLANDS
NORTH FREMANTLE
NORTH PERTH
OSBORNE PARK
PEPPERMINT GROVE
RIVERVALE
ROSSMOYNE
SHENTON PARK
SOUTH GUILDFORD
SOUTH PERTH
SUBIACO
SWAN VIEW
SWANBOURNE
THORNLIE
TUART HILL
VICTORIA PARK
WEMBLEY
WEMBLEY DOWNS
WEST PERTH
WILLAGEE
WILLETTON
WILSON
YOKINE
ATTADALE
BURSWOOD
CHURCHLANDS
COOLBINIA
CRAWLEY
DOUBLEVIEW
EAST VICTORIA PARK
FLOREAT
GLENDALOUGH
GWELUP
HIGH WYCOMBE
INNALOO
JOLIMONT
JOONDANNA
KARAWARA
KARRINYUP
KEWDALE
KOONGAMIA
LATHLAIN
LEEDERVILLE
LOCKRIDGE
MARTIN
MENORA
MIDVALE
MOUNT CLAREMONT
MOUNT LAWLEY
NOLLAMARA
NORANDA
VIVEASH
WINTHROP
BALCATTA
FORRESTFIELD
HAZELMERE
SALTER POINT
KIARA
TR

In [62]:
perth_venues.head()

Unnamed: 0,Suburb,Suburb Latitude,Surburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ALFRED COVE,-32.03045,115.81509,Legends Bar,-32.031349,115.809198,Bar
1,APPLECROSS,-32.01406,115.83871,Bad Apples Bar,-32.02092,115.83491,Lounge
2,APPLECROSS,-32.01406,115.83871,Clancy's Fish Pub,-32.013137,115.848458,Pub
3,APPLECROSS,-32.01406,115.83871,Bartletts bar,-32.007362,115.838217,Bar
4,APPLECROSS,-32.01406,115.83871,Pepper wine bar,-32.013408,115.848064,Wine Bar


In [63]:
# add markers to map
for lat, lng, label in zip(perth_venues['Venue Latitude'], perth_venues['Venue Longitude'], perth_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_perth)  

map_perth

#### Obtaining list of vegetarian restaurants from Foursquare and overlay it on map

In [64]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=1000, categoryId='4bf58dd8d48988d1d3941735'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryId)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [65]:
#get vegetarian restaurants from Foursquare
vege_venues = getNearbyVenues(names=dfmerge['Suburb'],
                                   latitudes=dfmerge['latitude'],
                                   longitudes=dfmerge['longitude']
                                  )

ALFRED COVE
APPLECROSS
ARDROSS
BATEMAN
BECKENHAM
BEDFORD
BICTON
BOORAGOON
BOYA
CANNINGTON
CARLISLE
CARMEL
CAVERSHAM
CITY BEACH
CLOVERDALE
COTTESLOE
DAGLISH
DALKEITH
DIANELLA
EAST CANNINGTON
EAST FREMANTLE
EAST PERTH
EDEN HILL
EMBLETON
FREMANTLE
GOOSEBERRY HILL
HELENA VALLEY
KALAMUNDA
KENWICK
LANGFORD
LESMURDIE
MADDINGTON
MAIDA VALE
MANNING
MIDDLE SWAN
MIDLAND
MORLEY
MOUNT HAWTHORN
MYAREE
NEDLANDS
NORTH FREMANTLE
NORTH PERTH
OSBORNE PARK
PEPPERMINT GROVE
RIVERVALE
ROSSMOYNE
SHENTON PARK
SOUTH GUILDFORD
SOUTH PERTH
SUBIACO
SWAN VIEW
SWANBOURNE
THORNLIE
TUART HILL
VICTORIA PARK
WEMBLEY
WEMBLEY DOWNS
WEST PERTH
WILLAGEE
WILLETTON
WILSON
YOKINE
ATTADALE
BURSWOOD
CHURCHLANDS
COOLBINIA
CRAWLEY
DOUBLEVIEW
EAST VICTORIA PARK
FLOREAT
GLENDALOUGH
GWELUP
HIGH WYCOMBE
INNALOO
JOLIMONT
JOONDANNA
KARAWARA
KARRINYUP
KEWDALE
KOONGAMIA
LATHLAIN
LEEDERVILLE
LOCKRIDGE
MARTIN
MENORA
MIDVALE
MOUNT CLAREMONT
MOUNT LAWLEY
NOLLAMARA
NORANDA
VIVEASH
WINTHROP
BALCATTA
FORRESTFIELD
HAZELMERE
SALTER POINT
KIARA
TR

In [66]:
# add markers to map
for lat, lng, label in zip(vege_venues['Venue Latitude'], vege_venues['Venue Longitude'], vege_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_perth)  

map_perth

### Data Analysis

#### Grouping the numbers of entertainment venues and vegetarian restaurants into existing Suburb with income dataframe

In [67]:
perth_venues_counts = perth_venues['Suburb'].value_counts().to_frame()
perth_venues_counts.head(20)

Unnamed: 0,Suburb
FREMANTLE,19
WEST PERTH,16
VICTORIA PARK,12
BURSWOOD,11
SUBIACO,10
LEEDERVILLE,10
WEST LEEDERVILLE,10
EAST PERTH,10
COTTESLOE,6
NORTH FREMANTLE,6


In [68]:
perth_venues_counts.index.name = 'Suburb'
perth_venues_counts.rename(columns={'Suburb': 'Venues Counts'}, inplace=True)

In [69]:
#grouping venues by suburbs
perth_venues_counts.head()

Unnamed: 0_level_0,Venues Counts
Suburb,Unnamed: 1_level_1
FREMANTLE,19
WEST PERTH,16
VICTORIA PARK,12
BURSWOOD,11
SUBIACO,10


In [70]:
vege_venues_count=vege_venues['Suburb'].value_counts().to_frame()

In [71]:
#grouping vegetarian restaurants by suburbs
vege_venues_count.index.name = 'Suburb'
vege_venues_count.rename(columns={'Suburb': 'Vege Counts'}, inplace=True)
vege_venues_count.head()

Unnamed: 0_level_0,Vege Counts
Suburb,Unnamed: 1_level_1
FREMANTLE,5
EAST VICTORIA PARK,2
WEST PERTH,2
VICTORIA PARK,1
MYAREE,1


In [72]:
dfmerge.set_index('Suburb', inplace=True)
dfmerge.head()

Unnamed: 0_level_0,longitude,latitude,Income per week
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ALFRED COVE,115.81509,-32.03045,1318
APPLECROSS,115.83871,-32.01406,1389
ARDROSS,115.83648,-32.02767,1257
BATEMAN,115.84676,-32.05461,1169
BECKENHAM,115.95398,-32.02184,776


In [73]:
#Combining all data into a single dataframe
combinedf=pd.concat([dfmerge, perth_venues_counts, vege_venues_count], axis=1, sort=False)

In [74]:
combinedf.head(50)

Unnamed: 0,longitude,latitude,Income per week,Venues Counts,Vege Counts
ALFRED COVE,115.81509,-32.03045,1318,1.0,
APPLECROSS,115.83871,-32.01406,1389,5.0,1.0
ARDROSS,115.83648,-32.02767,1257,2.0,
BATEMAN,115.84676,-32.05461,1169,1.0,
BECKENHAM,115.95398,-32.02184,776,,
BEDFORD,115.89093,-31.9085,1069,2.0,
BICTON,115.78343,-32.02934,1293,1.0,1.0
BOORAGOON,115.83093,-32.03878,1217,1.0,
BOYA,116.05787,-31.91433,1165,,
CANNINGTON,115.93454,-32.01711,711,1.0,1.0


#### Removing all suburbs with existing vegetarian restaurants from considerations

In [75]:
finaldf = combinedf[pd.isnull(combinedf['Vege Counts'])]

In [76]:
finaldf.head()

Unnamed: 0,longitude,latitude,Income per week,Venues Counts,Vege Counts
ALFRED COVE,115.81509,-32.03045,1318,1.0,
ARDROSS,115.83648,-32.02767,1257,2.0,
BATEMAN,115.84676,-32.05461,1169,1.0,
BECKENHAM,115.95398,-32.02184,776,,
BEDFORD,115.89093,-31.9085,1069,2.0,


#### Rank remaining Suburb according to number of entertainment venues

In [77]:
finaldf.sort_values(by=['Venues Counts'], inplace=False, ascending=False)

Unnamed: 0,longitude,latitude,Income per week,Venues Counts,Vege Counts
WEST LEEDERVILLE,115.832310,-31.937390,1329,10.0,
LEEDERVILLE,115.833889,-31.936111,1279,10.0,
NORTH FREMANTLE,115.755930,-32.033500,1272,6.0,
FLOREAT,115.794260,-31.937950,1558,5.0,
ROSSMOYNE,115.762320,-32.038230,1300,4.0,
...,...,...,...,...,...
FORRESTFIELD,115.993700,-31.979890,877,,
HAZELMERE,115.992310,-31.900450,759,,
SALTER POINT,115.872040,-32.025720,1415,,
KIARA,115.938700,-31.881830,914,,


#### Rank remaining Suburb according to income

In [78]:
finaldf.sort_values(by=['Income per week'], inplace=False, ascending=False)

Unnamed: 0,longitude,latitude,Income per week,Venues Counts,Vege Counts
CITY BEACH,115.76454,-31.93434,1632,1.0,
DALKEITH,115.79982,-31.99517,1592,,
FLOREAT,115.79426,-31.93795,1558,5.0,
SWANBOURNE,115.76732,-31.96878,1490,,
NEDLANDS,115.80732,-31.98184,1484,2.0,
...,...,...,...,...,...
MIDDLE SWAN,116.02037,-31.85267,716,3.0,
LANGFORD,115.94204,-32.04434,707,2.0,
MADDINGTON,115.98982,-32.04934,701,,
LOCKRIDGE,115.94676,-31.88100,685,1.0,


In [79]:
#Drop all suburbs with income per week less than the median
df_rich= finaldf[~(finaldf['Income per week'] <= 1083)]

In [80]:
#Display map that shows suburbs that do not contain vegetarian restaurants and with higher than median income
address = 'Perth'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

considered_location = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_rich['latitude'], df_rich['longitude'], df_rich.index):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(considered_location)  
    
considered_location