## Opening A New Restaurant in Canberra Using Machine Learning
*****

### Business Problem
*****
This project will aim to make an educated prediction of what might be the best place to open a new restaurant using the Foursquare API and machine learning learning approach. There are over 200 restaurants in the city of Canberra. So, when opening a new restaurant, one of the most important decision an owner can make is to choose the location wisely. 

Canberra is the capital city of Australia and has an estimated population of about 447,692 in the year 2018. According to the 2016 ABS Census data, the median age of residents is about 37 years, and about <sup>1</sup>/<sub>3</sub><sup>rd</sup> of the population is under the age of 25 years. So, naturally, a lot of them like to go out during the weekends. Also, Canberra is home to one of the largest universities in Australia (The Australian National University) and other established universities like University of Canberra and Canberra Institute of Technology. These universities attract a lot of international students and academics every year. 

So, it's safe to say that Canberra is also culturally rich and ethnically diverse. It's also a crucial decision to decide what cuisine to serve at the restaurant. To help the owner make these decision, I want to use Machine Learning and Foursquare API to come up with a solution that can predict which area/suburb would be the perfect spot to open a new restaurant to maximise footfall. Also, Foursquare API and the Yelp API can be utilised to determine what kind of cuisine is preferred by the people of a suburb/area.

In this way, I think I can make fairly good prediciton for the choice of a restaurant in the city, and this would help maximise the profit of the owner. 


### Data
*****
Solving this problem through machine learning is going to take a lot of data related to the people of Canberra and their food choices. To accumulate this data I'll aim to use one or more of these sources -

#### Foursquare API

The Foursquare API will help me get the data on which suburbs have what type of buildings/dwellings. I will query the API to get the number of each type of buildings in an area and map that out on the map of Canberra. This makes a difference in the overall decision to open the restaurant, as opening a restaurant near office spaces, residential area can be pretty advantageous for business.

#### SafeGraph GPS 

I can also utilise SafeGraph's GPS data to determine the footfall in a particular area/suburb in Canberra. SafeGraph provides the anonymous GPS movement data of its users, which is very useful to determine the footfall in an area of the city. An area with higher footfall but comparitively lesser number of restaurants can be a good place to open a restaurant.

#### Yelp / Foursquare

I also want to utilise Yelp or Foursquare API to get people's reviews on resturants of an area. If a particular cuisine's reviews are not very good in a suburb, then maybe it would be a good idea to open a new restaurant featuring that particular cuisine in that area. 

So, in this way, I can make an educated guess as to where to open a restaurant in Canberra.

In [156]:
import json
import pandas as pd
import numpy as np
import urllib3
import warnings
from googleplaces import GooglePlaces, types, lang
from unidecode import unidecode 
import time
import requests
warnings.simplefilter(action='ignore', category=FutureWarning)
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
from pandas.io.json import json_normalize
import folium
from folium import plugins

In [200]:
http = urllib3.PoolManager()
client_id = 'PW5JOBUR1JZUSJM5NOATIVXPSJVFET0C4NEESACDN4RXJN4H'
client_secret = '2XYJJ53EVPBMKLETY3ARM0FET2Q0FW4IHKXYKKCM2NOUZPYF'
foursquare = http.request('GET', 'https://api.foursquare.com/v2/venues/search?client_id=' + client_id
                          + '&client_secret='+ client_secret 
                          +'&v=20180602&ll=-35.28346,149.12807&query=food')
food_foursquare = json.loads(foursquare.data.decode('UTF-8'))
foursquare_results = food_foursquare['response']['venues']

# print(food_foursquare)
# print("API Response Status:", food_dict['meta']['code'])
print("Foursquare API - No. of restaurants in Canberra:", len(foursquare_results),"\n")
# for name in food_dict['response']['venues']:
#     print(name['name'])
df_foursquare = json_normalize(food) # flatten JSON
df_foursquare = df_foursquare[['name', 'location.postalCode', 'location.lat', 'location.lng', 'location.address']]

Foursquare API - No. of restaurants in Canberra: 30 



In [182]:
def google_places_results(s_types):

    API_KEY = 'AIzaSyCBWhdOTs5-a6KveEsX4owSIR5nU22JOys'
    # url variable store url 
    url = "https://maps.googleapis.com/maps/api/place/textsearch/json?"
  
    google_places = GooglePlaces(API_KEY)
    query_result = google_places.nearby_search(
                   lat_lng={'lat' : -35.28346, 'lng' : 149.12807}, 
                   radius=5000,
                   types= s_types)

    time.sleep(10)  

    df_google = pd.DataFrame(columns = ['name', 'rating', 'lat','lng', 'address'])
    ctr = 0

    for place in query_result.places:
        place.get_details()
        for key, item in place.geo_location.items():
            df_google.loc[ctr, 'name'] = place.name
            df_google.loc[ctr, 'rating'] = place.rating
            df_google.loc[ctr, key] = item
            df_google.loc[ctr, 'address'] = place.formatted_address
        ctr += 1
        # print(unidecode(place.name))

    if query_result.has_next_page_token:
        query_result_next_page = google_places.nearby_search(
            lat_lng={'lat' : 46.1667, 'lng' : -1.15}, 
            radius=5000,
            types=[types.TYPE_RESTAURANT] or [types.TYPE_CAFE] or [type.TYPE_BAR] or 
                  [type.TYPE_CASINO], pagetoken=query_result.next_page_token)
        for pl in query_result_next_page.places:
            pl.get_details()
            for key, item in place.geo_location.items():
                df_google.loc[ctr, 'name'] = pl.name
                df_google.loc[ctr, 'rating'] = pl.rating
                df_google.loc[ctr, key] = item
                df_google.loc[ctr, 'address'] = pl.formatted_address
            ctr += 1
            # print(unidecode(pl.name))

        if query_result_next_page.has_next_page_token:
            query_result_third_page = google_places.nearby_search(
                lat_lng={'lat' : 46.1667, 'lng' : -1.15}, 
                radius=5000,
                types=[types.TYPE_RESTAURANT] or [types.TYPE_CAFE] or [type.TYPE_BAR] or 
                      [type.TYPE_CASINO], pagetoken=query_result_next_page.next_page_token)
            for pl in query_result_third_page.places:
                pl.get_details()
                for key, item in place.geo_location.items():
                    df_google.loc[ctr, 'name'] = pl.name
                    df_google.loc[ctr, 'rating'] = pl.rating
                    df_google.loc[ctr, key] = item
                    df_google.loc[ctr, 'address'] = pl.formatted_address
                ctr += 1
                # print(unidecode(pl.name))
        
    return df_google

In [199]:
s_types = [types.TYPE_RESTAURANT]
df_google_restaurants = google_places_results(s_types)
s_types = [types.TYPE_CAFE]
df_google_cafes = google_places_results(s_types)
s_types = [types.TYPE_FOOD]
df_google_food = google_places_results(s_types)
df_google_food_pts = pd.concat([df_google_cafes, df_google_restaurants, df_google_food], ignore_index=True)
df_google_food_pts.head()

Unnamed: 0,name,rating,lat,lng,address
0,Adina Serviced Apartments Canberra Dickson (fo...,4.2,-35.2595,149.133829,"45 Dooring St, Dickson ACT 2602, Australia"
1,Hotel Kurrajong Canberra,4.2,-35.30753000000001,149.134518,"8 National Circuit, Barton ACT 2600, Australia"
2,Rolls Choice Cafe,,-35.2792228,149.1318656,"180 City Walk, Canberra ACT 2601, Australia"
3,Xchange on London,4.0,-35.2816936,149.1249786,"7 London Circuit, Canberra ACT 2601, Australia"
4,NewActon,4.2,-35.2841887,149.1245382,"Cnr Marcus Clarke St & Edinburgh Ave, Canberra..."


In [202]:
h_map = folium.Map([-35.2809, 149.1300], zoom_start=12)

# convert to (n, 2) nd-array format for heatmap
google_rest = df_google_restaurants[['lat', 'lng']].as_matrix()
google_food = df_google_food[['lat', 'lng']].as_matrix()
google_cafe = df_google_cafes[['lat', 'lng']].as_matrix()
foursquare_rest = df_foursquare[['location.lat', 'location.lng']].as_matrix()

# plot heatmap
h_map.add_child(plugins.HeatMap(foursquare_rest, radius=20))
h_map.add_child(plugins.HeatMap(google_rest, radius=20))
h_map.add_child(plugins.HeatMap(google_food, radius=20))
h_map.add_child(plugins.HeatMap(google_cafe, radius=20))

In [203]:
m_map = folium.Map([-35.2809, 149.1300], zoom_start=12)

for index, row in df_foursquare.iterrows():
    folium.CircleMarker([row['location.lat'], row['location.lng']],
                         radius=5,
                         fill=True,
                         color = 'red',
                         popup=folium.Popup(row['name'],parse_html=True)
                       ).add_to(m_map)
    
for index, row in df_google_restaurants.iterrows():
    folium.CircleMarker([row['lat'], row['lng']],
                        radius=5,
                        fill=True,
                        color = 'red',
                        popup=folium.Popup(row['name'],parse_html=True)
                       ).add_to(m_map)
    
for index, row in df_google_cafes.iterrows():
    folium.CircleMarker([row['lat'], row['lng']],
                        radius=5,
                        fill=True,
                        color = 'red',
                        popup=folium.Popup(row['name'],parse_html=True)
                       ).add_to(m_map)

for index, row in df_google_food.iterrows():
    folium.CircleMarker([row['lat'], row['lng']],
                        radius=5,
                        fill=True,
                        color = 'red',
                        popup=folium.Popup(row['name'],parse_html=True)
                       ).add_to(m_map)
m_map

As we can see from  these markers, the suburbs of **City** and **Braddon** are densely covered with restaurants and cafes. this is primarily because of the presence of **Australian National University** in the nearby Acton suburb, and the presence of many **government and corporate offices** in City area. Now, let's see which susurbs of the city have more number of offices.   

Another well-covered region is the **Griffith and Kingston** suburb of **South Canberra** due to the presence of many government buildings and lake-side premium real estate. To confirm our hypotheses and get an idea of where all the office buildings and corporate businesses lie in Canberra, let's map out the government and corporate offices (and other places) that maybe of interest to a restaurant as a clientele. 

In [190]:
s_types = [types.TYPE_ADMINISTRATIVE_AREA_LEVEL_1]
df_google_admin1 = google_places_results(s_types)
s_types = [types.TYPE_ADMINISTRATIVE_AREA_LEVEL_2]
df_google_admin2 = google_places_results(s_types)
s_types = [types.TYPE_ADMINISTRATIVE_AREA_LEVEL_3]
df_google_admin3 = google_places_results(s_types)
s_types = [types.TYPE_ACCOUNTING]
df_google_accnt = google_places_results(s_types)
s_types = [types.TYPE_BANK]
df_google_banks = google_places_results(s_types)
s_types = [types.TYPE_EMBASSY]
df_google_embassy = google_places_results(s_types)
s_types = [types.TYPE_FINANCE]
df_google_finance = google_places_results(s_types)
s_types = [types.TYPE_INSURANCE_AGENCY]
df_google_insur = google_places_results(s_types)
s_types = [types.TYPE_SCHOOL]
df_google_schools = google_places_results(s_types)
s_types = [types.TYPE_MUSEUM]
df_google_museums = google_places_results(s_types)

In [207]:
df_google_work = pd.concat([df_google_accnt, df_google_admin1, df_google_admin2, df_google_admin3,
                            df_google_banks, df_google_finance, df_google_embassy, df_google_insur,
                            df_google_museums, df_google_schools], ignore_index=True)
df_google_work
h_map_work = folium.Map([-35.2809, 149.1300], zoom_start=12)

# convert to (n, 2) nd-array format for heatmap
google_work = df_google_work[['lat', 'lng']].as_matrix()
# plot heatmap
h_map_work.add_child(plugins.HeatMap(google_work, radius=20))

Now, let's plot these workplaces (potential clientele) onto our existing map of restaurants in Canberra.

In [198]:
for index, row in df_google_work.iterrows():
    folium.CircleMarker([row['lat'], row['lng']],
                        radius=5,
                        fill=True,
                        color = 'green',
                        popup=folium.Popup(row['name'],parse_html=True)
                       ).add_to(m_map)
m_map

So, as we can from the above map, there are two regions of Canberra which have a good number of workplaces,
* The suburb of **Campbell**

* The suburb of **Parkes**

but, both of these suburbs are reasonably untapped in terms of food joints. 

* **Campbell** has a number of schools and sports grounds, so it would be a good educated guess to assume that the area would be regularly visited visited by people aged between 15-30. This age group is drawn towards fast food or casual dining joints more than luxury dining.


* Alternatively, **Parkes** has a number of museums, government offices and workplaces. So, the visitors here would be more inclined to go to a high-end restaurant to meet colleagues or clients for lunch or dinner. This means, a luxury dining restaurant would be a more rational choice for this suburb.

So, these two locations, in my opinion would be good places to open a new restaurant, depending on what type of restaurant an owner would like to open.