# **IBM Data Science – Capstone**

**Choosing the Best Cities for Middle Eastern Eateries in Australia**:

An exploratory analysis of Sydney, Melbourne, Canberra, Brisbane and Perth

The notebook can be viewed here: https://colab.research.google.com/drive/1F7bI8IXsrmp3evGtEz1jfbBX2G75wEmf



(A detailed report can be found on my Git repository.)


# **Importing Necessary Libraries Pandas, Numpy and Folium**

In [0]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests 
from pandas.io.json import json_normalize #we will use this normalize our nested JSON file from Foursquare
import folium # Map library

print('Libraries imported.')

Libraries imported.


# **As our project is Map intensive, we will just play around with Folium as see whether the map coordinates are working properly for Australia and Sydney for example**

In [0]:
Aus_m = folium.Map([-25.2743988, 133.7751312], zoom_start=5)
Aus_m

ERROR! Session/line number was not unique in database. History logging moved to new session 61


In [0]:
Sydney_m = folium.Map([-33.870453, 151.208755], zoom_start=10)
Sydney_m

# **Once we are up and running, we will set up our foursquare API using our client ID and Secret, and combine 5 queries (for 5 cities) into one below. We will also limit the queries to each city to a 100.** 

**We also use the Foursquare distinct code for Middle Eastern restaurants:** 4bf58dd8d48988d115941735

In [0]:
CLIENT_ID = 'KEY HIDDEN FOR REVIEW' # your Foursquare ID
CLIENT_SECRET = 'KEY HIDDEN FOR REVIW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


In [0]:
LIMIT = 100 #limiting queries to a 100
cities = ['Sydney, NSW', 'Melbourne, VIC', 'Perth, WA', 'Canberra, ACT', 'Brisbane, QLD'] #our five target cities and thier states
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d115941735") # Foursquare Code for Middle Eastern Eateries
    results[city] = requests.get(url).json()

## **In this step we are now normalising our Json data and creating a dictionary object with Name (of the Venue), Address, Lat and Long**

In [0]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']


# **After normalizing our data, we get the Name, Address, Lat and Long coordinates**

In [0]:
df_venues

{'Brisbane, QLD':                                               Name  \
 0                                             Naïm   
 1                          Gad's: Charcoal Chicken   
 2                                           Byblós   
 3   Arabella’s Charcoal and Middle Eastern Cuisine   
 4                        Sunshine Kebabs Underwood   
 5   Arabella's Charcoal and Middle Eastern Cuisine   
 6                                      Baba Ganouj   
 7                                  Watany Manoushi   
 8                                  Sinbad's Kebabs   
 9                         Baalbek Lebanese Cuisine   
 10                                            Naïm   
 11                                     1001 Nights   
 12                                     ISPA Kebabs   
 13                        Farah Persian Restaurant   
 14                                     Baba Ganouj   
 15                         Rockys Bakehouse & Cafe   
 16                       King Ahiram Lebanese F

In [0]:
df_venues.keys()

dict_keys(['Sydney, NSW', 'Melbourne, VIC', 'Perth, WA', 'Canberra, ACT', 'Brisbane, QLD'])

# **As we can see above, we have 5 keys of 5 cities, so, for maps we can either use their index numbers 0,1,2,3 and 4 respectively or simply their names. Moreover, we will plot our data on the maps for respective cities**

In [0]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])

In [0]:
maps[cities[0]] #Sydney

In [0]:
maps[cities[1]] #Melbourne

In [0]:
maps[cities[2]] #Perth

In [0]:
maps[cities[3]] #Canberra

In [0]:
maps[cities[4]] #Brisbane

# **Now that we can see the restaurant clusters on the maps for each city, and might see that Sydney and Melbourne might be the best option, we need to do indepth analysis. This is because Sydney is has a bigger Central Business District compared to other cities, and, therefore, the cluster might no present the true picture. Hence, we will take the central point of each city and calculate the mean distance between the center and all the locations**

In [0]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="orange", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Sydney, NSW
Mean Distance from Mean coordinates
0.11600453159957849
Melbourne, VIC
Mean Distance from Mean coordinates
0.037171818747818525
Perth, WA
Mean Distance from Mean coordinates
0.04414071939304621
Canberra, ACT
Mean Distance from Mean coordinates
0.07499200150714927
Brisbane, QLD
Mean Distance from Mean coordinates
0.0646339126548971


In [0]:
maps[cities[0]] #SYDNEY 

In [0]:
maps[cities[1]] #MELBOURNE

In [0]:
maps[cities[2]] #PERTH


In [0]:
maps[cities[3]] #CANBERRA

In [0]:
maps[cities[4]] #BRISBANE

## **In the steps below, we remove maximum outliers from all  the values to get an average value minus the outliers. The following values are achieved which show that Melbourne has the least distance from central point to explore Middle Eastern restaurants in the city:**


**Sydney, NSW**

Mean Distance from Mean coordinates: 0.11287612594187721

**Canberra, ACT**

Mean Distance from Mean coordinates: 0.07047281616415198

**Melbourne, VIC**

Mean Distance from Mean coordinates: 0.03318223070418223

**Perth, WA**

Mean Distance from Mean coordinates: 0.04176312230297355

**Brisbane, QLD**

Mean Distance from Mean coordinates: 0.05986191980294174




In [0]:
city = 'Sydney, NSW'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))

Sydney, NSW
Mean Distance from Mean coordinates
0.11287612594187721


In [0]:
city = 'Canberra, ACT'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))

Canberra, ACT
Mean Distance from Mean coordinates
0.07047281616415198


In [0]:
city = 'Melbourne, VIC'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))

Melbourne, VIC
Mean Distance from Mean coordinates
0.03318223070418223


In [0]:
city = 'Perth, WA'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))

Perth, WA
Mean Distance from Mean coordinates
0.04176312230297355


In [0]:
city = 'Brisbane, QLD'
venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 

print(city)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-1]))

Brisbane, QLD
Mean Distance from Mean coordinates
0.05986191980294174
