# Clustering Restaurants in Sydney

**By Harry Ngo**    
This notebook is a continuation of my first project "[Exploring areas and venues in Sydney NSW, Australia](https://medium.com/@harryngo/exploring-areas-and-venues-of-sydney-nsw-australia-88c0cf4f3da2)". It will focus on restaurants and food related shops around stations in Sydney.

## Introduction

The aim of the project is to explore different restaurants and food cuisines in Sydney. We will use Foursquare API and K-means clustering to cluster similar areas of Sydney which have similar features. The target audience for this project are people who live in Sydney, especially public transport commuters, looking for areas where they would like to eat a certain food cuisine. 

In [1]:
import pandas as pd

In [8]:
df = pd.read_csv('sydney_trains.csv')
df.drop(['Unnamed: 0'], axis=1, inplace=True)
df.head()

Unnamed: 0,Station,Latitude,Longitude
0,Allawah,-33.9697,151.1145
1,Arncliffe,-33.9362,151.1473
2,Artarmon,-33.8088,151.1851
3,Ashfield,-33.8876,151.1259
4,Asquith,-33.6887,151.1081


This data set contains all of Sydney's train and Metro stations, along with their respective coordinates in latitude and longitude. This will allow us to visualise the locations on a map of Sydney. The locations of stations were chosen as the majority of Sydney workers travel by public transport. Restaurants close to stations provide convenient options for residents.

Before we begin our analysis, the following libraries will be imported.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [32]:
print('The dataframe has', len(df['Station']), 'stations.')

The dataframe has 181 stations.


## Methodology and Analysis

Use geopy library to get the latitude and longitude values of Sydney

In [33]:
# Get geographical coordinates of Sydney, Australia
address = "Sydney, AU"

geolocator = Nominatim(user_agent="sydney_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Sydney, AU are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Sydney, AU are -33.8548157, 151.2164539.


Create a map of Sydney with stations superimposed on top

In [43]:
# create map of Sydney using latitude and longitude values
map_sydney = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, stat in zip(df['Latitude'], df['Longitude'], df['Station']):
    label = '{}'.format(stat)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sydney)  
    
map_sydney

Define Foursquare Credentials

In [44]:
CLIENT_ID = 'OOJQPJJQ53MGWUAXT0GUIXS54AC40GFD3WOVOYRCJ0QWIDWL' # your Foursquare ID
CLIENT_SECRET = '3ZSURTR2Q3344PUWY55GBI1VHP5AHFJQLBXAJZU0QK4E3EQD' # your Foursquare Secret
VERSION = '20180605' 

Create a function which retrieves all venues within the 'Food' Category (ID: 4d4b7105d754a06374d81259) for all of our data.

In [72]:
LIMIT = 300 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['venues']
        except KeyError:
            print("No food venues: {} ".format(name))
            continue;
            
        if len(results) == 0:
            print("No food venues: {} ".format(name))
        
        # return only relevant information for each nearby venue
        for v in results:
            try:
                venues_list.append([(
                    name, 
                    lat, 
                    lng, 
                    v['name'], 
                    v['location']['lat'], 
                    v['location']['lng'],  
                    v['categories'][0]['name'])])
            except IndexError:
                print("Index Error: {}".format(v['name']))

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [73]:
sydney_venues = getNearbyVenues(names=df['Station'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Allawah
No food venues: Allawah 
Arncliffe
No food venues: Arncliffe 
Artarmon
No food venues: Artarmon 
Ashfield
No food venues: Ashfield 
Asquith
No food venues: Asquith 
Auburn
No food venues: Auburn 
Banksia
No food venues: Banksia 
Bankstown
No food venues: Bankstown 
Bardwell Park
No food venues: Bardwell Park 
Beecroft
No food venues: Beecroft 
Bella Vista
No food venues: Bella Vista 
Belmore
No food venues: Belmore 
Berala
No food venues: Berala 
Berowra
No food venues: Berowra 
Beverly Hills
No food venues: Beverly Hills 
Bexley North
No food venues: Bexley North 
Birrong
No food venues: Birrong 
Blacktown
No food venues: Blacktown 
Bondi Junction
No food venues: Bondi Junction 
Burwood
No food venues: Burwood 
Cabramatta
No food venues: Cabramatta 
Campbelltown
No food venues: Campbelltown 
Campsie
No food venues: Campsie 
Canley Vale
No food venues: Canley Vale 
Canterbury
No food venues: Canterbury 
Caringbah
No food venues: Caringbah 
Carlton
No food venues: Carlton 
Carra

ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements

In [74]:
print('{} food venues were returned by Foursquare.'.format(sydney_venues.shape[0]))

1438 food venues were returned by Foursquare.
