## 1. Introduction

### 1.1 Background

The land Down Under has quite the passion and love for Indian restaurants and Indian cuisine. So it comes as no big surprise that Indian cuisine fan base is growing steadily from the 1960s, the time when the Aussies had begun their travel to India. 

Besides the obvious reason of being amazingly delicious, growth of Indian cuisine’s popularity is also due to the increase in Indians migrating to Australia and the sizeable presence of the Indian community here.



### 1.2 Problem & Interest

This project aims to provide all those Indian food lovers a place to gain insights about the location of their favourite Indian restaurants and grocery stores in the greater Sydney area.  

## 2. Data Acquisition and Processing

Sydney Metropolitan Area can be broadly classified into three regions and a total of 689 suburbs. In order to segment the suburbs and explore them, a dataset that contains the 3 regions and the suburbs that exist in each region as well as the the latitude and longitude coordinates of each suburb is needed.

### 2.1 Data Sources

The data needed for this analysis is being sourced from multiple locations as stated below:

- List of Sydney suburbs along with their postcodes will be obtained from the below URL:

    https://www.intosydneydirectory.com.au/sydney-postcodes.php  
    
    
- Training Services NSW provides the regional classification of Sydney Metropolitan Area and their mapping to the postcodes below:

    https://www.training.nsw.gov.au/about_us/postcodes_byregion.html
    

- And the latitude and longitude co-ordinates for Australian postcodes is found in the link below:

    http://www.corra.com.au/australian-postcode-location-data/

    
- Indian restaurants and grocery stores location in every Sydney suburb will be obtained using Foursquare API


### 2.2 Data Processing

To obtain the list of Sydney metropolitan suburbs including their region, postcode, latitude and longitude information in single table, data from each of the above mentioned sources had to be downloaded separately and combined into one file.  While the data was extracted manually into spreadsheets, combining the different datasets was done using the VLOOKUP function and the merged data was saved in a csv file.  

##### Below I will load the processed csv file and transform the data into a pandas dataframe for further analysis. 

Before I commence loading the file I will download all the dependencies that are needed to load and explore the data first.  

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [2]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes credentials.
# Removing those credentials before for sharing the notebook.

# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_sydney_suburbs = pd.read_csv(body)
df_sydney_suburbs.head()



Unnamed: 0,Region,Suburb,Postcode,Latitude,Longitude
0,Central & Northern Sydney,Barangaroo,2000,-33.855601,151.20822
1,Central & Northern Sydney,Dawes Point,2000,-33.855601,151.20822
2,Central & Northern Sydney,Haymarket,2000,-33.855601,151.20822
3,Central & Northern Sydney,Millers Point,2000,-33.877718,151.205723
4,Central & Northern Sydney,Sydney,2000,-33.867139,151.207114


##### Making sure that the resulting dataframe includes all 3 regions and 689 suburbs..

In [3]:
print('The dataframe has {} regions and {} suburbs.'.format(
        len(df_sydney_suburbs['Region'].unique()),
        df_sydney_suburbs.shape[0]
    )
)

The dataframe has 3 regions and 689 suburbs.


##### Using geopy library to get the latitude and longitude values of Sydney City

In [4]:
address = 'Sydney, NSW'

geolocator = Nominatim(user_agent="syd_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Sydney are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Sydney are -33.8548157, 151.2164539.


##### Creating a map of Sydney with suburbs superimposed on top for visualization

In [5]:
# create map of Sydney using latitude and longitude values
map_sydney = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, region, suburb in zip(df_sydney_suburbs['Latitude'], df_sydney_suburbs['Longitude'], df_sydney_suburbs['Region'], df_sydney_suburbs['Suburb']):
    label = '{}, {}'.format(suburb, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sydney)  
    
map_sydney

##### Below I utilize the Foursquare API to explore the suburbs and segment them.

In [1]:
CLIENT_ID = 'XXXX' # removing the Foursquare credentials for sharing the notebook
CLIENT_SECRET = 'XXXX' # removing the Foursquare credentials for sharing the notebook
VERSION = '20180605' # Foursquare API version
LIMIT = 100
#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

##### Exploring my resident suburb in the dataframe..

Getting the suburb's name..

In [7]:
df_sydney_suburbs.loc[313, 'Suburb']

'Parramatta'

Getting the suburb's latitude and longitude values..

In [8]:
suburb_latitude = df_sydney_suburbs.loc[313, 'Latitude'] # suburb latitude value
suburb_longitude = df_sydney_suburbs.loc[313, 'Longitude'] # suburb longitude value

suburb_name = df_sydney_suburbs.loc[313, 'Suburb'] # suburb name

print('Latitude and longitude values of {} are {}, {}.'.format(suburb_name, 
                                                               suburb_latitude, 
                                                               suburb_longitude))

Latitude and longitude values of Parramatta are -33.822427000000005, 151.008961.


Defining a query to search for Indian food related venues that is within 750 metres from Barangaroo

In [9]:
search_query = 'Indian'
radius = 500
print(search_query + ' .... OK!')

Indian .... OK!


Defining the corresponding URL..

In [10]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, suburb_latitude, suburb_longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=LIPOB1ZHS4HTENRJ41X3B0XIZUMC0NZFOAOKZA200S03RIKJ&client_secret=WYWHPQY1TPFAKCJXY2FN4JNEXQOQ2LBDF40FKGA3Z11CSDLW&ll=-33.822427000000005,151.008961&v=20180605&query=Indian&radius=500&limit=100'

Sending the GET request and examining the results

In [11]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ccf3ceb9fb6b7571a79795a'},
 'response': {'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'primary': True,
      'shortName': 'Indian'}],
    'hasPerk': False,
    'id': '4cd52943a5b34688b83a8c50',
    'location': {'address': '91 Wigram St',
     'cc': 'AU',
     'country': 'Australia',
     'distance': 217,
     'formattedAddress': ['91 Wigram St', 'Harris Park NSW 2150', 'Australia'],
     'labeledLatLngs': [{'label': 'display',
       'lat': -33.820522,
       'lng': 151.00948}],
     'lat': -33.820522,
     'lng': 151.00948,
     'postalCode': '2150',
     'state': 'New South Wales'},
    'name': 'Taj Indian Sweets & Restaurant',
    'referralId': 'v-1557085419'},
   {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food

Getting relevant part of JSON and transform it into a pandas dataframe

In [12]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId
0,"[{'primary': True, 'id': '4bf58dd8d48988d10f94...",False,4cd52943a5b34688b83a8c50,91 Wigram St,AU,,Australia,,217,"[91 Wigram St, Harris Park NSW 2150, Australia]","[{'lng': 151.00948, 'label': 'display', 'lat':...",-33.820522,151.00948,2150,New South Wales,Taj Indian Sweets & Restaurant,v-1557085419
1,"[{'primary': True, 'id': '54135bf5e4b08f3d2429...",False,5af7bd34838e59002c67158b,"42 Marion st,",AU,,Australia,,56,"[42 Marion st,, Harris Park NSW 2150, Australia]","[{'lng': 151.0084, 'label': 'display', 'lat': ...",-33.82222,151.0084,2150,New South Wales,Amaravathi Indian Restaurant,v-1557085419
2,"[{'primary': True, 'id': '4bf58dd8d48988d10f94...",False,530051d7498ef1d57b83a062,53 Marion St,AU,Sydney,Australia,Harris Park,61,"[53 Marion St (Harris Park), Sydney NSW 2150, ...","[{'lng': 151.0083261621207, 'label': 'display'...",-33.822266,151.008326,2150,NSW,Choice Indian Fast food,v-1557085419
3,"[{'primary': True, 'id': '4bf58dd8d48988d10f94...",False,4c9d7b22031337047abf5fd5,"77 Wigram St,",AU,,Australia,,135,"[77 Wigram St,, Harris Park NSW 2150, Australia]","[{'lng': 151.0091837953755, 'label': 'display'...",-33.821227,151.009184,2150,New South Wales,Chopsticks - Indian Chinese Cuisine,v-1557085419
4,"[{'primary': True, 'id': '4bf58dd8d48988d10f94...",False,4c4bd8f946240f47b45fe6f3,94 Wigram St,AU,,Australia,at Ada St,175,"[94 Wigram St (at Ada St), Harris Park NSW 215...","[{'lng': 151.0090824509468, 'label': 'display'...",-33.820857,151.009082,2150,New South Wales,Ginger Indian Restaurant,v-1557085419


Defining information of interest and filtering dataframe

In [13]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Taj Indian Sweets & Restaurant,Indian Restaurant,91 Wigram St,AU,,Australia,,217,"[91 Wigram St, Harris Park NSW 2150, Australia]","[{'lng': 151.00948, 'label': 'display', 'lat':...",-33.820522,151.00948,2150.0,New South Wales,4cd52943a5b34688b83a8c50
1,Amaravathi Indian Restaurant,South Indian Restaurant,"42 Marion st,",AU,,Australia,,56,"[42 Marion st,, Harris Park NSW 2150, Australia]","[{'lng': 151.0084, 'label': 'display', 'lat': ...",-33.82222,151.0084,2150.0,New South Wales,5af7bd34838e59002c67158b
2,Choice Indian Fast food,Indian Restaurant,53 Marion St,AU,Sydney,Australia,Harris Park,61,"[53 Marion St (Harris Park), Sydney NSW 2150, ...","[{'lng': 151.0083261621207, 'label': 'display'...",-33.822266,151.008326,2150.0,NSW,530051d7498ef1d57b83a062
3,Chopsticks - Indian Chinese Cuisine,Indian Restaurant,"77 Wigram St,",AU,,Australia,,135,"[77 Wigram St,, Harris Park NSW 2150, Australia]","[{'lng': 151.0091837953755, 'label': 'display'...",-33.821227,151.009184,2150.0,New South Wales,4c9d7b22031337047abf5fd5
4,Ginger Indian Restaurant,Indian Restaurant,94 Wigram St,AU,,Australia,at Ada St,175,"[94 Wigram St (at Ada St), Harris Park NSW 215...","[{'lng': 151.0090824509468, 'label': 'display'...",-33.820857,151.009082,2150.0,New South Wales,4c4bd8f946240f47b45fe6f3
5,Haveli Indian Restaurant,Indian Restaurant,67 Wigram Street,AU,,Australia,,93,"[67 Wigram Street, Harris Park NSW, Australia]","[{'lng': 151.009254, 'label': 'display', 'lat'...",-33.821627,151.009254,,New South Wales,4d75c18f74eca093f4fcb2a8
6,Sabzee Indian Supermarket,Supermarket,,AU,,Australia,,116,[Australia],"[{'lng': 151.009084, 'label': 'display', 'lat'...",-33.821388,151.009084,,,536da5c0498e3a58b46f9642
7,Grandmaa's Indian Restaurant,Indian Restaurant,42 Station Street East,AU,,Australia,,218,"[42 Station Street East, Harris Park NSW 2150,...","[{'lng': 151.00815, 'label': 'display', 'lat':...",-33.82058,151.00815,2150.0,New South Wales,58e0621704f4d747c7c2dcf4
8,Celebrations Indian Restaurant,Indian Restaurant,"Suite 1/52 Station Street East, Harris Park",AU,Parramatta,Australia,,367,"[Suite 1/52 Station Street East, Harris Park, ...","[{'lng': 151.00739961094294, 'label': 'display...",-33.819387,151.0074,2150.0,NSW,4e7f0452e5fa5ad2e5669c41
9,Handi Lazeez Indian Restaurant,Indian Restaurant,,AU,Parramatta,Australia,,541,"[Parramatta NSW 2150, Australia]","[{'lng': 151.005426, 'label': 'display', 'lat'...",-33.818548,151.005426,2150.0,NSW,59280b6204d1ae4b863fd066


Visualizing the venues that are nearby Parramatta suburb..

In [26]:
venues_map = folium.Map(location=[suburb_latitude, suburb_longitude], zoom_start=15) # generate map centred around Parramatta

# add a red circle marker to represent Parramatta
folium.features.CircleMarker(
    [suburb_latitude, suburb_longitude],
    radius=10,
    color='red',
    popup='Parramatta',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Indian restaurants/grocery stores as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=folium.Popup(label, parse_html=True),
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)
# display map
venues_map

##### Now extending the same process to all the suburbs in Sydney using a function..

In [27]:
def getNearbyVenues(names, latitudes, longitudes, search_query = 'Indian', radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            lat, 
            lng, 
            VERSION, 
            search_query, 
            radius, 
            LIMIT)
            
        # make the GET request
        
        try:
            results = requests.get(url).json()['response']['venues']
            #print(results)
            #return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],  
                v['categories'][0]['name']) for v in results])
            #print(venues_list)
        except:
            pass
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    #return results
    #print(nearby_venues)
    return(nearby_venues)
    

In [28]:
sydney_venues = getNearbyVenues(names=df_sydney_suburbs['Suburb'],
                                   latitudes=df_sydney_suburbs['Latitude'],
                                   longitudes=df_sydney_suburbs['Longitude']
                                  )

In [29]:
print(sydney_venues.shape)
sydney_venues.head()

(240, 7)


Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Millers Point,-33.877718,151.205723,Mirchi Indian Malaysian,-33.876781,151.206917,Indian Restaurant
1,Millers Point,-33.877718,151.205723,Indian Excellency,-33.877494,151.20767,Indian Restaurant
2,Millers Point,-33.877718,151.205723,One Mb- Indian,-33.879289,151.20796,Indian Restaurant
3,Millers Point,-33.877718,151.205723,Nirvana New Indian,-33.874297,151.2069,Indian Restaurant
4,Millers Point,-33.877718,151.205723,Naked Indiana,-33.880824,151.209604,Vegetarian / Vegan Restaurant


In [31]:
sydney_venues_map = folium.Map(location=[-33.8688, 151.2093], zoom_start=12) # generate map centred around Sydney

# add a red circle marker to represent suburbs
for lat, lng, label in zip(sydney_venues['Suburb Latitude'], sydney_venues['Suburb Longitude'], sydney_venues['Suburb']):
    folium.features.CircleMarker(
        [lat, lng],
        radius=10,
        color='red',
        popup=folium.Popup(label, parse_html=True),
        fill = True,
        fill_color = 'red',
        fill_opacity = 0.6
    ).add_to(sydney_venues_map)

# add the Indian restaurants/grocery stores as blue circle markers
for lat, lng, label in zip(sydney_venues['Venue Latitude'], sydney_venues['Venue Longitude'], sydney_venues['Venue']):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=folium.Popup(label, parse_html=True),
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(sydney_venues_map)
# display map
sydney_venues_map