# **Capstone Project**

## **Table of contents**
* [Background](#background_discussion)
* [Requirement](#Problem_description)
* [Target audience](#Target_audience)
* [Data analysis](#Data_analysis)
* [Results](#Results)
* [Conclusion](#Conclusion)

### **Background discussion:** <a name="Background_discussion"></a>

Sydney is the state capital of New South Wales and the most populous city in Australia and Oceania. 

The city is made up of 658 suburbs, 40 local government areas and 15 contiguous regions. As of June 2017, Sydney's estimated metropolitan population was 5,230,330 and is home to approximately 65% of the state's population. 

In total, there are 32 Australian university campuses in Sydney, some of these are among the most distinguished universities in Australia and the World. Sydney universities teach 254,000 students combined, including 51,000 from overseas.

### **Problem description:** <a name="Problem_description"></a>

X&Y Capital is a global investment fund which owns fast food restaurants in Asia (mainly China and India). They particularly focus on university areas and quick-service dining. Since the fast-food portion of the broader restaurant industry has grown while full-service restaurants have lost ground in recent years, this has been proven to be a profitable investment for them. 

Considering Sydney is one of the education hubs in Asia-Pacific, X&Y Capital wants to explore business opportunities in Sydney area. They are particularly interested with: 

- What types of restaurants are around universities in Sydney? 
- What type of food they serve?
- Are there any differences between various university locations?
- What would be the ideal location for the first restaurant?

### **Target audience:** <a name="Target_audience"></a>

X&Y Capital senior management that wants to investigate business opportunities in Sydney, Australia. 

### **Data analysis:** <a name="Data_analysis"></a>

In [1]:
pip install folium 

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [3]:
import pandas as pd # data analsysis
import numpy as np
import folium # maps
from geopy.geocoders import Nominatim # geograpical coordinates

Following csv data of Sydney University Locations will be used as the base for analysis:  

In [4]:
df_syd_unv_loc = pd.read_csv('Sydney_University_Locations.csv')
df_syd_unv_loc.head()

Unnamed: 0,Campus_Type,University_Campus,Post_Code,Suburb,Latitude,Longitude
0,Satellite,University of Sydney (Camden),2570,Camden Sydney NSW Australia,-34.035282,150.655934
1,Satellite,Western Sydney University (Campbelltown),2560,Campbelltown Sydney NSW Australia,-34.069407,150.788561
2,Main,University of Sydney (Camperdown),2006,Camperdown Sydney NSW Australia,-33.888584,151.187347
3,Satellite,Curtin University (Chippendale),2007,Chippendale Sydney NSW Australia,-33.885306,151.202449
4,Satellite,University of Notre Dame Australia (Chippendale),2007,Chippendale Sydney NSW Australia,-33.884555,151.197259


In [5]:
df_syd_unv_loc.shape

(32, 6)

In [6]:
# latitude and longitude of Sydney AUS

address = 'Sydney, AUS'

geolocator = Nominatim(user_agent="syd_map")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Sydney AUS are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Sydney AUS are -33.9247891, 151.2277413.


In [7]:
# create map of Sydney using latitude and longitude values
map_sydney = folium.Map(location=[latitude, longitude], zoom_start=10)
map_sydney

In [8]:
# create map of Sydney using latitude and longitude values
map_sydney = folium.Map(location=[latitude, longitude], zoom_start=10)

# add university locations to the map
for lat, lng, label in zip(df_syd_unv_loc['Latitude'], df_syd_unv_loc['Longitude'], df_syd_unv_loc['University_Campus']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sydney)  
    
map_sydney

In [9]:
# list the suburbs universities are located

df_syd_unv_suburbs_list = df_syd_unv_loc[df_syd_unv_loc.columns[3:4]]
df_syd_unv_suburbs_list.head(10)

Unnamed: 0,Suburb
0,Camden Sydney NSW Australia
1,Campbelltown Sydney NSW Australia
2,Camperdown Sydney NSW Australia
3,Chippendale Sydney NSW Australia
4,Chippendale Sydney NSW Australia
5,Darlinghurst Sydney NSW Australia
6,Erskineville Sydney NSW Australia
7,Hawkesbury Richmond NSW Australia
8,Haymarket Sydney NSW Australia
9,Kensington Sydney NSW Australia


In [10]:
# some suburbs are home of more than one campus (particularly the suburbs in and around city center)
# following is the list of suburbs without duplicates

df_syd_unv_suburbs = df_syd_unv_suburbs_list.drop_duplicates()
df_syd_unv_sub = df_syd_unv_suburbs.reset_index(drop=True)
df_syd_unv_sub

Unnamed: 0,Suburb
0,Camden Sydney NSW Australia
1,Campbelltown Sydney NSW Australia
2,Camperdown Sydney NSW Australia
3,Chippendale Sydney NSW Australia
4,Darlinghurst Sydney NSW Australia
5,Erskineville Sydney NSW Australia
6,Hawkesbury Richmond NSW Australia
7,Haymarket Sydney NSW Australia
8,Kensington Sydney NSW Australia
9,Lidcombe NSW Australia


**Foursquare Credentials**

In [1]:
CLIENT_ID = 'XXXXX' # Foursquare ID
CLIENT_SECRET = 'XXXXX' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXXXX
CLIENT_SECRET:XXXXX


In [12]:
# select one of the campuses

df_syd_unv_loc.loc[4, 'University_Campus']

'University of Notre Dame Australia (Chippendale)'

In [13]:
# latitude and longitude values of selected campus

neighborhood_latitude = df_syd_unv_loc.loc[4, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_syd_unv_loc.loc[4, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_syd_unv_loc.loc[4, 'University_Campus'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of University of Notre Dame Australia (Chippendale) are -33.884555, 151.197259.


In [14]:
# Get the top 100 venues that are around University of Notre Dame Australia (Chippendale) campus within a radius of 500 meters
# create the GET request URL

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

In [15]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import json # JSON files manipulation
import requests # HTTP library

In [16]:
results = requests.get(url).json()
# results

In [17]:
# get_category_type function from the Foursquare lab.
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
# list the venues around University of Notre Dame Australia (Chippendale) campus within a radius of 500 meters

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(100)

Unnamed: 0,name,categories,lat,lng
0,The Upside Cafe,Café,-33.884170,151.197900
1,Something for Jess,Café,-33.886055,151.198728
2,Malacca Straits,Malay Restaurant,-33.883715,151.197054
3,Victoria Park Swimming Pool,Pool,-33.885721,151.194119
4,La Mamma del Gelato Anita,Ice Cream Shop,-33.884853,151.200460
5,Staves Brewery,Brewery,-33.884055,151.194281
6,Four Points by Sheraton,Hotel,-33.884644,151.198876
7,White Rabbit Gallery,Art Gallery,-33.886466,151.200146
8,Salt Meats Cheese,Italian Restaurant,-33.883217,151.194867
9,Din Tai Fung (鼎泰豐),Dumpling Restaurant,-33.884466,151.200562


**Repetad the process to explore all venues around university locations in Sydney**

In [19]:
# function to repeat the same process for all venues

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['University_Campus', 
                  'University_Campus Latitude', 
                  'University_Campus Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
# List all of venues (with foresquare API limit = 100) around all university campuses in sydney

df_syd_unv_loc_venues = getNearbyVenues(names=df_syd_unv_loc['University_Campus'],
                                   latitudes=df_syd_unv_loc['Latitude'],
                                   longitudes=df_syd_unv_loc['Longitude']
                                  )

University of Sydney (Camden)
Western Sydney University (Campbelltown)
University of Sydney (Camperdown)
Curtin University (Chippendale)
University of Notre Dame Australia (Chippendale)
University of Tasmania (Darlinghurst)
University of Sydney (Erskineville)
Western Sydney University (Hawkesbury)
Charles Darwin University (Haymarket)
University of New South Wales (Kensington)
University of Sydney (Lidcombe)
University of Technology Sydney (Lindfield)
University of Wollongong (Liverpool)
University of Wollongong (Loftus)
Western Sydney University (Milperra)
Macquarie University (Macquarie Park)
Australian Catholic University (North Sydney)
Western Sydney University (Parramatta)
University of New England (Parramatta)
Western Sydney University (Penrith)
Western Sydney University (Quakers Hill)
University of Sydney (Rozelle)
University of Tasmania (Rozelle)
Australian Catholic University (Strathfield)
University of Sydney (Surry Hills)
CQUniversity (Sydney City)
La Trobe University (Sydne

In [21]:
# total number of venues returned (within a radius of 500 meters) from foresquare for provided campus locations 

print(df_syd_unv_loc_venues.shape)

(1309, 7)


In [22]:
# firts 50 venues out of the total

df_syd_unv_loc_venues.head(50)

Unnamed: 0,University_Campus,University_Campus Latitude,University_Campus Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,University of Sydney (Camden),-34.035282,150.655934,Agritract,-34.036515,150.655614,Construction & Landscaping
1,Western Sydney University (Campbelltown),-34.069407,150.788561,Bobbies Cafe,-34.068606,150.790588,Café
2,Western Sydney University (Campbelltown),-34.069407,150.788561,Bar Cafe,-34.069533,150.791814,Café
3,Western Sydney University (Campbelltown),-34.069407,150.788561,Joes Chinese Take Away,-34.067525,150.792038,Asian Restaurant
4,Western Sydney University (Campbelltown),-34.069407,150.788561,Thai Centrist,-34.066277,150.790436,Asian Restaurant
5,University of Sydney (Camperdown),-33.888584,151.187347,Ralph's Cafe,-33.887722,151.18623,Café
6,University of Sydney (Camperdown),-33.888584,151.187347,Manning Bar,-33.886821,151.187747,Music Venue
7,University of Sydney (Camperdown),-33.888584,151.187347,Nicholson Museum,-33.886054,151.188807,History Museum
8,University of Sydney (Camperdown),-33.888584,151.187347,Rubyos Restaurant,-33.892573,151.18716,Australian Restaurant
9,University of Sydney (Camperdown),-33.888584,151.187347,Handcraft Specialty Coffee,-33.892643,151.185726,Café


In [23]:
# number of venues returned for each University_Campus

df_syd_unv_loc_venues.groupby('University_Campus').count()


Unnamed: 0_level_0,University_Campus Latitude,University_Campus Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
University_Campus,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Australian Catholic University (North Sydney),37,37,37,37,37,37
Australian Catholic University (Strathfield),3,3,3,3,3,3
CQUniversity (Sydney City),100,100,100,100,100,100
Charles Darwin University (Haymarket),51,51,51,51,51,51
Curtin University (Chippendale),52,52,52,52,52,52
La Trobe University (Sydney City),100,100,100,100,100,100
Macquarie University (Macquarie Park),8,8,8,8,8,8
Torrens University Australia (The Rocks),100,100,100,100,100,100
Torrens University Australia (Ultimo),49,49,49,49,49,49
University of New England (Parramatta),41,41,41,41,41,41


In [24]:
print('There are {} unique venue categories.'.format(len(df_syd_unv_loc_venues['Venue Category'].unique())))

There are 168 unique venue categories.


In [25]:
# List of unique venues returned

df_syd_unv_loc_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,University_Campus,University_Campus Latitude,University_Campus Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aquarium,2,2,2,2,2,2
Arcade,1,1,1,1,1,1
Argentinian Restaurant,1,1,1,1,1,1
Art Gallery,7,7,7,7,7,7
Art Museum,1,1,1,1,1,1
Asian Restaurant,9,9,9,9,9,9
Athletics & Sports,1,1,1,1,1,1
Australian Restaurant,27,27,27,27,27,27
Austrian Restaurant,1,1,1,1,1,1
BBQ Joint,4,4,4,4,4,4


**Create "University Campus x Vanue Category" matrix** 

In [26]:
df_syd_unv_loc_venues_matrix_all = pd.get_dummies(df_syd_unv_loc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_syd_unv_loc_venues_matrix_all['University_Campus'] = df_syd_unv_loc_venues['University_Campus'] 

# move neighborhood column to the first column
fixed_columns = [df_syd_unv_loc_venues_matrix_all.columns[-1]] + list(df_syd_unv_loc_venues_matrix_all.columns[:-1])
df_syd_unv_loc_venues_matrix_all = df_syd_unv_loc_venues_matrix_all[fixed_columns]

df_syd_unv_loc_venues_matrix_all.head()

Unnamed: 0,University_Campus,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,...,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,University of Sydney (Camden),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Western Sydney University (Campbelltown),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Western Sydney University (Campbelltown),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Western Sydney University (Campbelltown),0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Western Sydney University (Campbelltown),0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
# dataframe size

df_syd_unv_loc_venues_matrix_all.shape


(1309, 169)

**Analyze Each University Campus**

In [28]:
# group rows by University_Campus and by taking the mean of the frequency of occurrence of each category

df_syd_unv_loc_venues_matrix = df_syd_unv_loc_venues_matrix_all.groupby('University_Campus').mean().reset_index()
df_syd_unv_loc_venues_matrix

Unnamed: 0,University_Campus,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,...,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,Australian Catholic University (North Sydney),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.027027,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0
1,Australian Catholic University (Strathfield),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,CQUniversity (Sydney City),0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0
3,Charles Darwin University (Haymarket),0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0
4,Curtin University (Chippendale),0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0
5,La Trobe University (Sydney City),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
6,Macquarie University (Macquarie Park),0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Torrens University Australia (The Rocks),0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.07,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
8,Torrens University Australia (Ultimo),0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,University of New England (Parramatta),0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,...,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0


In [29]:
# the top 10 most common venues for each university campus 

num_top_venues = 10

for hood in df_syd_unv_loc_venues_matrix['University_Campus']:
    print("----"+hood+"----")
    temp = df_syd_unv_loc_venues_matrix[df_syd_unv_loc_venues_matrix['University_Campus'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Australian Catholic University (North Sydney)----
                   venue  freq
0                   Café  0.30
1                    Bar  0.11
2    Japanese Restaurant  0.08
3              Juice Bar  0.05
4  Vietnamese Restaurant  0.05
5                    Pub  0.05
6                 Bakery  0.05
7            Coffee Shop  0.05
8             Restaurant  0.03
9                   Park  0.03


----Australian Catholic University (Strathfield)----
                           venue  freq
0                           Park  0.67
1                      Bookstore  0.33
2  Paper / Office Supplies Store  0.00
3                Motorcycle Shop  0.00
4                  Movie Theater  0.00
5                      Multiplex  0.00
6                         Museum  0.00
7                    Music Venue  0.00
8                      Nightclub  0.00
9                   Noodle House  0.00


----CQUniversity (Sydney City)----
              venue  freq
0       Coffee Shop  0.10
1              Café  0.09
2     

**Load data into pandas dataframe and list the top 10 venues for each university campus**

In [30]:
# function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
# list the top 10 venues for each university campus

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues

columns = ['University_Campus']

for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
university_campus_venues_sorted = pd.DataFrame(columns=columns)
university_campus_venues_sorted['University_Campus'] = df_syd_unv_loc_venues_matrix['University_Campus']

for ind in np.arange(df_syd_unv_loc_venues_matrix.shape[0]):
    university_campus_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_syd_unv_loc_venues_matrix.iloc[ind, :], num_top_venues)

university_campus_venues_sorted.head(35)

Unnamed: 0,University_Campus,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Australian Catholic University (North Sydney),Café,Bar,Japanese Restaurant,Coffee Shop,Vietnamese Restaurant,Pub,Juice Bar,Bakery,Sandwich Place,Restaurant
1,Australian Catholic University (Strathfield),Park,Bookstore,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
2,CQUniversity (Sydney City),Coffee Shop,Café,Hotel,Cocktail Bar,Bar,Thai Restaurant,Shopping Mall,Bookstore,Ramen Restaurant,Japanese Restaurant
3,Charles Darwin University (Haymarket),Thai Restaurant,Café,Hostel,Ice Cream Shop,Coffee Shop,Dessert Shop,Bakery,Hotel,Hotpot Restaurant,Indonesian Restaurant
4,Curtin University (Chippendale),Café,Thai Restaurant,Bar,Bakery,Hotel,Australian Restaurant,Coffee Shop,Hostel,Wine Bar,Ice Cream Shop
5,La Trobe University (Sydney City),Café,Thai Restaurant,Hotel,Japanese Restaurant,Coffee Shop,Korean Restaurant,Chinese Restaurant,Breakfast Spot,Cocktail Bar,Greek Restaurant
6,Macquarie University (Macquarie Park),Café,Hotel,Art Gallery,College Cafeteria,Gym,Trail,Juice Bar,Event Space,Food & Drink Shop,Flower Shop
7,Torrens University Australia (The Rocks),Café,Hotel,Australian Restaurant,Pub,Italian Restaurant,Cocktail Bar,Park,Brewery,Museum,Hotel Bar
8,Torrens University Australia (Ultimo),Café,Coffee Shop,Bar,Hotel,Dumpling Restaurant,Gym,Food Court,Shopping Mall,Lounge,Malay Restaurant
9,University of New England (Parramatta),Thai Restaurant,Coffee Shop,Australian Restaurant,Café,Lebanese Restaurant,Dessert Shop,Gym,Ice Cream Shop,Miscellaneous Shop,Sandwich Place


**k-means clustering**

In [32]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.cluster import KMeans # clustering algorithm
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [33]:
# set number of clusters
kclusters = 4

df_syd_unv_loc_venues_matrix_clustering = df_syd_unv_loc_venues_matrix.drop('University_Campus', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_syd_unv_loc_venues_matrix_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]


array([3, 1, 3, 3, 3, 3, 3, 3, 3, 3])

In [34]:

# add clustering labels
university_campus_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_syd_unv_loc_merged = df_syd_unv_loc

# merge north_york_grouped with north_york_toronto data to add latitude/longitude for each neighborhood
df_syd_unv_loc_merged = df_syd_unv_loc_merged.join(university_campus_venues_sorted.set_index('University_Campus'), on='University_Campus')

df_syd_unv_loc_merged.head(35) # check the last columns!


Unnamed: 0,Campus_Type,University_Campus,Post_Code,Suburb,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Satellite,University of Sydney (Camden),2570,Camden Sydney NSW Australia,-34.035282,150.655934,2,Construction & Landscaping,Electronics Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Event Space
1,Satellite,Western Sydney University (Campbelltown),2560,Campbelltown Sydney NSW Australia,-34.069407,150.788561,0,Asian Restaurant,Café,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant
2,Main,University of Sydney (Camperdown),2006,Camperdown Sydney NSW Australia,-33.888584,151.187347,3,Café,Coffee Shop,Music Venue,Beer Garden,Beer Bar,Bus Line,Thai Restaurant,College Rec Center,Sports Bar,Middle Eastern Restaurant
3,Satellite,Curtin University (Chippendale),2007,Chippendale Sydney NSW Australia,-33.885306,151.202449,3,Café,Thai Restaurant,Bar,Bakery,Hotel,Australian Restaurant,Coffee Shop,Hostel,Wine Bar,Ice Cream Shop
4,Satellite,University of Notre Dame Australia (Chippendale),2007,Chippendale Sydney NSW Australia,-33.884555,151.197259,3,Café,Bar,Thai Restaurant,Coffee Shop,Burger Joint,Dumpling Restaurant,Pub,Chinese Restaurant,Supermarket,Italian Restaurant
5,Satellite,University of Tasmania (Darlinghurst),2010,Darlinghurst Sydney NSW Australia,-33.880336,151.2221,3,Café,Italian Restaurant,Thai Restaurant,Pub,Bar,Pizza Place,Burger Joint,Boutique,Bookstore,Indian Restaurant
6,Satellite,University of Sydney (Erskineville),2043,Erskineville Sydney NSW Australia,-33.895939,151.184601,3,Thai Restaurant,Café,Bar,Pub,Pizza Place,Japanese Restaurant,Bookstore,Coffee Shop,Italian Restaurant,Boutique
7,Satellite,Western Sydney University (Hawkesbury),2753,Hawkesbury Richmond NSW Australia,-33.614059,150.75023,0,Football Stadium,Café,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
8,Satellite,Charles Darwin University (Haymarket),2000,Haymarket Sydney NSW Australia,-33.882972,151.203485,3,Thai Restaurant,Café,Hostel,Ice Cream Shop,Coffee Shop,Dessert Shop,Bakery,Hotel,Hotpot Restaurant,Indonesian Restaurant
9,Main,University of New South Wales (Kensington),2033,Kensington Sydney NSW Australia,-33.917015,151.225182,3,Fast Food Restaurant,Café,Italian Restaurant,Indonesian Restaurant,Sandwich Place,Pub,Bar,Japanese Restaurant,General College & University,Grocery Store


### **Results:** <a name="Results"></a>

**Visualize the resulting clusters**

In [35]:
# Matplotlib and associated plotting modules

import matplotlib.cm as cm
import matplotlib.colors as colors

In [36]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_syd_unv_loc_merged['Latitude'], df_syd_unv_loc_merged['Longitude'], df_syd_unv_loc_merged['University_Campus'], df_syd_unv_loc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-3],
        fill=True,
        fill_color=rainbow[cluster-3],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters


**Examine Clusters**

Following are universities and 10 most common venues under each cluster (based on k = 4)


#### Cluster 1

In [37]:
df_syd_unv_loc_merged.loc[df_syd_unv_loc_merged['Cluster Labels'] == 0, df_syd_unv_loc_merged.columns[[1] + list(range(5, df_syd_unv_loc_merged.shape[1]))]]

Unnamed: 0,University_Campus,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Western Sydney University (Campbelltown),150.788561,0,Asian Restaurant,Café,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant
7,Western Sydney University (Hawkesbury),150.75023,0,Football Stadium,Café,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
20,Western Sydney University (Quakers Hill),150.876917,0,Café,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


#### Cluster 2

In [38]:
df_syd_unv_loc_merged.loc[df_syd_unv_loc_merged['Cluster Labels'] == 1, df_syd_unv_loc_merged.columns[[1] + list(range(5, df_syd_unv_loc_merged.shape[1]))]]

Unnamed: 0,University_Campus,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,University of Sydney (Lidcombe),151.048948,1,Park,Construction & Landscaping,Dumpling Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Event Space
23,Australian Catholic University (Strathfield),151.076798,1,Park,Bookstore,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


#### Cluster 3

In [39]:
df_syd_unv_loc_merged.loc[df_syd_unv_loc_merged['Cluster Labels'] == 2, df_syd_unv_loc_merged.columns[[1] + list(range(5, df_syd_unv_loc_merged.shape[1]))]]

Unnamed: 0,University_Campus,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,University of Sydney (Camden),150.655934,2,Construction & Landscaping,Electronics Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Event Space


#### Cluster 4

In [40]:
df_syd_unv_loc_merged.loc[df_syd_unv_loc_merged['Cluster Labels'] == 3, df_syd_unv_loc_merged.columns[[1] + list(range(5, df_syd_unv_loc_merged.shape[1]))]]

Unnamed: 0,University_Campus,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,University of Sydney (Camperdown),151.187347,3,Café,Coffee Shop,Music Venue,Beer Garden,Beer Bar,Bus Line,Thai Restaurant,College Rec Center,Sports Bar,Middle Eastern Restaurant
3,Curtin University (Chippendale),151.202449,3,Café,Thai Restaurant,Bar,Bakery,Hotel,Australian Restaurant,Coffee Shop,Hostel,Wine Bar,Ice Cream Shop
4,University of Notre Dame Australia (Chippendale),151.197259,3,Café,Bar,Thai Restaurant,Coffee Shop,Burger Joint,Dumpling Restaurant,Pub,Chinese Restaurant,Supermarket,Italian Restaurant
5,University of Tasmania (Darlinghurst),151.2221,3,Café,Italian Restaurant,Thai Restaurant,Pub,Bar,Pizza Place,Burger Joint,Boutique,Bookstore,Indian Restaurant
6,University of Sydney (Erskineville),151.184601,3,Thai Restaurant,Café,Bar,Pub,Pizza Place,Japanese Restaurant,Bookstore,Coffee Shop,Italian Restaurant,Boutique
8,Charles Darwin University (Haymarket),151.203485,3,Thai Restaurant,Café,Hostel,Ice Cream Shop,Coffee Shop,Dessert Shop,Bakery,Hotel,Hotpot Restaurant,Indonesian Restaurant
9,University of New South Wales (Kensington),151.225182,3,Fast Food Restaurant,Café,Italian Restaurant,Indonesian Restaurant,Sandwich Place,Pub,Bar,Japanese Restaurant,General College & University,Grocery Store
11,University of Technology Sydney (Lindfield),151.161041,3,Bus Line,Park,Soccer Field,Yoga Studio,Electronics Store,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
12,University of Wollongong (Liverpool),150.925338,3,Coffee Shop,Fast Food Restaurant,Café,Shopping Mall,Supermarket,Burrito Place,Lebanese Restaurant,Big Box Store,Electronics Store,Sandwich Place
13,University of Wollongong (Loftus),151.052926,3,Athletics & Sports,Train Station,Gym,Basketball Court,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market,Fish & Chips Shop


### **Conclusion:** <a name="Conclusion"></a>  

Combining the Sydney university locations data with foursquare venue categories data (based on TOP 100 venues around each university campus within a radius of 500 meters) and applying k-means clustering (for k = 4 *) showed that:

- 26 out of 32 campus locations in and around Sydney have similar characteristics based on foursquare venue categories. Here “coffee shops” are, by far, the most common venue category. This is followed by restaurants (mainly Thai and some Italian, Japanese), fast food/sandwich stores, pubs/bars and gyms. These campuses are mainly within CBD area and some congested suburbs out of CBD, such as Chatswood and North Ryde in the north, Sutherland in the south, Liverpool in the south west, and Penrith in the west. 
- 2 inner city campuses (University of Sydney in Lidcombe and Australian Catholic University in Strathfield) are within the same category. The first two most common venues around both locations are not food related, which is interesting. These suburbs are within few kilometres next to each other and due to new construction projects (residential units, railway and highway upgrades) population density in Strathfield in particular will increase in coming years (this may present a good opportunity for fast-food restaurant business). 
-  3 campuses of Western Sydney University in Campbelltown, Hawkesbury, Quakers Hill and 1 campus of Sydney University in Camden are located in outskirts of the city. Particularly Camden is considered a remote area (rural NSW). All of these locations, at present have low population density. For short to mid-term these locations may not present good investment opportunities. 

*the k-means clustering methodology has been repeated separately for k = 3, 4, 5, 6, 7, 8, 9 and 10. After reviewing the results, it has been decided that k = 4 provides the optimum clusters. Even with higher k values, majority of locations are grouped under the same cluster.  
