# Final Coursera Project - Melbourne Restaurant Analysis

### The following notebook contains my final project and report.

#### Introduction/Business Problem

I am thinking of opeining an Italian restaurant in Melbourne but am unsure where to open it. It would be best to be outside of suburbs with a lot of existing Italian restaurants but also not too far away that people would not think to look for an Italian restaurant in that location. My business problem is to find a suburb adjacent to suburbs with a strong Italian restaurant presence.

#### Data Requirements

This report will analyse the ditribution of restaurant types in Melbourne suburbs using foursquare data (venues/explore). I will find a list of Melbourne suburbs online and scrape the data. I will then use geopy.geocoders to find the latitude and longitude. I will find the number of Italian restaurants by suburb, and also the most common restaurant type by suburb and visualize the results using folium maps.

I have not completed the week 5 project yet so please disregard the code below for now. :-)

Firstly, import the required libraries

In [209]:
import pandas as pd
import numpy as np

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Reading in Melbourne surbub data and finding lat long with geocode.

In [210]:
#Defining Url path
url = "https://en.wikipedia.org/wiki/List_of_Melbourne_suburbs"

In [211]:
#Use pandas to read html
url_df = pd.read_html(url, header=0)

In [212]:
#Pull the table of suburbs from the html
df = url_df[0]
df.head()

Unnamed: 0,Suburb,Postcode,Local government area,Location[citation needed],Distance[3][citation needed],Area[citation needed],Population[citation needed],Population density[citation needed],Date established[citation needed]
0,Bellfield,3081,City of Banyule,,,0.9 km2,"1,793[4]",,
1,Briar Hill,3088,City of Banyule,,,,"3,152[4]",,
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...,,,15 km2,28653,,
3,Eaglemont,3084,City of Banyule,,,1.9 km2,3873,,
4,Eltham,3095,City of Banyule; Shire of Nillumbik,,,,,,


In [140]:
#Format the address for geocode search
df['Address'] = df['Suburb'] + ', VIC ' + df['Postcode'].apply(str) + ", Australia"

In [141]:
#Creat a geolocator
geolocator = Nominatim(user_agent="specify_your_app_name_here")

In [142]:
#Create function to pull lat and long from location
def geoConvLat(address):
    a = geolocator.geocode(address, timeout=15)
    if a != None:
        return a.latitude
    else: return 0
def geoConvLong(address):
    a = geolocator.geocode(address, timeout=15)
    if a != None:
        return a.longitude
    else: return 0

In [61]:
#Find lat
#lats = []
#for i in df['Address']:
#    lats.append(geoConvLat(i))

In [62]:
#Find long
#longs = []
#for i in df['Address']:
#    longs.append(geoConvLong(i))

In [63]:
#Create lat and long columns in the df
#lats = pd.Series(lats)
#longs = pd.Series(longs)
#df['Latitude'] = lats
#df['Longitude'] = longs

In [64]:
#save the df (lat and long took a long time)
#df.to_csv('data/suburbs_and_coords.csv')

# Load Suburbs with Coordinates

In [213]:
#Read in suburbs and coords. Checkpoint
df = pd.read_csv('data/suburbs_and_coords.csv')
df.shape

(549, 13)

In [214]:
#drop irrelevant fields
drop_columns = ['Population[citation needed]', 'Area[citation needed]', 'Population density[citation needed]', 
                'Unnamed: 0', 'Location[citation needed]', 'Distance[3][citation needed]', 'Date established[citation needed]',
               'Address']
df = df.drop(drop_columns, axis=1)

In [215]:
#check the df
df.head()

Unnamed: 0,Suburb,Postcode,Local government area,Latitude,Longitude
0,Bellfield,3081,City of Banyule,-37.75,145.04
1,Briar Hill,3088,City of Banyule,-37.71,145.12
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...,-37.7,145.07
3,Eaglemont,3084,City of Banyule,-37.77,145.06
4,Eltham,3095,City of Banyule; Shire of Nillumbik,-37.71,145.15


In [216]:
#Plot them on a folium map
import folium

# create map using latitude and longitude values
map_melbourne = folium.Map(location=(-37.9136, 144.9631), zoom_start=9)

# add markers to map
for lat, lng, suburb, lga in zip(df['Latitude'], df['Longitude'], df['Suburb'], df['Local government area']):
    label = '{}, {}'.format(suburb, lga)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='red',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.5,
        parse_html=False).add_to(map_melbourne)  
    
map_melbourne

## Foursquare Section

In [12]:
#Define my details for a Foursquare query

CLIENT_ID = 'LH2UYK41GOP20PJ0EMGOGJJZTNDBZIRGQIRZCOUU1JLRGEML' # your Foursquare ID
CLIENT_SECRET = 'AIP5GKY0EUPAIZNTNEOD3XKQ34YBHQ151HI4XBLURWPVXKFM' # your Foursquare Secret
VERSION = '20190701' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LH2UYK41GOP20PJ0EMGOGJJZTNDBZIRGQIRZCOUU1JLRGEML
CLIENT_SECRET:AIP5GKY0EUPAIZNTNEOD3XKQ34YBHQ151HI4XBLURWPVXKFM


In [13]:
#Define the details of the query itself

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
lat, long = (-37.9136, 144.9631)
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    long, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=LH2UYK41GOP20PJ0EMGOGJJZTNDBZIRGQIRZCOUU1JLRGEML&client_secret=AIP5GKY0EUPAIZNTNEOD3XKQ34YBHQ151HI4XBLURWPVXKFM&v=20190701&ll=-37.9136,144.9631&radius=1000&limit=100'

In [14]:
#Using function from Coursera labs to get nearby venues for each suburbs

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
#Getting nearby venues for each suburb

melbourne_venues = getNearbyVenues(names=df['Suburb'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [16]:
melbourne_venues.groupby('Suburb')['Suburb'].count()
print("Done")

Done


In [17]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 215 uniques categories.


In [18]:
# one hot encoding
onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Suburb'] = melbourne_venues['Suburb'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

Unnamed: 0,Suburb,Adult Boutique,Afghan Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yunnan Restaurant
0,Bellfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bellfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bellfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bellfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bellfield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
onehot.shape

(8037, 216)

In [20]:
grouped = onehot.groupby('Suburb').sum().reset_index()
grouped.head()

Unnamed: 0,Suburb,Adult Boutique,Afghan Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yunnan Restaurant
0,Abbotsford,0,0,0,0,0,0,0,1,0,...,0,1,1,0,1,0,0,0,0,1
1,Aberfeldie,0,1,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,Aintree,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Airport West,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Albanvale,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,1,0,0


In [21]:
grouped.shape

(307, 216)

In [22]:
grouped.head()

Unnamed: 0,Suburb,Adult Boutique,Afghan Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yunnan Restaurant
0,Abbotsford,0,0,0,0,0,0,0,1,0,...,0,1,1,0,1,0,0,0,0,1
1,Aberfeldie,0,1,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,Aintree,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Airport West,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Albanvale,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,1,0,0


In [39]:
merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
merged = merged.join(grouped.set_index('Suburb'), on='Suburb')

merged.tail()

Unnamed: 0,Suburb,Postcode,Local government area,Latitude,Longitude,Adult Boutique,Afghan Restaurant,Antique Shop,Aquarium,Arcade,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yunnan Restaurant
544,Woori Yallock,3139,Shire of Yarra Ranges,-37.779333,145.530126,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
545,Yarra Glen,3775,Shire of Yarra Ranges,-37.657348,145.374396,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
546,Yarra Junction,3797,Shire of Yarra Ranges,-37.782169,145.615026,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
547,Yellingbo,3139,Shire of Yarra Ranges,-37.813513,145.508205,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0
548,Yering,3770,Shire of Yarra Ranges,-37.688713,145.374657,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [40]:
merged = merged.fillna(0)

In [41]:
merged.to_csv('data/mergedData.csv', index=None)

# Load Suburbs Merged with Venues

In [217]:
merged= pd.read_csv('data/mergedData.csv')
keepSuburbs = pd.read_csv('data/keepSuburbs.csv')
keepVenue = pd.read_csv('data/venues.csv')

In [218]:
keepList = []
for i in keepSuburbs['Suburb']:
    keepList.append(i)

keepList[:2]

['Beaumaris', 'Black Rock']

In [219]:
merged.head()

Unnamed: 0,Suburb,Postcode,Local government area,Latitude,Longitude,Adult Boutique,Afghan Restaurant,Antique Shop,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Beach,Beer Garden,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Camera Store,Candy Store,Cantonese Restaurant,Casino,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Quad,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kitchen Supply Store,Korean Restaurant,Lake,Lebanese Restaurant,Library,Light Rail Station,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Music Store,Music Venue,Nail Salon,Night Market,Nightclub,Office,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Sake Bar,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Track,Train Station,Tram Station,Travel & Transport,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yunnan Restaurant
0,Bellfield,3081,City of Banyule,-37.75,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Briar Hill,3088,City of Banyule,-37.71,145.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...,-37.7,145.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Eaglemont,3084,City of Banyule,-37.77,145.06,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,4.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
4,Eltham,3095,City of Banyule; Shire of Nillumbik,-37.71,145.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [220]:
keepVenues = ['Suburb', 'Postcode', 'Local government area', 'Latitude', 'Longitude']
for i in keepVenue['Venue_Type']:
    keepVenues.append(i)

keepVenues[:2]

['Suburb', 'Postcode']

In [221]:
merged = merged[keepVenues]
merged = merged[merged['Suburb'].isin(keepList)]
merged = merged[merged['Suburb'].isin(keepList)]

In [222]:
merged['Italian Restaurant'].unique()

array([ 0.,  1.,  3.,  2., 14.,  5.,  4., 10.])

In [223]:
merged.shape

(78, 49)

In [224]:
house_prices = pd.read_excel('data/melbourneSalesPrices.xlsx', index=None)

In [225]:
house_prices.shape

(48427, 2)

In [226]:
houses_grouped = house_prices.groupby('Suburb').mean().reset_index()
houses_grouped.sort_values(by='Price')[:5]

Unnamed: 0,Suburb,Price
108,Darley,380000.0
204,Kurunjang,381785.71
224,Melton South,396267.5
90,Cockatoo,397000.0
240,Mount Dandenong,405000.0


In [227]:
joined_df = merged.join(houses_grouped.set_index('Suburb'), on='Suburb', how='inner')
joined_df.head()

Unnamed: 0,Suburb,Postcode,Local government area,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
21,Beaumaris,3193,City of Bayside,-37.98,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1514326.53
22,Black Rock,3193,City of Bayside,-37.97,145.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1541980.44
23,Brighton,3186,City of Bayside,-37.91,145.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2027741.78
24,Brighton East,3187,City of Bayside,-37.92,145.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1686790.44
25,Cheltenham,3192,City of Bayside; City of Kingston,-37.96,145.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,969127.39


In [228]:
# set number of clusters
kclusters = 5

merged_clustering = joined_df.drop(['Suburb', 'Postcode', 'Local government area', 'Latitude', 'Longitude'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=42, n_init=100).fit(merged_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 3, 0, 3, 1, 1, 1, 3, 1, 1, 1, 3, 3, 0, 3, 3, 0, 0, 3, 2, 2, 3,
       3, 2, 3, 2, 1, 4, 1, 1, 2, 4, 2, 4, 2, 1, 1, 2, 2, 4, 2, 4, 4, 4,
       2, 2, 4, 1, 1, 0, 4, 1, 0, 4, 4, 2, 3, 3, 0, 3, 1, 3, 1, 1, 1, 2,
       4, 1, 2, 2, 3, 1])

In [229]:
joined_df.insert(column ='Cluster', value = kmeans.labels_, loc = 1)
joined_df.head()

Unnamed: 0,Suburb,Cluster,Postcode,Local government area,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
21,Beaumaris,3,3193,City of Bayside,-37.98,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1514326.53
22,Black Rock,3,3193,City of Bayside,-37.97,145.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1541980.44
23,Brighton,0,3186,City of Bayside,-37.91,145.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2027741.78
24,Brighton East,3,3187,City of Bayside,-37.92,145.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1686790.44
25,Cheltenham,1,3192,City of Bayside; City of Kingston,-37.96,145.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,969127.39


In [230]:
# create map
map_clusters = folium.Map(location=(-37.9136, 144.9631), zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
import math
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, n, p in zip(joined_df['Latitude'], joined_df['Longitude'], joined_df['Suburb'], 
                                     joined_df['Cluster'], joined_df['Italian Restaurant'], joined_df['Price']):
    label = folium.Popup(str(poi) + ", Cluster: " + str(cluster) + ", Price: " + str(p//1) + 
                         ", Italian Restuarants: " + str(int(n)), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster - 1],
        fill=True,
        fill_color=rainbow[cluster - 1],
        fill_opacity=1
    ).add_to(map_clusters)
    
       
map_clusters

In [231]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [232]:
joined_df.loc[joined_df['Cluster'] == 0].describe()

Unnamed: 0,Cluster,Postcode,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
count,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0
mean,0.0,3153.43,-37.84,145.03,0.0,0.0,0.14,0.86,0.29,0.57,0.0,0.71,0.0,0.0,0.29,0.0,0.0,0.0,0.29,0.57,0.0,0.86,0.57,1.57,0.14,0.29,0.57,0.14,0.14,0.29,0.29,0.57,0.14,0.43,0.14,0.29,0.29,0.14,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2104943.87
std,0.0,45.67,0.03,0.05,0.0,0.0,0.38,1.07,0.76,0.98,0.0,1.25,0.0,0.0,0.49,0.0,0.0,0.0,0.76,0.79,0.0,1.21,0.98,2.82,0.38,0.49,0.98,0.38,0.38,0.49,0.49,1.13,0.38,1.13,0.38,0.49,0.76,0.38,0.38,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,166557.72
min,0.0,3103.0,-37.91,144.96,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1891813.71
25%,0.0,3114.5,-37.85,144.98,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1993759.56
50%,0.0,3144.0,-37.85,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2055545.22
75%,0.0,3196.0,-37.82,145.07,0.0,0.0,0.0,2.0,0.0,1.0,0.0,1.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,1.0,0.0,1.5,1.0,2.0,0.0,0.5,1.0,0.0,0.0,0.5,0.5,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2219961.75
max,0.0,3206.0,-37.81,145.08,0.0,0.0,1.0,2.0,2.0,2.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,0.0,3.0,2.0,7.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,3.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2359805.56


In [233]:
joined_df.loc[joined_df['Cluster'] == 1].describe()

Unnamed: 0,Cluster,Postcode,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,1.0,3154.35,-37.88,145.02,0.1,0.0,0.1,0.4,0.1,0.1,0.0,0.25,0.0,0.1,0.4,0.05,0.0,0.0,0.05,0.15,0.1,0.8,0.15,0.5,0.05,0.05,0.15,0.4,0.0,0.15,0.05,0.1,0.1,0.2,0.05,0.15,0.05,0.0,0.0,0.1,0.05,0.05,0.05,0.05,0.0,0.05,0.0,0.0,1067327.42
std,0.0,49.22,0.05,0.04,0.31,0.0,0.31,0.99,0.31,0.45,0.0,0.72,0.0,0.31,0.94,0.22,0.0,0.0,0.22,0.49,0.31,2.44,0.49,1.47,0.22,0.22,0.49,1.05,0.0,0.37,0.22,0.45,0.31,0.89,0.22,0.49,0.22,0.0,0.0,0.31,0.22,0.22,0.22,0.22,0.0,0.22,0.0,0.0,56671.91
min,1.0,3003.0,-37.96,144.92,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,958154.49
25%,1.0,3136.0,-37.91,145.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1039088.03
50%,1.0,3173.0,-37.88,145.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1074450.17
75%,1.0,3185.75,-37.84,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1095591.03
max,1.0,3204.0,-37.8,145.07,1.0,0.0,1.0,3.0,1.0,2.0,0.0,3.0,0.0,1.0,4.0,1.0,0.0,0.0,1.0,2.0,1.0,10.0,2.0,6.0,1.0,1.0,2.0,4.0,0.0,1.0,1.0,2.0,1.0,4.0,1.0,2.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1191875.62


In [234]:
joined_df.loc[joined_df['Cluster'] == 2].describe()

Unnamed: 0,Cluster,Postcode,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
count,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0,16.0
mean,2.0,3118.81,-37.83,145.0,0.06,0.0,0.06,0.5,0.06,0.06,0.0,0.5,0.0,0.12,0.69,0.0,0.0,0.0,0.19,0.56,0.12,1.12,0.12,0.31,0.12,0.06,0.12,0.25,0.12,0.38,0.06,0.06,0.44,0.38,0.06,0.12,0.12,0.06,0.06,0.19,0.12,0.06,0.12,0.0,0.0,0.0,0.0,0.0,1332251.53
std,0.0,68.68,0.05,0.05,0.25,0.0,0.25,0.89,0.25,0.25,0.0,0.82,0.0,0.34,0.79,0.0,0.0,0.0,0.54,0.63,0.34,1.82,0.5,0.6,0.34,0.25,0.5,0.58,0.34,0.5,0.25,0.25,0.51,0.72,0.25,0.34,0.5,0.25,0.25,0.4,0.34,0.25,0.34,0.0,0.0,0.0,0.0,0.0,93958.61
min,2.0,3002.0,-37.92,144.92,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1210067.18
25%,2.0,3062.25,-37.85,144.97,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1254947.59
50%,2.0,3122.5,-37.82,144.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1304597.06
75%,2.0,3189.75,-37.8,145.03,0.0,0.0,0.0,0.5,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1419179.24
max,2.0,3207.0,-37.78,145.11,1.0,0.0,1.0,2.0,1.0,1.0,0.0,3.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,2.0,1.0,5.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1476801.43


In [235]:
joined_df.loc[joined_df['Cluster'] == 3].describe()

Unnamed: 0,Cluster,Postcode,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
count,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0
mean,3.0,3143.0,-37.87,145.04,0.0,0.0,0.06,0.12,0.0,0.06,0.0,0.0,0.0,0.06,0.47,0.0,0.0,0.0,0.0,0.18,0.12,0.82,0.0,0.12,0.0,0.0,0.0,0.18,0.0,0.06,0.0,0.0,0.18,0.18,0.0,0.0,0.0,0.0,0.0,0.06,0.06,0.06,0.06,0.06,0.0,0.06,0.0,0.0,1647958.21
std,0.0,39.37,0.06,0.03,0.0,0.0,0.24,0.49,0.0,0.24,0.0,0.0,0.0,0.24,1.01,0.0,0.0,0.0,0.0,0.53,0.33,3.4,0.0,0.49,0.0,0.0,0.0,0.53,0.0,0.24,0.0,0.0,0.53,0.39,0.0,0.0,0.0,0.0,0.0,0.24,0.24,0.24,0.24,0.24,0.0,0.24,0.0,0.0,95636.71
min,3.0,3054.0,-37.98,144.97,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1514326.53
25%,3.0,3124.0,-37.92,145.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1577664.26
50%,3.0,3144.0,-37.86,145.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1642173.08
75%,3.0,3187.0,-37.83,145.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1722933.95
max,3.0,3193.0,-37.78,145.1,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,1.0,4.0,0.0,0.0,0.0,0.0,2.0,1.0,14.0,0.0,2.0,0.0,0.0,0.0,2.0,0.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1810614.89


In [236]:
joined_df.loc[joined_df['Cluster'] == 4].describe()

Unnamed: 0,Cluster,Postcode,Latitude,Longitude,Middle Eastern Restaurant,Mediterranean Restaurant,Indian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Lebanese Restaurant,Mexican Restaurant,Falafel Restaurant,Vietnamese Restaurant,Thai Restaurant,Turkish Restaurant,Afghan Restaurant,Eastern European Restaurant,Dumpling Restaurant,Sushi Restaurant,Kebab Restaurant,Italian Restaurant,Indonesian Restaurant,Korean Restaurant,Ramen Restaurant,Dim Sum Restaurant,Chinese Restaurant,Malay Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Greek Restaurant,Asian Restaurant,Australian Restaurant,Molecular Gastronomy Restaurant,Seafood Restaurant,French Restaurant,Argentinian Restaurant,Tapas Restaurant,Modern European Restaurant,Yunnan Restaurant,Persian Restaurant,Egyptian Restaurant,Theme Restaurant,Gluten-free Restaurant,Cantonese Restaurant,South American Restaurant,Shanghai Restaurant,Price
count,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0
mean,4.0,3102.58,-37.84,144.98,0.08,0.0,0.25,0.5,0.08,0.0,0.0,0.17,0.0,0.17,0.42,0.0,0.0,0.0,0.08,0.42,0.08,1.08,0.0,0.17,0.08,0.08,0.17,0.17,0.08,0.17,0.0,0.0,0.25,0.42,0.08,0.17,0.17,0.0,0.0,0.0,0.17,0.08,0.17,0.0,0.0,0.0,0.0,0.0,815137.76
std,0.0,75.63,0.04,0.04,0.29,0.0,0.45,0.8,0.29,0.0,0.0,0.39,0.0,0.39,0.79,0.0,0.0,0.0,0.29,0.51,0.29,1.83,0.0,0.58,0.29,0.29,0.58,0.39,0.29,0.39,0.0,0.0,0.45,0.79,0.29,0.39,0.58,0.0,0.0,0.0,0.39,0.29,0.39,0.0,0.0,0.0,0.0,0.0,82910.07
min,4.0,3006.0,-37.89,144.92,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,652058.82
25%,4.0,3031.0,-37.88,144.94,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,783177.3
50%,4.0,3114.0,-37.84,144.98,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,844532.91
75%,4.0,3167.75,-37.8,145.0,0.0,0.0,0.25,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,1.0,0.0,1.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,865426.36
max,4.0,3185.0,-37.79,145.06,1.0,0.0,1.0,2.0,1.0,0.0,0.0,1.0,0.0,1.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,0.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,1.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,916205.07


In [237]:
cluster_ave_price = pd.DataFrame([joined_df.loc[joined_df['Cluster'] == 0]['Price'].mean(), joined_df.loc[joined_df['Cluster'] == 1]['Price'].mean(),
joined_df.loc[joined_df['Cluster'] == 2]['Price'].mean(), joined_df.loc[joined_df['Cluster'] == 3]['Price'].mean(),
joined_df.loc[joined_df['Cluster'] == 4]['Price'].mean()],columns=['Average Price'])
cluster_ave_price

Unnamed: 0,Average Price
0,2104943.87
1,1067327.42
2,1332251.53
3,1647958.21
4,815137.76


In [238]:
cluster_ave_italian = pd.DataFrame([joined_df.loc[joined_df['Cluster'] == 0]['Italian Restaurant'].mean(), 
joined_df.loc[joined_df['Cluster'] == 1]['Italian Restaurant'].mean(),
joined_df.loc[joined_df['Cluster'] == 2]['Italian Restaurant'].mean(), 
joined_df.loc[joined_df['Cluster'] == 3]['Italian Restaurant'].mean(),
joined_df.loc[joined_df['Cluster'] == 4]['Italian Restaurant'].mean()], columns=['Ave Italian Restaurants'])
cluster_ave_italian

Unnamed: 0,Ave Italian Restaurants
0,0.86
1,0.8
2,1.12
3,0.82
4,1.08


In [239]:
joined_df.loc[joined_df['Cluster'] == 1][['Suburb', 'Italian Restaurant', 'Price']]

Unnamed: 0,Suburb,Italian Restaurant,Price
25,Cheltenham,0.0,969127.39
26,Gardenvale,0.0,1095312.5
169,Gardenvale,0.0,1095312.5
28,Hampton East,0.0,1096513.04
29,Highett,0.0,1083245.06
30,Moorabbin,0.0,1009232.99
162,Bentleigh East,0.0,1132489.32
164,Caulfield East,0.0,1035750.0
165,Caulfield North,0.0,958154.49
172,Murrumbeena,0.0,1061576.9


In [255]:
best_suburb = joined_df.loc[joined_df['Cluster'] == 1][['Suburb', 'Cluster', 'Latitude', 'Longitude', 
                                                        'Postcode', 'Local government area', 'Italian Restaurant', 'Price']]
best_suburb.loc[best_suburb['Suburb'] == 'Burnley']

Unnamed: 0,Suburb,Cluster,Latitude,Longitude,Postcode,Local government area,Italian Restaurant,Price
479,Burnley,1,-37.83,145.02,3121,City of Yarra,0.0,1191875.62


In [279]:
# create map using latitude and longitude values
map_burnley = folium.Map(location=(-37.83, 145.02), zoom_start=13)

label = '{}, {}'.format('Burnley', 'City of Yarra')
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [-37.83, 145.0175],
    radius=40,
    popup=label,
    color='blue',
    fill=True,
    fill_opacity=0.2,
    parse_html=False).add_to(map_burnley)  

map_burnley