## Clustering the city of Johannesburg, South Africa

This Machine Learning project explores the most common venue categories in each neighborhood of Johannesburg, and then use this feature to group the neighborhoods into clusters. This is K-Means Clustering algorithm.

## 1. Import necessary dependencies to use in exploring the data

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 2. Scrap the web and import dataset

The data will be imported from a website and it contains postal codes for the neighbourhoods of Johannesburg

In [36]:
url = 'https://geo.mycyber.org/south_africa/johannesburg' #source of the data
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1] #convert dataset into a dataframe
df.head()

Unnamed: 0,Suburb,Zip Code,Postal Code
0,,2000.0,2001.0
1,Abbotsford,,2192.0
2,Aeroton,2013.0,2190.0
3,Airdlin,2157.0,
4,Alan Manor,,2091.0


For clustering, we are going to use postal codes and some codes have a plethora of suburbs in them, so we will use unique postal codes. Let's find out how many unique codes we have in the dataset

In [37]:
df['Postal Code'].nunique()

22

In [38]:
df.head()

Unnamed: 0,Suburb,Zip Code,Postal Code
0,,2000.0,2001.0
1,Abbotsford,,2192.0
2,Aeroton,2013.0,2190.0
3,Airdlin,2157.0,
4,Alan Manor,,2091.0


We have 22 unique postal codes, and that is what we are going to use. This means this is going to be the size of our dataset

## 3. Data Wrangling

We are going to clean the data to be able to process it

In [39]:
df.drop('Zip Code', axis=1, inplace=True)

In [40]:
df.head()

Unnamed: 0,Suburb,Postal Code
0,,2001.0
1,Abbotsford,2192.0
2,Aeroton,2190.0
3,Airdlin,
4,Alan Manor,2091.0


In [45]:
new_df = df.dropna()
new_df.head()

Unnamed: 0,Suburb,Postal Code
1,Abbotsford,2192.0
2,Aeroton,2190.0
4,Alan Manor,2091.0
5,Alan Manor Ext 2,2091.0
6,Alan Manor UIT 2,2091.0


In [46]:
new_df.drop_duplicates('Postal Code', keep='first')
new_df.head()

Unnamed: 0,Suburb,Postal Code
1,Abbotsford,2192.0
2,Aeroton,2190.0
4,Alan Manor,2091.0
5,Alan Manor Ext 2,2091.0
6,Alan Manor UIT 2,2091.0


In [48]:
new_df.shape

(1468, 2)

In [64]:
refined_df = new_df.drop_duplicates('Postal Code', keep='first')
refined_df

Unnamed: 0,Suburb,Postal Code
1,Abbotsford,2192.0
2,Aeroton,2190.0
4,Alan Manor,2091.0
7,Albertskroon,2195.0
9,Aldarapark,2194.0
10,Alexandra,2090.0
19,Amalgam,2092.0
20,Argyle,2001.0
22,Atholhurst,2196.0
69,Bedford Gardens,2007.0


In [74]:
refined_df

Unnamed: 0,Suburb,Postal Code,City
1,Abbotsford,2192.0,Johannesburg
2,Aeroton,2190.0,Johannesburg
4,Alan Manor,2091.0,Johannesburg
7,Albertskroon,2195.0,Johannesburg
9,Aldarapark,2194.0,Johannesburg
10,Alexandra,2090.0,Johannesburg
19,Amalgam,2092.0,Johannesburg
20,Argyle,2001.0,Johannesburg
22,Atholhurst,2196.0,Johannesburg
69,Bedford Gardens,2007.0,Johannesburg


In [157]:
column_titles = ['Suburb', 'City', 'Postal Code']
df = refined_df.reindex(columns=column_titles)
df.head()

Unnamed: 0,Suburb,City,Postal Code
1,Abbotsford,Johannesburg,2192.0
2,Aeroton,Johannesburg,2190.0
4,Alan Manor,Johannesburg,2091.0
7,Albertskroon,Johannesburg,2195.0
9,Aldarapark,Johannesburg,2194.0


In [158]:
df['Postal Code'] = df['Postal Code'].astype(int) #t
df.head()

Unnamed: 0,Suburb,City,Postal Code
1,Abbotsford,Johannesburg,2192
2,Aeroton,Johannesburg,2190
4,Alan Manor,Johannesburg,2091
7,Albertskroon,Johannesburg,2195
9,Aldarapark,Johannesburg,2194


In [159]:
df['Postal Code'] = df['Postal Code'].astype(str) #transform float into a string for concatenation
df.head()

Unnamed: 0,Suburb,City,Postal Code
1,Abbotsford,Johannesburg,2192
2,Aeroton,Johannesburg,2190
4,Alan Manor,Johannesburg,2091
7,Albertskroon,Johannesburg,2195
9,Aldarapark,Johannesburg,2194


In [82]:
df['Address'] = df[['Suburb', 'City', 'Postal Code']].apply(lambda x: ','.join(x), axis = 1) #join all columns into an address
df

Unnamed: 0,Suburb,City,Postal Code,Address
1,Abbotsford,Johannesburg,2192,"Abbotsford,Johannesburg,2192"
2,Aeroton,Johannesburg,2190,"Aeroton,Johannesburg,2190"
4,Alan Manor,Johannesburg,2091,"Alan Manor,Johannesburg,2091"
7,Albertskroon,Johannesburg,2195,"Albertskroon,Johannesburg,2195"
9,Aldarapark,Johannesburg,2194,"Aldarapark,Johannesburg,2194"
10,Alexandra,Johannesburg,2090,"Alexandra,Johannesburg,2090"
19,Amalgam,Johannesburg,2092,"Amalgam,Johannesburg,2092"
20,Argyle,Johannesburg,2001,"Argyle,Johannesburg,2001"
22,Atholhurst,Johannesburg,2196,"Atholhurst,Johannesburg,2196"
69,Bedford Gardens,Johannesburg,2007,"Bedford Gardens,Johannesburg,2007"


In [163]:
#locator = Nominatim(user_agent='myGeocoder')
# 1 - conveneint function to delay between geocoding calls
#geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
# 2- - create location column
#df['location'] = df['Address'].apply(geocode)
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
#df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
#df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)

### Let's export the dataset to excel to manually fill missing coordinates since they are just 6

In [165]:
df.to_excel(r'C:\Users\lawt9\Desktop\Dataset\joburg_refined.xlsx', index=False, header=True)

### Let's import data back into notebook

In [166]:
df = pd.read_excel('joburg_refined.xlsx')
df

Unnamed: 0,Suburb,City,Postal Code,Address,location,point,latitude,longitude,altitude
0,Abbotsford,Johannesburg,2192,"Abbotsford,Johannesburg,2192","Abbotsford, Johannesburg Ward 74, Johannesburg...","(-26.1431769, 28.0684413, 0.0)",-26.143177,28.068441,0.0
1,Aeroton,Johannesburg,2190,"Aeroton,Johannesburg,2190","Johannesburg, Western Bypass, Aeroton, Johanne...","(-26.2576949, 27.9618345, 0.0)",-26.257695,27.961834,0.0
2,Alan Manor,Johannesburg,2091,"Alan Manor,Johannesburg,2091","Alan Manor, Johannesburg Ward 125, Johannesbur...","(-26.2776913, 27.9928566, 0.0)",-26.277691,27.992857,0.0
3,Albertskroon,Johannesburg,2195,"Albertskroon,Johannesburg,2195","Albertskroon, Johannesburg Ward 86, Johannesbu...","(-26.1613889, 27.975, 0.0)",-26.161389,27.975,0.0
4,Aldarapark,Johannesburg,2194,"Aldarapark,Johannesburg,2194",,,-26.134159,27.9809,
5,Alexandra,Johannesburg,2090,"Alexandra,Johannesburg,2090","Alexandra, Johannesburg Ward 105, Sandton, Cit...","(-26.104444, 28.098889, 0.0)",-26.104444,28.098889,0.0
6,Amalgam,Johannesburg,2092,"Amalgam,Johannesburg,2092","Amalgam Place, Mayfair, Johannesburg Ward 58, ...","(-26.2130449, 28.0051737, 0.0)",-26.213045,28.005174,0.0
7,Argyle,Johannesburg,2001,"Argyle,Johannesburg,2001",,,-26.114611,28.025999,
8,Atholhurst,Johannesburg,2196,"Atholhurst,Johannesburg,2196","Atholhurst school, Dennis Road, Atholl Gardens...","(-26.1178103, 28.0707958, 0.0)",-26.11781,28.070796,0.0
9,Bedford Gardens,Johannesburg,2007,"Bedford Gardens,Johannesburg,2007",,,-26.19042,28.12258,


#### Now let's drop some columns we don't need

In [167]:
df.drop(columns =['Address', 'location', 'point', 'altitude'])

Unnamed: 0,Suburb,City,Postal Code,latitude,longitude
0,Abbotsford,Johannesburg,2192,-26.143177,28.068441
1,Aeroton,Johannesburg,2190,-26.257695,27.961834
2,Alan Manor,Johannesburg,2091,-26.277691,27.992857
3,Albertskroon,Johannesburg,2195,-26.161389,27.975
4,Aldarapark,Johannesburg,2194,-26.134159,27.9809
5,Alexandra,Johannesburg,2090,-26.104444,28.098889
6,Amalgam,Johannesburg,2092,-26.213045,28.005174
7,Argyle,Johannesburg,2001,-26.114611,28.025999
8,Atholhurst,Johannesburg,2196,-26.11781,28.070796
9,Bedford Gardens,Johannesburg,2007,-26.19042,28.12258


In [168]:
df.rename(columns={'Postal Code':'Postalcode'}, inplace=True)

#### Let's get the geographical location of the city of Johannesburg

In [169]:
address = 'Johannesburg'

geolocator = Nominatim(user_agent="jhb_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Johannesburg are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Johannesburg are -26.205, 28.049722.


### Now, let's create a map of Johannesburg with suburbs superimpossed on it

In [170]:
# create map of Johannesburg using latitude and longitude values
map_johannesburg = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, suburb, postalcode in zip(df['latitude'], df['longitude'], df['Suburb'], df['Postalcode']):
    label = '{}, {}'.format(suburb, postalcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_johannesburg)  
    
map_johannesburg

### connect to Foursquare and utilize its location data

In [171]:
CLIENT_ID = 'PXHSUXW5LFVITQND3IJ4PJXA1WK3QMYK0L15DDM1WPPO3FF2' # your Foursquare ID
CLIENT_SECRET = 'BMJXIWT2G2QZTHUUKEZQJTMBIWMNNTGXESU2FUXKAWXKDWLW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: PXHSUXW5LFVITQND3IJ4PJXA1WK3QMYK0L15DDM1WPPO3FF2
CLIENT_SECRET:BMJXIWT2G2QZTHUUKEZQJTMBIWMNNTGXESU2FUXKAWXKDWLW


In [175]:
# type your answer here
LIMIT = 10 # limit of number of venues returned by Foursquare API
radius = 50000 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=PXHSUXW5LFVITQND3IJ4PJXA1WK3QMYK0L15DDM1WPPO3FF2&client_secret=BMJXIWT2G2QZTHUUKEZQJTMBIWMNNTGXESU2FUXKAWXKDWLW&v=20180605&ll=-26.19042,28.12258&radius=50000&limit=10'

In [118]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff8bcaae53c096fdf44e26d'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 0,
  'suggestedBounds': {'ne': {'lat': -26.314577995499995,
    'lng': 28.031792754110832},
   'sw': {'lat': -26.323578004500007, 'lng': 28.02177064588917}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': []}]}}

### Get Category type from Foursquare to categorize our data

In [176]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Create a function to explore the suburbs of Johannesburg

In [177]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
                # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### write a code to run the above function

In [178]:
# type your answer here
johannesburg_venues = getNearbyVenues(names=df['Suburb'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Abbotsford
Aeroton
Alan Manor
Albertskroon
Aldarapark
Alexandra
Amalgam
Argyle
Atholhurst
Bedford Gardens
Belgravia
Bellevue
Bloubosrand
Blue Heaven
Bosmont
City Deep
Fairland
Forest Town
KYA Sand
Naturena
Rietvlei Country Estate
Rispark


##### Check the size of the resulting dataframe

In [179]:
print(johannesburg_venues.shape)
johannesburg_venues.head()

(69, 7)


Unnamed: 0,Suburb,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albertskroon,-26.161389,27.975,Little italy,-26.161876,27.974966,Italian Restaurant
1,Albertskroon,-26.161389,27.975,Jumbo farmers,-26.162502,27.974599,Food & Drink Shop
2,Albertskroon,-26.161389,27.975,Plaasjapie Antiques,-26.163257,27.973422,Antique Shop
3,Albertskroon,-26.161389,27.975,Thandidille Mountain Lodge,-26.158526,27.978684,Hotel
4,Aldarapark,-26.134159,27.9809,Carvers,-26.13167,27.981586,Restaurant


#### number of venues returned for each suburb

In [180]:
johannesburg_venues.groupby('Suburb').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albertskroon,4,4,4,4,4,4
Aldarapark,4,4,4,4,4,4
Alexandra,2,2,2,2,2,2
Amalgam,4,4,4,4,4,4
Argyle,8,8,8,8,8,8
Atholhurst,2,2,2,2,2,2
Bedford Gardens,10,10,10,10,10,10
Bellevue,6,6,6,6,6,6
Blue Heaven,2,2,2,2,2,2
City Deep,3,3,3,3,3,3


In [181]:
print('There are {} uniques categories.'.format(len(johannesburg_venues['Venue Category'].unique())))

There are 49 uniques categories.


### Analyze each suburb

In [182]:
# one hot encoding
johannesburg_onehot = pd.get_dummies(johannesburg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
johannesburg_onehot['Suburb'] = johannesburg_venues['Suburb'] 

# move neighborhood column to the first column
fixed_columns = [johannesburg_onehot.columns[-1]] + list(johannesburg_onehot.columns[:-1])
johannesburg_onehot = johannesburg_onehot[fixed_columns]

johannesburg_onehot.head()

Unnamed: 0,Suburb,Afghan Restaurant,African Restaurant,Antique Shop,Art Gallery,Athletics & Sports,Automotive Shop,Bakery,Bar,Bistro,Boutique,Breakfast Spot,Burger Joint,Butcher,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Furniture / Home Store,Garden Center,Gas Station,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Home Service,Hotel,Italian Restaurant,Market,Mediterranean Restaurant,Movie Theater,Music Venue,Pizza Place,Portuguese Restaurant,Pub,Resort,Restaurant,Shopping Mall,Sporting Goods Shop,Steakhouse,Supermarket,Vegetarian / Vegan Restaurant,Zoo
0,Albertskroon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Albertskroon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Albertskroon,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Albertskroon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aldarapark,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [129]:
johannesburg_onehot.shape

(69, 50)

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [183]:
johannesburg_grouped = johannesburg_onehot.groupby('Suburb').mean().reset_index()
johannesburg_grouped

Unnamed: 0,Suburb,Afghan Restaurant,African Restaurant,Antique Shop,Art Gallery,Athletics & Sports,Automotive Shop,Bakery,Bar,Bistro,Boutique,Breakfast Spot,Burger Joint,Butcher,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Furniture / Home Store,Garden Center,Gas Station,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Home Service,Hotel,Italian Restaurant,Market,Mediterranean Restaurant,Movie Theater,Music Venue,Pizza Place,Portuguese Restaurant,Pub,Resort,Restaurant,Shopping Mall,Sporting Goods Shop,Steakhouse,Supermarket,Vegetarian / Vegan Restaurant,Zoo
0,Albertskroon,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aldarapark,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
2,Alexandra,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amalgam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Argyle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
5,Atholhurst,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bedford Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.1,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0
7,Bellevue,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0
8,Blue Heaven,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,City Deep,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Top 5 most common venues

In [131]:
num_top_venues = 5

for hood in johannesburg_grouped['Suburb']:
    print("----"+hood+"----")
    temp = johannesburg_grouped[johannesburg_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albertskroon----
                venue  freq
0        Antique Shop  0.25
1   Food & Drink Shop  0.25
2               Hotel  0.25
3  Italian Restaurant  0.25
4   Afghan Restaurant  0.00


----Aldarapark----
              venue  freq
0  Greek Restaurant  0.25
1     Grocery Store  0.25
2        Restaurant  0.25
3      Burger Joint  0.25
4     Movie Theater  0.00


----Alexandra----
                           venue  freq
0              Afghan Restaurant   0.5
1                        Butcher   0.5
2  Vegetarian / Vegan Restaurant   0.0
3                  Movie Theater   0.0
4                  Grocery Store   0.0


----Amalgam----
                venue  freq
0    Halal Restaurant  0.25
1       Shopping Mall  0.25
2        Home Service  0.25
3  Chinese Restaurant  0.25
4   Afghan Restaurant  0.00


----Argyle----
                  venue  freq
0         Grocery Store  0.25
1  Fast Food Restaurant  0.25
2  Gym / Fitness Center  0.12
3         Shopping Mall  0.12
4                Bistro  0.

In [184]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top 10 most common venues for each neigborhood

In [153]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Suburb'] = johannesburg_grouped['Suburb']

for ind in np.arange(johannesburg_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(johannesburg_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albertskroon,Antique Shop,Hotel,Italian Restaurant,Food & Drink Shop,Zoo,Cafeteria,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
1,Aldarapark,Burger Joint,Greek Restaurant,Grocery Store,Restaurant,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant
2,Alexandra,Afghan Restaurant,Butcher,Furniture / Home Store,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
3,Amalgam,Shopping Mall,Chinese Restaurant,Halal Restaurant,Home Service,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
4,Argyle,Fast Food Restaurant,Grocery Store,Shopping Mall,Gym / Fitness Center,Bistro,Music Venue,Butcher,Construction & Landscaping,Coffee Shop,Climbing Gym
5,Atholhurst,Construction & Landscaping,Market,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
6,Bedford Gardens,Supermarket,Shopping Mall,Greek Restaurant,Pizza Place,Gym / Fitness Center,Movie Theater,Bakery,Portuguese Restaurant,Cafeteria,Café
7,Bellevue,Flower Shop,African Restaurant,Gas Station,Sporting Goods Shop,Caribbean Restaurant,Boutique,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop
8,Blue Heaven,Garden Center,Pub,Burger Joint,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
9,City Deep,Coffee Shop,Automotive Shop,Breakfast Spot,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Climbing Gym,Chinese Restaurant


In [186]:
# set number of clusters
kclusters = 4

johannesburg_grouped_clustering = johannesburg_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(johannesburg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10]
johannesburg_grouped_clustering

Unnamed: 0,Afghan Restaurant,African Restaurant,Antique Shop,Art Gallery,Athletics & Sports,Automotive Shop,Bakery,Bar,Bistro,Boutique,Breakfast Spot,Burger Joint,Butcher,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Coffee Shop,Construction & Landscaping,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Furniture / Home Store,Garden Center,Gas Station,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Home Service,Hotel,Italian Restaurant,Market,Mediterranean Restaurant,Movie Theater,Music Venue,Pizza Place,Portuguese Restaurant,Pub,Resort,Restaurant,Shopping Mall,Sporting Goods Shop,Steakhouse,Supermarket,Vegetarian / Vegan Restaurant,Zoo
0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
2,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.1,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0
7,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Cluster Neighborhoods

In [156]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster titles', kmeans.labels_)

johannesburg_merged = df

# merge johannesburg_grouped with df to add latitude/longitude for each neighborhood
johannesburg_merged = johannesburg_merged.join(neighborhoods_venues_sorted.set_index('Suburb'), on='Suburb')

johannesburg_merged # check the last columns!

Unnamed: 0,Suburb,City,Postalcode,Address,location,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,Johannesburg,2192,"Abbotsford,Johannesburg,2192","Abbotsford, Johannesburg Ward 74, Johannesburg...","(-26.1431769, 28.0684413, 0.0)",-26.143177,28.068441,0.0,,,,,,,,,,,
1,Aeroton,Johannesburg,2190,"Aeroton,Johannesburg,2190","Johannesburg, Western Bypass, Aeroton, Johanne...","(-26.2576949, 27.9618345, 0.0)",-26.257695,27.961834,0.0,,,,,,,,,,,
2,Alan Manor,Johannesburg,2091,"Alan Manor,Johannesburg,2091","Alan Manor, Johannesburg Ward 125, Johannesbur...","(-26.2776913, 27.9928566, 0.0)",-26.277691,27.992857,0.0,,,,,,,,,,,
3,Albertskroon,Johannesburg,2195,"Albertskroon,Johannesburg,2195","Albertskroon, Johannesburg Ward 86, Johannesbu...","(-26.1613889, 27.975, 0.0)",-26.161389,27.975,0.0,0.0,Antique Shop,Hotel,Italian Restaurant,Food & Drink Shop,Zoo,Cafeteria,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
4,Aldarapark,Johannesburg,2194,"Aldarapark,Johannesburg,2194",,,-26.134159,27.9809,,0.0,Burger Joint,Greek Restaurant,Grocery Store,Restaurant,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant
5,Alexandra,Johannesburg,2090,"Alexandra,Johannesburg,2090","Alexandra, Johannesburg Ward 105, Sandton, Cit...","(-26.104444, 28.098889, 0.0)",-26.104444,28.098889,0.0,3.0,Afghan Restaurant,Butcher,Furniture / Home Store,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
6,Amalgam,Johannesburg,2092,"Amalgam,Johannesburg,2092","Amalgam Place, Mayfair, Johannesburg Ward 58, ...","(-26.2130449, 28.0051737, 0.0)",-26.213045,28.005174,0.0,0.0,Shopping Mall,Chinese Restaurant,Halal Restaurant,Home Service,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
7,Argyle,Johannesburg,2001,"Argyle,Johannesburg,2001",,,-26.114611,28.025999,,0.0,Fast Food Restaurant,Grocery Store,Shopping Mall,Gym / Fitness Center,Bistro,Music Venue,Butcher,Construction & Landscaping,Coffee Shop,Climbing Gym
8,Atholhurst,Johannesburg,2196,"Atholhurst,Johannesburg,2196","Atholhurst school, Dennis Road, Atholl Gardens...","(-26.1178103, 28.0707958, 0.0)",-26.11781,28.070796,0.0,2.0,Construction & Landscaping,Market,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
9,Bedford Gardens,Johannesburg,2007,"Bedford Gardens,Johannesburg,2007",,,-26.19042,28.12258,,0.0,Supermarket,Shopping Mall,Greek Restaurant,Pizza Place,Gym / Fitness Center,Movie Theater,Bakery,Portuguese Restaurant,Cafeteria,Café


In [191]:
new = johannesburg_merged.dropna()
new

Unnamed: 0,Suburb,City,Postalcode,Address,location,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Albertskroon,Johannesburg,2195,"Albertskroon,Johannesburg,2195","Albertskroon, Johannesburg Ward 86, Johannesbu...","(-26.1613889, 27.975, 0.0)",-26.161389,27.975,0.0,0.0,Antique Shop,Hotel,Italian Restaurant,Food & Drink Shop,Zoo,Cafeteria,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
5,Alexandra,Johannesburg,2090,"Alexandra,Johannesburg,2090","Alexandra, Johannesburg Ward 105, Sandton, Cit...","(-26.104444, 28.098889, 0.0)",-26.104444,28.098889,0.0,3.0,Afghan Restaurant,Butcher,Furniture / Home Store,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
6,Amalgam,Johannesburg,2092,"Amalgam,Johannesburg,2092","Amalgam Place, Mayfair, Johannesburg Ward 58, ...","(-26.2130449, 28.0051737, 0.0)",-26.213045,28.005174,0.0,0.0,Shopping Mall,Chinese Restaurant,Halal Restaurant,Home Service,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
8,Atholhurst,Johannesburg,2196,"Atholhurst,Johannesburg,2196","Atholhurst school, Dennis Road, Atholl Gardens...","(-26.1178103, 28.0707958, 0.0)",-26.11781,28.070796,0.0,2.0,Construction & Landscaping,Market,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
11,Bellevue,Johannesburg,2198,"Bellevue,Johannesburg,2198","Bellevue, Johannesburg Ward 66, Johannesburg, ...","(-26.1772222, 28.07, 0.0)",-26.177222,28.07,0.0,0.0,Flower Shop,African Restaurant,Gas Station,Sporting Goods Shop,Caribbean Restaurant,Boutique,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop
15,City Deep,Johannesburg,2197,"City Deep,Johannesburg,2197","City Deep, Maritzburg Street, Jeppestown, Joha...","(-26.2151823, 28.0638983, 0.0)",-26.215182,28.063898,0.0,0.0,Coffee Shop,Automotive Shop,Breakfast Spot,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Climbing Gym,Chinese Restaurant
16,Fairland,Johannesburg,2170,"Fairland,Johannesburg,2170","Fairland, Johannesburg Ward 98, Johannesburg, ...","(-26.1336111, 27.9444444, 0.0)",-26.133611,27.944444,0.0,0.0,Grocery Store,Bar,Pizza Place,Café,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
17,Forest Town,Johannesburg,2193,"Forest Town,Johannesburg,2193","Forest Town, Johannesburg Ward 87, Johannesbur...","(-26.1727778, 28.0366667, 0.0)",-26.172778,28.036667,0.0,0.0,Zoo,Art Gallery,Cafeteria,Resort,Food & Drink Shop,Athletics & Sports,Automotive Shop,Flower Shop,Fast Food Restaurant,Construction & Landscaping
18,KYA Sand,Johannesburg,2169,"KYA Sand,Johannesburg,2169","Kya Sand, Johannesburg Ward 96, Randburg, City...","(-26.02713645, 27.948544551597475, 0.0)",-26.027136,27.948545,0.0,1.0,Furniture / Home Store,Climbing Gym,Zoo,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Chinese Restaurant,Caribbean Restaurant,Café
19,Naturena,Johannesburg,2095,"Naturena,Johannesburg,2095","Naturena, Johannesburg Ward 125, Johannesburg,...","(-26.2832168, 27.959597, 0.0)",-26.283217,27.959597,0.0,0.0,Gym,Gym / Fitness Center,Pizza Place,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant


In [192]:
new.to_excel(r'C:\Users\lawt9\Desktop\Dataset\joburg_final.xlsx', index=False, header=True)

In [193]:
df = pd.read_excel('joburg_final.xlsx')
df

Unnamed: 0,Suburb,City,Postalcode,Address,location,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albertskroon,Johannesburg,2195,"Albertskroon,Johannesburg,2195","Albertskroon, Johannesburg Ward 86, Johannesbu...","(-26.1613889, 27.975, 0.0)",-26.161389,27.975,0,0,Antique Shop,Hotel,Italian Restaurant,Food & Drink Shop,Zoo,Cafeteria,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
1,Alexandra,Johannesburg,2090,"Alexandra,Johannesburg,2090","Alexandra, Johannesburg Ward 105, Sandton, Cit...","(-26.104444, 28.098889, 0.0)",-26.104444,28.098889,0,3,Afghan Restaurant,Butcher,Furniture / Home Store,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
2,Amalgam,Johannesburg,2092,"Amalgam,Johannesburg,2092","Amalgam Place, Mayfair, Johannesburg Ward 58, ...","(-26.2130449, 28.0051737, 0.0)",-26.213045,28.005174,0,0,Shopping Mall,Chinese Restaurant,Halal Restaurant,Home Service,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
3,Atholhurst,Johannesburg,2196,"Atholhurst,Johannesburg,2196","Atholhurst school, Dennis Road, Atholl Gardens...","(-26.1178103, 28.0707958, 0.0)",-26.11781,28.070796,0,2,Construction & Landscaping,Market,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant
4,Bellevue,Johannesburg,2198,"Bellevue,Johannesburg,2198","Bellevue, Johannesburg Ward 66, Johannesburg, ...","(-26.1772222, 28.07, 0.0)",-26.177222,28.07,0,0,Flower Shop,African Restaurant,Gas Station,Sporting Goods Shop,Caribbean Restaurant,Boutique,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop
5,City Deep,Johannesburg,2197,"City Deep,Johannesburg,2197","City Deep, Maritzburg Street, Jeppestown, Joha...","(-26.2151823, 28.0638983, 0.0)",-26.215182,28.063898,0,0,Coffee Shop,Automotive Shop,Breakfast Spot,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Climbing Gym,Chinese Restaurant
6,Fairland,Johannesburg,2170,"Fairland,Johannesburg,2170","Fairland, Johannesburg Ward 98, Johannesburg, ...","(-26.1336111, 27.9444444, 0.0)",-26.133611,27.944444,0,0,Grocery Store,Bar,Pizza Place,Café,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
7,Forest Town,Johannesburg,2193,"Forest Town,Johannesburg,2193","Forest Town, Johannesburg Ward 87, Johannesbur...","(-26.1727778, 28.0366667, 0.0)",-26.172778,28.036667,0,0,Zoo,Art Gallery,Cafeteria,Resort,Food & Drink Shop,Athletics & Sports,Automotive Shop,Flower Shop,Fast Food Restaurant,Construction & Landscaping
8,KYA Sand,Johannesburg,2169,"KYA Sand,Johannesburg,2169","Kya Sand, Johannesburg Ward 96, Randburg, City...","(-26.02713645, 27.948544551597475, 0.0)",-26.027136,27.948545,0,1,Furniture / Home Store,Climbing Gym,Zoo,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Chinese Restaurant,Caribbean Restaurant,Café
9,Naturena,Johannesburg,2095,"Naturena,Johannesburg,2095","Naturena, Johannesburg Ward 125, Johannesburg,...","(-26.2832168, 27.959597, 0.0)",-26.283217,27.959597,0,0,Gym,Gym / Fitness Center,Pizza Place,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant


## Visualize the resulting clusters

In [195]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df['latitude'], df['longitude'], df['Suburb'], df['Cluster titles']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters

## Cluster 1

In [220]:
df.loc[df['Cluster titles'] == 0, df.columns[[0] + list(range(5, df.shape[1]))]]

Unnamed: 0,Suburb,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albertskroon,"(-26.1613889, 27.975, 0.0)",-26.161389,27.975,0,0,Antique Shop,Hotel,Italian Restaurant,Food & Drink Shop,Zoo,Cafeteria,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
2,Amalgam,"(-26.2130449, 28.0051737, 0.0)",-26.213045,28.005174,0,0,Shopping Mall,Chinese Restaurant,Halal Restaurant,Home Service,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
4,Bellevue,"(-26.1772222, 28.07, 0.0)",-26.177222,28.07,0,0,Flower Shop,African Restaurant,Gas Station,Sporting Goods Shop,Caribbean Restaurant,Boutique,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop
5,City Deep,"(-26.2151823, 28.0638983, 0.0)",-26.215182,28.063898,0,0,Coffee Shop,Automotive Shop,Breakfast Spot,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Climbing Gym,Chinese Restaurant
6,Fairland,"(-26.1336111, 27.9444444, 0.0)",-26.133611,27.944444,0,0,Grocery Store,Bar,Pizza Place,Café,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym
7,Forest Town,"(-26.1727778, 28.0366667, 0.0)",-26.172778,28.036667,0,0,Zoo,Art Gallery,Cafeteria,Resort,Food & Drink Shop,Athletics & Sports,Automotive Shop,Flower Shop,Fast Food Restaurant,Construction & Landscaping
9,Naturena,"(-26.2832168, 27.959597, 0.0)",-26.283217,27.959597,0,0,Gym,Gym / Fitness Center,Pizza Place,Zoo,Butcher,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant


## Cluster 2

In [219]:
df.loc[df['Cluster titles'] == 1, df.columns[[0] + list(range(5, df.shape[1]))]]


Unnamed: 0,Suburb,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,KYA Sand,"(-26.02713645, 27.948544551597475, 0.0)",-26.027136,27.948545,0,1,Furniture / Home Store,Climbing Gym,Zoo,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Chinese Restaurant,Caribbean Restaurant,Café


## Cluster 3

In [218]:
df.loc[df['Cluster titles'] == 2, df.columns[[0] + list(range(5, df.shape[1]))]]


Unnamed: 0,Suburb,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Atholhurst,"(-26.1178103, 28.0707958, 0.0)",-26.11781,28.070796,0,2,Construction & Landscaping,Market,Zoo,Butcher,Flower Shop,Fast Food Restaurant,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant


## Cluster 4

In [217]:
 df.loc[df['Cluster titles'] == 3, df.columns[[0] + list(range(5, df.shape[1]))]]


Unnamed: 0,Suburb,point,latitude,longitude,altitude,Cluster titles,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Alexandra,"(-26.104444, 28.098889, 0.0)",-26.104444,28.098889,0,3,Afghan Restaurant,Butcher,Furniture / Home Store,Flower Shop,Fast Food Restaurant,Construction & Landscaping,Coffee Shop,Climbing Gym,Chinese Restaurant,Caribbean Restaurant


# End of Project