## Segmentation and clustering

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_Cape_Town_suburbs, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

In [4]:
#!pip3 install bs4
import pandas as pd
import numpy as np
from requests import get
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/List_of_Cape_Town_suburbs'
response = get(url)
import pandas as pd
import numpy as np
html_soup = BeautifulSoup(response.content, 'lxml')
boroughs = html_soup.find_all(class_='mw-headline')
df_tables = pd.read_html(response.content)
for i in range(len(df_tables[0:8])):
    df_tables[i].insert(0, 'Borough', boroughs[i].text)
    df_tables[i].drop("Postal Code", axis=1, inplace=True)
    try:
        df_tables[i].drop("Dialing prefix", axis=1, inplace=True)
    except:
        pass
cpt_merged = pd.concat(df_tables[0:8])
cpt_merged.dropna(subset=["Street Code"], axis=0, inplace=True)
cpt_merged.reset_index(drop=True, inplace=True)
cpt_merged[["Street Code"]] = cpt_merged[["Street Code"]].astype("int")
cpt_merged.rename(columns={'Street Code':'PostalCode', 'Suburb':'Neighbourhood'}, inplace=True)
cpt_merged = pd.DataFrame(cpt_merged.groupby(['PostalCode', 'Borough'])['Neighbourhood'].apply(', '.join))
cpt_merged.reset_index(inplace=True)
cpt_merged

Collecting bs4
  Using cached https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
[?25l  Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
[K    100% |████████████████████████████████| 102kB 1.0MB/s a 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/5d/42/d821581cf568e9b7dfc5b415aa61952b0f5e3dede4f3cbd650e3a1082992/soupsieve-1.9.4-py2.py3-none-any.whl
Building wheels for collected packages: bs4
  Running setup.py bdist_wheel for bs4 ... [?25ldone
[?25h  Stored in directory: /home/pieter/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.8.1 b

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,7100,Cape Flats,Delft
1,7130,Cape Flats,Macassar
2,7130,Helderberg,"Firgrove, Somerset West"
3,7140,Helderberg,"Gordon's Bay, Strand"
4,7349,West Coast,"Atlantis, Mamre"
5,7405,Northern Suburbs,"Brooklyn, Kensington, Maitland, Rugby"
6,7405,Southern Suburbs,"Ndabeni, Pinelands"
7,7441,Northern Suburbs,"Bothasig, Edgemead"
8,7441,West Coast,"Bloubergstrand, Melkbosstrand, Milnerton, Mont..."
9,7455,Cape Flats,Langa


In [5]:
print('Cape Town has {} unique Postal codes\nCape Town has {} Unique Boroughs'.format( cpt_merged.PostalCode.unique().size,
                                                                                      cpt_merged.Borough.unique().size))

Cape Town has 34 unique Postal codes
Cape Town has 8 Unique Boroughs


Explore and cluster the neighborhoods in Cape Town.

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make.
* to generate maps to visualize your neighborhoods and how they cluster together.

### Mapping
1. Install libraries and import modules

In [7]:
#!pip3 install geopy
from geopy.geocoders import Nominatim # Convert and address into latitude and longitude values
from pandas.io.json import json_normalize  #Transform json file into pandas dataframe
#Plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering
from sklearn.cluster import KMeans

#!pip install folium # Install folium
import folium
print('Imported Libraries')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/80/93/d384479da0ead712bdaf697a8399c13a9a89bd856ada5a27d462fb45e47b/geopy-1.20.0-py2.py3-none-any.whl (100kB)
[K    100% |████████████████████████████████| 102kB 960kB/s a 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.20.0
Imported Libraries


Define a useragent for geocoder

In [8]:
from time import sleep
def get_location(address):
    address = address
    try:
        geolocator = Nominatim(user_agent="cape_town_explorer")
        sleep(2)
        location = geolocator.geocode(address)
        return location.latitude, location.longitude
    except Exception as e:
        print(e)
        return np.nan, np.nan
    

In [9]:

df_geo_loc = pd.DataFrame({"Neighbourhood":[], "Latitude":[], "Longitude":[]})
for b in cpt_merged['Neighbourhood']:
        latitude, longitude = get_location('{}, Cape Town, ZA'.format(b.split(',')[0]))
        row = [b, latitude, longitude]
        #print(row)
        df_geo_loc.loc[len(df_geo_loc)] = row

df_geo_loc


'NoneType' object has no attribute 'latitude'
'NoneType' object has no attribute 'latitude'
Service timed out


KeyboardInterrupt: 

In [21]:
df_cpt_merge = cpt_merged.merge(df_geo_loc, how='left', left_on='Neighbourhood', right_on='Neighbourhood')
df_cpt_merge.dropna(subset=["Latitude", 'Longitude'], axis=0, inplace=True)
df_cpt_merge.reset_index(drop=True, inplace=True)
df_cpt_merge.shape
df_cpt_merge

(41, 5)

In [24]:
latitude, longitude = get_location('Cape Town, ZA')
map_cape_town = folium.Map(location=[latitude, longitude], zoom_start=10)
#Add markers to the map
for lat, lng, borough, neighbourhood in zip(df_cpt_merge['Latitude'], df_cpt_merge['Longitude'],
                                           df_cpt_merge['Borough'],
                                           df_cpt_merge['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat,lng],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fill_opacity=0.7,
                       parse_html=False).add_to(map_cape_town)
map_cape_town

Let us select only the neighbourhoods in Downtown Toronto

In [25]:
cape_town_data = df_cpt_merge
cape_town_data.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,7100,Cape Flats,Delft,-33.965556,18.644444
1,7130,Cape Flats,Macassar,-34.066116,18.767495
2,7130,Helderberg,"Firgrove, Somerset West",-34.040539,18.455753
3,7140,Helderberg,"Gordon's Bay, Strand",-34.161125,18.868687
4,7405,Northern Suburbs,"Brooklyn, Kensington, Maitland, Rugby",-33.908889,18.479167


In [26]:
CLIENT_ID = 'LH2OJQC5DUE5Y3TQZVD3DXDH2A3ASD5MANEVAKEMGDJN0HWT' # Inputs for 4Square Api
CLIENT_SECRET = 'YRM0HCKVDAG5GJU1JYCKPMUWJD0IZUVF1N0ZY3GRKLC3MXT3'
VERSION = '20190930' # Set the version to a date is current

Get the name of the neighbourhood

In [27]:
cape_town_data.loc[0, 'Neighbourhood']

'Delft'

Get the latitude and longitude values of the neighbourhood

In [28]:
neighborhood_latitude = cape_town_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = cape_town_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = cape_town_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Delft are -33.9655556, 18.6444444.


Get the top 100 venues that are in Delft

In [33]:
LIMIT = 100
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=LH2OJQC5DUE5Y3TQZVD3DXDH2A3ASD5MANEVAKEMGDJN0HWT&client_secret=YRM0HCKVDAG5GJU1JYCKPMUWJD0IZUVF1N0ZY3GRKLC3MXT3&v=20190930&ll=-33.9655556,18.6444444&radius=1000&limit=100'

In [34]:
results = get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d99a6e30d2be7002c3d7579'},
  'headerLocation': 'Cape Town International Airport',
  'headerFullLocation': 'Cape Town International Airport, Cape Town',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': -33.95655559099999,
    'lng': 18.65527571298506},
   'sw': {'lat': -33.974555609000014, 'lng': 18.633613087014943}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4fcdef06e4b0155a2e9906d0',
       'name': 'Saverite Supermarket',
       'location': {'lat': -33.95756530761719,
        'lng': 18.64678764343262,
        'labeledLatLngs': [{'label': 'display',
          'lat': -33.95756530761719,
          'lng': 18.64678764343262}],
        'distance': 915,
        'cc': 'ZA',
      

In [35]:
# Function that extracts the catgory of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean the json data and structure it into a dataframe

In [36]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Saverite Supermarket,Convenience Store,-33.957565,18.646788


In [37]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

1 venues were returned by Foursquare.


Explore the neighbourhoods in Cape Town

In [45]:
#Function to repeat the same process for all the neighbourhoods in Downtown Cape Town
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [46]:
#Now we get all the venues for Downtown Toronto
downtown_venues = getNearbyVenues(names=cape_town_data['Neighbourhood'], 
                                  latitudes=cape_town_data['Latitude'], 
                                  longitudes=cape_town_data['Longitude'])

Delft
Macassar
Firgrove, Somerset West
Gordon's Bay, Strand
Brooklyn, Kensington, Maitland, Rugby
Ndabeni, Pinelands
Bothasig, Edgemead
Bloubergstrand, Melkbosstrand, Milnerton, Montague Gardens, Parklands, Table View, West Beach
Langa
Epping
Goodwood, Monte Vista, Thornton
Belhar
Panorama, Parow, Plattekloof
Bellville, Loevenstein
Durbanville
Brackenfell
Kraaifontein
Kuils River
Milnerton
Mowbray, Newlands, Rondebosch, Rosebank
Bishopscourt, Claremont, Harfield Village, Kenilworth
Gugulethu, Nyanga, Philippi
Athlone, Bonteheuwel
Crawford
Kenwyn, Lansdowne, Rondebosch East
Khayelitsha
Mitchells Plain, Samora Machel
Strandfontein
Ottery
Plumstead, Wynberg
Hout Bay, Imizamo Yethu, Llandudno
Constantia, Kreupelbosch, Meadowridge
SouthField
Salt River, Walmer Estate (District Six), Woodstock (including Upper Woodstock), Zonnebloem (District Six)
Observatory
Grassy Park, Lotus River
Lavender Hill
Lakeside, Marina da Gama, Muizenberg, St James
Bergvliet, Diep River, Heathfield, Kirstenhof, R

In [47]:
#Check the size of the resulting dataframe
print(downtown_venues.shape)
downtown_venues.head()

(1138, 7)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Delft,-33.965556,18.644444,Debonairs Pizza,-33.976087,18.645183,Pizza Place
1,Delft,-33.965556,18.644444,Designer Fireplaces,-33.964327,18.629529,Business Service
2,Delft,-33.965556,18.644444,Wimpy,-33.97903,18.65189,Burger Joint
3,Macassar,-34.066116,18.767495,Proudly Macassar Pottery,-34.058593,18.774445,Arts & Crafts Store
4,Macassar,-34.066116,18.767495,Corner Bakery,-34.050871,18.768658,Bakery


In [48]:
# Checking how many venues was returned for each neighbourhood
downtown_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Athlone, Bonteheuwel",15,15,15,15,15,15
"Bakoven, Bantry Bay, Camps Bay, Clifton, Fresnaye, Green Point, Mouille Point, Sea Point, Three Anchor Bay",54,54,54,54,54,54
Belhar,8,8,8,8,8,8
"Bellville, Loevenstein",32,32,32,32,32,32
"Bergvliet, Diep River, Heathfield, Kirstenhof, Retreat, Steenberg, Tokai",19,19,19,19,19,19
"Bishopscourt, Claremont, Harfield Village, Kenilworth",78,78,78,78,78,78
"Bloubergstrand, Melkbosstrand, Milnerton, Montague Gardens, Parklands, Table View, West Beach",23,23,23,23,23,23
"Bothasig, Edgemead",8,8,8,8,8,8
Brackenfell,19,19,19,19,19,19
"Brooklyn, Kensington, Maitland, Rugby",18,18,18,18,18,18


In [49]:
# Check how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_venues['Venue Category'].unique())))

There are 162 uniques categories.


## Analyzing each neighbourhood

In [50]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
downtown_onehot['Neighbourhood'] = downtown_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]

downtown_onehot.head()

Unnamed: 0,Neighbourhood,African Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theater,Toll Plaza,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vineyard,Wine Bar,Winery
0,Delft,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Delft,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Delft,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Macassar,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Macassar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [51]:
# Examine the new dataframe
downtown_onehot.shape

(1138, 163)

In [52]:
#Group the rows by neighbourhood
downtown_grouped = downtown_onehot.groupby('Neighbourhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighbourhood,African Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theater,Toll Plaza,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vineyard,Wine Bar,Winery
0,"Athlone, Bonteheuwel",0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
1,"Bakoven, Bantry Bay, Camps Bay, Clifton, Fresn...",0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,...,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Belhar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bellville, Loevenstein",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0
4,"Bergvliet, Diep River, Heathfield, Kirstenhof,...",0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bishopscourt, Claremont, Harfield Village, Ken...",0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.012821,0.0
6,"Bloubergstrand, Melkbosstrand, Milnerton, Mont...",0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Bothasig, Edgemead",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Brackenfell,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Brooklyn, Kensington, Maitland, Rugby",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [53]:
# Check the new size
downtown_grouped.shape

(41, 163)

## Print each neighbourhood with the top 5 venues

In [54]:
num_top_venues = 5

for hood in downtown_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Athlone, Bonteheuwel----
                 venue  freq
0         Burger Joint  0.07
1    Convenience Store  0.07
2        Shopping Mall  0.07
3  Sporting Goods Shop  0.07
4              Stadium  0.07


----Bakoven, Bantry Bay, Camps Bay, Clifton, Fresnaye, Green Point, Mouille Point, Sea Point, Three Anchor Bay----
                venue  freq
0               Hotel  0.15
1                Café  0.09
2          Restaurant  0.07
3  Seafood Restaurant  0.07
4               Beach  0.07


----Belhar----
                  venue  freq
0  Fast Food Restaurant  0.25
1                 Hotel  0.12
2      Basketball Court  0.12
3           Pizza Place  0.12
4      Business Service  0.12


----Bellville, Loevenstein----
                  venue  freq
0  Fast Food Restaurant  0.09
1         Grocery Store  0.06
2             Nightclub  0.06
3           Coffee Shop  0.06
4              Pharmacy  0.06


----Bergvliet, Diep River, Heathfield, Kirstenhof, Retreat, Steenberg, Tokai----
           venue  f

               venue  freq
0              Beach   0.4
1       Soccer Field   0.2
2  Indian Restaurant   0.2
3        Gas Station   0.2
4          Nightclub   0.0




Create a pandas dataframe and display the top10 venues for each neighbourhood

In [55]:
#Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creating the new dataframe and display the top 10 venues for each neighbourhood

In [56]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = downtown_grouped['Neighbourhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Athlone, Bonteheuwel",Burger Joint,Asian Restaurant,Electronics Store,Café,Fast Food Restaurant,Gas Station,Sporting Goods Shop,Coffee Shop,Outdoors & Recreation,Shopping Mall
1,"Bakoven, Bantry Bay, Camps Bay, Clifton, Fresn...",Hotel,Café,Restaurant,Seafood Restaurant,Beach,Coffee Shop,Pizza Place,Scenic Lookout,Italian Restaurant,Ice Cream Shop
2,Belhar,Fast Food Restaurant,Pizza Place,Deli / Bodega,Hotel,Business Service,Supermarket,Basketball Court,Electronics Store,Fish & Chips Shop,Farm
3,"Bellville, Loevenstein",Fast Food Restaurant,Nightclub,Restaurant,Pharmacy,Convenience Store,Grocery Store,Coffee Shop,Breakfast Spot,Shoe Store,Burger Joint
4,"Bergvliet, Diep River, Heathfield, Kirstenhof,...",Gas Station,Grocery Store,Shopping Mall,Bakery,Steakhouse,Breakfast Spot,Portuguese Restaurant,Convenience Store,Arcade,Hotel


## Cluster Neighbourhoods

In [57]:
# Run k-means to cluster the neighbourhoods into 5 clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighbourhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)
# Check cluster labels generated for each row in the dataframe
kmeans.labels_[:]

array([3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 3, 3, 3, 1, 3, 3, 4, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 0],
      dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood

In [60]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = cape_town_data
# Merge the data
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
downtown_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,7100,Cape Flats,Delft,-33.965556,18.644444,2,Pizza Place,Burger Joint,Business Service,Winery,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Event Space,Event Service
1,7130,Cape Flats,Macassar,-34.066116,18.767495,4,Bakery,Arts & Crafts Store,Beach,Electronics Store,Flea Market,Fishing Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Event Space
2,7130,Helderberg,"Firgrove, Somerset West",-34.040539,18.455753,1,Gas Station,Grocery Store,Steakhouse,Shopping Mall,Bakery,Hotel,Business Service,Thai Restaurant,Portuguese Restaurant,Restaurant
3,7140,Helderberg,"Gordon's Bay, Strand",-34.161125,18.868687,1,Seafood Restaurant,Shopping Mall,Burger Joint,Coffee Shop,Restaurant,Breakfast Spot,Scenic Lookout,Café,City,Bed & Breakfast
4,7405,Northern Suburbs,"Brooklyn, Kensington, Maitland, Rugby",-33.908889,18.479167,1,Bus Station,Hotel,Cultural Center,Gas Station,Flea Market,Café,Spa,Restaurant,Beach,Climbing Gym


 Now let us visualize the resulting cluster

In [61]:
avg_cluster_labels = downtown_merged['Cluster Labels'].mean(axis=0)
downtown_merged['Cluster Labels'].replace(np.nan, avg_cluster_labels, inplace=True)
downtown_merged[['Cluster Labels']] = downtown_merged[['Cluster Labels']].astype('int')

In [62]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], 
                                  downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    if cluster == np.nan:
        cluster = 0
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [63]:
#examine the clusters
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, 
                     downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Cape Flats,0,Beach,Gas Station,Indian Restaurant,Soccer Field,Deli / Bodega,Event Service,Flea Market,Fishing Store,Fish & Chips Shop,Fast Food Restaurant


In [64]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, 
                     downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Helderberg,1,Gas Station,Grocery Store,Steakhouse,Shopping Mall,Bakery,Hotel,Business Service,Thai Restaurant,Portuguese Restaurant,Restaurant
3,Helderberg,1,Seafood Restaurant,Shopping Mall,Burger Joint,Coffee Shop,Restaurant,Breakfast Spot,Scenic Lookout,Café,City,Bed & Breakfast
4,Northern Suburbs,1,Bus Station,Hotel,Cultural Center,Gas Station,Flea Market,Café,Spa,Restaurant,Beach,Climbing Gym
5,Southern Suburbs,1,Grocery Store,Gym,Paper / Office Supplies Store,Seafood Restaurant,Gas Station,Burger Joint,Fast Food Restaurant,Restaurant,Pizza Place,Café
7,West Coast,1,Seafood Restaurant,Café,Restaurant,African Restaurant,Steakhouse,Hotel,Juice Bar,Lounge,Dessert Shop,Department Store
9,Cape Flats,1,Gas Station,Nightclub,Fast Food Restaurant,Grocery Store,Train Station,Hotel,Liquor Store,Diner,Farm,Event Space
10,Northern Suburbs,1,Fast Food Restaurant,Hotel,Steakhouse,Casino,Diner,Shopping Mall,Seafood Restaurant,Train Station,Department Store,Electronics Store
11,Northern Suburbs,1,Fast Food Restaurant,Pizza Place,Deli / Bodega,Hotel,Business Service,Supermarket,Basketball Court,Electronics Store,Fish & Chips Shop,Farm
12,Northern Suburbs,1,Shopping Mall,Seafood Restaurant,Grocery Store,Pizza Place,Fast Food Restaurant,Steakhouse,Supermarket,Portuguese Restaurant,Italian Restaurant,Department Store
13,Northern Suburbs,1,Fast Food Restaurant,Nightclub,Restaurant,Pharmacy,Convenience Store,Grocery Store,Coffee Shop,Breakfast Spot,Shoe Store,Burger Joint


In [65]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, 
                     downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cape Flats,2,Pizza Place,Burger Joint,Business Service,Winery,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Event Space,Event Service


In [66]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, 
                     downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Northern Suburbs,3,Shopping Mall,Convenience Store,Grocery Store,Dance Studio,Pub,Seafood Restaurant,Stadium,Winery,Donut Shop,Farm
8,Cape Flats,3,Convenience Store,Fast Food Restaurant,Shopping Mall,Seafood Restaurant,Steakhouse,Pizza Place,Grocery Store,Coffee Shop,Cultural Center,Electronics Store
16,Northern Suburbs,3,Fast Food Restaurant,Train Station,Breakfast Spot,Coffee Shop,Seafood Restaurant,Multiplex,Music Store,Shopping Mall,Clothing Store,Burger Joint
17,Northern Suburbs,3,Shopping Mall,Convenience Store,Golf Course,Seafood Restaurant,Breakfast Spot,Food Court,Football Stadium,Bed & Breakfast,Restaurant,Burger Joint
22,Cape Flats,3,Burger Joint,Asian Restaurant,Electronics Store,Café,Fast Food Restaurant,Gas Station,Sporting Goods Shop,Coffee Shop,Outdoors & Recreation,Shopping Mall
25,Cape Flats,3,Convenience Store,Burger Joint,Shopping Mall,Coffee Shop,Lounge,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Event Space
28,Cape Flats,3,Shopping Mall,Grocery Store,Train Station,Portuguese Restaurant,Gas Station,Breakfast Spot,Winery,Diner,Fast Food Restaurant,Farm
36,Cape Flats,3,Fast Food Restaurant,Harbor / Marina,Train Station,Convenience Store,Department Store,Gas Station,Deli / Bodega,Dance Studio,Design Studio,Dessert Shop


In [67]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, 
                     downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Cape Flats,4,Bakery,Arts & Crafts Store,Beach,Electronics Store,Flea Market,Fishing Store,Fish & Chips Shop,Fast Food Restaurant,Farm,Event Space


In [68]:
#Save the data to be used in the final workbooks for the capstone project
downtown_merged.to_csv('cape_town_data.csv')