## CAPSTONE PROJECT - OPENNING A NEW MALL IN KAMPALA

# Introduction: Business problem

In this project we will try to find an optimal location for openning a new shopping mall in Kampala. Specifically this report will be useful to property developers that are looking to open up a new shopping mall in the city of Kampala.

Since there are countable shopping malls in the surrounding areas of Kampala city we will try to detect locations that does not have a shopping mall establishment in its visinity. We are also particularly interested in suburbs with high value property and rental rates assuming the first condition has been met.

We will use data science techniques to generate the most promissing locations based on the criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Import Libraries

In [1]:
#import pandas and json to read the table from the wikipedia link
import pandas as pd
import numpy as np
import json
#Importing libraries to be used
#!conda install -c conda-forge geopy --yes 

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests 

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans # import k-means from clustering stage

#!conda install -c conda-forge folium=0.5.0 --yes 

import folium # map rendering library

print('Libraries imported.')


Libraries imported.


### Load 'latlong.xlsx'  from the working directory
'latlong.xlsx' contains the pre prepaired data information consisting of suburb name, average rental prices, latitude and longitude data.

In [2]:
#working directory
import os
cwd = os.getcwd()
cwd

'C:\\Users\\user'

In [4]:
df=pd.read_excel('latlong.xlsx')

df=pd.DataFrame(df)
print(df.shape)
df.head()

(32, 4)


Unnamed: 0,Suburb,Average rental prices,Latitude,Longitude
0,Bugolobi,2448924,0.313898,32.622041
1,Bukasa,856939,-0.43438,32.502563
2,Bukoto,1100000,-0.39288,31.63031
3,Bunga,1475000,0.272844,32.620329
4,Buziga,1216667,0.258917,32.617698


The table above contains information on the most prominent suburbs in Kampala city with regard to their average rental charges which was gotten from the rental index published by Knight Frank a property development company in Uganda and other web sources.
For the cordinates stated in the table, had to get latitude and longitude data using the google app https://www.latlong.net/. for each suburb location named. The python Geocoder library kept on returning wrong and unrelated cordinates. So had to come up with other means to obtain the coordinates.

In [5]:
address = 'Kampala'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kampala are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kampala are 0.3177137, 32.5813539.


## Create map of Kampala indicating mentioned surburbs

In [6]:
# create map of Kampala indicating the suburbs using latitude and longitude values
map_kampala = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, hood in zip(df['Latitude'], df['Longitude'], df['Suburb']):
    label = '{}'.format(hood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kampala)  
    
map_kampala

In [8]:
#save map as html
map_kampala.save('map_kampala.html')

## Use Foursqure API to generate venues in the different surburbs

In [7]:
# define Foursquare Credentials and Version
CLIENT_ID = '0TSNTEYQISCUSFIFLM4JKYP4ONUK5L01ZKNNXG4CBQZB4AHX' # your Foursquare ID
CLIENT_SECRET = 'KIZN1LIXOV4JG0BVNUT3QIKY33EQBZH54CSYGDN3DGIVHTXA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ', CLIENT_ID)
print('CLIENT_SECRET:', CLIENT_SECRET)

Your credentails:
CLIENT_ID:  0TSNTEYQISCUSFIFLM4JKYP4ONUK5L01ZKNNXG4CBQZB4AHX
CLIENT_SECRET: KIZN1LIXOV4JG0BVNUT3QIKY33EQBZH54CSYGDN3DGIVHTXA


In [8]:
radius = 2000
LIMIT = 100

venues = []

for lat, lng, hood in zip(df['Latitude'], df['Longitude'], df['Suburb']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lng,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            hood,
            lat, 
            lng, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [64]:
venues

[('Bugolobi',
  0.313898,
  32.622041,
  'Alchemist',
  0.31864252619093086,
  32.62204107079856,
  'Bar'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Planet Yogurt',
  0.31989734517698715,
  32.61768588959705,
  'Frozen Yogurt Shop'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'MONOt Bar',
  0.3187015991906804,
  32.62219520024536,
  'Bar'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Jazz Ville',
  0.31874864577159606,
  32.62007476148969,
  'Jazz Club'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Jazzville',
  0.31879885051933354,
  32.62007758940084,
  'Performing Arts Venue'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Liquid Silk',
  0.3203690106192498,
  32.617784080318046,
  'Nightclub'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Java Coffee & Tea',
  0.3199489322503677,
  32.61758386189637,
  'Café'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Bugolobi Village Mall',
  0.3199664920348895,
  32.61754734957329,
  'Shopping Mall'),
 ('Bugolobi',
  0.313898,
  32.622041,
  'Bugolobi Villa

In [65]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Suburb', 'Suburb_lat', 'Suburb_long', 'Venue_name', 'Venue_lat','Venue_long', 'VenueCategory']

print(venues_df.shape)
venues_df.head()
#print('There are {} venues returned,'.format(venues_df[0:]))


(835, 7)


Unnamed: 0,Suburb,Suburb_lat,Suburb_long,Venue_name,Venue_lat,Venue_long,VenueCategory
0,Bugolobi,0.313898,32.622041,Alchemist,0.318643,32.622041,Bar
1,Bugolobi,0.313898,32.622041,Planet Yogurt,0.319897,32.617686,Frozen Yogurt Shop
2,Bugolobi,0.313898,32.622041,MONOt Bar,0.318702,32.622195,Bar
3,Bugolobi,0.313898,32.622041,Jazz Ville,0.318749,32.620075,Jazz Club
4,Bugolobi,0.313898,32.622041,Jazzville,0.318799,32.620078,Performing Arts Venue


The total number of venues returned from all the stated surburbs are 835, this is quite a small number of venues compared to other bigger cities.

### Checking how many venues returned for each suburb

In [11]:
venues_df.groupby(["Suburb"]).count()

Unnamed: 0_level_0,Suburb_lat,Suburb_long,Venue_name,Venue_lat,Venue_long,VenueCategory
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bugolobi,33,33,33,33,33,33
Bukasa,2,2,2,2,2,2
Bukoto,3,3,3,3,3,3
Bunga,13,13,13,13,13,13
Buziga,16,16,16,16,16,16
Bweyogerere,6,6,6,6,6,6
Entebbe,29,29,29,29,29,29
Kansanga,38,38,38,38,38,38
Kira,4,4,4,4,4,4
Kireka,19,19,19,19,19,19


In [12]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Bar', 'Frozen Yogurt Shop', 'Jazz Club', 'Performing Arts Venue',
       'Nightclub', 'Café', 'Shopping Mall', 'Beer Garden',
       'Mexican Restaurant', 'Lounge', 'African Restaurant',
       'Convenience Store', 'Whisky Bar', 'Bed & Breakfast', 'Pub',
       'Italian Restaurant', 'Fast Food Restaurant', 'Hotel', 'Market',
       'Stadium', 'Cocktail Bar', 'Department Store', 'Pharmacy',
       'Hostel', 'BBQ Joint', 'Taxi Stand', 'Bakery', 'Resort', 'Gym',
       'Japanese Restaurant', 'Food & Drink Shop', 'Coffee Shop',
       'French Restaurant', 'Sports Bar', 'Beach', 'Restaurant', 'Diner',
       'Outdoor Supply Store', 'Athletics & Sports', 'Soccer Stadium',
       'American Restaurant', 'Theme Restaurant', 'Zoo',
       'Botanical Garden', 'Golf Course', 'Pool', 'Lake', 'Movie Theater',
       'Thai Restaurant', 'Wine Bar', 'Burger Joint', 'Neighborhood',
       'Theme Park', 'Grocery Store', 'Health & Beauty Service',
       'Indian Restaurant', 'Pizza Place', 'Chines

In [13]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

In [14]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add Suburb column back to dataframe
kl_onehot['Suburb'] = venues_df['Suburb'] 

# move Suburb column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(835, 121)


Unnamed: 0,Suburb,African Restaurant,Airport Lounge,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,...,Tapas Restaurant,Taxi Stand,Thai Restaurant,Theme Park,Theme Restaurant,Turkish Restaurant,Video Store,Whisky Bar,Wine Bar,Zoo
0,Bugolobi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bugolobi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bugolobi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bugolobi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bugolobi,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
kl_grouped = kl_onehot.groupby(["Suburb"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped.head()

(32, 121)


Unnamed: 0,Suburb,African Restaurant,Airport Lounge,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,...,Tapas Restaurant,Taxi Stand,Thai Restaurant,Theme Park,Theme Restaurant,Turkish Restaurant,Video Store,Whisky Bar,Wine Bar,Zoo
0,Bugolobi,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0
1,Bukasa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bukoto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bunga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Buziga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
kl_mall = kl_grouped[["Suburb","Shopping Mall"]]
kl_mall.head()

Unnamed: 0,Suburb,Shopping Mall
0,Bugolobi,0.060606
1,Bukasa,0.0
2,Bukoto,0.0
3,Bunga,0.0
4,Buziga,0.0


In [30]:
kl_mall['Rental charges']=df['Average rental prices']
kl_mall.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Suburb,Shopping Mall,Rental charges
0,Bugolobi,0.060606,2448924
1,Bukasa,0.0,856939
2,Bukoto,0.0,1100000
3,Bunga,0.0,1475000
4,Buziga,0.0,1216667


In [31]:
kl_mall.head()

Unnamed: 0,Suburb,Shopping Mall,Rental charges
0,Bugolobi,0.060606,2448924
1,Bukasa,0.0,856939
2,Bukoto,0.0,1100000
3,Bunga,0.0,1475000
4,Buziga,0.0,1216667


In [37]:
from sklearn import preprocessing
# set number of clusters
kclusters = 3

kl_clustering = kl_mall.drop(["Suburb"],1)
kl_clustering=preprocessing.StandardScaler().fit(kl_clustering).transform(kl_clustering)


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 2, 2, 2, 2, 1, 0, 2, 2])

In [38]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each suburb.
kl_merged = kl_mall.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [39]:
kl_merged

Unnamed: 0,Suburb,Shopping Mall,Rental charges,Cluster Labels
0,Bugolobi,0.060606,2448924,1
1,Bukasa,0.0,856939,2
2,Bukoto,0.0,1100000,2
3,Bunga,0.0,1475000,2
4,Buziga,0.0,1216667,2
5,Bweyogerere,0.0,550000,2
6,Entebbe,0.034483,2882138,1
7,Kansanga,0.052632,1141667,0
8,Kira,0.0,633333,2
9,Kireka,0.0,946667,2


In [26]:
kl_merged['Rental prices']=df['Average rental prices']
kl_merged.head()

Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Rental prices
0,Bugolobi,0.060606,2,2448924
1,Bukasa,0.0,0,856939
2,Bukoto,0.0,0,1100000
3,Bunga,0.0,0,1475000
4,Buziga,0.0,0,1216667


In [51]:
#merge kl_merged with df to add latitude/longitude for each Surbub
kl_merged = kl_merged.join(df.set_index("Suburb"), on="Suburb")

print(kl_merged.shape)
kl_merged.drop(['Rental charges'],1, inplace=True)
kl_merged.head() # check the last columns!

(32, 7)


Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Average rental prices,Latitude,Longitude
0,Bugolobi,0.060606,1,2448924,0.313898,32.622041
1,Bukasa,0.0,2,856939,-0.43438,32.502563
2,Bukoto,0.0,2,1100000,-0.39288,31.63031
3,Bunga,0.0,2,1475000,0.272844,32.620329
4,Buziga,0.0,2,1216667,0.258917,32.617698


In [56]:
kl_merged.sort_values(['Cluster Labels'], inplace=True)
kl_merged.head()

Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Average rental prices,Latitude,Longitude
15,Kyambogo,0.1,0,560000,0.3473,32.63088
27,Nakawa,0.107143,0,660000,0.3338,32.61847
25,Najjera,0.129032,0,750000,0.379828,32.626282
24,Naguru,0.09375,0,3287500,0.341905,32.606366
20,Mengo,0.058824,0,717000,0.31075,32.55886


In [57]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Suburb'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [59]:
map_clusters.save('clusters.html')

# Examine clusters

## Cluster 0

In [61]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Average rental prices,Latitude,Longitude
15,Kyambogo,0.1,0,560000,0.3473,32.63088
27,Nakawa,0.107143,0,660000,0.3338,32.61847
25,Najjera,0.129032,0,750000,0.379828,32.626282
24,Naguru,0.09375,0,3287500,0.341905,32.606366
20,Mengo,0.058824,0,717000,0.31075,32.55886
16,Lubowa,0.111111,0,2943828,0.233333,32.566667
30,Ntinda,0.075,0,841667,0.354428,32.613638
14,Kyaliwajjala,0.166667,0,591667,0.380206,32.646827
12,Kiwatule,0.09375,0,741667,0.372741,32.629864
10,Kisaasi,0.0625,0,946667,0.362895,32.599734


## Cluster 1

In [62]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Average rental prices,Latitude,Longitude
0,Bugolobi,0.060606,1,2448924,0.313898,32.622041
13,Kololo,0.022472,1,3500000,0.33781,32.58636
6,Entebbe,0.034483,1,2882138,0.061172,32.469856
28,Nalya,0.119048,1,816667,0.360831,32.628884
18,Lweza,0.0,1,2080000,0.224644,32.547464
19,Makerere,0.037037,1,591667,0.337735,32.562941
26,Nakasero,0.07,1,4000000,0.32045,32.57635


## Cluster 2

In [63]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Suburb,Shopping Mall,Cluster Labels,Average rental prices,Latitude,Longitude
29,Namugongo,0.0,2,641667,0.390715,32.654199
1,Bukasa,0.0,2,856939,-0.43438,32.502563
2,Bukoto,0.0,2,1100000,-0.39288,31.63031
3,Bunga,0.0,2,1475000,0.272844,32.620329
11,Kitende,0.0,2,955000,0.2,32.533333
22,Mutungo,0.0,2,800000,0.315914,32.642562
9,Kireka,0.0,2,520000,0.3515,32.64518
4,Buziga,0.0,2,1216667,0.258917,32.617698
17,Luzira,0.0,2,1144293,0.297493,32.652819
5,Bweyogerere,0.0,2,550000,0.352032,32.673612


### Observations

Most of the shopping malls are concentrated in the central areas of Kampala, with the highest number in cluster 0 and moderate number in cluster 1. On the other hand, cluster 2 has no shopping mall in its suburbs. This represents a great opportunity and high potential areas to open new shopping malls as there is no competition from existing malls. Meanwhile, shopping malls in cluster 0 are likely suffering from competition due to high concentration of shopping malls with regard to other clusters. From another perspective, this also shows that there are more shopping malls in suburbs closer to the central area of the city, with outer lying suburbs still having very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in suburbs in cluster 2 which seem to have no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 1 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 0 which already have high concentration of shopping malls and may suffer from intense competition.

# FUTHER ANALYSIS
Given that the data set is actually very small we could do direct analysis by filtering out suburbs that don't have shopping malls and have high rental charges.

## NB: remember we are only intrested in surburbs without venue category "shopping mall" 

In [66]:
#return all suburbs with shopping malls in them
venue_filt=venues_df.loc[venues_df['VenueCategory']=='Shopping Mall'] #=='Shopping Mall']
print(venue_filt.shape)
venue_filt

(51, 7)


Unnamed: 0,Suburb,Suburb_lat,Suburb_long,Venue_name,Venue_lat,Venue_long,VenueCategory
7,Bugolobi,0.313898,32.622041,Bugolobi Village Mall,0.319966,32.617547,Shopping Mall
8,Bugolobi,0.313898,32.622041,Bugolobi Village Mall,0.319941,32.617637,Shopping Mall
82,Entebbe,0.061172,32.469856,Victoria Mall,0.066485,32.476442,Shopping Mall
121,Kansanga,0.292001,32.604574,Tirupati Mazima Mall,0.299438,32.596503,Shopping Mall
122,Kansanga,0.292001,32.604574,Kingsgate Mall Kabalagala,0.298017,32.601367,Shopping Mall
156,Kisaasi,0.362895,32.599734,Tuskys,0.354983,32.612668,Shopping Mall
172,Kisaasi,0.362895,32.599734,Haruna Towers,0.354242,32.612022,Shopping Mall
184,Kiwatule,0.372741,32.629864,Quality Village -Namugongo,0.375563,32.643473,Shopping Mall
189,Kiwatule,0.372741,32.629864,Metroplex Mall,0.365035,32.633063,Shopping Mall
195,Kiwatule,0.372741,32.629864,u-save supermarket,0.370473,32.624387,Shopping Mall


The App returns only 51 shopping malls. So we now extract the suburb areas that already have shopping malls

In [67]:
venue_remove=venue_filt['Suburb'].unique()
venue_remove

array(['Bugolobi', 'Entebbe ', 'Kansanga', 'Kisaasi', 'Kiwatule',
       'Kyaliwajjala', 'Lubowa', 'Naguru', 'Najjera', 'Nalya', 'Ntinda',
       'Seguku', 'Makerere', 'Kololo', 'Mengo', 'Nakasero', 'Nakawa',
       'Kyambogo'], dtype=object)

### Drop all the suburb areas that already have shopping mall in them from the main DataFrame

In [68]:
df=df.set_index('Suburb')
df=df.drop(['Bugolobi', 'Entebbe ', 'Kansanga','Kisaasi','Kiwatule', 'Kyaliwajjala','Lubowa','Naguru','Najjera','Nalya','Ntinda','Seguku','Makerere','Kololo','Mengo','Nakasero','Nakawa','Kyambogo'])
df

Unnamed: 0_level_0,Average rental prices,Latitude,Longitude
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bukasa,856939,-0.43438,32.502563
Bukoto,1100000,-0.39288,31.63031
Bunga,1475000,0.272844,32.620329
Buziga,1216667,0.258917,32.617698
Bweyogerere,550000,0.352032,32.673612
Kira,633333,0.400341,32.639185
Kitende,955000,0.2,32.533333
Luzira,1144293,0.297493,32.652819
Lweza,2080000,0.224644,32.547464
Munyonyo,2010582,0.244545,32.62175


In [69]:
df=df.reset_index()
print(df.shape)
df.head()

(14, 4)


Unnamed: 0,Suburb,Average rental prices,Latitude,Longitude
0,Bukasa,856939,-0.43438,32.502563
1,Bukoto,1100000,-0.39288,31.63031
2,Bunga,1475000,0.272844,32.620329
3,Buziga,1216667,0.258917,32.617698
4,Bweyogerere,550000,0.352032,32.673612


At this point we can create a map showing the suburb areas without shopping malls

In [70]:
# create map of Kampala indicating the suburbs using latitude and longitude values
map_kampala = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, hood in zip(df['Latitude'], df['Longitude'], df['Suburb']):
    label = '{}'.format(hood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kampala)  
    
map_kampala

In [71]:
map_kampala.save('cluster_nomalls.html')

## Objective 2
As stated in our second objective. The intended location is supporsed to be in a high to middle income suburb. For this we shall only select suburbs with rental charges greater than 1,000,000 Ugandan shillings. This is done with the assumption that those capable of paying such rental charges can also afford shopping in the high end shopping mall.

In [74]:
# drop cells with surburbs that have rental prices below 1,000,000
df_lan =df[df['Average rental prices'] > 1000000].reset_index(drop=True)
print(df_lan.shape)
df_lan

(7, 4)


Unnamed: 0,Suburb,Average rental prices,Latitude,Longitude
0,Bukoto,1100000,-0.39288,31.63031
1,Bunga,1475000,0.272844,32.620329
2,Buziga,1216667,0.258917,32.617698
3,Luzira,1144293,0.297493,32.652819
4,Lweza,2080000,0.224644,32.547464
5,Munyonyo,2010582,0.244545,32.62175
6,Muyenga,2275000,0.299107,32.618571


After fulfilling the criteria set, we are left with only 7 suburbs namely Bukoto, Bunga, Buziga, Luzira, Lweza, Munyonto and Muyenga. 

## We then plot a map indicating the remaining suburb areas

In [81]:
# create map of Toronto using latitude and longitude values
map_kampala = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, hood in zip(df_lan['Latitude'], df_lan['Longitude'], df_lan['Suburb']):
    label = '{}'.format(hood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kampala)
    folium.CircleMarker(
        [0.2734828, 32.6165844],
        radius=90,
        popup='loaction11',
        color='yellow',
        fill=False,).add_to(map_kampala)
    

    
map_kampala

Visually we can see that there is a clustering suburbs consisting of Muyenga, Bunga, Buziga and Munyonyo. These are well known high end suburbs consisting of posh neighborhoods housing mostly the wealthy. It would be great to have a mall that can be accessed by all the residents in the 4 mentioned suburbs. A mall set up within the visinity of the yellow circle that was proximated by getting the average of the latitude and longitude cordinates of the 4 suburbs would be very practicle and visible. 