# ANALYZING INVESTMENTS IN REAL ESTATE IN NAIROBI BOROUGH
## IBM DATA SCIENCE COURSE CAPSTONE PROJECT (WEEK2)

### Business Problem

The real estate sector in Kenya has seen a tremendous growth in the recent past and Nairobi, the capital city of Kenya, sits at the top among the towns that are experiencing a rebirth of property development. In short, real estate investment is the talk of the town and it seems as though everyone is up for grabs for this ‘cake’. However, much as it’s that promising, not everyone is savvy as to where to begin, also, much care should be observed when making decisions as to what kind investment to make and which location has good prospects in the returns on investment otherwise a miscalculated investment can turn out to be a bitter pill to swallow . Take for example an investor who would like to make investment in the real estate sector but they don’t know how and where to begin, some of the questions that they could be asking are where should I buy land? What should I develop on the property? And many other similar questions that demand for a response.
It would be hectic to make a visit to every single property on sale and have a view of its location and venues around it if there be any, similarly , to page through every single page of some given real estate agency website to check on available properties on sale and there location. Hence, a quick analytical tour through every property would prove to be of help in the decision making process.

### Data Section

For this problem I'll need the following datasets:

1. A list of properties on sale which I obtained from RentProperty kenya website through webscrapping.
some of the features I scrapped from the website are; 
  - Size  
  - Price  
  - Location  
  - Type of property  
  
2. A shapefile of Nairobi county(Borough) districts. This I downloaded from 

3. A csv of cytonn research.
    - I'll pick return on investment data from  Nairobi Metropolitan Residential Report 2017 carried out by Cytonn.

4. Geospatial data of the property locations.
    - To get this I will use geopy to geocode the property addresses.

5. List of Venues around property locations.
    - This I will obtain from Foursquare API 


## Methodology

I'll apply the following steps of problem solving in data science:

    - Business understanding  
    - Data requirements  
    - Data understanding  
    - Data collection  
    - Modelling(clustering)
    - Evaluation
    - Conclusion  
    



#### Importing Libraries

In [1]:
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import numpy as np
import shapefile
import os
import geopandas as gpd

from sklearn.cluster import KMeans
import json
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup
import lxml
import html5lib

import geopy
from geopy.extra.rate_limiter import RateLimiter

## 1.Webscrapping data from Rentproperty website 


In [21]:
#url = "https://www.buyrentkenya.com/plots-land-for-sale/nairobi?gclid=EAIaIQobChMItdbu8vOT5gIVh4bVCh2uQw1wEAAYASAAEgJFavD_BwE"
#source = requests.get(url).text
#soup = BeautifulSoup(source,'lxml')

    The website has multiple pages, 28 pages in number. To access data from all the pages, 
    I created a loop that collects all the url's of all the pages and then extracts data.

In [23]:
pages = []
prices_list=[]
plocation2_list=[]
size2_list=[]
ptype2_list=[]

for i in range(29):
    url = 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=' + str(i)  
    pages.append(url)  

    Checking if all the url's of the pages are listed.

In [7]:
pages

['https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=0',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=1',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=2',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=3',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=4',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=5',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=6',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=7',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=8',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=9',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=10',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=11',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=12',
 'https://www.buyrentkenya.com/plots-land-for-sale/nairobi?page=13',
 'https://www.buyrentkenya.com/plots-land-fo

     Loop through all the pages and extract the data.

In [24]:
for item in pages:
    page = requests.get(item).text
    soup = BeautifulSoup(page, 'lxml')
    
    price_tag = soup.select(".item-price")
    price2 = [pt.get_text() for pt in price_tag]
    prices_list.extend(price2)
    
    property_location = soup.select(".property-location")
    plocation2 = [pt.get_text() for pt in property_location]
    plocation2_list.extend(plocation2)
    
    property_type = soup.select(".property-title")
    ptype2 = [pt.get_text() for pt in property_type]
    ptype2_list.extend(ptype2)
    
    property_size = soup.select(".h-area")
    size2 = [pt.get_text() for pt in property_size]
    size2_list.extend(size2)

Now lets put the extracted data into a dataframe

In [25]:
pricess = pd.DataFrame(prices_list)
sizze = pd.DataFrame(size2_list)
llocation = pd.DataFrame(plocation2_list)
proptype = pd.DataFrame(ptype2_list)

In [27]:
newtable = pd.concat([proptype,pricess,llocation,sizze], ignore_index=True,axis=1)
newtable.head()

Unnamed: 0,0,1,2,3
0,\n\nResidential Land\n\n,"KES 185,000,000","Njumbi Rd, Lavington, Dagoretti North","\n3,237m²\n\n\n\n"
1,\n\nLand\n\n,"KES 32,000,000","Karen, Langata","\n2,024m²\n\n\n\n"
2,\n\n3 Bedroom Residential Land\n\n,"KES 180,000,000","Dennis Pritt Road, State House, Dagoretti North",\n955m²\n\n\n\n
3,\n\nLand\n\n,"KES 225,000,000","Eastern Bypass, Embakasi, Embakasi East",\n325m²\n\n\n\n
4,\n\nLand\n\n,"KES 650,000,000","Roysambu Area, Roysambu","\n21,044m²\n\n\n\n"


## 2. Data cleaning and transformation
In this section we'll clean the scraped data and tranform it into a format that we can work with.

Lets begin with assigning column names to the dataframe.

In [28]:
newtable.columns = ["Property Type","Price","Location","Size"]
newtable.head()

Unnamed: 0,Property Type,Price,Location,Size
0,\n\nResidential Land\n\n,"KES 185,000,000","Njumbi Rd, Lavington, Dagoretti North","\n3,237m²\n\n\n\n"
1,\n\nLand\n\n,"KES 32,000,000","Karen, Langata","\n2,024m²\n\n\n\n"
2,\n\n3 Bedroom Residential Land\n\n,"KES 180,000,000","Dennis Pritt Road, State House, Dagoretti North",\n955m²\n\n\n\n
3,\n\nLand\n\n,"KES 225,000,000","Eastern Bypass, Embakasi, Embakasi East",\n325m²\n\n\n\n
4,\n\nLand\n\n,"KES 650,000,000","Roysambu Area, Roysambu","\n21,044m²\n\n\n\n"


In [29]:
new_df = pd.DataFrame({
    "Property Type":ptype2,
    "Price":price2,
    "Location":plocation2,
    "Size":size2
})

In [30]:
#rentpropdata = newtable.to_csv (r'C:\Users\dclinton\Downloads\rentpropdta.csv', index=None, header=True)
#rentpropdata

### Data cleaning

In [31]:
newtable=newtable.replace('\n','',regex=True)
newtable.head()

Unnamed: 0,Property Type,Price,Location,Size
0,Residential Land,"KES 185,000,000","Njumbi Rd, Lavington, Dagoretti North","3,237m²"
1,Land,"KES 32,000,000","Karen, Langata","2,024m²"
2,3 Bedroom Residential Land,"KES 180,000,000","Dennis Pritt Road, State House, Dagoretti North",955m²
3,Land,"KES 225,000,000","Eastern Bypass, Embakasi, Embakasi East",325m²
4,Land,"KES 650,000,000","Roysambu Area, Roysambu","21,044m²"


The column Price is supposed to be of type float/integer 
so we'll have to remove the string KES from the values so that we convert it.

In [32]:

newtable['Price']=newtable['Price'].replace(r'[a-zA-Z]+','',regex=True)
newtable.head()

Unnamed: 0,Property Type,Price,Location,Size
0,Residential Land,185000000,"Njumbi Rd, Lavington, Dagoretti North","3,237m²"
1,Land,32000000,"Karen, Langata","2,024m²"
2,3 Bedroom Residential Land,180000000,"Dennis Pritt Road, State House, Dagoretti North",955m²
3,Land,225000000,"Eastern Bypass, Embakasi, Embakasi East",325m²
4,Land,650000000,"Roysambu Area, Roysambu","21,044m²"


The columns have spaces on the sides, and we'll strip the white spaces.

In [34]:
newtable.columns = newtable.columns.str.strip()

Next we're going to geocode the property addressess but then before we do that,
we'll append Nairobi to the addresses since some addresses might be similar to other places in the world. Nairobi will help in trying to specify the location.

In [35]:
newtable['new_location'] = 'Nairobi' + newtable['Location'].astype(str)
newtable.head()

Unnamed: 0,Property Type,Price,Location,Size,new_location
0,Residential Land,185000000,"Njumbi Rd, Lavington, Dagoretti North","3,237²","Nairobi Njumbi Rd, Lavington, Dagoretti North"
1,Land,32000000,"Karen, Langata","2,024²","Nairobi Karen, Langata"
2,3 Bedroom Residential Land,180000000,"Dennis Pritt Road, State House, Dagoretti North",955²,"Nairobi Dennis Pritt Road, State House, Dagor..."
3,Land,225000000,"Eastern Bypass, Embakasi, Embakasi East",325²,"Nairobi Eastern Bypass, Embakasi, Embakasi East"
4,Land,650000000,"Roysambu Area, Roysambu","21,044²","Nairobi Roysambu Area, Roysambu"


In [36]:
newtable.shape

(548, 5)

## 3. Geocoding the property addresses

I'll use OpenStreetMap Nominatim to geocode the addresses.

In [37]:
locator = Nominatim(user_agent="myGeocoder")

geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

newtable['locationn'] = newtable['new_location'].apply(geocode)

newtable['point'] = newtable['locationn'].apply(lambda loc: tuple(loc.point) if loc else None)

newtable[['latitude','longitude','altitude']] = pd.DataFrame(newtable['point'].tolist(), index=newtable.index)

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Nairobi  Kabarnet Road, Ngong Road',), **{}).
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1016, in _send_output
    self.send(msg)
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 956, in send
    self.connect()
  File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1384, in connect
    super().connect()


ValueError: Columns must be same length as key

In [38]:
newtable.shape

(548, 7)

Checking the first five values.

In [39]:
newtable.head()

Unnamed: 0,Property Type,Price,Location,Size,new_location,locationn,point
0,Residential Land,185000000,"Njumbi Rd, Lavington, Dagoretti North","3,237²","Nairobi Njumbi Rd, Lavington, Dagoretti North",,
1,Land,32000000,"Karen, Langata","2,024²","Nairobi Karen, Langata","(Nairobi Mamba Village, Langata, Nairobi, 0050...","(-1.3333492, 36.7503422, 0.0)"
2,3 Bedroom Residential Land,180000000,"Dennis Pritt Road, State House, Dagoretti North",955²,"Nairobi Dennis Pritt Road, State House, Dagor...",,
3,Land,225000000,"Eastern Bypass, Embakasi, Embakasi East",325²,"Nairobi Eastern Bypass, Embakasi, Embakasi East",,
4,Land,650000000,"Roysambu Area, Roysambu","21,044²","Nairobi Roysambu Area, Roysambu","(Roysambu Roundabout, Roysambu, Nairobi, Kenya...","(-1.21960835, 36.8914864399621, 0.0)"


In [40]:
newdf=newtable[['Property Type','new_location','Size','point','Price']].copy()
newdf.head()

Unnamed: 0,Property Type,new_location,Size,point,Price
0,Residential Land,"Nairobi Njumbi Rd, Lavington, Dagoretti North","3,237²",,185000000
1,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000
2,3 Bedroom Residential Land,"Nairobi Dennis Pritt Road, State House, Dagor...",955²,,180000000
3,Land,"Nairobi Eastern Bypass, Embakasi, Embakasi East",325²,,225000000
4,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000


Some of the addresses were not geocoded, I'll drop the locations that have None value and then reset the index.

In [41]:
newdf=newdf.dropna().reset_index()

In [42]:

newdf=newdf.drop(['index'],axis=1)

In [43]:
newdf.head()

Unnamed: 0,Property Type,new_location,Size,point,Price
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000


As we can see, the latitude and longitude are all wrapped in the point column, I'll split them into individual columns and drop the altitude column seeing that we'll not need it.

In [44]:
newdf[['latitude','longitude','altitude']] = pd.DataFrame(newdf.point.values.tolist(), index= newdf.index)

In [45]:
newdf=newdf.drop(['altitude'], axis=1).reset_index()

In [46]:
newdf=newdf.drop(['index'],axis=1)
newdf.head()

Unnamed: 0,Property Type,new_location,Size,point,Price,latitude,longitude
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000,-1.219608,36.891486
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000,-1.333349,36.750342
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000,-1.218227,36.89256
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342


In [47]:
newdf=newdf.rename(columns={'new_location':'Neighborhood'})
newdf.head()

Unnamed: 0,Property Type,Neighborhood,Size,point,Price,latitude,longitude
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000,-1.219608,36.891486
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000,-1.333349,36.750342
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000,-1.218227,36.89256
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342


In [49]:
newdf.to_csv(r'C:/Users/dclinton/Downloads/newdf.csv')

## 4. Exploring neighborhoods in Nairobi with Foursquare API

Defining the Foursquare Credentials and version

In [50]:
CLIENT_ID = '0PEXF3LYLJ4VON53CBUVLTA5C5OCYZEG10K2P3D2VDVKHURF' # your Foursquare ID
CLIENT_SECRET = '4NA5F2D51W42VIBJ1AYZMWJ11MJCYDVSKPUH4233GERYKFJA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0PEXF3LYLJ4VON53CBUVLTA5C5OCYZEG10K2P3D2VDVKHURF
CLIENT_SECRET:4NA5F2D51W42VIBJ1AYZMWJ11MJCYDVSKPUH4233GERYKFJA


 Lets create a function that will loop through all the neighborhoods for the locations in Nairobi.

In [51]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100

    radius = 500
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [61]:
Nairobi_venues = getNearbyVenues( names=newdf['Neighborhood'],
                                 latitudes=newdf['latitude'],
                                 longitudes=newdf['longitude'])


Nairobi  Karen, Langata
Nairobi  Roysambu Area, Roysambu
Nairobi  Karen, Langata
Nairobi  Roysambu, Githurai, Roysambu
Nairobi  Karen, Langata
Nairobi  Westlands Area, Westlands
Nairobi  Parklands, Westlands
Nairobi  Karen, Langata
Nairobi  Roysambu Area, Roysambu
Nairobi  Karen, Langata
Nairobi  Roysambu, Githurai, Roysambu
Nairobi  Karen, Langata
Nairobi  Westlands Area, Westlands
Nairobi  Parklands, Westlands
Nairobi  Lower Kabete, Westlands
Nairobi  Eldama Ravine Road, Parklands, Westlands
Nairobi  Embakasi West
Nairobi  Parklands, Westlands
Nairobi  Lake View, Westlands
Nairobi  Lavington, Dagoretti North
Nairobi  Utawala, Embakasi East
Nairobi  Karen, Langata
Nairobi  Lavington, Dagoretti North
Nairobi  Thika Road
Nairobi  Lavington, Dagoretti North
Nairobi  Kasarani Area, Kasarani
Nairobi  Parklands, Westlands
Nairobi  Ngong Road
Nairobi  Kiambu Road
Nairobi  Karen, Langata
Nairobi  Kiambu Road
Nairobi  Spring Valley, Westlands
Nairobi  Lavington, Dagoretti North
Nairobi  Karen,

Lets check the size of the resulting dataframe

In [62]:
Nairobi_venues.shape
#Nairobi_venues.head()

(4365, 7)

Lets count the number of venues that were returned for each neighborhood

In [39]:
Nairobi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Nairobi Bogani, Karen, Langata",5,5,5,5,5,5
"Nairobi Eldama Ravine Road, Parklands, Westlands",7,7,7,7,7,7
Nairobi Embakasi West,8,8,8,8,8,8
"Nairobi Fana Road, Karen, Langata",3,3,3,3,3,3
"Nairobi Garden Estate, Ridgeways",5,5,5,5,5,5
Nairobi Jogoo Road,8,8,8,8,8,8
"Nairobi Kahawa Sukari, Roysambu",10,10,10,10,10,10
"Nairobi Karen, Langata",352,352,352,352,352,352
"Nairobi Kasarani Area, Kasarani",2,2,2,2,2,2
"Nairobi Kawangware, Dagoretti North",24,24,24,24,24,24


Lets see how many unique categories can be curated from all the returned venues.

In [65]:
print('There are {} unique categories.'.format(len(Nairobi_venues['Venue Category'].unique())))

There are 108 unique categories.


## 5. Analyzing each Neighborhood

In [66]:
# one hot encoding
Nairobi_onehot = pd.get_dummies(Nairobi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Nairobi_onehot['Neighborhood'] = Nairobi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Nairobi_onehot.columns[-1]] + list(Nairobi_onehot.columns[:-1])
Nairobi_onehot = Nairobi_onehot[fixed_columns]

Nairobi_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Auto Garage,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Garden,...,Supermarket,Tapas Restaurant,Tea Room,Trail,Video Game Store,Video Store,Wine Bar,Women's Store,Zoo,Zoo Exhibit
0,"Nairobi Karen, Langata",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Nairobi Karen, Langata",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Nairobi Karen, Langata",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Nairobi Karen, Langata",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Nairobi Roysambu Area, Roysambu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Lets check the size of the new dataframe

In [67]:
Nairobi_onehot.shape

(4365, 109)

Grouping rows by neighborhoods and take the mean of the frequency of occurrence of each category.

In [68]:
Nairobi_grouped = Nairobi_onehot.groupby('Neighborhood').mean().reset_index()
Nairobi_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Auto Garage,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Garden,...,Supermarket,Tapas Restaurant,Tea Room,Trail,Video Game Store,Video Store,Wine Bar,Women's Store,Zoo,Zoo Exhibit
0,"Nairobi Bogani, Karen, Langata",0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Nairobi Eldama Ravine Road, Parklands, Westlands",0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Nairobi Embakasi West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Nairobi Fana Road, Karen, Langata",0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Nairobi Garden Estate, Ridgeways",0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Our new dataframe size is: 

In [69]:
Nairobi_grouped.shape

(46, 109)

Lets print each neighborhood along with the top 5 most common venues

In [70]:
num_top_venues = 5

for hood in Nairobi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Nairobi_grouped[Nairobi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Nairobi  Bogani, Karen, Langata----
                  venue  freq
0  Fast Food Restaurant   0.4
1    African Restaurant   0.2
2            Campground   0.2
3                  Café   0.2
4              Pharmacy   0.0


----Nairobi  Eldama Ravine Road, Parklands, Westlands----
                  venue  freq
0                 Hotel  0.14
1  Gym / Fitness Center  0.14
2    Seafood Restaurant  0.14
3   Japanese Restaurant  0.14
4          Burger Joint  0.14


----Nairobi  Embakasi West----
                  venue  freq
0                Lounge  0.50
1     Convenience Store  0.25
2  Fast Food Restaurant  0.25
3    African Restaurant  0.00
4         Movie Theater  0.00


----Nairobi  Fana Road, Karen, Langata----
                venue  freq
0   Recreation Center  0.33
1                 Bar  0.33
2     Bed & Breakfast  0.33
3  African Restaurant  0.00
4   Mobile Phone Shop  0.00


----Nairobi  Garden Estate, Ridgeways----
              venue  freq
0  Department Store   0.2
1               Ba

                   venue  freq
0       Department Store   1.0
1      Mobile Phone Shop   0.0
2  Outdoors & Recreation   0.0
3           Optical Shop   0.0
4           Noodle House   0.0


----Nairobi  Waithaka, Dagoretti South----
                venue  freq
0         Flea Market   0.5
1      Breakfast Spot   0.5
2  African Restaurant   0.0
3   Mobile Phone Shop   0.0
4        Optical Shop   0.0


----Nairobi  Waiyaki Way, Westlands----
         venue  freq
0    Nightclub  0.25
1         Café  0.12
2        Hotel  0.12
3  Coffee Shop  0.06
4  Gaming Cafe  0.06


----Nairobi  Westlands Area, Westlands----
               venue  freq
0  Indian Restaurant  0.08
1      Shopping Mall  0.08
2  Electronics Store  0.08
3       Noodle House  0.08
4        Coffee Shop  0.08




Creating a dataframe to contain the information above with top 10 venues for each neighborhood.

In [71]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [72]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Nairobi_grouped['Neighborhood']

for ind in np.arange(Nairobi_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Nairobi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Nairobi Bogani, Karen, Langata",Fast Food Restaurant,African Restaurant,Campground,Café,Department Store,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant
1,"Nairobi Eldama Ravine Road, Parklands, Westlands",Gym / Fitness Center,BBQ Joint,Burger Joint,Japanese Restaurant,Hotel,Seafood Restaurant,Auto Garage,Gastropub,Electronics Store,Empanada Restaurant
2,Nairobi Embakasi West,Lounge,Convenience Store,Fast Food Restaurant,French Restaurant,Diner,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant
3,"Nairobi Fana Road, Karen, Langata",Bar,Bed & Breakfast,Recreation Center,Zoo Exhibit,French Restaurant,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant
4,"Nairobi Garden Estate, Ridgeways",Department Store,Garden Center,Bar,Garden,Scenic Lookout,Food Court,Diner,Donut Shop,Electronics Store,Empanada Restaurant


#### Clustering the Neighborhoods

We'll run k-means to cluster the neighborhoods into 6 clusters

In [73]:
# set number of clusters
kclusters = 6

Nairobi_grouped_clustering = Nairobi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Nairobi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 0, 1, 3, 1])

Lets create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [74]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Nairobi_merged = newdf

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Nairobi_merged = Nairobi_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Nairobi_merged.head() # check the last columns!

Unnamed: 0,Property Type,Neighborhood,Size,point,Price,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000,-1.219608,36.891486,1.0,Hookah Bar,Shopping Mall,Supermarket,Pizza Place,Lounge,Café,Restaurant,Coffee Shop,BBQ Joint,Gym / Fitness Center
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000,-1.218227,36.89256,1.0,Shopping Mall,Coffee Shop,Fast Food Restaurant,Moving Target,Pizza Place,Café,Sausage Shop,Gym / Fitness Center,BBQ Joint,Donut Shop
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant


In [75]:
Nairobi_mergedx = Nairobi_merged.dropna(axis=0)
Nairobi_mergedx.head()

Unnamed: 0,Property Type,Neighborhood,Size,point,Price,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000,-1.219608,36.891486,1.0,Hookah Bar,Shopping Mall,Supermarket,Pizza Place,Lounge,Café,Restaurant,Coffee Shop,BBQ Joint,Gym / Fitness Center
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000,-1.218227,36.89256,1.0,Shopping Mall,Coffee Shop,Fast Food Restaurant,Moving Target,Pizza Place,Café,Sausage Shop,Gym / Fitness Center,BBQ Joint,Donut Shop
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1.0,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant


In [76]:
address = 'Nairobi City, Kenya'

geolocator = Nominatim(user_agent = "tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Nairobi City are {},{}.'.format(latitude, longitude))

The geographical coordinate of Nairobi City are -1.2832533,36.8172449.


In [77]:
Nairobi_mergedx['Cluster Labels'] = Nairobi_mergedx['Cluster Labels'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [78]:
Nairobi_mergedx.head()

Unnamed: 0,Property Type,Neighborhood,Size,point,Price,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Land,"Nairobi Karen, Langata","2,024²","(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
1,Land,"Nairobi Roysambu Area, Roysambu","21,044²","(-1.21960835, 36.8914864399621, 0.0)",650000000,-1.219608,36.891486,1,Hookah Bar,Shopping Mall,Supermarket,Pizza Place,Lounge,Café,Restaurant,Coffee Shop,BBQ Joint,Gym / Fitness Center
2,Residential Land,"Nairobi Karen, Langata","7,285²","(-1.3333492, 36.7503422, 0.0)",85000000,-1.333349,36.750342,1,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
3,Land,"Nairobi Roysambu, Githurai, Roysambu",0²,"(-1.2182267, 36.8925596, 0.0)",70000000,-1.218227,36.89256,1,Shopping Mall,Coffee Shop,Fast Food Restaurant,Moving Target,Pizza Place,Café,Sausage Shop,Gym / Fitness Center,BBQ Joint,Donut Shop
4,Residential Land,"Nairobi Karen, Langata",2²,"(-1.3333492, 36.7503422, 0.0)",32000000,-1.333349,36.750342,1,Nature Preserve,Playground,Music Venue,Boarding House,Zoo Exhibit,Food Court,Donut Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant


Visualizing the clusters with Folium

In [97]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Nairobi_mergedx['latitude'], Nairobi_mergedx['longitude'], Nairobi_mergedx['Neighborhood'], Nairobi_mergedx['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 6. Visualizing the Property locations on a  Choropleth Map

For this section;

    - I'll import the shapefiles of districts of Nairobi Borough and the csv with data from the research on real estate investment.
    - Create a Choropleth map and then superimpose the properties on sale on top of it.
    - Join the shapefile and the return on investment data into one dataframe.

In [94]:
# Importing shapefile
shapefile = gpd.read_file('~geospartial data/nairobi_shapefile/nairobi_shapefile.shp')

# Importing csv 

data_cytonn = os.path.expanduser('~IBM Data Science Course/Capstone Project/cytonn_data.csv')
mydata = pd.read_csv(data_cytonn)


In [90]:
shapefile.head()

Unnamed: 0,ID_0,ISO,NAME_0,ID_1,NAME_1,ID_2,NAME_2,ID_3,NAME_3,ID_4,NAME_4,ID_5,NAME_5,TYPE_5,ENGTYPE_5,geometry
0,118,KEN,Kenya,4,Nairobi,22,Nairobi,104,Central,441,Ngara,1592,Ngara East,Kata Ndogo,Sub location,"POLYGON ((36.82690 -1.28279, 36.82653 -1.28380..."
1,118,KEN,Kenya,4,Nairobi,22,Nairobi,104,Central,441,Ngara,1593,Ngara West,Kata Ndogo,Sub location,"POLYGON ((36.81301 -1.28415, 36.81030 -1.28244..."
2,118,KEN,Kenya,4,Nairobi,22,Nairobi,104,Central,442,Starehe,1594,City Square,Kata Ndogo,Sub location,"POLYGON ((36.82626 -1.29233, 36.82508 -1.29234..."
3,118,KEN,Kenya,4,Nairobi,22,Nairobi,104,Central,442,Starehe,1595,Nairobi Central,Kata Ndogo,Sub location,"POLYGON ((36.81015 -1.29522, 36.81003 -1.29448..."
4,118,KEN,Kenya,4,Nairobi,22,Nairobi,104,Central,442,Starehe,1596,Pangani,Kata Ndogo,Sub location,"POLYGON ((36.83683 -1.27386, 36.83727 -1.27640..."


In [91]:
mydata.head()

Unnamed: 0,Location,Class,Type,Total Return,PercentageTotalReturn
0,Lower Kabete,High-End,Detached Units,0.09,9.2
1,Karura,High-End,Detached Units,0.09,8.7
2,Roselyn,High-End,Detached Units,0.08,7.7
3,Kitisuru,High-End,Detached Units,0.06,6.0
4,Karen,High-End,Detached Units,0.06,5.7


#### Merge the shapefile and the  return on investment  dataframes into one file

In [95]:
merged = shapefile.merge(mydata, left_on = 'NAME_5', right_on = 'Location')

#### Choropleth Map

The Choropleth map shows various districts with the corresponding return on investment percentage on the legend

In [98]:
mapp = folium.Map(location=[-1.28333,36.81667], zoom_start = 11)

mapp.choropleth(
    geo_data= merged,
    name= 'Choropleth',
    data= mydata,
    columns= ['Location','PercentageTotalReturn'],
    key_on= 'feature.properties.NAME_5',
    fill_color= 'YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name= 'PercentageTotalReturns'
)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

for lat, lon, poi,price, cluster in zip(Nairobi_mergedx['latitude'], Nairobi_mergedx['longitude'], Nairobi_mergedx['Neighborhood'], Nairobi_mergedx['Price'], Nairobi_mergedx['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + str(price), parse_html=True)
#for lat, lon in zip(newdf['latitude'], newdf['longitude']):
    #label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        ).add_to(mapp)

mapp
#folium.LayerControl().add_to(mapp)



## Results

Integrating the choropleth map and the resulting dataframe from clustering not only shows the spatial distribution 
of the property locations, with a click on the points one is able to get the details of the property i.e price and location but then more to it the Choropleth helps in telling the prospects of returns if one should invest in that 
particular property.
The map gives someone a quick and concise understanding of the data and in such a way that one can do some comparisons on the fly.

## Conclusion

Just by merely looking at some given data is not sufficient to give you insights that are crucial in the decision making process, further analysis does help in revealing what more often than not is hidden from naked eyes.
Subjecting data through some analytical process is inevitable if one is get good conclusions about the data in question.