<h1 align="center">The Battle of Neighborhoods</h1>

## Background

Real estate buyers searching for a new home always face big decisions. There are lots of factors effecting where to buy, such as the trend of price growth, local amenities and other factors which might make negative effects on the property value. London, the capital and largest city of England and the United Kingdom, is one of the world's most important global cities. London's population is about 9 million which accounts for 13.4% of the U.K. population. Also, London has a diverse range of people and full of different cultures. London attracts people from all over the world. According to the Guardian, all areas of the country in the U.K. recorded growth, with London prices increasing by 7.3%.



## Problem Description

How to decide which neighborhood in London the property is worth to buy? For those who want to buy real estate in London, it’s hard to know where to start.

## Taget Audience

This project aims to make an analysis of features for buyers who want to find and purchase property in London. The buyers can target at the features including median housing price, local private rental market and loacal facilities in each borough to decide where is the best neighborhood to buy.

## Datasets

The datasets will include the following data:
1. <b> London boroughs </b>
    <br> Source: https://en.wikipedia.org/wiki/List_of_London_boroughs
    - The data will be scraped from web url.
    - Selected columns: borough, population, coordinates
<br><br>
2. <b> Private Rental Market in London: January to December 2020 </b>
    <br> Source: https://www.ons.gov.uk/peoplepopulationandcommunity/housing/adhocs/12871privaterentalmarketinlondonjanuarytodecember2020
    - The data we will use is mean price.
    - Selected columns: Borough, Bedroom Category, Mean
<br><br>
3. <b> London Average House Prices </b>
    <br> Source: https://data.london.gov.uk/dataset/average-house-prices
    - The Year we will use is 2017.
    - Selected columns: Area, Year, Measure, Value
<br><br>
4. <b> Foursquare location data </b>
    <br> Source: https://foursquare.com

In [1]:
# For manipulation and web scraping
!pip install bs4
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

# For geocoding
!pip install geopy
from geopy.geocoders import Nominatim 

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 4.8 MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2
  Downloading soupsieve-2.2.1-py3-none-any.whl (33 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=6cce6f6778e6f7d22d0dc02808007ef96cdf5ac02bb3c5060cb65d87f6abbedf
  Stored in directory: /home/jovyan/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.9.3 bs4-0.0.1 soupsieve-2.2.1
Collecting geopy
  Downloading geopy-2.1.0-py3-none-any.whl (112 kB)
[K     |████████████████████████████████| 112 kB 4.5 MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49
  Downlo

## Get London Boroughs Data

In [2]:
url_Boroughs = 'https://en.wikipedia.org/wiki/List_of_London_boroughs'
html_Boroughs = requests.get(url_Boroughs).text
soup_Boroughs = BeautifulSoup(html_Boroughs, 'html.parser')
table_Boroughs = soup_Boroughs.find('table')

In [3]:
Borough_data = pd.DataFrame(columns=["Borough", "Population", "Coordinates"])

#Get all rows from the table
for row in table_Boroughs.find_all('tr')[2:]: # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    Borough = cols[0].text.replace("[note 2]", "").replace("[note 4]", "").strip()
    Population = cols[7].text .strip()
    Coordinates = cols[8].text.strip()
    Borough_data = Borough_data.append({"Borough":Borough, "Population":Population, "Coordinates":Coordinates}, ignore_index=True)
    
Borough_data.head()

Unnamed: 0,Borough,Population,Coordinates
0,Barnet,395896,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
1,Bexley,248287,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
2,Brent,329771,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
3,Bromley,332336,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...
4,Camden,270029,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W﻿ /...


We need to transform the coordinates into Latitude and Longitude for further mapping.

In [4]:
geolocator = Nominatim(user_agent="London_explorer")
Borough_data['Coordinate']= Borough_data['Borough'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
Borough_data[['Latitude', 'Longitude']] = Borough_data['Coordinate'].apply(pd.Series)
Borough_data = Borough_data.drop(['Coordinate','Coordinates'], axis=1)
Borough_data['Population'] = Borough_data['Population'].str.replace(",", "")
Borough_data

Unnamed: 0,Borough,Population,Latitude,Longitude
0,Barnet,395896,51.65309,-0.200226
1,Bexley,248287,39.969238,-82.936864
2,Brent,329771,32.937346,-87.164718
3,Bromley,332336,51.402805,0.014814
4,Camden,270029,39.94484,-75.119891
5,Croydon,386710,51.371305,-0.101957
6,Ealing,341806,51.512655,-0.305195
7,Enfield,333794,53.430836,-2.96091
8,Greenwich,287942,51.482084,-0.004542
9,Hackney,281120,51.54324,-0.049362


It was found the Latitude and Longitude of Enfield & Tower Hamlets were not corretly corresponding after mapping, so I go back to this stage and update the correct data.

In [5]:
# Enfield 51.654827, -0.083599
# Tower Hamlets 51.5167, -0.0500.
Borough_data.loc[Borough_data['Borough'] == 'Enfield', 'Latitude'] = 51.654827
Borough_data.loc[Borough_data['Borough'] == 'Enfield', 'Longitude'] = -0.083599
Borough_data.loc[Borough_data['Borough'] == 'Tower Hamlets', 'Latitude'] = 51.5167
Borough_data.loc[Borough_data['Borough'] == 'Tower Hamlets', 'Longitude'] = -0.0500

In [6]:
Borough_data.query('Borough == "Enfield" | Borough == "Tower Hamlets"')

Unnamed: 0,Borough,Population,Latitude,Longitude
7,Enfield,333794,51.654827,-0.083599
27,Tower Hamlets,324745,51.5167,-0.05


Now the coordinates are correct.

In [7]:
Borough_data.shape

(31, 4)

## Get London Rental Market Data

In [8]:
Rent_data = pd.read_csv('LondonRent.csv')
# Borough,Bedroom Category,Mean
Rent_data.rename(columns = {'Bedroom Category':'Category'}, inplace = True)
Rent_data.head(10)

Unnamed: 0,Borough,Category,Count of rents,Mean,Lower quartile,Median,Upper quartile
0,Barking and Dagenham,Room,20.0,859.0,542.0,692.0,792.0
1,Barking and Dagenham,Studio,10.0,792.0,750.0,800.0,850.0
2,Barking and Dagenham,One Bedroom,150.0,990.0,900.0,1000.0,1050.0
3,Barking and Dagenham,Two Bedrooms,220.0,1220.0,1150.0,1200.0,1300.0
4,Barking and Dagenham,Three Bedrooms,150.0,1438.0,1300.0,1400.0,1550.0
5,Barking and Dagenham,Four or More Bedrooms,30.0,1695.0,1600.0,1700.0,1850.0
6,Barnet,Studio,60.0,916.0,823.0,900.0,1000.0
7,Barnet,One Bedroom,380.0,1162.0,1050.0,1150.0,1250.0
8,Barnet,Two Bedrooms,770.0,1404.0,1300.0,1400.0,1500.0
9,Barnet,Three Bedrooms,260.0,1797.0,1603.0,1798.0,1950.0


We will only need the data of mean rent price, so drop the columns we don't need and get a summary.

In [9]:
del Rent_data['Count of rents']
Rent_data.drop(Rent_data.loc[:, 'Lower quartile':], inplace = True, axis = 1)
Rent_data = pd.pivot_table(Rent_data, values = 'Mean', index='Borough', columns = 'Category').reset_index()
Rent_data.head()

Category,Borough,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms
0,Barking and Dagenham,1695.0,990.0,859.0,792.0,1438.0,1220.0
1,Barnet,2529.0,1162.0,,916.0,1797.0,1404.0
2,Bexley,1758.0,853.0,495.0,698.0,1294.0,1112.0
3,Brent,2279.0,1176.0,621.0,904.0,1850.0,1452.0
4,Bromley,2321.0,1001.0,549.0,775.0,1589.0,1262.0


In [10]:
Rent_data.shape

(33, 7)

## Get London Average House Prices

In [11]:
House_data = pd.read_csv('land-registry-house-prices-borough.csv')
del House_data['Code']
House_data.rename(columns = {'Area':'Borough'}, inplace = True)
House_data.head()

Unnamed: 0,Borough,Year,Measure,Value
0,City of London,Year ending Dec 1995,Median,105000
1,Barking and Dagenham,Year ending Dec 1995,Median,49000
2,Barnet,Year ending Dec 1995,Median,85125
3,Bexley,Year ending Dec 1995,Median,62000
4,Brent,Year ending Dec 1995,Median,68000


We need to remove those marks and years we don't need. Make it a clean data. The year will be used is the latest data in 2017 and the measure we will use the mean measure.

In [12]:
House_data['Value'] = House_data['Value'].str.replace(",", "")
House_data = House_data.query('Year == "Year ending Dec 2017" & Measure == "Mean"').reset_index(drop=True)
House_data['Year'] = House_data['Year'].str[-4:]
House_data.rename(columns = {'Value':'2017_MeanPrice'}, inplace = True)
House_data.drop(House_data.loc[:, 'Year':'Measure'], inplace = True, axis = 1)
House_data.head()

Unnamed: 0,Borough,2017_MeanPrice
0,City of London,950760
1,Barking and Dagenham,301518
2,Barnet,667593
3,Bexley,357779
4,Brent,578705


In [13]:
House_data.shape

(45, 2)

## Merge Data

Hear we are going to produce three datasets
- data_merged_RENT: a full data set with Population, Latitude, Longitude and rent price.
- data_merged_SALE: a full data set with Population, Latitude, Longitude and sale price.
- data_merged_BOTH: a full data set with Population, Latitude, Longitude and both rent & sale price.

In [52]:
data_merged_RENT = Borough_data.merge(Rent_data, on='Borough')
data_merged_RENT.head()

Unnamed: 0,Borough,Population,Latitude,Longitude,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms
0,Barnet,395896,51.65309,-0.200226,2529.0,1162.0,,916.0,1797.0,1404.0
1,Bexley,248287,39.969238,-82.936864,1758.0,853.0,495.0,698.0,1294.0,1112.0
2,Brent,329771,32.937346,-87.164718,2279.0,1176.0,621.0,904.0,1850.0,1452.0
3,Bromley,332336,51.402805,0.014814,2321.0,1001.0,549.0,775.0,1589.0,1262.0
4,Camden,270029,39.94484,-75.119891,3796.0,1639.0,795.0,1072.0,2877.0,2264.0


In [53]:
data_merged_RENT.shape

(31, 10)

In [30]:
data_merged_SALE = Borough_data.merge(House_data, on='Borough')
data_merged_SALE.head()

Unnamed: 0,Borough,Population,Latitude,Longitude,2017_MeanPrice
0,Barnet,395896,51.65309,-0.200226,667593
1,Bexley,248287,39.969238,-82.936864,357779
2,Brent,329771,32.937346,-87.164718,578705
3,Bromley,332336,51.402805,0.014814,502623
4,Camden,270029,39.94484,-75.119891,1099876


In [17]:
data_merged_SALE.shape

(31, 5)

In [15]:
data_merged_BOTH = Borough_data.merge(House_data, on='Borough').merge(Rent_data, on='Borough')
data_merged_BOTH.head()

Unnamed: 0,Borough,Population,Latitude,Longitude,2017_MeanPrice,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms
0,Barnet,395896,51.65309,-0.200226,667593,2529.0,1162.0,,916.0,1797.0,1404.0
1,Bexley,248287,39.969238,-82.936864,357779,1758.0,853.0,495.0,698.0,1294.0,1112.0
2,Brent,329771,32.937346,-87.164718,578705,2279.0,1176.0,621.0,904.0,1850.0,1452.0
3,Bromley,332336,51.402805,0.014814,502623,2321.0,1001.0,549.0,775.0,1589.0,1262.0
4,Camden,270029,39.94484,-75.119891,1099876,3796.0,1639.0,795.0,1072.0,2877.0,2264.0


## Let's see the London map first

Install the library for mapping

In [16]:
! pip install folium==0.5.0
import folium # plotting library

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 3.2 MB/s eta 0:00:011
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=978be9fc0fa3da1b6cf05be815364431c9840442f8c533d2de37e8975cd479ab
  Stored in directory: /home/jovyan/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: folium
Successfully installed folium-0.5.0


In [17]:
address = 'London'
geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [18]:
# create map of Toronto using latitude and longitude values
map_London = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, population in zip(Borough_data['Latitude'], Borough_data['Longitude'], Borough_data['Borough'],Borough_data['Population']):
    label = 'Borough: {}, Population: {}'.format(borough, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 8,
        popup=label,
        color='#2F97C1',
        fill=True,
        fill_color='#2F97C1',
        fill_opacity=0.5,
        parse_html=False).add_to(map_London)  
    
map_London

## Get the Foursquare Location Data

Retreive Foursquare information.

In [19]:
CLIENT_ID = 'IEWFAFLUUSE0LTN5LUQOYARSU13XUZXNGLHL3LPZJTUUOJXG' # your Foursquare ID
CLIENT_SECRET = 'AWFX1LKFB2P030S134BWSAXOUHQSYCZAEBRJ2C1RZV15NPT0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get London venues

In [21]:
# type your answer here
London_venues = getNearbyVenues(
                                    names=Borough_data['Borough'],
                                    latitudes=Borough_data['Latitude'],
                                    longitudes=Borough_data['Longitude']
                                  )


Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [22]:
print(London_venues.shape)
London_venues.head()

(1180, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barnet,51.65309,-0.200226,Ye Old Mitre Inne,51.65294,-0.199507,Pub
1,Barnet,51.65309,-0.200226,Caffè Nero,51.654861,-0.201743,Coffee Shop
2,Barnet,51.65309,-0.200226,The Black Horse,51.653075,-0.206719,Pub
3,Barnet,51.65309,-0.200226,Waterstones,51.655368,-0.202607,Bookstore
4,Barnet,51.65309,-0.200226,Domino's Pizza,51.652675,-0.198837,Pizza Place


We can explore in which borough having the most of the venues.
<br> Let's see the top 5 borough.

In [23]:
London_venues_count = London_venues[['Borough','Venue']].groupby('Borough', as_index=False).count()
#London_venues_count.sort_values('Venue', ascending=False).head()
London_venues_count = London_venues_count.sort_values('Venue', ascending=False).head()
#",".join(London_venues_count['Borough'].unique())
print('Top 5 boroughs with the most venues are {}.'.format(", ".join(London_venues_count['Borough'].unique())))

Top 5 boroughs with the most venues are Southwark, Kingston upon Thames, Ealing, Hammersmith and Fulham, Islington.


In [24]:
# one hot encoding
venues_onehot = pd.get_dummies(London_venues[['Venue Category']], prefix="", prefix_sep="")

# add Borough column back to dataframe
venues_onehot['Borough'] = London_venues['Borough'] 

# move Borough column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barnet,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
london_groupedVenues = venues_onehot.groupby('Borough').mean().reset_index()
london_groupedVenues.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [28]:
london_groupedVenues.shape

(31, 213)

Now we have a dataset [london_groupedVenues] wtih vanues in each London borough. Then we will use this data to compare rent price, sale price for further clustering.

## Lond Venue Exploration

What are the top 5 venues/facilities nearby profitable real estate investments?

In [32]:
num_top_venues = 5

for hood in london_grouped['Borough']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barnet----
                  venue  freq
0           Coffee Shop  0.12
1                   Pub  0.09
2     Convenience Store  0.06
3  Fast Food Restaurant  0.06
4              Pharmacy  0.06


----Bexley----
                 venue  freq
0                 Park   1.0
1    Afghan Restaurant   0.0
2      Organic Grocery   0.0
3  Monument / Landmark   0.0
4        Movie Theater   0.0


----Brent----
               venue  freq
0   Business Service   0.5
1  Convenience Store   0.5
2  Afghan Restaurant   0.0
3  Outdoor Sculpture   0.0
4      Movie Theater   0.0


----Bromley----
                   venue  freq
0         Clothing Store  0.14
1            Coffee Shop  0.11
2   Gym / Fitness Center  0.07
3  Portuguese Restaurant  0.05
4                    Pub  0.05


----Camden----
                venue  freq
0            Pharmacy  0.09
1         Pizza Place  0.09
2  Chinese Restaurant  0.09
3          Restaurant  0.09
4         Coffee Shop  0.05


----Croydon----
                   venue  fre

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
borough_venues_sorted = pd.DataFrame(columns=columns)
borough_venues_sorted['Borough'] = london_groupedVenues['Borough']

for ind in np.arange(london_grouped.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_groupedVenues.iloc[ind, :], num_top_venues)

borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Coffee Shop,Pub,Convenience Store,Fast Food Restaurant,Pharmacy,Restaurant,Park,Pizza Place,Modern European Restaurant,Sandwich Place
1,Bexley,Park,Afghan Restaurant,Organic Grocery,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent
2,Brent,Business Service,Convenience Store,Afghan Restaurant,Outdoor Sculpture,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent
3,Bromley,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pub,Burger Joint,Pizza Place,Gelato Shop,Sushi Restaurant,Bookstore
4,Camden,Pharmacy,Pizza Place,Chinese Restaurant,Restaurant,Coffee Shop,Park,Toll Plaza,Fried Chicken Joint,Café,Bar


### A. Venues & Sales Price

In [39]:
london_SALE_Cluster = london_groupedVenues.merge(data_merged_SALE, on='Borough')
london_SALE_Cluster.drop(london_SALE_Cluster.loc[:, 'Latitude':'Longitude'], inplace = True, axis = 1)
london_SALE_Cluster.head()


Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Population,2017_MeanPrice
0,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,395896,667593
1,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,248287,357779
2,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,329771,578705
3,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,332336,502623
4,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,270029,1099876


In [40]:
!pip install -U scikit-learn
#!pip install bs4

Collecting scikit-learn
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
[K     |████████████████████████████████| 22.3 MB 4.2 MB/s eta 0:00:01
[?25hCollecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Installing collected packages: threadpoolctl, scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.21.3
    Uninstalling scikit-learn-0.21.3:
      Successfully uninstalled scikit-learn-0.21.3
Successfully installed scikit-learn-0.24.2 threadpoolctl-2.1.0


In [41]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

## Clustering with House Price

In [42]:
# set number of clusters
kclusters = 5

london_SALE_Clustering = london_SALE_Cluster.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_SALE_Clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50] 

array([1, 4, 1, 4, 2, 4, 1, 4, 4, 1, 2, 1, 1, 4, 4, 4, 2, 0, 1, 1, 4, 1,
       4, 4, 2, 1, 4, 1, 4, 2, 3], dtype=int32)

In [43]:
borough_venues_sorted_SALE = borough_venues_sorted.merge(data_merged_SALE, on='Borough')

# add clustering labels
borough_venues_sorted_SALE['Cluster Labels'] = kmeans.labels_

borough_venues_sorted_SALE.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
0,Barnet,Coffee Shop,Pub,Convenience Store,Fast Food Restaurant,Pharmacy,Restaurant,Park,Pizza Place,Modern European Restaurant,Sandwich Place,395896,51.65309,-0.200226,667593,1
1,Bexley,Park,Afghan Restaurant,Organic Grocery,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent,248287,39.969238,-82.936864,357779,4
2,Brent,Business Service,Convenience Store,Afghan Restaurant,Outdoor Sculpture,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent,329771,32.937346,-87.164718,578705,1
3,Bromley,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pub,Burger Joint,Pizza Place,Gelato Shop,Sushi Restaurant,Bookstore,332336,51.402805,0.014814,502623,4
4,Camden,Pharmacy,Pizza Place,Chinese Restaurant,Restaurant,Coffee Shop,Park,Toll Plaza,Fried Chicken Joint,Café,Bar,270029,39.94484,-75.119891,1099876,2


In [44]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [79]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)
# add title
loc = 'Cluster with Sale Price'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc) 

    

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, price in zip(borough_venues_sorted_SALE['Latitude'], borough_venues_sorted_SALE['Longitude'], borough_venues_sorted_SALE['Borough'], borough_venues_sorted_SALE['Cluster Labels'], borough_venues_sorted_SALE['2017_MeanPrice']):
    label = folium.Popup(str(poi) + ' Cluster: ' + str(cluster)  + ' Mean House Price: ' + str(price), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color= rainbow[cluster-1],
        fill=True,
        fill_color= rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters.get_root().html.add_child(folium.Element(title_html))

map_clusters

In [46]:
borough_venues_sorted_SALE.loc[borough_venues_sorted_SALE['Cluster Labels'] == 0, borough_venues_sorted_SALE.columns[[1] + list(range(5, borough_venues_sorted_SALE.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
17,Bakery,Burger Joint,French Restaurant,Park,English Restaurant,Pizza Place,Plaza,156129,51.487517,-0.168701,2092485,0


In [47]:
borough_venues_sorted_SALE.loc[borough_venues_sorted_SALE['Cluster Labels'] == 1, borough_venues_sorted_SALE.columns[[1] + list(range(5, borough_venues_sorted_SALE.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
0,Coffee Shop,Pharmacy,Restaurant,Park,Pizza Place,Modern European Restaurant,Sandwich Place,395896,51.65309,-0.200226,667593,1
2,Business Service,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent,329771,32.937346,-87.164718,578705,1
6,Coffee Shop,Platform,Bus Stop,Grocery Store,Bakery,Italian Restaurant,Pizza Place,341806,51.512655,-0.305195,578110,1
9,Pub,Brewery,Garden,Movie Theater,Sporting Goods Shop,Butcher,Yoga Studio,281120,51.54324,-0.049362,614955,1
11,Café,Convenience Store,Bus Stop,Fast Food Restaurant,Bulgarian Restaurant,Sandwich Place,Middle Eastern Restaurant,268647,51.58793,-0.10541,683987,1
12,Afghan Restaurant,Kitchen Supply Store,Indian Restaurant,Pizza Place,Optical Shop,Movie Theater,Playground,251160,51.596827,-0.337305,527206,1
18,Coffee Shop,Italian Restaurant,Sandwich Place,Department Store,Hotel,Ice Cream Shop,Stationery Store,177507,51.409627,-0.306262,573938,1
19,Coffee Shop,Korean Restaurant,Beer Bar,Park,Event Space,Sandwich Place,Bakery,326034,51.501301,-0.117287,616126,1
21,Tram Station,Farm,Thai Restaurant,Flea Market,Park,Hardware Store,Sushi Restaurant,206548,51.41087,-0.188097,638519,1
25,Hotel,Bar,Café,Burger Joint,Bakery,Chinese Restaurant,Ramen Restaurant,318830,51.502922,-0.103458,641210,1


In [48]:
borough_venues_sorted_SALE.loc[borough_venues_sorted_SALE['Cluster Labels'] == 2, borough_venues_sorted_SALE.columns[[1] + list(range(5, borough_venues_sorted_SALE.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
4,Pharmacy,Coffee Shop,Park,Toll Plaza,Fried Chicken Joint,Café,Bar,270029,39.94484,-75.119891,1099876,2
10,Pub,Grocery Store,Pizza Place,Pharmacy,Gym / Fitness Center,Sandwich Place,Burger Joint,185143,51.492038,-0.22364,972231,2
16,Pub,Bakery,French Restaurant,Gastropub,Ice Cream Shop,Breakfast Spot,Bookstore,242467,51.538429,-0.099905,778290,2
24,Bus Station,Afghan Restaurant,Modern European Restaurant,Movie Theater,Multiplex,Museum,Music Venue,198019,51.440553,-0.307639,819044,2
29,Pub,Asian Restaurant,Clothing Store,Burger Joint,Stationery Store,Café,Sporting Goods Shop,329677,51.457027,-0.193261,818443,2


In [49]:
borough_venues_sorted_SALE.loc[borough_venues_sorted_SALE['Cluster Labels'] == 3, borough_venues_sorted_SALE.columns[[1] + list(range(5, borough_venues_sorted_SALE.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
30,Pub,Sandwich Place,Outdoor Sculpture,Café,Monument / Landmark,Hotel,Garden,261317,51.500444,-0.12654,1718124,3


In [50]:
borough_venues_sorted_SALE.loc[borough_venues_sorted_SALE['Cluster Labels'] == 4, borough_venues_sorted_SALE.columns[[1] + list(range(5, borough_venues_sorted_SALE.shape[1]))]]

Unnamed: 0,1st Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Population,Latitude,Longitude,2017_MeanPrice,Cluster Labels
1,Park,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent,248287,39.969238,-82.936864,357779,4
3,Clothing Store,Pub,Burger Joint,Pizza Place,Gelato Shop,Sushi Restaurant,Bookstore,332336,51.402805,0.014814,502623,4
5,Pub,Malay Restaurant,Clothing Store,Tram Station,Caribbean Restaurant,Korean Restaurant,Gaming Cafe,386710,51.371305,-0.101957,399645,4
7,Pub,Optical Shop,Pharmacy,Department Store,Video Game Store,Newsagent,Shopping Mall,333794,51.654827,-0.083599,463806,4
8,Pub,Pizza Place,Garden,Grocery Store,Burger Joint,History Museum,Market,287942,51.482084,-0.004542,462820,4
13,Café,Supermarket,Market,Grocery Store,Flea Market,Cocktail Bar,Salon / Barbershop,259552,51.544095,-0.144329,387535,4
14,Pub,Organic Grocery,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,306870,51.542519,-0.448335,452272,4
15,Coffee Shop,Indian Restaurant,Hotel,Chinese Restaurant,Sandwich Place,Pharmacy,Bakery,271523,51.468613,-0.361347,507876,4
20,Clothing Store,Fast Food Restaurant,Restaurant,Sporting Goods Shop,Grocery Store,Gym,Café,305842,51.462432,-0.010133,475142,4
22,Pub,Outdoor Sculpture,Multiplex,Museum,Music Venue,New American Restaurant,Newsagent,353134,51.53,0.029318,409477,4


### B. Venues & Rent Price

In [55]:
london_RENT_Cluster = london_groupedVenues.merge(data_merged_RENT, on='Borough')
london_RENT_Cluster.drop(london_RENT_Cluster.loc[:, 'Latitude':'Longitude'], inplace = True, axis = 1)
london_RENT_Cluster.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Wings Joint,Women's Store,Yoga Studio,Population,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms
0,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,395896,2529.0,1162.0,,916.0,1797.0,1404.0
1,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,248287,1758.0,853.0,495.0,698.0,1294.0,1112.0
2,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,329771,2279.0,1176.0,621.0,904.0,1850.0,1452.0
3,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,332336,2321.0,1001.0,549.0,775.0,1589.0,1262.0
4,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,270029,3796.0,1639.0,795.0,1072.0,2877.0,2264.0


### Cluster with Rent price

In [66]:
# set number of clusters
kclusters = 5

london_RENT_Clustering = london_RENT_Cluster.drop('Borough', 1).fillna(0)

# run k-means clustering
kmeansR = KMeans(n_clusters=kclusters, random_state=0).fit(london_RENT_Clustering)

# check cluster labels generated for each row in the dataframe
kmeansR.labels_[0:50] 


array([4, 1, 0, 0, 1, 4, 0, 0, 3, 3, 2, 1, 1, 1, 3, 1, 1, 2, 2, 0, 3, 2,
       0, 3, 2, 0, 2, 0, 1, 0, 1], dtype=int32)

In [67]:
borough_venues_sorted_RENT = borough_venues_sorted.merge(data_merged_RENT, on='Borough')

# add clustering labels
borough_venues_sorted_RENT['Cluster Labels'] = kmeansR.labels_

borough_venues_sorted_RENT.head()


Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,Population,Latitude,Longitude,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms,Cluster Labels
0,Barnet,Coffee Shop,Pub,Convenience Store,Fast Food Restaurant,Pharmacy,Restaurant,Park,Pizza Place,Modern European Restaurant,...,395896,51.65309,-0.200226,2529.0,1162.0,,916.0,1797.0,1404.0,4
1,Bexley,Park,Afghan Restaurant,Organic Grocery,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,...,248287,39.969238,-82.936864,1758.0,853.0,495.0,698.0,1294.0,1112.0,1
2,Brent,Business Service,Convenience Store,Afghan Restaurant,Outdoor Sculpture,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,...,329771,32.937346,-87.164718,2279.0,1176.0,621.0,904.0,1850.0,1452.0,0
3,Bromley,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pub,Burger Joint,Pizza Place,Gelato Shop,Sushi Restaurant,...,332336,51.402805,0.014814,2321.0,1001.0,549.0,775.0,1589.0,1262.0,0
4,Camden,Pharmacy,Pizza Place,Chinese Restaurant,Restaurant,Coffee Shop,Park,Toll Plaza,Fried Chicken Joint,Café,...,270029,39.94484,-75.119891,3796.0,1639.0,795.0,1072.0,2877.0,2264.0,1


In [74]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)
# add title
loc = 'Cluster with Rent'
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(borough_venues_sorted_RENT['Latitude'], borough_venues_sorted_RENT['Longitude'], borough_venues_sorted_SALE['Borough'], borough_venues_sorted_RENT['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster: ' + str(cluster) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color= rainbow[cluster-1],
        fill=True,
        fill_color= rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters.get_root().html.add_child(folium.Element(title_html))
map_clusters

### C. Venues with Sales & Rent Price

In [62]:
london_BOTH_Cluster = london_groupedVenues.merge(data_merged_BOTH, on='Borough')
london_BOTH_Cluster.drop(london_BOTH_Cluster.loc[:, 'Latitude':'Longitude'], inplace = True, axis = 1)
london_BOTH_Cluster.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Women's Store,Yoga Studio,Population,2017_MeanPrice,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms
0,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,395896,667593,2529.0,1162.0,,916.0,1797.0,1404.0
1,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,248287,357779,1758.0,853.0,495.0,698.0,1294.0,1112.0
2,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,329771,578705,2279.0,1176.0,621.0,904.0,1850.0,1452.0
3,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,332336,502623,2321.0,1001.0,549.0,775.0,1589.0,1262.0
4,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,270029,1099876,3796.0,1639.0,795.0,1072.0,2877.0,2264.0


### Cluster with Sales & Rent price

In [69]:
# set number of clusters
kclusters = 5

london_BOTH_Clustering = london_BOTH_Cluster.drop('Borough', 1).fillna(0)

# run k-means clustering
kmeansB = KMeans(n_clusters=kclusters, random_state=0).fit(london_BOTH_Clustering)

# check cluster labels generated for each row in the dataframe
kmeansB.labels_[0:50] 


array([1, 4, 1, 4, 2, 4, 1, 4, 4, 1, 2, 1, 1, 4, 4, 4, 2, 0, 1, 1, 4, 1,
       4, 4, 2, 1, 4, 1, 4, 2, 3], dtype=int32)

In [70]:
borough_venues_sorted_BOTH = borough_venues_sorted.merge(data_merged_BOTH, on='Borough')

# add clustering labels
borough_venues_sorted_BOTH['Cluster Labels'] = kmeansB.labels_

borough_venues_sorted_BOTH.head()


Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,Latitude,Longitude,2017_MeanPrice,Four or More Bedrooms,One Bedroom,Room,Studio,Three Bedrooms,Two Bedrooms,Cluster Labels
0,Barnet,Coffee Shop,Pub,Convenience Store,Fast Food Restaurant,Pharmacy,Restaurant,Park,Pizza Place,Modern European Restaurant,...,51.65309,-0.200226,667593,2529.0,1162.0,,916.0,1797.0,1404.0,1
1,Bexley,Park,Afghan Restaurant,Organic Grocery,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,...,39.969238,-82.936864,357779,1758.0,853.0,495.0,698.0,1294.0,1112.0,4
2,Brent,Business Service,Convenience Store,Afghan Restaurant,Outdoor Sculpture,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,...,32.937346,-87.164718,578705,2279.0,1176.0,621.0,904.0,1850.0,1452.0,1
3,Bromley,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pub,Burger Joint,Pizza Place,Gelato Shop,Sushi Restaurant,...,51.402805,0.014814,502623,2321.0,1001.0,549.0,775.0,1589.0,1262.0,4
4,Camden,Pharmacy,Pizza Place,Chinese Restaurant,Restaurant,Coffee Shop,Park,Toll Plaza,Fried Chicken Joint,Café,...,39.94484,-75.119891,1099876,3796.0,1639.0,795.0,1072.0,2877.0,2264.0,2


In [73]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)
# add title
loc = 'Cluster with Rent & Sales '
title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(borough_venues_sorted_BOTH['Latitude'], borough_venues_sorted_BOTH['Longitude'], borough_venues_sorted_BOTH['Borough'], borough_venues_sorted_BOTH['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster: ' + str(cluster) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color= rainbow[int(cluster)-1],
        color= rainbow[cluster-1],
        fill=True,
        #fill_color= rainbow[int(cluster)-1],
        fill_color= rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters.get_root().html.add_child(folium.Element(title_html))
map_clusters