# Capstone Project - The Battle of Neighborhoods (Week 1)
### by Rishi Kumar Nursimloo

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis and Results](#analysis)
* [Discussion](#results)
* [Conclusion](#conclusion)

## 1. Introduction: Business Problem
#### A description of the problem and a discussion of the background

A client is looking to start up his own start-up business within the technology sector. He resides in the United Kingdom and is looking for recommendations i.e. attractive hotspots to find and set up his office. Money does not appear to be an issue and therefore is very flexible in terms of cost to purchase an office. Since London is the largest and fasting growing tech hub in the UK, we will look within this region.

In this data science project we will focus our efforts on detecting areas of London which have a high amount of tech businesses as "startups in London benefit from high levels of global connectedness to other top ecosystems, and a flow of knowledge into the city which helps them to build global market reach. London-based companies can tap into global markets due to the UK’s favourable time zone, with the city acting as a launchpad to scale and grow internationally". More information can be seen here https://media.londonandpartners.com/news/london-named-one-of-the-worlds-leading-startup-ecosystems. 

## 2. Data
#### A description of the data and how it will be used to solve the problem

We shall be using location data via Wikipedia (United Kingdom Postcodes) and FourSquare API to solve this problem. We will gain the Longitude and latitude of each postcode, utilie the geopy library to get the latitude and longitude values of London, UK. Together, by leveraging FourSquare API, allow us to explore, segment and cluster neighbourhoods in London, UK and look at other interesting insights such as unique categories and the frequency of occurrence of each category. We could look into other datasets and take this as far as we want in terms of deep diving into the respective area e.g. crime rates if the client wanted to look at safety of boroughs, but for the time being this will be more than sufficient.

## 3. Methodology

#### Purpose of the methdology are the major steps in the following:
####   - Tackling a data science problem.
####   - Practicing data science, from forming a concrete business or research problem, to collecting and analyzing data, to building a model, and understanding the feedback after model deployment.

The steps in order to generate key insights for the client is as follows:

1) You could webscrape UK postcodes, which I did in a previous course assignment (via Wikipedia), but I have found all 32 London Boroughs online, placed it into an Excel spreadsheet and cleaned up the data. The data can be accessed here https://en.wikipedia.org/wiki/List_of_London_boroughs 
We then imported the folium library to visualise geographic details of London and its boroughs.

2) I used Foursquare API to explore the boroughs and segment them. I set 100 as the limit number of venues returned and defined the radius as 500 metres for each borough as defined by their given latitude and longitude data.

3) Find out the unique categories by Foursquare and explore the top 10 most common venues for London and its boroughs, which includes utilising one hot encoding to encode categorical features as a one-hot numeric array. We then group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

4) We have some common venue categories in boroughs and therefore utilised k-means algorithm (unsupervised learning) to cluster the neighborhood into clusters.

5) Visualise the resulting clusters using a folium map

## 4. Analysis/Results

### 4.1. Data Pre-processing
#### Importing all of the crucial libraries

In [1]:
# Firstly I'm importing the libraries neccessary for analysis
import pandas as pd
import numpy as np
import requests
import csv
from bs4 import BeautifulSoup
import urllib.request
import xlrd
import sys
import os
import openpyxl
from openpyxl import Workbook
from comtypes.client import CreateObject
import csv
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



#### Importing/Reading the dataset

In [2]:
# Reading the Excel file which I copied into excel and downloaded as an xlsx file (this would be easily web scraped)

df = pd.read_excel("londonboroughs.xlsx")
pd.set_option('display.max_columns', None) # this will show all columns in the dataframe instead of the "..."
df.head(2)

Unnamed: 0,Borough,Local authority,Headquarters,Population (2013 est)[1],Latitude,Longitude
0,Barking and Dagenham,Barking and Dagenham London Borough Council,"Town Hall, 1 Town Square",194352,51.5607,0.1557
1,Barnet,Barnet London Borough Council,"Barnet House, 2 Bristol Avenue, Colindale",369088,51.6252,0.1517


#### Identifying and handling potential missing values

In [3]:
# Checking if the dataframe has NaN values
df.isnull

<bound method DataFrame.isnull of                    Borough                                Local authority  \
0     Barking and Dagenham    Barking and Dagenham London Borough Council   
1                   Barnet                  Barnet London Borough Council   
2                   Bexley                  Bexley London Borough Council   
3                    Brent                   Brent London Borough Council   
4                  Bromley                 Bromley London Borough Council   
5                   Camden                  Camden London Borough Council   
6                  Croydon                 Croydon London Borough Council   
7                   Ealing                  Ealing London Borough Council   
8                  Enfield                 Enfield London Borough Council   
9                Greenwich               Greenwich London Borough Council   
10                 Hackney                 Hackney London Borough Council   
11  Hammersmith and Fulham  Hammersmith an

#### Removing unneccesary columns not required for analysis

In [4]:
df = df.drop(df.columns[[1,2]], axis=1) # Dropping Local authority (Index 1) and Headquarters (Index 2)
pd.set_option('display.max_columns', None)
df.head(2)

Unnamed: 0,Borough,Population (2013 est)[1],Latitude,Longitude
0,Barking and Dagenham,194352,51.5607,0.1557
1,Barnet,369088,51.6252,0.1517


#### Obtrain Longitude and Latitude values of London, United Kingdon using Geopy library 

In [5]:
# Defining our user_agent as tr_explorer

address = 'London, UK'

geolocator = Nominatim(user_agent="tr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical co-ordinates of London, United Kingdom are {}, {}.'.format(latitude, longitude))

The geograpical co-ordinates of London, United Kingdom are 51.5073219, -0.1276474.


#### Create map of London, United Kingdom using Folium visualisation library

In [6]:
# create map of London,  using latitude and longitude values
map_London = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(df['Latitude'], df['Longitude'], df['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_London)
    
map_London

#### Utilising Foursquare API to explore the boroughs in London, United Kingdom and segment them

In [7]:
# Credentials

CLIENT_SECRET = 'JUKO0QBVGYWUJIJ3XPVS000Z52AWZB5RE4ITVHOQE41EYKDI'
VERSION = '20200630'
CLIENT_ID = 'GK0YVQBTQIUAN1NCHDWQ3YB0VEOZLATK04CJJBKHNKB30ZVV'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GK0YVQBTQIUAN1NCHDWQ3YB0VEOZLATK04CJJBKHNKB30ZVV
CLIENT_SECRET:JUKO0QBVGYWUJIJ3XPVS000Z52AWZB5RE4ITVHOQE41EYKDI


In [8]:
df.loc[0, 'Borough']

'Barking and Dagenham'

In [9]:
# Get the above Boroughs Longitude and Latitude value

neighborhood_latitude = df.loc[0, 'Latitude']
neighborhood_longitude = df.loc[0, 'Longitude'] 

neighborhood_name = df.loc[0, 'Borough'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Barking and Dagenham are 51.5607, 0.1557.


#### Top 100 venues that are in Barking and Dagenham within a radius of 500 meters.

In [10]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=GK0YVQBTQIUAN1NCHDWQ3YB0VEOZLATK04CJJBKHNKB30ZVV&client_secret=JUKO0QBVGYWUJIJ3XPVS000Z52AWZB5RE4ITVHOQE41EYKDI&v=20200630&ll=51.5607,0.1557&radius=500&limit=100'

In [11]:
## Send the GET request and examine the results
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f21974851d07f34bc29a44a'},
 'response': {'headerLocation': 'Heath',
  'headerFullLocation': 'Heath, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 51.5652000045, 'lng': 0.162924882650929},
   'sw': {'lat': 51.556199995499995, 'lng': 0.148475117349071}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ac518f8f964a520d6af20e3',
       'name': 'Central Park',
       'location': {'address': 'Wood Ln.',
        'crossStreet': 'Rainham Rd. N.',
        'lat': 51.559560186523925,
        'lng': 0.16198065419715413,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.559560186523925,
          'lng': 0.16198065419715413}],
        'distance': 452,
        'pos

#### All the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
### Below we clean the json and structure it into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Central Park,Park,51.55956,0.161981
1,Crowlands Heath Golf Course,Golf Course,51.562457,0.155818
2,Robert Clack Leisure Centre,Martial Arts Dojo,51.560808,0.152704
3,Beacontree Heath Leisure Centre,Gym / Fitness Center,51.560997,0.148932
4,Becontree Heath Bus Station,Bus Station,51.561065,0.150998


#### Creating a function to repeat the same process to all the Boroughs in London

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
### Below is the code to run the above function on each neighborhood and create a new dataframe called London_venues.

London_venues = getNearbyVenues(names=df['Borough'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

print(London_venues.shape)
London_venues.head()

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster
(289, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Crowlands Heath Golf Course,51.562457,0.155818,Golf Course
2,Barking and Dagenham,51.5607,0.1557,Robert Clack Leisure Centre,51.560808,0.152704,Martial Arts Dojo
3,Barking and Dagenham,51.5607,0.1557,Beacontree Heath Leisure Centre,51.560997,0.148932,Gym / Fitness Center
4,Barking and Dagenham,51.5607,0.1557,Becontree Heath Bus Station,51.561065,0.150998,Bus Station


In [16]:
### Checking how many venues were returned for each borough

London_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,7,7,7,7,7,7
Bexley,27,27,27,27,27,27
Brent,2,2,2,2,2,2
Bromley,39,39,39,39,39,39
Camden,4,4,4,4,4,4
Croydon,7,7,7,7,7,7
Ealing,2,2,2,2,2,2
Enfield,5,5,5,5,5,5
Greenwich,42,42,42,42,42,42
Hackney,7,7,7,7,7,7


#### Analysing each borough in London, United Kingdom

In [17]:
# one hot encoding
London_onehot = pd.get_dummies(London_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
London_onehot['Neighborhood'] = London_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [London_onehot.columns[-1]] + list(London_onehot.columns[:-1])
London_onehot = London_onehot[fixed_columns]

London_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Asian Restaurant,Bakery,Bar,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Furniture / Home Store,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Light Rail Station,Lighthouse,Lingerie Store,Martial Arts Dojo,Metro Station,Movie Theater,Multiplex,Music Store,Nature Preserve,Outdoor Sculpture,Park,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Rafting,Restaurant,Rugby Pitch,Sandwich Place,Shopping Mall,Skate Park,Soccer Stadium,Sporting Goods Shop,Stables,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [18]:
### Grouping rows by borough and by taking the mean of the frequency of occurrence of each category

London_grouped = London_onehot.groupby('Neighborhood').mean().reset_index()
London_grouped

Unnamed: 0,Neighborhood,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Asian Restaurant,Bakery,Bar,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Furniture / Home Store,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Light Rail Station,Lighthouse,Lingerie Store,Martial Arts Dojo,Metro Station,Movie Theater,Multiplex,Music Store,Nature Preserve,Outdoor Sculpture,Park,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Rafting,Restaurant,Rugby Pitch,Sandwich Place,Shopping Mall,Skate Park,Soccer Stadium,Sporting Goods Shop,Stables,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bexley,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037
2,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bromley,0.0,0.0,0.0,0.0,0.0,0.025641,0.025641,0.051282,0.025641,0.0,0.0,0.0,0.051282,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.128205,0.128205,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.025641,0.025641,0.025641,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Ealing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Greenwich,0.02381,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.071429,0.0,0.02381,0.02381,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.047619,0.0,0.02381,0.0,0.0,0.047619,0.0,0.0,0.02381,0.071429,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.071429,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.02381
9,Hackney,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.428571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0


In [19]:
### Top 10 most common venues in ALL BOROUGHS IN LONDON

num_top_venues = 10

for hood in London_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = London_grouped[London_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
                  venue  freq
0                  Pool  0.14
1     Martial Arts Dojo  0.14
2  Gym / Fitness Center  0.14
3           Supermarket  0.14
4           Golf Course  0.14
5                  Park  0.14
6           Bus Station  0.14
7       Nature Preserve  0.00
8                 Plaza  0.00
9            Playground  0.00


----Bexley----
                    venue  freq
0                     Pub  0.11
1             Coffee Shop  0.11
2          Clothing Store  0.11
3                Pharmacy  0.07
4    Fast Food Restaurant  0.07
5             Supermarket  0.07
6         Warehouse Store  0.04
7      Chinese Restaurant  0.04
8                   Hotel  0.04
9  Furniture / Home Store  0.04


----Brent----
                venue  freq
0                 Pub   0.5
1         Golf Course   0.5
2  African Restaurant   0.0
3         Music Store   0.0
4          Playground   0.0
5            Platform   0.0
6         Pizza Place   0.0
7                Pier   0.0
8   

In [20]:
# Function to sort out the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
### Creating the new dataframe and display the top 10 venues for each neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = London_grouped['Neighborhood']

for ind in np.arange(London_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Gym / Fitness Center,Pool,Golf Course,Supermarket,Park,Martial Arts Dojo,Bus Station,Warehouse Store,Discount Store,Cosmetics Shop
1,Bexley,Coffee Shop,Clothing Store,Pub,Fast Food Restaurant,Pharmacy,Supermarket,Warehouse Store,Sandwich Place,Hotel,Furniture / Home Store
2,Brent,Pub,Golf Course,Warehouse Store,English Restaurant,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store
3,Bromley,Coffee Shop,Clothing Store,Burger Joint,Pizza Place,Gym / Fitness Center,Bar,Burrito Place,Cosmetics Shop,Donut Shop,Chocolate Shop
4,Camden,Gym,Business Service,Rugby Pitch,Skate Park,Warehouse Store,Donut Shop,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop


### 4.2. Clustering Boroughs using K-Means unsupervised clustering algorithm 

In [22]:
# set number of clusters
kclusters = 5

London_grouped_clustering = London_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(London_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 3, 0, 0, 0, 0, 0, 0, 0])

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_).reset_index(drop = True)

ValueError: cannot insert Cluster Labels, already exists

In [54]:
London_merged = London_data

# merge London_grouped with London_data to add latitude/longitude for each neighborhood
London_merged = London_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='right')

London_merged.head()

NameError: name 'London_data' is not defined

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(London_merged['Latitude'], London_merged['Longitude'], London_merged['Neighborhood'], London_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

KeyError: 'Latitude'

In [34]:
 %whos # Checking all the variables as 

No variables match your requested type.


## 5. Discussion

As shown throughout the project, we have used Python to collect, clean and analyse crucial open source data, together with the utilisation of Foursquare API, neccessary in order to find the top 10 common venues, (more specifically within business services and easy access to food & drink) within all London boroughs to give the client an opportunity to view which places in London would yield the greatest return in traction of future clients, local networks and employee attraction.
 
Given the deadline on this course, in conjunction with work and life constraints, this has lmpacted the time spent on this project and therefore would not give the entire picture, although the results of the analysis were crucial. If we had extra time, could potentially explore/deep-dive into the following:

1) Crime data - are there areas we should avoid e.g. high security in Canary Wharf? Is there a correlation between tech business areas and crime rates and types of crime?

2) Deep dive into the businesses of each borough, as we are looking for tech related businesses not businesses in general.  
 
2) Customer/Client reviews - Webscrape rental office data/reviews e.g. tripadvisor, to find out quality of office space and residents within the building?

3) Effect of COVID-19 on rental prices - Does the current pandemic have a strong positive correlation on pricing within rental offices?

4) Given Number 3, what would be the average cost per employee to set up from home, versus rental of office?

## 6. Conclusion

Utilising open source data alongside Foursquare API has given us a tremendous insight into highlighting venues in the London Borough which are attractive and promising to setting up a start up business. Further analysis (and therefore manpower hours) will need to be conducted to give the overall landscape of the client end goal, especially top venues within the tech industry. However the current findings are extremely useful so far and sets the stage for a very promising (and potential) business plan.

In [43]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Barking and Dagenham,Gym / Fitness Center,Pool,Golf Course,Supermarket,Park,Martial Arts Dojo,Bus Station,Warehouse Store,Discount Store,Cosmetics Shop
1,0,Bexley,Coffee Shop,Clothing Store,Pub,Fast Food Restaurant,Pharmacy,Supermarket,Warehouse Store,Sandwich Place,Hotel,Furniture / Home Store
2,3,Brent,Pub,Golf Course,Warehouse Store,English Restaurant,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store
3,0,Bromley,Coffee Shop,Clothing Store,Burger Joint,Pizza Place,Gym / Fitness Center,Bar,Burrito Place,Cosmetics Shop,Donut Shop,Chocolate Shop
4,0,Camden,Gym,Business Service,Rugby Pitch,Skate Park,Warehouse Store,Donut Shop,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop
5,0,Croydon,Pizza Place,Coffee Shop,Chinese Restaurant,Italian Restaurant,Bakery,Supermarket,Pub,Donut Shop,Cosmetics Shop,Currency Exchange
6,0,Ealing,Home Service,Business Service,English Restaurant,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop
7,0,Enfield,Construction & Landscaping,Park,English Restaurant,Cosmetics Shop,Currency Exchange,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop
8,0,Greenwich,Pub,Supermarket,Clothing Store,Coffee Shop,Fast Food Restaurant,Pharmacy,Hotel,Sandwich Place,Plaza,Grocery Store
9,0,Hackney,Indian Restaurant,Hotel,Train Station,Park,Restaurant,Warehouse Store,Electronics Store,Cosmetics Shop,Currency Exchange,Department Store


NameError: name 'London_data' is not defined

In [51]:
London_merged

Unnamed: 0,Borough,Population (2013 est)[1],Latitude,Longitude
0,Barking and Dagenham,194352,51.5607,0.1557
1,Barnet,369088,51.6252,0.1517
2,Bexley,236687,51.4549,0.1505
3,Brent,317264,51.5588,0.2817
4,Bromley,317899,51.4039,0.0198
5,Camden,229719,51.529,0.1255
6,Croydon,372752,51.3714,0.0977
7,Ealing,342494,51.513,0.3089
8,Enfield,320524,51.6538,0.0799
9,Greenwich,264008,51.4892,0.0648
