# Battle of Neighborhoods in Central London

#### The idea of this notebook is to analyze neighborhoods in Central London and come up with 3 suggested neighborhood locations for an entrepreneur looking to start a fitness equipment and supplement store in Central London. More details can be found in the report.

Import the necessary libraries:

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# To import Wikipedia page
!pip3 install lxml

# getting geographical coordinates from OS code in data
!pip install OSGridConverter
from OSGridConverter import grid2latlong

print('Libraries imported.')

/bin/sh: 1: pip3: not found
Libraries imported.


Now, scrape the list of Neighborhoods in London from the Wikipedia page: https://en.wikipedia.org/wiki/List_of_areas_of_London.

In [152]:
# Read DataFrame from Wikipedia using Pandas
df_london = pd.read_html('https://en.wikipedia.org/wiki/List_of_areas_of_London')[1]
df_london.columns = ['Neighborhood','Borough','Town','PostCode','DialCode','OS grid ref']
df_london.head()

Unnamed: 0,Neighborhood,Borough,Town,PostCode,DialCode,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [153]:
print('The shape of the dataframe is',df_london.shape)

The shape of the dataframe is (533, 6)


The data-frame needs cleaning. The example row index 25 shows the values in brackets (also known as) for locations and numbered citations for London boroughs which have to be removed.

In [154]:
print(df_london.loc[[25]])

                                  Neighborhood     Borough    Town PostCode  \
25  Barnet (also Chipping Barnet, High Barnet)  Barnet[16]  BARNET      EN5   

   DialCode OS grid ref  
25      020    TQ245955  


In [155]:
# Removing any brackets and its values from dataframe
df_london.replace(to_replace=r'\(.*\)', value='', regex=True, inplace = True)
df_london.replace(to_replace=r'\[.*\]', value='', regex=True, inplace = True)
df_london.head()

Unnamed: 0,Neighborhood,Borough,Town,PostCode,DialCode,OS grid ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


Now, the 25th index row becomes:

In [156]:
print(df_london.loc[[25]])

   Neighborhood Borough    Town PostCode DialCode OS grid ref
25      Barnet   Barnet  BARNET      EN5      020    TQ245955


The OS grid ref code is another way of specifying the geographical location of the neighborhood. Converting this to location latitude and longitude values using the OSGrid Converter installed in the libraries above, and adding the values to the data-frame.

In [157]:
# Get total null values in data-frame
df_london.isna().sum()

Neighborhood    0
Borough         0
Town            0
PostCode        0
DialCode        0
OS grid ref     2
dtype: int64

In [158]:
# Remove any null values in dataframe 
df_london.dropna(inplace=True)
df_london.shape

(531, 6)

In [159]:
# Convert OS grid ref to Geographical Coordinates for all Locations and add it to Data-frame.

latitude = []
longitude = []

for value in df_london['OS grid ref']:
    l = grid2latlong(value) # function in OS grid converter
    latitude.append(l.latitude)
    longitude.append(l.longitude)
    
df_london['Latitude'] = latitude  
df_london['Longitude'] = longitude 

df_london.head()

Unnamed: 0,Neighborhood,Borough,Town,PostCode,DialCode,OS grid ref,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785,51.486484,0.109318
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805,51.510591,-0.264585
2,Addington,Croydon,CROYDON,CR0,20,TQ375645,51.362934,-0.02578
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665,51.381625,-0.068126
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728,51.434929,0.125663


Now, Removing unnecessary columns from the dataframe - OS grid ref and Dial Code.

In [163]:
# Remove columns 4 and 5
df_london.drop(columns=['DialCode','OS grid ref'], axis = 1, inplace = True)
df_london.head()

Unnamed: 0,Neighborhood,Borough,Town,PostCode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.486484,0.109318
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.510591,-0.264585
2,Addington,Croydon,CROYDON,CR0,51.362934,-0.02578
3,Addiscombe,Croydon,CROYDON,CR0,51.381625,-0.068126
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",51.434929,0.125663


#### Now I have my London data ready. But since I am looking for Central London, I will slice the boroughs which come under Central London, provided here: https://en.wikipedia.org/wiki/Central_London

In [185]:
# Get all unique Borough Values to filter central London Boroughs later
df_london['Borough'].unique()

array(['Bexley, Greenwich ', 'Ealing, Hammersmith and Fulham', 'Croydon',
       'Bexley', 'Redbridge', 'City', 'Westminster', 'Brent', 'Bromley',
       'Islington', 'Havering', 'Barnet', 'Enfield', 'Wandsworth',
       'Southwark', 'Barking and Dagenham', 'Richmond upon Thames',
       'Newham', 'Sutton', 'Ealing', 'Lewisham', 'Harrow', 'Camden',
       'Kingston upon Thames', 'Tower Hamlets', 'Greenwich', 'Haringey',
       'Hounslow', 'Lambeth',
       'Kensington and ChelseaHammersmith and Fulham', 'Waltham Forest',
       'Redbridge, Barking and Dagenham', 'Kensington and Chelsea',
       'Hounslow, Ealing, Hammersmith and Fulham', 'Lambeth, Wandsworth',
       'Barnet, Enfield', 'Merton', 'Hillingdon', 'Barnet, Brent, Camden',
       'Hackney', 'Dartford', 'Bexley, Greenwich', 'Islington & City',
       'Haringey, Islington', 'Hammersmith and Fulham',
       'Greenwich, Lewisham', 'Lambeth, Southwark', 'Brent, Harrow',
       'Brent, Camden', 'Camden and Islington', 'Haringey an

In [194]:
# Set index to Borough
df_london.set_index('Borough', inplace = True)

In [195]:
central_boroughs = ['Camden', 'Islington', 'Kensington and Chelsea', 
                    'Lambeth', 'Southwark', 'Westminster','City',
                   'Islington & City', 'Lambeth, Southwark', 'Camden and Islington',
                   'City, Westminster', 'Islington, Camden']

df_central = df_london.loc[central_boroughs]
df_central.head()

Unnamed: 0_level_0,Neighborhood,Town,PostCode,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Camden,Belsize Park,LONDON,NW3,51.545049,-0.165131
Camden,Bloomsbury,LONDON,WC1,51.526345,-0.119715
Camden,Camden Town,LONDON,NW1,51.544548,-0.133398
Camden,Chalk Farm,LONDON,NW1,51.543969,-0.153628
Camden,Fitzrovia,LONDON,W1,51.518533,-0.137347


In [196]:
# Reset index of London Data
df_london.reset_index(inplace = True)
df_london.head()

Unnamed: 0,Borough,Neighborhood,Town,PostCode,Latitude,Longitude
0,"Bexley, Greenwich",Abbey Wood,LONDON,SE2,51.486484,0.109318
1,"Ealing, Hammersmith and Fulham",Acton,LONDON,"W3, W4",51.510591,-0.264585
2,Croydon,Addington,CROYDON,CR0,51.362934,-0.02578
3,Croydon,Addiscombe,CROYDON,CR0,51.381625,-0.068126
4,Bexley,Albany Park,"BEXLEY, SIDCUP","DA5, DA14",51.434929,0.125663


In [197]:
# Reset index of Central London Data
df_central.reset_index(inplace = True)
df_central.head()

Unnamed: 0,Borough,Neighborhood,Town,PostCode,Latitude,Longitude
0,Camden,Belsize Park,LONDON,NW3,51.545049,-0.165131
1,Camden,Bloomsbury,LONDON,WC1,51.526345,-0.119715
2,Camden,Camden Town,LONDON,NW1,51.544548,-0.133398
3,Camden,Chalk Farm,LONDON,NW1,51.543969,-0.153628
4,Camden,Fitzrovia,LONDON,W1,51.518533,-0.137347


#### Now, I have my Central London Data-frame df_central extracted from London Dataframe df_london.

Double checking the Data:

In [198]:
df_central['Borough'].unique()

array(['Camden', 'Islington', 'Kensington and Chelsea', 'Lambeth',
       'Southwark', 'Westminster', 'City', 'Islington & City',
       'Lambeth, Southwark', 'Camden and Islington', 'City, Westminster',
       'Islington, Camden'], dtype=object)

In [199]:
df_central['Town'].unique()

array(['LONDON'], dtype=object)

In [200]:
print("The shape of Central London Data-frame is",df_central.shape)

The shape of Central London Data-frame is (89, 6)


In [203]:
print('The number of neighborhoods in Central London are {} and number of Boroughs are {}'
      .format(df_central.shape[0], len(df_central['Borough'].unique())))

The number of neighborhoods in Central London are 89 and number of Boroughs are 12


#### Now I'll get the coordinates of London and plot the respective neighborhoods of Central London on the Folium Map

In [204]:
# Get Location Coordinates of London, and name user agent london_explorer

address = 'London'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [207]:
# create map of Central London using latitude and longitude values
map_central_london = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(df_central['Latitude'], df_central['Longitude'], 
                                                       df_central['Borough'], df_central['Neighborhood'], 
                                                       df_central['PostCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_central_london)  
    
map_central_london

In [208]:
# Defining Foursquare Credentials to start exploring Neighborhoods using Foursquare

CLIENT_ID = 'WGEHBVTXEOANDOSTZIM5BQO2DVGLIVZHJDZNRTREVZPH23VX' # your Foursquare ID
CLIENT_SECRET = 'V3OCNJU51334BB0I4KPQRK1YY1ZCSD3QCRKCIUTL5ZDWGRWW' # your Foursquare Secret
VERSION = '20200605' # Foursquare API version

print('Credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Credentails:
CLIENT_ID: WGEHBVTXEOANDOSTZIM5BQO2DVGLIVZHJDZNRTREVZPH23VX
CLIENT_SECRET:V3OCNJU51334BB0I4KPQRK1YY1ZCSD3QCRKCIUTL5ZDWGRWW


I'll start by exploring 1 neighborhood, say - Soho in Westminster Borough.

In [212]:
df_soho = df_central[df_central['Neighborhood']=='Soho'].reset_index(drop=True)

In [214]:
neighborhood_latitude = df_soho.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_soho.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_soho.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Soho are 51.5175885518576, -0.13450100389219394.


#### Now, exploring top 100 venues in Soho

In [215]:
# create url
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=WGEHBVTXEOANDOSTZIM5BQO2DVGLIVZHJDZNRTREVZPH23VX&client_secret=V3OCNJU51334BB0I4KPQRK1YY1ZCSD3QCRKCIUTL5ZDWGRWW&v=20200605&ll=51.5175885518576,-0.13450100389219394&radius=500&limit=100'

In [216]:
# get results
results = requests.get(url).json()

In [217]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean json and structure into Pandas Dataframe:

In [218]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,The Punch Room,Cocktail Bar,51.516905,-0.136151
1,The London Edition (The London EDITION),Hotel,51.516762,-0.136049
2,Kaffeine,Coffee Shop,51.516785,-0.13708
3,Roka,Japanese Restaurant,51.518992,-0.135308
4,Vagabond,Wine Bar,51.518695,-0.135003


In [219]:
print('{} venues with {} different categories were returned by Foursquare.'
      .format(nearby_venues.shape[0], len(nearby_venues['categories'].unique())))

79 venues with 49 different categories were returned by Foursquare.


Now creating a function to repeat the above process for all Neighborhoods in Central London

In [220]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now running the above code for each Neighborhood for Central London

In [221]:
central_venues = getNearbyVenues(names=df_central['Neighborhood'],
                                   latitudes=df_central['Latitude'],
                                   longitudes=df_central['Longitude']
                                  )

Belsize Park
Bloomsbury
Camden Town
Chalk Farm
Fitzrovia
Frognal
Gospel Oak
Hampstead
Highgate
Holborn
Kentish Town
Primrose Hill
Somerstown
St Giles
St Pancras
Swiss Cottage
West Hampstead
Angel
Archway
Barnsbury
Canonbury
Clerkenwell
De Beauvoir Town
Finsbury
Highbury
Holloway
Islington
Nag's Head
Pentonville
St Luke's
Upper Holloway
Chelsea
Earls Court
Holland Park
Kensington
North Kensington
Notting Hill
South Kensington
West Brompton
Brixton
Gipsy Hill
Herne Hill
Lambeth
Oval
Stockwell
Streatham
Tulse Hill
Vauxhall
West Norwood
Bankside
Bermondsey
Camberwell
Denmark Hill
Dulwich
East Dulwich
Elephant and Castle
Newington
Nunhead
Peckham
Rotherhithe
Surrey Quays
Walworth
Aldwych
Bayswater
Belgravia
Charing Cross
Chinatown
Covent Garden
Knightsbridge
Lisson Grove
Little Venice
Maida Vale
Marylebone 
Mayfair
Millbank
Paddington
Pimlico
Soho
St James's
St John's Wood
Westminster
Aldgate
Barbican
Blackfriars
Farringdon
Kennington
King's Cross
Temple
Tufnell Park


In [222]:
print(central_venues.shape)
central_venues.head(15)

(4329, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Belsize Park,51.545049,-0.165131,Chamomile,51.545729,-0.162398,Café
1,Belsize Park,51.545049,-0.165131,Sable D'or,51.54599,-0.162048,Café
2,Belsize Park,51.545049,-0.165131,Black Truffle,51.545977,-0.16253,Deli / Bodega
3,Belsize Park,51.545049,-0.165131,Starbucks,51.545459,-0.162607,Coffee Shop
4,Belsize Park,51.545049,-0.165131,The Washington,51.545467,-0.162768,Pub
5,Belsize Park,51.545049,-0.165131,Primrose Hill Market,51.541779,-0.162012,Market
6,Belsize Park,51.545049,-0.165131,Grain Artisan Sourdough,51.54591,-0.162218,Café
7,Belsize Park,51.545049,-0.165131,M Lounge,51.542057,-0.170372,Hotel Bar
8,Belsize Park,51.545049,-0.165131,Co-op Food,51.543196,-0.166608,Grocery Store
9,Belsize Park,51.545049,-0.165131,Haverstock Hill,51.54846,-0.16534,Hill


In [223]:
print('{} venues with {} different categories were returned by Foursquare for all Neighborhoods in Central London.'
      .format(central_venues.shape[0], len(central_venues['Venue Category'].unique())))

4329 venues with 318 different categories were returned by Foursquare for all Neighborhoods in Central London.


#### Since the theme of the suggested business is Fitness equipment accessories, supplements etc, the target location will be the neighborhoods with most gyms. I will now pick up all the gyms and plot it on the folium map below

In [230]:
# Print Unique categories to determine which categories relate to gyms
venue_categories = central_venues['Venue Category'].unique()
print(sorted(venue_categories))

['Accessories Store', 'Afghan Restaurant', 'African Restaurant', 'American Restaurant', 'Antique Shop', 'Aquarium', 'Arcade', 'Arepa Restaurant', 'Argentinian Restaurant', 'Art Gallery', 'Art Museum', 'Arts & Crafts Store', 'Arts & Entertainment', 'Asian Restaurant', 'Astrologer', 'Athletics & Sports', 'Australian Restaurant', 'Austrian Restaurant', 'Auto Garage', 'BBQ Joint', 'Bagel Shop', 'Bakery', 'Bar', 'Basketball Court', 'Beach', 'Bed & Breakfast', 'Beer Bar', 'Beer Garden', 'Beer Store', 'Belgian Restaurant', 'Bike Rental / Bike Share', 'Bike Shop', 'Bistro', 'Boat or Ferry', 'Bookstore', 'Botanical Garden', 'Boutique', 'Bowling Alley', 'Boxing Gym', 'Brasserie', 'Brazilian Restaurant', 'Breakfast Spot', 'Brewery', 'Bubble Tea Shop', 'Building', 'Burger Joint', 'Burrito Place', 'Bus Station', 'Bus Stop', 'Business Service', 'Butcher', 'Cafeteria', 'Café', 'Camera Store', 'Canal', 'Canal Lock', 'Candy Store', 'Cantonese Restaurant', 'Caribbean Restaurant', 'Caucasian Restaurant',

It is seen, categories 'Gym', 'Gym / Fitness Center', 'Gym Pool', 'Gymnastics Gym' can all relate to the requirement I am looking for, hence I will slice the dataset for all the categories.

In [232]:
df_central_gym = central_venues[central_venues['Venue Category']=='Gym'].reset_index(drop=True)
df_central_gymfitness = central_venues[central_venues['Venue Category']=='Gym / Fitness Center'].reset_index(drop=True)
df_central_gympool = central_venues[central_venues['Venue Category']=='Gym Pool'].reset_index(drop=True)
df_central_gymnastics = central_venues[central_venues['Venue Category']=='Gymnastics Gym'].reset_index(drop=True)

Now i will create a new dataframe containing all the gyms in these 4 categories in Central London

In [236]:
gyms = [df_central_gym, df_central_gymfitness, df_central_gympool, df_central_gymnastics]
df_central_allgyms = pd.concat(gyms)
df_central_allgyms.reset_index(inplace = True,drop = True)
df_central_allgyms

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bloomsbury,51.526345,-0.119715,The Gym,51.524175,-0.125031,Gym
1,Bloomsbury,51.526345,-0.119715,Nuffield Health Fitness & Wellbeing Gym,51.524178,-0.118479,Gym
2,Holborn,51.517359,-0.120085,Good Vibes,51.515064,-0.123741,Gym
3,Somerstown,51.526575,-0.134133,Bannatyne Health Club,51.524415,-0.127815,Gym
4,St Giles,51.517359,-0.120085,Good Vibes,51.515064,-0.123741,Gym
5,St Pancras,51.526345,-0.119715,The Gym,51.524175,-0.125031,Gym
6,St Pancras,51.526345,-0.119715,Nuffield Health Fitness & Wellbeing Gym,51.524178,-0.118479,Gym
7,De Beauvoir Town,51.540992,-0.080142,Crossfit Hackney,51.537875,-0.075192,Gym
8,St Luke's,51.52588,-0.090878,London Fight Factory,51.528507,-0.089985,Gym
9,Kensington,51.500517,-0.192876,Equinox Kensington,51.501369,-0.191634,Gym


In [238]:
print('The number of gyms returned by Foursquare in Central London are {}.'.format(df_central_allgyms.shape[0]))

The number of gyms returned by Foursquare in Central London are 103.


Lets plot these gyms on the map

In [239]:
# create map of Central London using latitude and longitude values and plot all gyms
map_central_gyms = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood, venue, venuecat in zip(df_central_allgyms['Venue Latitude'], df_central_allgyms['Venue Longitude'], 
                                                       df_central_allgyms['Neighborhood'], df_central_allgyms['Venue'], 
                                                       df_central_allgyms['Venue Category']):
    label = '{}, {}, {}'.format(venue, venuecat,neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_central_gyms)  
    
map_central_gyms

It is clear from the map about the location of gyms in Central London. For starting up the suggested business in Central London, it is best to start with Neighborhoods with most gyms than Neighborhoods with less/no gyms.

#### I will now perform data analysis to come up with some numbers of potential locations for starting up the business.

In [252]:
# Perform One hot coding for the df_central_allgyms dataframe.
# one hot encoding
central_allgyms_onehot = pd.get_dummies(df_central_allgyms[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
central_allgyms_onehot['Neighborhood'] = df_central_allgyms['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [central_allgyms_onehot.columns[-1]] + list(central_allgyms_onehot.columns[:-1])
central_allgyms_onehot = central_allgyms_onehot[fixed_columns]

central_allgyms_onehot.head(10)

Unnamed: 0,Neighborhood,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym
0,Bloomsbury,1,0,0,0
1,Bloomsbury,1,0,0,0
2,Holborn,1,0,0,0
3,Somerstown,1,0,0,0
4,St Giles,1,0,0,0
5,St Pancras,1,0,0,0
6,St Pancras,1,0,0,0
7,De Beauvoir Town,1,0,0,0
8,St Luke's,1,0,0,0
9,Kensington,1,0,0,0


In [242]:
print ('Size of the above dataframe is', central_allgyms_onehot.shape)

Size of the above dataframe is (103, 5)


Now, grouping by neighborhoods taking the sum of frequency of occurrence of of gym

In [253]:
central_allgyms_grouped = central_allgyms_onehot.groupby('Neighborhood').sum().reset_index()
central_allgyms_grouped

Unnamed: 0,Neighborhood,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym
0,Aldgate,0,5,0,0
1,Aldwych,0,1,0,0
2,Barbican,1,3,0,0
3,Bayswater,0,1,0,0
4,Blackfriars,0,3,0,0
5,Bloomsbury,2,0,0,0
6,Brixton,0,1,0,0
7,Camberwell,0,2,0,0
8,Chalk Farm,0,0,0,1
9,Chelsea,0,1,0,0


Now, adding the total for each neighborhood, adding it to the dataframe and arranging in descending order.

In [254]:
total_gyms = central_allgyms_grouped["Gym"] + central_allgyms_grouped["Gym / Fitness Center"] + central_allgyms_grouped["Gym Pool"] + central_allgyms_grouped["Gymnastics Gym"]
central_allgyms_grouped['Total'] = total_gyms

In [255]:
central_allgyms_grouped.head()

Unnamed: 0,Neighborhood,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Total
0,Aldgate,0,5,0,0,5
1,Aldwych,0,1,0,0,1
2,Barbican,1,3,0,0,4
3,Bayswater,0,1,0,0,1
4,Blackfriars,0,3,0,0,3


In [258]:
central_allgyms_grouped.sort_values(by = ['Total'], ascending = False, inplace = True)
print ("The top 5 Neighborhoods for starting up the suggested business are:")
central_allgyms_grouped.head(5)

The top 5 Neighborhoods for starting up the suggested business are:


Unnamed: 0,Neighborhood,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Total
40,Somerstown,1,5,0,0,6
13,Elephant and Castle,2,3,0,0,5
45,St Luke's,1,3,1,0,5
0,Aldgate,0,5,0,0,5
2,Barbican,1,3,0,0,4


Here I have the top 5 Neighborhoods with maximum number of gyms around in order to start a business of Fitness equipment accessories, fitness supplements etc. 

I'll now plot these 5 potential locations on the map.

In [266]:
# Use central dataframe and Set index to Neighborhood
df_central.set_index('Neighborhood', inplace = True)

In [267]:
top5 = ["Somerstown","Elephant and Castle", "St Luke's", "Aldgate", "Barbican"]
df_top5 = df_central.loc[top5]
df_top5

Unnamed: 0_level_0,Borough,Town,PostCode,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Somerstown,Camden,LONDON,NW1,51.526575,-0.134133
Elephant and Castle,Southwark,LONDON,"SE1, SE11, SE17",51.493669,-0.100876
St Luke's,Islington,LONDON,EC1,51.52588,-0.090878
Aldgate,City,LONDON,EC3,51.514885,-0.078356
Barbican,City,LONDON,EC1,51.51966,-0.095466


In [268]:
# Reset index of Central London Data
df_central.reset_index(inplace = True)

In [270]:
# Reset index of Top 5 Data
df_top5.reset_index(inplace = True)

In [275]:
# create map of Potential Locations in Central London using latitude and longitude values
map_top5 = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(df_top5['Latitude'], df_top5['Longitude'], 
                                                       df_top5['Borough'], df_top5['Neighborhood'], 
                                                       df_top5['PostCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_top5)  
    
map_top5