Capstone Project - The Battle of Neighborhoods - Fine Dining in Bristol

This notebook shows the analysis of data to recommend a location for a new high-end restaurant in Bristol. Full details can be found in the accompanying report.

Part one: Import the relevant libraries and packages.

In [1]:
import pandas as pd
import numpy as np
import json
import geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import geocoder

Part Two: Import district data for Bristol and clean the dataframe

In [2]:
# Import data as a pandas dataframe
url = "https://en.wikipedia.org/wiki/BS_postcode_area"
BS_Districts = pd.read_html(url)[1]

BS_Districts.dropna(subset=["Coverage"], axis=0, inplace=True) #drop rows from dataframe with no assigned district
BS_Districts = BS_Districts[BS_Districts["Post town"] == "BRISTOL"] #drop rows from dataframe outside of Bristol town
BS_Districts.drop(columns = ['Post town', 'Local authority area(s)'], inplace = True)
BS_Districts.reset_index(drop=True, inplace=True) # reset index
BS_Districts #visualize dataframe

Unnamed: 0,Postcode district,Coverage
0,BS1,"Bristol city centre, Redcliffe"
1,BS2,"Kingsdown, St Paul's, St Phillip's, St Agnes"
2,BS3,"Bedminster, Southville, Bower Ashton, part of ..."
3,BS4,"Brislington, Knowle, Knowle West, St Anne's, p..."
4,BS5,"Easton, St George, Redfield, Whitehall, Eastvi..."
5,BS6,"Cotham, Redland, Montpelier, Westbury Park, St..."
6,BS7,"Bishopston, Horfield, part of Filton, Lockleaz..."
7,BS8,"Clifton, Failand, Hotwells, Leigh Woods"
8,BS9,"Coombe Dingle, Sneyd Park, Stoke Bishop, Westb..."
9,BS10,"Brentry, Henbury, Southmead, part of Westbury ..."


Part Three: Find the co-ordinates of each postcode using geocoder.arcgis and add these to the dataframe...

In [3]:
latitude = []
longitude =[]

for i in range(0, len(BS_Districts["Postcode district"])):
    pc = BS_Districts["Postcode district"][i]
    g=geocoder.arcgis("{}, Bristol".format(pc), components="country:GB")
    coords = g.latlng
    latitude = latitude + [coords[0]]
    longitude = longitude + [coords[1]]
               
BS_Districts["Latitude"] = latitude
BS_Districts["Longitude"] = longitude

In [4]:
BS_Districts

Unnamed: 0,Postcode district,Coverage,Latitude,Longitude
0,BS1,"Bristol city centre, Redcliffe",51.453845,-2.594374
1,BS2,"Kingsdown, St Paul's, St Phillip's, St Agnes",51.450367,-2.571305
2,BS3,"Bedminster, Southville, Bower Ashton, part of ...",51.44329,-2.608419
3,BS4,"Brislington, Knowle, Knowle West, St Anne's, p...",51.425055,-2.587577
4,BS5,"Easton, St George, Redfield, Whitehall, Eastvi...",51.46032,-2.570874
5,BS6,"Cotham, Redland, Montpelier, Westbury Park, St...",51.467995,-2.586913
6,BS7,"Bishopston, Horfield, part of Filton, Lockleaz...",51.496505,-2.579725
7,BS8,"Clifton, Failand, Hotwells, Leigh Woods",51.454,-2.60209
8,BS9,"Coombe Dingle, Sneyd Park, Stoke Bishop, Westb...",51.475195,-2.633475
9,BS10,"Brentry, Henbury, Southmead, part of Westbury ...",51.498225,-2.599736


...then visualise the centres of each postcode district on a map

In [5]:
address = 'Bristol'

geolocator = Nominatim(user_agent="bristol_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Bristol using latitude and longitude values
map_bristol = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(BS_Districts['Latitude'], BS_Districts['Longitude'], BS_Districts['Postcode district'], BS_Districts['Coverage']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bristol)  
    
map_bristol

Part 4: Since several postcode districts are located far from Bristol city centre, we'll remove any point which have a latitude or longitude which is far from the center (BS1).

In [6]:
CC_lat = BS_Districts["Latitude"][0]
CC_long = BS_Districts["Longitude"][0]

d = []

for i in range(1, len(BS_Districts["Latitude"])):
    lat = BS_Districts["Latitude"][i]
    long = BS_Districts["Longitude"][i]
    if (lat > CC_lat + 0.04) or (lat < CC_lat - 0.04) or (long > CC_long + 0.1) or (long < CC_long - 0.1):
        d = d + [i]
    
BS_Districts.drop(index = d, inplace = True)
BS_Districts.reset_index(drop=True, inplace=True) # reset index

Let's have another look:

In [7]:
address = 'Bristol'

geolocator = Nominatim(user_agent="bristol_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Bristol using latitude and longitude values
map_bristol2 = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(BS_Districts['Latitude'], BS_Districts['Longitude'], BS_Districts['Postcode district'], BS_Districts['Coverage']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bristol2)  
    
map_bristol2

Part 5: Now our district locations are determined, let's find the venues in each district.
The following functions were defined in an earlier lab book in the Coursera Applied Data Science Capstone course.

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

We use Foursquare to find the venues surrounding the centers of each postcode district.

In [30]:
#FourSquare credentials
CLIENT_ID = '5VKZMOG0VTC5AIKTYGQANSCCT2KPBISSHB34USIWG112RJY5' # my Foursquare ID
CLIENT_SECRET = 'YBRYR0VZTWMYG33OS4H55I5B5S4ORESAWVXQWD3WEQJXUQ1S' # my Foursquare Secret
VERSION = '20200720'

LIMIT = 100
bristol_venues = getNearbyVenues(names=BS_Districts['Postcode district'],
                                   latitudes=BS_Districts['Latitude'],
                                   longitudes=BS_Districts['Longitude']
                                  )
bristol_venues.head()

Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BS1,51.453845,-2.594374,Small Street Espresso,51.454766,-2.5944,Coffee Shop
1,BS1,51.453845,-2.594374,BrewDog Bristol,51.453659,-2.592372,Beer Bar
2,BS1,51.453845,-2.594374,St. Nicholas Market,51.454248,-2.593286,Market
3,BS1,51.453845,-2.594374,Small Bar,51.451791,-2.594948,Pub
4,BS1,51.453845,-2.594374,Bristol Old Vic,51.45209,-2.594154,Theater


Part 6: Let's look at how many venues are in each district in Bristol, what kinds of venues these are and select the important ones.

In [31]:
PC = bristol_venues["Postcode"].unique()
c = []
for i in range(0, len(PC)):
    pc = PC[i]
    c = c + [bristol_venues.Postcode.value_counts()[pc]]
    print(pc, " Count:", c[i])

df = pd.DataFrame({
    'Postcode district': PC,
    'Count': c,
})
bristol_venues["Venue Category"].unique()

BS1  Count: 100
BS2  Count: 4
BS3  Count: 35
BS4  Count: 3
BS5  Count: 8
BS6  Count: 16
BS8  Count: 100
BS9  Count: 1
BS11  Count: 100
BS13  Count: 43
BS14  Count: 100
BS15  Count: 55
BS16  Count: 35
BS20  Count: 4
BS31  Count: 35
BS32  Count: 5
BS34  Count: 20
BS35  Count: 3
BS39  Count: 18
BS41  Count: 3
BS49  Count: 24


array(['Coffee Shop', 'Beer Bar', 'Market', 'Pub', 'Theater', 'Jazz Club',
       'Japanese Restaurant', 'Burger Joint', 'Bar', 'Italian Restaurant',
       'Falafel Restaurant', 'Café', 'Lounge', 'Steakhouse',
       'Middle Eastern Restaurant', 'Vietnamese Restaurant',
       'Indian Restaurant', 'Asian Restaurant', 'Pie Shop',
       'Korean Restaurant', 'Pizza Place', 'Hookah Bar', 'Concert Hall',
       'Breakfast Spot', 'Cocktail Bar', 'Record Shop', 'Park',
       'Ice Cream Shop', 'Indie Movie Theater', 'Hotel', 'Art Gallery',
       'Caribbean Restaurant', 'Frozen Yogurt Shop', 'English Restaurant',
       'Nightclub', 'Street Art', 'Tapas Restaurant', 'Greek Restaurant',
       'Bookstore', 'Music Venue', 'Cosmetics Shop', 'Dessert Shop',
       'Brewery', 'Seafood Restaurant', 'Gay Bar', 'Chinese Restaurant',
       'Bed & Breakfast', 'Gastropub', 'Rental Car Location',
       'Community Center', 'Bakery', 'Deli / Bodega', 'Supermarket',
       'Grocery Store', 'Food & Drink

Part 7: We can see that several districts have very few venues - let's remove those with less than 10. We should also remove districts with venues which are associated with lower income visitors and residents, such as community centers, liquor stores and campgrounds.

In [32]:
# one hot encoding
bristol_onehot = pd.get_dummies(bristol_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
bristol_onehot['Postcode district'] = bristol_venues['Postcode'] 

# move neighborhood column to the first column
fixed_columns = [bristol_onehot.columns[-1]] + list(bristol_onehot.columns[:-1])
bristol_onehot = bristol_onehot[fixed_columns]

bristol_grouped = bristol_onehot.groupby('Postcode district').mean().reset_index()
bristol_grouped = bristol_grouped.join(df.set_index('Postcode district'), on='Postcode district') #Add a venue count column

n = []
for i in range(0, len(bristol_grouped['Postcode district'])):
    total = bristol_grouped["Count"][i]
    if total < 10: #pick out districts with few venues
        n = n + [i]
    cc = bristol_grouped["Community Center"][i]
    liquor = bristol_grouped["Liquor Store"][i]
    camp = bristol_grouped["Campground"][i]
    if cc > 0 or liquor > 0 or camp > 0 : #pick out districts with liquor stores, community centers and campgrounds
        n = n + [i]
bristol_grouped.drop(index = n, inplace = True) #remove these districts
bristol_grouped.drop(columns = ["Count"], inplace = True) #remove the count column
bristol_grouped.reset_index(drop=True, inplace=True) # reset index

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode district']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postcode district'] = bristol_grouped['Postcode district']

for ind in np.arange(bristol_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bristol_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted


Unnamed: 0,Postcode district,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,BS1,Bar,Pub,Café,Coffee Shop,Italian Restaurant
1,BS11,Bar,Pub,Café,Coffee Shop,Italian Restaurant
2,BS13,Pub,Bar,Hotel,Coffee Shop,Burger Joint
3,BS14,Bar,Pub,Café,Coffee Shop,Steakhouse
4,BS16,Platform,Pub,Coffee Shop,Café,Bakery
5,BS34,Pub,Pharmacy,Indian Restaurant,Breakfast Spot,Coffee Shop
6,BS49,Platform,Coffee Shop,Hotel,Bakery,Juice Bar
7,BS6,Café,Pub,Pizza Place,Gastropub,Mediterranean Restaurant
8,BS8,Bar,Café,Cocktail Bar,Asian Restaurant,Coffee Shop


CONCLUSIONS: Now we've reduced the number of districts suitable for our high-end restaurant, we can look for indicators of higher-income visitors. The BS8 district contains bars and cocktail bars, which are often frequented by higher income people, but few places to eat in the evening - as such I would recommend the BS8 area (Clifton, Highland, Hotwells and Leigh Woods) as a location for a new high-end restaurant in Bristol.