# Identifying Neighbourhoods that will Benefit from Becoming an LTN in Bristol, UK

## Data Wrangling

### Bristol Neighbourhood Data 

We will be using the ArcGIS RESTful API to get latitudinal and longitudinal coordinates for each of the informal neighbourhoods in Bristol. But first, we need to actually get a list of the names neighbourhoods. We can do this by scraping [this Wikipedia page](https://en.wikipedia.org/wiki/Subdivisions_of_Bristol). We will use the Beautiful Soup 4 package to do this. 

We need to request the html content from the Wiki page first by using the `requests` package, then we can turn it into soup.

In [1]:
import requests 

bristol_url = "https://en.wikipedia.org/wiki/Subdivisions_of_Bristol"
r = requests.get(bristol_url)

In [2]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.text, 'html.parser')

Inspection of the Wiki page shows that each the the neighbourhoods is a bullet point in the 'Neighbourhoods' section - in HTML these are list (or li) tags. The useful text in the each of the list elements is inside an 'a' tag, which is a hyperlink. We can use Beautiful Soup to find all of these on a page.

Not all of the 'li' tags on the page have an 'a' tag (the ones we are interested in do), so for now we will use a try/except to skip over those that don't.

In [3]:
li_tags = soup.find_all('li')

titles = []
for tag in li_tags:
    try:
        titles.append(tag.a.string)
    except:
        pass
        
    
print(titles[:20])  # Print the first 20 elements 

[None, None, None, None, None, 'Bristol West', 'Bristol East', 'Bristol South', 'Bristol North West', 'Ashley', 'Avonmouth', 'Bedminster', 'Bishopston', 'Bishopsworth', 'Brislington East', 'Brislington West', 'Clifton', 'Clifton East', 'Cotham', 'Easton']


So far so good, but the problem is we have taken all bullet points on the page, not just those that are useful to us in the Neighbourhoods section. If we inspect the webpage again, we can see that 'Bristol City Centre' and 'Withywood' bookend the elements in that section and they are also the only times those strings are used on the page. This is us useful because we can use Python to get the index for each of those strings in our titles list and then slice the list based on those indices, leaving just the elements in the Neighbourhoods section.

In [4]:
begin_slice = titles.index("Bristol city centre")
end_slice = titles.index("Withywood")
neighbourhoods = titles[begin_slice:end_slice+1]
print(neighbourhoods)

['Bristol city centre', 'Arnos Vale', 'Ashley Down', 'Ashton Vale', 'Avonmouth', 'Aztec West', 'Baptist Mills', 'Barrs Court', 'Barton Hill', 'Bedminster', 'Bedminster Down', 'Begbrook', 'Bishopston', 'Bishopsworth', 'Blaise Hamlet', 'Botany Bay', 'Bower Ashton', 'Bradley Stoke', 'Brandon Hill', 'Brentry', 'Brislington', 'Broadmead', 'Broomhill', 'Broom Hill', 'Canons Marsh', 'Catbrain', 'Charlton Mead', 'Chester Park', 'Cheswick', 'Clay Hill', 'Clifton', 'Coombe Dingle', 'Cotham', "Crew's Hole", 'Crofts End', 'Downend', 'Eastville', 'Easton', 'Emersons Green', 'Filton', 'Filwood Park', 'Fishponds', 'Frenchay', 'Golden Hill', 'Greenbank', 'Hanham', 'Hartcliffe', 'Headley Park', 'Henbury', 'Hengrove', 'Henleaze', 'Hillfields', 'Horfield', 'Hotwells', 'Kensington Park', 'Kingsdown', 'Kingswood', 'Knowle', 'Knowle West', 'Lawrence Hill', 'Lawrence Weston', 'Leigh Woods', "Lewin's Mead", 'Lockleaze', 'Lodge Hill', 'Longwell Green', 'Mangotsfield', 'Mayfield Park', 'Monks Park', 'Montpelier

If we've done our job correctly, then we should have 116 elements in this list:

In [5]:
if len(neighbourhoods) == 116:
    print("The list is the correct length!")
else:
    print("Nope, something has gone wrong...")

The list is the correct length!


Great! Now we put these names into a DataFrame to use going forward.

In [6]:
import pandas as pd

neigh_df = pd.DataFrame({'neighbourhood': neighbourhoods})
neigh_df.head()

Unnamed: 0,neighbourhood
0,Bristol city centre
1,Arnos Vale
2,Ashley Down
3,Ashton Vale
4,Avonmouth


Next, we need to get the coordinates of each neighbourhood. We are going to use the ArcGIS RESTful API to achieve this. The API call is simple: we request a json file and for a given location search. Below is an example for the neighbourhood 'Bedminster':

In [7]:
url = f"https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&SingleLine=Bedminster, Bristol"

r = requests.get(url)
if r.status_code == 200:
    print("Success!")
else:
    print("The search did not work - try again!")

r.json()

Success!


{'spatialReference': {'wkid': 4326, 'latestWkid': 4326},
 'candidates': [{'address': 'Bedminster, Bristol, Avon, England',
   'location': {'x': -2.6091299999999364, 'y': 51.44023000000004},
   'score': 100,
   'attributes': {},
   'extent': {'xmin': -2.6191299999999362,
    'ymin': 51.430230000000044,
    'xmax': -2.5991299999999367,
    'ymax': 51.45023000000004}},
  {'address': 'Bedminster',
   'location': {'x': -2.5946299999999383, 'y': 51.43990000000008},
   'score': 100,
   'attributes': {},
   'extent': {'xmin': -2.599629999999938,
    'ymin': 51.43490000000008,
    'xmax': -2.5896299999999384,
    'ymax': 51.44490000000008}},
  {'address': 'Bedminster',
   'location': {'x': -2.595299999999952, 'y': 51.44181000000003},
   'score': 100,
   'attributes': {},
   'extent': {'xmin': -2.600299999999952,
    'ymin': 51.43681000000003,
    'xmax': -2.590299999999952,
    'ymax': 51.446810000000035}},
  {'address': 'Bedminster Parade, Bristol, Avon, England, BS3 4',
   'location': {'x': -

We can see that there are a number of results for a given search returned. Because ArcGIS ranks the returns in order of decreasing relevance, we will just assume that the top result in correct in any case and extract the coordinates from there.

In [8]:
bed = r.json()['candidates'][0]

bed_latlong = bed['location']
bed_lat = bed_latlong['x']
bed_long = bed_latlong['y']

print(f"Latitude = {bed_lat}, longitude ={bed_long}")

Latitude = -2.6091299999999364, longitude =51.44023000000004


Great, so we know this works. Next thing to do it to bake the above process into some functions so that we can perform the same thing for all 166 neighbourhoods in our dataframe.

In [9]:
def get_lat(row):
    name = row['neighbourhood']
    print(f"Getting latitudinal coordinates for {name}...")
    
    url = f"https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&SingleLine={name}, Bristol, UK"
    
    try:
        r = requests.get(url)
        if r.status_code == 200:
            print("Success!")
        else:
            print(f"The search did not work for the neighbourhood: {name}.")
    except:
        pass
        
    return r.json()['candidates'][0]['location']['y']

def get_long(row):
    name = row['neighbourhood']
    print(f"Getting longitudinal coordinates for {name}...")
    
    url = f"https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&SingleLine={name}, Bristol, UK"
    
    try:
        r = requests.get(url)
        if r.status_code == 200:
            print("Success!")
        else:
            print(f"The search did not work for the neighbourhood: {name}.")
    except:
        pass
        
    return r.json()['candidates'][0]['location']['x']

Now we simply apply these functions to each row of the DataFrame...

In [10]:
neigh_df['latitude'] = neigh_df.apply(get_lat, axis=1)
neigh_df['longitude'] = neigh_df.apply(get_long, axis=1)

Getting latitudinal coordinates for Bristol city centre...
Success!
Getting latitudinal coordinates for Arnos Vale...
Success!
Getting latitudinal coordinates for Ashley Down...
Success!
Getting latitudinal coordinates for Ashton Vale...
Success!
Getting latitudinal coordinates for Avonmouth...
Success!
Getting latitudinal coordinates for Aztec West...
Success!
Getting latitudinal coordinates for Baptist Mills...
Success!
Getting latitudinal coordinates for Barrs Court...
Success!
Getting latitudinal coordinates for Barton Hill...
Success!
Getting latitudinal coordinates for Bedminster...
Success!
Getting latitudinal coordinates for Bedminster Down...
Success!
Getting latitudinal coordinates for Begbrook...
Success!
Getting latitudinal coordinates for Bishopston...
Success!
Getting latitudinal coordinates for Bishopsworth...
Success!
Getting latitudinal coordinates for Blaise Hamlet...
Success!
Getting latitudinal coordinates for Botany Bay...
Success!
Getting latitudinal coordinates f

Success!
Getting longitudinal coordinates for Broomhill...
Success!
Getting longitudinal coordinates for Broom Hill...
Success!
Getting longitudinal coordinates for Canons Marsh...
Success!
Getting longitudinal coordinates for Catbrain...
Success!
Getting longitudinal coordinates for Charlton Mead...
Success!
Getting longitudinal coordinates for Chester Park...
Success!
Getting longitudinal coordinates for Cheswick...
Success!
Getting longitudinal coordinates for Clay Hill...
Success!
Getting longitudinal coordinates for Clifton...
Success!
Getting longitudinal coordinates for Coombe Dingle...
Success!
Getting longitudinal coordinates for Cotham...
Success!
Getting longitudinal coordinates for Crew's Hole...
Success!
Getting longitudinal coordinates for Crofts End...
Success!
Getting longitudinal coordinates for Downend...
Success!
Getting longitudinal coordinates for Eastville...
Success!
Getting longitudinal coordinates for Easton...
Success!
Getting longitudinal coordinates for Emer

In [11]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
neigh_df.head()

Unnamed: 0,neighbourhood,latitude,longitude
0,Bristol city centre,51.45966,-2.59001
1,Arnos Vale,51.44106,-2.55976
2,Ashley Down,51.47783,-2.58769
3,Ashton Vale,51.439143,-2.625922
4,Avonmouth,51.49898,-2.69432


In [12]:
import folium

# create map of Bristol using latitude and longitude values
map_bristol = folium.Map(location=[neigh_df['latitude'][0], neigh_df['longitude'][0]], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neigh_df['latitude'], neigh_df['longitude'], neigh_df['neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bristol)  
    
map_bristol

If we scroll out on the map, we can see that ArcGIS has mislocated a couple of those neighbourhoods, placing one near Sheffield (Botany Bay) and another up near the Scottish border (Cheswick). In the interests of time, we will drop these rows from the DataFrame.

In [13]:
to_drop = ['Botany Bay', 'Cheswick']
neigh_df = neigh_df[~neigh_df['neighbourhood'].isin(to_drop)].reset_index()

Now let's replot the map as we did before without these two neighbourhoods...

In [14]:
# create map of Bristol using latitude and longitude values
map_bristol = folium.Map(location=[neigh_df['latitude'][0], neigh_df['longitude'][0]], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neigh_df['latitude'], neigh_df['longitude'], neigh_df['neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bristol)  
    
map_bristol

## Getting venues from Foursquare

First we define the Foursquare credentials and version:

In [15]:
CLIENT_ID = 'ZEXAMNCJWFKN1Q4D3ONJFUJAL4TW41VVG1RDLGHGQQ5VE4TV' # your Foursquare ID
CLIENT_SECRET = 'CKCHFYI3FEZSJ0C5FA2I5AILO0UDKWGMNY0JCXMTMDIO24FM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZEXAMNCJWFKN1Q4D3ONJFUJAL4TW41VVG1RDLGHGQQ5VE4TV
CLIENT_SECRET:CKCHFYI3FEZSJ0C5FA2I5AILO0UDKWGMNY0JCXMTMDIO24FM


Let's explore one of the neighbourhoods. Picking a random number such as six to define the index of our dataframe that we will use for the search.

In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

print(f"Searching around... {neigh_df.loc[6, 'neighbourhood']}")

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neigh_df.loc[6, 'latitude'], 
    neigh_df.loc[6, 'longitude'], 
    radius, 
    LIMIT)
url 

Searching around... Baptist Mills


'https://api.foursquare.com/v2/venues/explore?&client_id=ZEXAMNCJWFKN1Q4D3ONJFUJAL4TW41VVG1RDLGHGQQ5VE4TV&client_secret=CKCHFYI3FEZSJ0C5FA2I5AILO0UDKWGMNY0JCXMTMDIO24FM&v=20180605&ll=51.468990000000076,-2.5752199999999448&radius=500&limit=100'

Now we can send a GET request and examine the results.

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fd5fa01547d005374093dc5'},
 'response': {'headerLocation': 'Bristol',
  'headerFullLocation': 'Bristol',
  'headerLocationGranularity': 'city',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': 51.47349000450008,
    'lng': -2.568009648959892},
   'sw': {'lat': 51.464489995500074, 'lng': -2.5824303510399975}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c48893e0f5aa593a8938176',
       'name': 'The Duke of York',
       'location': {'address': '2 Jubilee Rd.',
        'lat': 51.46726081293404,
        'lng': -2.5757451679408034,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.46726081293404,
          'lng': -2.5757451679408034}],
        'distance': 195,
        'postalCode': 'BS2 9RS',
        'cc': 

We can use a (quite contrived) list comprehension to sort through the json and extract the name and category of each venue we find:

In [18]:
venue_name = [result['venue']['name'] for result in results['response']['groups'][0]['items']]
venue_cat = [result['venue']['categories'][0]['name'] for result in results['response']['groups'][0]['items']]

print(venue_name)
print(venue_cat)

['The Duke of York', 'The Better Food Company', 'Bloc Climbing', 'Napolita Pizza', 'Mina Road Park', 'The Bristol Climbing Centre']
['Pub', 'Food', 'Climbing Gym', 'Pizza Place', 'Park', 'Climbing Gym']


We can also do the same for the lat and long coordinates of each venue:

In [19]:
venue_lat = [result['venue']['location']['lat'] for result in results['response']['groups'][0]['items']]
venue_long = [result['venue']['location']['lng'] for result in results['response']['groups'][0]['items']]

print(venue_lat)
print(venue_long)

[51.46726081293404, 51.46879860290838, 51.46806857520593, 51.46931769808646, 51.46797553005654, 51.471640476355184]
[-2.5757451679408034, -2.5784954703963665, -2.5705135370547323, -2.575540244579315, -2.5751135132298892, -2.5765376043714956]


Great. Both of the above look right. Now we define a function that can do the above process for every neighbourhood in our DataFrame. 

In [20]:
def get_venues(DataFrame, radius=500):
    
    venue_df = pd.DataFrame(columns=['neighbourhood', 'venue name', 'category', 'latitude', 'longitude'])
    
    neighs = DataFrame['neighbourhood']
    lats = DataFrame['latitude']
    longs = DataFrame['longitude']
    
    for neigh, lat, long in zip(neighs, lats, longs):
        
        venue_df_ = pd.DataFrame()
        
        print(f"Searching around... {neigh}")

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
        
        results = requests.get(url).json()
        venue_df_['venue name'] = [result['venue']['name'] for result in results['response']['groups'][0]['items']]
        venue_df_['category'] = [result['venue']['categories'][0]['name'] for result in results['response']['groups'][0]['items']]
        venue_df_['latitude'] = [result['venue']['location']['lat'] for result in results['response']['groups'][0]['items']]
        venue_df_['longitude'] = [result['venue']['location']['lng'] for result in results['response']['groups'][0]['items']]
        
        venue_df_['neighbourhood'] = neigh
        
        venue_df = pd.concat([venue_df, venue_df_], ignore_index=True)
        
    return venue_df

In [23]:
venues_df = get_venues(neigh_df)
venues_df.shape

Searching around... Bristol city centre
Searching around... Arnos Vale
Searching around... Ashley Down
Searching around... Ashton Vale
Searching around... Avonmouth
Searching around... Aztec West
Searching around... Baptist Mills
Searching around... Barrs Court
Searching around... Barton Hill
Searching around... Bedminster
Searching around... Bedminster Down
Searching around... Begbrook
Searching around... Bishopston
Searching around... Bishopsworth
Searching around... Blaise Hamlet
Searching around... Bower Ashton
Searching around... Bradley Stoke
Searching around... Brandon Hill
Searching around... Brentry
Searching around... Brislington
Searching around... Broadmead
Searching around... Broomhill
Searching around... Broom Hill
Searching around... Canons Marsh
Searching around... Catbrain
Searching around... Charlton Mead
Searching around... Chester Park
Searching around... Clay Hill
Searching around... Clifton
Searching around... Coombe Dingle
Searching around... Cotham
Searching aro

(1162, 5)

There's a good chance that the function picked up some duplicates since the neighbourhoods are quite closely packed together - there may have been some overlap. We can remove these like so:

In [24]:
venues_df.drop_duplicates(subset=['venue name'], inplace=True)
venues_df.reset_index(inplace=True)
venues_df.shape
venues_df.head()

Unnamed: 0,index,neighbourhood,venue name,category,latitude,longitude
0,0,Bristol city centre,Pieminister,Pie Shop,51.461261,-2.59074
1,1,Bristol city centre,Hampton by Hilton,Hotel,51.459609,-2.588604
2,2,Bristol city centre,Lush,Cosmetics Shop,51.457421,-2.590152
3,3,Bristol city centre,The Crazy Fox Coffee Bar,Café,51.457546,-2.589841
4,4,Bristol city centre,Eat a Pitta,Falafel Restaurant,51.457569,-2.58986


In [25]:
venues_df.drop(['index'], axis=1, inplace=True)
print(venues_df.shape)
venues_df.head()

(713, 5)


Unnamed: 0,neighbourhood,venue name,category,latitude,longitude
0,Bristol city centre,Pieminister,Pie Shop,51.461261,-2.59074
1,Bristol city centre,Hampton by Hilton,Hotel,51.459609,-2.588604
2,Bristol city centre,Lush,Cosmetics Shop,51.457421,-2.590152
3,Bristol city centre,The Crazy Fox Coffee Bar,Café,51.457546,-2.589841
4,Bristol city centre,Eat a Pitta,Falafel Restaurant,51.457569,-2.58986


To do:
- [ ] Find a meaningful way of categorising the venue categories - use categories below...
- [ ] Use DBSCAN to find close clusters of meaningful businesses
- [ ] Present the clusters with a geographic method

[Foursquare venue hierarchy](https://developer.foursquare.com/docs/build-with-foursquare/categories/)

## Finding meaning in the categories 

So far, we have simply categorised each of the venues we have found. However, any decisions we make later on on which areas to pedestrianise aren't going to need information as granular as 'Tapas Restaurant' or 'Cosmetics Shop'. What we really want is to be able to see the top-level category of each business so that decisions can be made on a macro-level. Luckily, Foursquare provides a hierarchy of that categories that can broken down like so:

In [26]:
hierarchy = requests.get(f'https://api.foursquare.com/v2/venues/categories?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}').json()
hierarchy

{'meta': {'code': 200, 'requestId': '5fd5facf6713ee22166a1fc2'},
 'response': {'categories': [{'id': '4d4b7104d754a06370d81259',
    'name': 'Arts & Entertainment',
    'pluralName': 'Arts & Entertainment',
    'shortName': 'Arts & Entertainment',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/default_',
     'suffix': '.png'},
    'categories': [{'id': '56aa371be4b08b9a8d5734db',
      'name': 'Amphitheater',
      'pluralName': 'Amphitheaters',
      'shortName': 'Amphitheater',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/default_',
       'suffix': '.png'},
      'categories': []},
     {'id': '4fceea171983d5d06c3e9823',
      'name': 'Aquarium',
      'pluralName': 'Aquariums',
      'shortName': 'Aquarium',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/aquarium_',
       'suffix': '.png'},
      'categories': []},
     {'id': '4bf58dd8d48988d1e1931735',
      'name': 'A

In [27]:
top_level = [category['name'] for category in hierarchy['response']['categories']]
top_level

['Arts & Entertainment',
 'College & University',
 'Event',
 'Food',
 'Nightlife Spot',
 'Outdoors & Recreation',
 'Professional & Other Places',
 'Residence',
 'Shop & Service',
 'Travel & Transport']

So the above are the top-level categories provided by Foursquare. We will need to write a loop that will loop through the json response and create a dictionary of detailed category values keyed to the top-level categories. The categories have four-levels so we will need to loop through each level:

In [28]:
cat_dict = {}

for top_level in hierarchy['response']['categories']:
    
    top_name = top_level['name']  # As before
    lower_names = []
    
    for second_level in top_level['categories']:
        lower_names.append(second_level['name'])
        
        for third_level in second_level['categories']:
            lower_names.append(third_level['name'])
            
            for fourth_level in third_level['categories']:
                lower_names.append(fourth_level['name'])
                
    cat_dict[top_name] = lower_names
    
cat_dict

{'Arts & Entertainment': ['Amphitheater',
  'Aquarium',
  'Arcade',
  'Art Gallery',
  'Bowling Alley',
  'Casino',
  'Circus',
  'Comedy Club',
  'Concert Hall',
  'Country Dance Club',
  'Disc Golf',
  'Escape Room',
  'Exhibit',
  'General Entertainment',
  'Go Kart Track',
  'Historic Site',
  'Karaoke Box',
  'Laser Tag',
  'Memorial Site',
  'Mini Golf',
  'Movie Theater',
  'Drive-in Theater',
  'Indie Movie Theater',
  'Multiplex',
  'Museum',
  'Art Museum',
  'Erotic Museum',
  'History Museum',
  'Planetarium',
  'Science Museum',
  'Music Venue',
  'Jazz Club',
  'Piano Bar',
  'Rock Club',
  'Pachinko Parlor',
  'Performing Arts Venue',
  'Dance Studio',
  'Indie Theater',
  'Opera House',
  'Theater',
  'Pool Hall',
  'Public Art',
  'Outdoor Sculpture',
  'Street Art',
  'Racecourse',
  'Racetrack',
  'Roller Rink',
  'Salsa Club',
  'Samba School',
  'Stadium',
  'Baseball Stadium',
  'Basketball Stadium',
  'Cricket Ground',
  'Football Stadium',
  'Hockey Arena',
  'R

 We can now loop through the DataFrame and replace the venue categories with these generalised versions:

In [31]:
def generalise_cat(row):
    spec_cat = row['category']
    
    for key, value in cat_dict.items():
        if spec_cat in value:
            return key 

In [32]:
venues_df['top_cat'] = venues_df.apply(generalise_cat, axis=1)
venues_df.head()

In [34]:
from sklearn.cluster import DBSCAN

clus_ds = venues_df[['latitude', 'loitude']]
clustering = DBSCAN(eps=0.002, min_samples=5).fit(clus_ds)

venues_df['cluster'] = clustering.labels_

unique_clusters = set(clustering.labels_)
unique_clusters

{-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}

So we're seeing 18 distinct clusters with this current set of DBSCAN parameters. We may choose to modify these parameters later on to give us more granularity, but for now let's colour these clusters on our map of Bristol.

In [35]:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as col

colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_clusters))).tolist()

In [36]:
# create map of Bristol using latitude and longitude values
bristol_venues = folium.Map(location=[venues_df['latitude'][0], venues_df['longitude'][0]], zoom_start=12)

# add markers to map
for lat, lng, clus in zip(venues_df['latitude'], venues_df['longitude'], venues_df['cluster']):
    if clus != -1:
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color=col.to_hex(colors[clus]),
            fill=True,
            fill_color=col.to_hex(colors[clus]),
            fill_opacity=0.7,
            parse_html=False).add_to(bristol_venues)  
    
bristol_venues

### Categorising the clusters

Now that we have a nice spread of clusters over Bristol, we will want to find out what the top-level category is for each one. This will be as simple as grouping the Dataframe by its clusters and taking the mode of the rest of the columns.

In [41]:
venues_df.head()

Unnamed: 0,neighbourhood,venue name,category,latitude,longitude,top_cat,cluster
0,Bristol city centre,Pieminister,Pie Shop,51.461261,-2.59074,Food,0
1,Bristol city centre,Hampton by Hilton,Hotel,51.459609,-2.588604,Travel & Transport,1
2,Bristol city centre,Lush,Cosmetics Shop,51.457421,-2.590152,Shop & Service,1
3,Bristol city centre,The Crazy Fox Coffee Bar,Café,51.457546,-2.589841,Food,1
4,Bristol city centre,Eat a Pitta,Falafel Restaurant,51.457569,-2.58986,Food,1


In [67]:
grouped_df = venues_df.groupby(['cluster']).top_cat.apply(lambda x: x.mode())
grouped_df

cluster   
-1       0                     Food
         1           Nightlife Spot
 0       0                     Food
         1           Nightlife Spot
 1       0                     Food
 2       0                     Food
 3       0                     Food
 4       0                     Food
 5       0                     Food
         1       Travel & Transport
 6       0           Shop & Service
 7       0                     Food
 8       0           Shop & Service
 9       0                     Food
 10      0           Nightlife Spot
         1    Outdoors & Recreation
 11      0                     Food
 12      0                     Food
         1           Nightlife Spot
 13      0           Shop & Service
 14      0           Shop & Service
 15      0           Nightlife Spot
         1           Shop & Service
 16      0                     Food
 17      0                     Food
 18      0                     Food
         1           Nightlife Spot
Name: top_cat, dt

In [66]:
# create map of Bristol using latitude and longitude values
bristol_venues = folium.Map(location=[venues_df['latitude'][0], venues_df['longitude'][0]], zoom_start=12)

# add markers to map
for lat, lng, clus, ven in zip(venues_df['latitude'], venues_df['longitude'], venues_df['cluster'], venues_df['venue name']):
    if clus != -1:
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color=col.to_hex(colors[clus]),
            fill=True,
            fill_color=col.to_hex(colors[clus]),
            fill_opacity=0.7,
            popup=ven,
            parse_html=False).add_to(bristol_venues)  
    
centroid_df = venues_df.groupby(['cluster']).mean()
for lat, long, clus in zip(centroid_df['latitude'], centroid_df['longitude'], centroid_df.index):
    folium.Marker([lat, long], popup=grouped_df.iloc[clus]).add_to(bristol_venues)
    
bristol_venues

Now, by clicking on the above labels, we can easily see which type of business exists in the clusters we have found. This can not be passed to the city planning authority to make a decision on where is a reasonable place to pedestrianise.