<h1>Data gathering </h1>

Import libraries for data gathering

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

Define Wikipedia link and table 

In [2]:
wiki_url = 'https://en.wikipedia.org/wiki/OX_postcode_area'
oxford_class = 'wikitable sortable'

Get data table (Oxford Post Codes and Locations) from wikipedia

In [3]:
response = requests.get(wiki_url)
soup = BeautifulSoup(response.text, 'html.parser')

oxfordshire_table = soup.find('table', attrs={'class': oxford_class})
df1=pd.read_html(str(oxfordshire_table))

#create a dataframe from the list
df = df1[0]

#check first 5 rows
df.head()

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area(s)
0,OX1,OXFORD,"Central and South Oxford, Kennington, Boars Hi...","Oxford, Vale of White Horse"
1,OX2,OXFORD,"North and West Oxford, Botley, North Hinksey, ...","Oxford, Vale of White Horse, Cherwell"
2,OX3,OXFORD,"North East Oxford, Beckley, Headington, Marsto...","Oxford, South Oxfordshire, Cherwell"
3,OX4,OXFORD,"East Oxford, Cowley, Blackbird Leys, Littlemor...","Oxford, South Oxfordshire"
4,OX5,KIDLINGTON,"Kidlington, Yarnton, Begbroke, Tackley, Murcot...","Cherwell, West Oxfordshire, Buckinghamshire"


Install few more things

In [4]:
!pip install folium
!pip install geocoder

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.3 MB/s eta 0:00:011
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.9 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


The below cell is hidden as it has credential details which shouldn't be shared. The name of the imported dataframe is df_data_1, CSV, it has the Latitude and Longitude values


In [5]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postcode district,Latitude,Longitude
0,OX1,51.75467,-1.25489
1,OX2,51.761534,-1.27847
2,OX3,51.772105,-1.24118
3,OX4,51.74698,-1.2344
4,OX5,51.831672,-1.265074


Merge the two dataframes together

In [6]:
df['Latitude'] = df['Postcode district'].map(df_data_1.set_index('Postcode district')['Latitude'])
df['Longitude'] = df['Postcode district'].map(df_data_1.set_index('Postcode district')['Longitude'])
df.head()

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area(s),Latitude,Longitude
0,OX1,OXFORD,"Central and South Oxford, Kennington, Boars Hi...","Oxford, Vale of White Horse",51.75467,-1.25489
1,OX2,OXFORD,"North and West Oxford, Botley, North Hinksey, ...","Oxford, Vale of White Horse, Cherwell",51.761534,-1.27847
2,OX3,OXFORD,"North East Oxford, Beckley, Headington, Marsto...","Oxford, South Oxfordshire, Cherwell",51.772105,-1.24118
3,OX4,OXFORD,"East Oxford, Cowley, Blackbird Leys, Littlemor...","Oxford, South Oxfordshire",51.74698,-1.2344
4,OX5,KIDLINGTON,"Kidlington, Yarnton, Begbroke, Tackley, Murcot...","Cherwell, West Oxfordshire, Buckinghamshire",51.831672,-1.265074


Create a map around Oxford, first find Oxford coordinatas

In [8]:
from geopy.geocoders import Nominatim

address = 'Oxford'

geolocator = Nominatim(user_agent="oxford_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Oxford are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Oxford are 51.7520131, -1.2578499.


In [9]:
import folium # map rendering library

# create map of New York using latitude and longitude values
map_full = folium.Map(location=[latitude, longitude], zoom_start=10)


# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Post town'], df['Coverage']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_full) 

map_full

Simplify the above map and segment and cluster only the neighborhoods in Oxford city (OX1, OX2, OX3, OX4, OX33, OX44). So let's slice the original dataframe and create a new dataframe for Oxford - but as OX33 and OX44 are not close to the city centre, exclude them from now.

In [12]:
oxford_data = df[(df['Postcode district'] == 'OX1') | (df['Postcode district'] == 'OX2') | (df['Postcode district'] == 'OX3') | (df['Postcode district'] == 'OX4')].reset_index(drop=True)
oxford_data.head()

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area(s),Latitude,Longitude
0,OX1,OXFORD,"Central and South Oxford, Kennington, Boars Hi...","Oxford, Vale of White Horse",51.75467,-1.25489
1,OX2,OXFORD,"North and West Oxford, Botley, North Hinksey, ...","Oxford, Vale of White Horse, Cherwell",51.761534,-1.27847
2,OX3,OXFORD,"North East Oxford, Beckley, Headington, Marsto...","Oxford, South Oxfordshire, Cherwell",51.772105,-1.24118
3,OX4,OXFORD,"East Oxford, Cowley, Blackbird Leys, Littlemor...","Oxford, South Oxfordshire",51.74698,-1.2344


Let's visualize central Oxford and the neighborhoods in it.

In [13]:
# create map ofCentral Oxford using latitude and longitude values
map_oxford = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(oxford_data['Latitude'], oxford_data['Longitude'], oxford_data['Coverage']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_oxford)  
    
map_oxford

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.
Define Foursquare Credentials and Version
The below cell is hidden because of the credential details

In [14]:
# The code was removed by Watson Studio for sharing.

Get the neighborhood's latitude and longitude values.

In [16]:
neighborhood_latitude = oxford_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = oxford_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = oxford_data.loc[0, 'Postcode district'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of OX1 are 51.754670000000004, -1.25489.


Now, let's get the top 100 venues that are in OX1 within a radius of 500 meters.
First, let's create the GET request URL.

In [17]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

Send the GET request and examine the resutls

In [18]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fa91665588af57b170bf92f'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Oxford',
  'headerFullLocation': 'Oxford',
  'headerLocationGranularity': 'city',
  'totalResults': 84,
  'suggestedBounds': {'ne': {'lat': 51.75917000450001,
    'lng': -1.2476341272268734},
   'sw': {'lat': 51.7501699955, 'lng': -1.2621458727731267}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4be54697cf200f477004133c',
       'name': "Blackwell's",
       'location': {'address': '48-51 Broad St',
        'lat': 51.75463543031628,
        'lng': -1.2555174561115983,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.75463543031628,
          'lng'

As all the information is in the _items_ key we need to acces that. To proceed, let's use the **get_category_type** function used in the Foursquare lab.

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean the json and structure it into a pandas dataframe.

In [21]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Blackwell's,Bookstore,51.754635,-1.255517
1,Radcliffe Square,Plaza,51.753114,-1.253618
2,Blackwell's Art & Poster Shop,Art Gallery,51.754303,-1.256413
3,The Alternative Tuck Shop,Sandwich Place,51.755106,-1.251797
4,The Turf Tavern,Pub,51.754657,-1.253032


 Check how many venues were returned by Foursquare:

In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

84 venues were returned by Foursquare.


<h1> Explore Neighborhoods in Oxford <\h1>

Create a function to repeat the same process to all the neighborhoods/coverage in Oxford.

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each neighborhood and create a new dataframe called oxford_venues.


In [25]:
oxford_venues = getNearbyVenues(names=oxford_data['Coverage'],
                                   latitudes=oxford_data['Latitude'],
                                   longitudes=oxford_data['Longitude']
                                  )

Central and South Oxford, Kennington, Boars Hill, New Hinksey, South Hinksey, Osney
North and West Oxford, Botley, North Hinksey, Summertown, Wytham, Jericho, Wolvercote
North East Oxford, Beckley, Headington, Marston, Elsfield, Noke, Woodeaton, Woodperry
East Oxford, Cowley, Blackbird Leys, Littlemore Sandford-on-Thames, Iffley, Rose Hill


Check the size of the resulting Dataframe

In [26]:
print(oxford_venues.shape)
oxford_venues.head()

(130, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Central and South Oxford, Kennington, Boars Hi...",51.75467,-1.25489,Blackwell's,51.754635,-1.255517,Bookstore
1,"Central and South Oxford, Kennington, Boars Hi...",51.75467,-1.25489,Radcliffe Square,51.753114,-1.253618,Plaza
2,"Central and South Oxford, Kennington, Boars Hi...",51.75467,-1.25489,Blackwell's Art & Poster Shop,51.754303,-1.256413,Art Gallery
3,"Central and South Oxford, Kennington, Boars Hi...",51.75467,-1.25489,The Alternative Tuck Shop,51.755106,-1.251797,Sandwich Place
4,"Central and South Oxford, Kennington, Boars Hi...",51.75467,-1.25489,The Turf Tavern,51.754657,-1.253032,Pub


try to find czech in the data

Check how many venues were returned by category values

In [27]:
oxford_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Art Gallery,2,2,2,2,2,2
Asian Restaurant,1,1,1,1,1,1
Bakery,2,2,2,2,2,2
Bar,1,1,1,1,1,1
Bed & Breakfast,1,1,1,1,1,1
...,...,...,...,...,...,...
Sushi Restaurant,2,2,2,2,2,2
Tennis Court,1,1,1,1,1,1
Thai Restaurant,4,4,4,4,4,4
Theater,1,1,1,1,1,1


print('There are {} uniques categories.'.format(len(oxford_venues['Venue Category'].unique())))

<h1> Analyse neighborhoods</h1>

In [31]:
# one hot encoding
oxford_onehot = pd.get_dummies(oxford_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
oxford_onehot['Neighborhood'] = oxford_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [oxford_onehot.columns[-1]] + list(oxford_onehot.columns[:-1])
oxford_onehot = oxford_onehot[fixed_columns]

oxford_onehot.head()

Unnamed: 0,Neighborhood,Art Gallery,Asian Restaurant,Bakery,Bar,Bed & Breakfast,Beer Bar,Bookstore,Brazilian Restaurant,Breakfast Spot,...,Scandinavian Restaurant,Science Museum,Spanish Restaurant,Stationery Store,Steakhouse,Sushi Restaurant,Tennis Court,Thai Restaurant,Theater,Turkish Restaurant
0,"Central and South Oxford, Kennington, Boars Hi...",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Central and South Oxford, Kennington, Boars Hi...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Central and South Oxford, Kennington, Boars Hi...",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Central and South Oxford, Kennington, Boars Hi...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Central and South Oxford, Kennington, Boars Hi...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Check the new dataframe size

In [32]:
oxford_onehot.shape

(130, 72)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
oxford_grouped = oxford_onehot.groupby('Neighborhood').mean().reset_index()
oxford_grouped

Unnamed: 0,Neighborhood,Art Gallery,Asian Restaurant,Bakery,Bar,Bed & Breakfast,Beer Bar,Bookstore,Brazilian Restaurant,Breakfast Spot,...,Scandinavian Restaurant,Science Museum,Spanish Restaurant,Stationery Store,Steakhouse,Sushi Restaurant,Tennis Court,Thai Restaurant,Theater,Turkish Restaurant
0,"Central and South Oxford, Kennington, Boars Hi...",0.02381,0.0,0.02381,0.0,0.0,0.0,0.035714,0.011905,0.011905,...,0.011905,0.011905,0.0,0.011905,0.011905,0.011905,0.011905,0.047619,0.011905,0.0
1,"East Oxford, Cowley, Blackbird Leys, Littlemor...",0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,...,0.0,0.0,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0,0.051282
2,"North East Oxford, Beckley, Headington, Marsto...",0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"North and West Oxford, Botley, North Hinksey, ...",0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Check the new size

In [34]:
oxford_grouped.shape

(4, 72)

Print each neighborhood along with the top 5 most common venues

In [35]:
num_top_venues = 5

for hood in oxford_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = oxford_grouped[oxford_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central and South Oxford, Kennington, Boars Hill, New Hinksey, South Hinksey, Osney----
             venue  freq
0      Coffee Shop  0.10
1              Pub  0.07
2             Café  0.07
3  Thai Restaurant  0.05
4            Hotel  0.04


----East Oxford, Cowley, Blackbird Leys, Littlemore Sandford-on-Thames, Iffley, Rose Hill----
                venue  freq
0                 Pub  0.18
1         Pizza Place  0.08
2  Turkish Restaurant  0.05
3        Burger Joint  0.05
4      Sandwich Place  0.03


----North East Oxford, Beckley, Headington, Marston, Elsfield, Noke, Woodeaton, Woodperry----
                    venue  freq
0                     Bar  0.25
1                     Pub  0.25
2       Convenience Store  0.25
3  Furniture / Home Store  0.25
4             Art Gallery  0.00


----North and West Oxford, Botley, North Hinksey, Summertown, Wytham, Jericho, Wolvercote----
             venue  freq
0  Bed & Breakfast  0.33
1  Harbor / Marina  0.33
2             Farm  0.33
3      Art

Put these into a pandas dataframe

First, let's write a function to sort the venues in descending order.


In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [46]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = oxford_grouped['Neighborhood']

for ind in np.arange(oxford_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(oxford_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Central and South Oxford, Kennington, Boars Hi...",Coffee Shop,Pub,Café,Thai Restaurant,History Museum,Hotel,Sandwich Place,Bookstore,English Restaurant,Burger Joint
1,"East Oxford, Cowley, Blackbird Leys, Littlemor...",Pub,Pizza Place,Turkish Restaurant,Burger Joint,Greek Restaurant,Moroccan Restaurant,Mediterranean Restaurant,Italian Restaurant,Indian Restaurant,Ice Cream Shop
2,"North East Oxford, Beckley, Headington, Marsto...",Convenience Store,Bar,Pub,Furniture / Home Store,Turkish Restaurant,English Restaurant,Concert Hall,Deli / Bodega,Dessert Shop,Diner
3,"North and West Oxford, Botley, North Hinksey, ...",Harbor / Marina,Farm,Bed & Breakfast,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,English Restaurant,Farmers Market


<h1> Cluster Neighborhoods</h1>

In [47]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 2

oxford_grouped_clustering = oxford_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(oxford_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [48]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

oxford_merged = oxford_data

# merge oxford_grouped with oxford_data to add latitude/longitude for each neighborhood
oxford_merged = oxford_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Coverage')

oxford_merged.head() # check the last columns!

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area(s),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,OX1,OXFORD,"Central and South Oxford, Kennington, Boars Hi...","Oxford, Vale of White Horse",51.75467,-1.25489,1,Coffee Shop,Pub,Café,Thai Restaurant,History Museum,Hotel,Sandwich Place,Bookstore,English Restaurant,Burger Joint
1,OX2,OXFORD,"North and West Oxford, Botley, North Hinksey, ...","Oxford, Vale of White Horse, Cherwell",51.761534,-1.27847,0,Harbor / Marina,Farm,Bed & Breakfast,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,English Restaurant,Farmers Market
2,OX3,OXFORD,"North East Oxford, Beckley, Headington, Marsto...","Oxford, South Oxfordshire, Cherwell",51.772105,-1.24118,1,Convenience Store,Bar,Pub,Furniture / Home Store,Turkish Restaurant,English Restaurant,Concert Hall,Deli / Bodega,Dessert Shop,Diner
3,OX4,OXFORD,"East Oxford, Cowley, Blackbird Leys, Littlemor...","Oxford, South Oxfordshire",51.74698,-1.2344,1,Pub,Pizza Place,Turkish Restaurant,Burger Joint,Greek Restaurant,Moroccan Restaurant,Mediterranean Restaurant,Italian Restaurant,Indian Restaurant,Ice Cream Shop


Visualize the resulting clusters

In [52]:

import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(oxford_merged['Latitude'], oxford_merged['Longitude'], oxford_merged['Coverage'], oxford_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h1> Examine Clusters </h1>

Cluster 1

In [53]:
oxford_merged.loc[oxford_merged['Cluster Labels'] == 0, oxford_merged.columns[[1] + list(range(5, oxford_merged.shape[1]))]]

Unnamed: 0,Post town,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,OXFORD,-1.27847,0,Harbor / Marina,Farm,Bed & Breakfast,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,English Restaurant,Farmers Market


Cluster 2

In [54]:
oxford_merged.loc[oxford_merged['Cluster Labels'] == 1, oxford_merged.columns[[1] + list(range(5, oxford_merged.shape[1]))]]

Unnamed: 0,Post town,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,OXFORD,-1.25489,1,Coffee Shop,Pub,Café,Thai Restaurant,History Museum,Hotel,Sandwich Place,Bookstore,English Restaurant,Burger Joint
2,OXFORD,-1.24118,1,Convenience Store,Bar,Pub,Furniture / Home Store,Turkish Restaurant,English Restaurant,Concert Hall,Deli / Bodega,Dessert Shop,Diner
3,OXFORD,-1.2344,1,Pub,Pizza Place,Turkish Restaurant,Burger Joint,Greek Restaurant,Moroccan Restaurant,Mediterranean Restaurant,Italian Restaurant,Indian Restaurant,Ice Cream Shop
