# Capstone Project 

# Segementation and Clustering Neighborhoods in north-central Boroughs of Kent County

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in New York City. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href='#item1'>Downloading Dataset</a>
    

2. <a href="#item2">Get latitude and longtiude of each town</a>
    

3. <a href="#item3">lets explore the Medway and nearby towns</a>
    

4. <a href="#item4">Get nearby Venues in each town to establish similarity</a>
    

5. <a href="#item5">Clustering Towns</a>    
    
    
6. <a href="#item6">Examine Clusters</a>    
    
    
7. <a href="#item7">Conclusion</a>    


    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: - 
  - anaconda/osx-64::ca-certificates-2020.1.1-0, anaconda/osx-64::certifi-2019.11.28-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_4
  - anaconda/osx-64::ca-certificates-2020.1.1-0, anaconda/osx-64::certifi-2019.11.28-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_4
  - anaconda/osx-64::ca-certificates-2020.1.1-0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_4, defaults/osx-64::certifi-2019.11.28-py37_0
  - anaconda/osx-64::ca-certificates-2020.1.1-0, defaults/osx-64::certifi-2019.11.28-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_4
  - anaconda/osx-64::openssl-1.1.1d-h1de35cc_4, defaults/osx-64::ca-certificates-2020.1.1-0, defaults/osx-64::certifi-2019.11.28-py37_0
  - defaults/osx-64::ca-certificates-2020.1.1-0, defaults/osx-64::certifi-2019.11.28-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_4
  - anaconda/osx-64::certifi-2019.11.28-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_4, defaults/o

<a id='item1'></a>

## 1. Downloading Dataset

### Introduction to the North Central Kent Area

The ME postcode area, also known as the Rochester postcode area,is a group of 20 postcode districts in South East England, within 11 post towns. These cover **north central Kent**, including the Medway unitary authority and the borough of Swale, plus parts of the boroughs of Maidstone, Tonbridge and Malling, and Gravesham

### Lets import the data from wikipedia (https://en.wikipedia.org/wiki/ME_postcode_area)

In [3]:
url='https://en.wikipedia.org/wiki/ME_postcode_area'
d=pd.read_html(url)

In [4]:
# find number of tables found on the webpage
len(d)

4

In [6]:
#store the 2nd table on the webpage to the data frame
df_temp=d[1]
print(df_temp.shape)
df_temp.head()

(21, 4)


Unnamed: 0,Postcode district,Post town,Coverage,Local authority area
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling"
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham"
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone"


In [7]:
# Lets look at all the entries
df_temp

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling"
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham"
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone"
5,ME6,SNODLAND,Snodland,Tonbridge and Malling
6,ME7,GILLINGHAM,"Gillingham, Brompton, Hempstead, Bredhurst","Medway, Maidstone"
7,ME8,GILLINGHAM,"Rainham, Parkwood, Twydall, Hempstead, Wigmore",Medway
8,ME9,SITTINGBOURNE,"Newington, Teynham, Iwade and Rural",Swale
9,ME10,SITTINGBOURNE,"Sittingbourne, Kemsley, Milton Regis",Swale


Last row has non-geographical data (Jobceter Plus) which is not required and hence needs to be removed from our analysis

In [11]:
# remove column with 
df_temp = df_temp[~df_temp['Local authority area'] .str.match('non-geographic')].reset_index(drop=True)
df_temp

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling"
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham"
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone"
5,ME6,SNODLAND,Snodland,Tonbridge and Malling
6,ME7,GILLINGHAM,"Gillingham, Brompton, Hempstead, Bredhurst","Medway, Maidstone"
7,ME8,GILLINGHAM,"Rainham, Parkwood, Twydall, Hempstead, Wigmore",Medway
8,ME9,SITTINGBOURNE,"Newington, Teynham, Iwade and Rural",Swale
9,ME10,SITTINGBOURNE,"Sittingbourne, Kemsley, Milton Regis",Swale


In [13]:
df_temp.shape

(20, 4)

In [14]:
df=df_temp

<a id='item2'></a>

# Part 2. Get latitude and longtiude of each town 

In [15]:
!pip install geocoder



In [16]:
import geocoder

def get_latlon(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kent, United Kingdom'.format(postal_code))
        lat_lng_coords = g.latlng
    return lat_lng_coords




In [17]:
# test case for geocoder function
addr = 'Rochester, ME1'
result = get_latlon(addr)
result

[51.38916000000006, 0.506080000000054]

In [18]:
# test case for geocoder function
addr = 'Rochester, ME2'
result = get_latlon(addr)
result

[51.35475366552382, 0.4436869656040199]

In [19]:
lat=[]
longitude=[]
#for i in range(20):
for i in range(df.shape[0]):
    address=str( df['Post town'][i] + ", "+ df['Postcode district'][i] )
    result = get_latlon(address)
    lat.append(result[0])
    longitude.append(result[1])

In [20]:
df.head()

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling"
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham"
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone"


In [21]:
df['Latitude'] = lat

In [22]:
df['Longitude'] = longitude

In [23]:
df

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area,Latitude,Longitude
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling",51.38916,0.50608
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway,51.354754,0.443687
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham",51.427247,0.556319
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway,51.38048,0.521
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone",51.378091,0.53719
5,ME6,SNODLAND,Snodland,Tonbridge and Malling,51.32977,0.44844
6,ME7,GILLINGHAM,"Gillingham, Brompton, Hempstead, Bredhurst","Medway, Maidstone",51.390518,0.560961
7,ME8,GILLINGHAM,"Rainham, Parkwood, Twydall, Hempstead, Wigmore",Medway,51.368669,0.628575
8,ME9,SITTINGBOURNE,"Newington, Teynham, Iwade and Rural",Swale,51.34084,0.73791
9,ME10,SITTINGBOURNE,"Sittingbourne, Kemsley, Milton Regis",Swale,51.34175,0.73504


<a id='item3'></a>

In [24]:
df.head()

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area,Latitude,Longitude
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling",51.38916,0.50608
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway,51.354754,0.443687
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham",51.427247,0.556319
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway,51.38048,0.521
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone",51.378091,0.53719


# Part 3. lets explore the Medway and nearby towns

In [25]:
address = 'Medway, Kent, United Kingdom'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Medway are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Medway are 51.2011154, 0.3053652.


In [26]:
# create map of Medway  Postal Code locations using latitude and longitude values
map_medway = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Post town'], df['Coverage']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_medway)  
    
map_medway

In [27]:
# lets see all Town in Medway
df['Post town'].unique()
dt=df

In [28]:
dt = df_temp[df_temp['Local authority area'] .str.match('Medway')].reset_index(drop=True)
print(dt.shape)
dt



(7, 6)


Unnamed: 0,Postcode district,Post town,Coverage,Local authority area,Latitude,Longitude
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling",51.38916,0.50608
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway,51.354754,0.443687
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham",51.427247,0.556319
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway,51.38048,0.521
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone",51.378091,0.53719
5,ME7,GILLINGHAM,"Gillingham, Brompton, Hempstead, Bredhurst","Medway, Maidstone",51.390518,0.560961
6,ME8,GILLINGHAM,"Rainham, Parkwood, Twydall, Hempstead, Wigmore",Medway,51.368669,0.628575


In [29]:
dt=df

<a id='item4'></a>

# Part 4. Exploratory Data Analysis: 
Get nearby Venues in each town to establish similarity

### Define four square credentials and Version

In [30]:
# @hidden_cell
CLIENT_ID = 'SQGBVPHDHHUJYZWJFATZAZYID0EUVEUKHFPHE310HVW0GDWW' # your Foursquare ID
CLIENT_SECRET = 'CD1YWEV23WCIQYDN3L20L5V2ZSCMBTXBO5GEZ3ANYNTWIF0Q' # your Foursquare Secret
VERSION = '20180605'


#### Let's create a function to repeat the same process to all the towns in Medway

In [31]:
LIMIT = 100 # limit number of venues 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town', 
                  'Town Latitude', 
                  'Town Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *dt_venues*.

In [32]:
# Get venues for each town and create info in new data frame dt_venues

dt_venues = getNearbyVenues(names=dt['Post town'],
                                   latitudes=dt['Latitude'],
                                   longitudes=dt['Longitude']
                                  )



ROCHESTER
ROCHESTER
ROCHESTER
CHATHAM
CHATHAM
SNODLAND
GILLINGHAM
GILLINGHAM
SITTINGBOURNE
SITTINGBOURNE
QUEENBOROUGH
SHEERNESS
FAVERSHAM
MAIDSTONE
MAIDSTONE
MAIDSTONE
MAIDSTONE
MAIDSTONE
WEST MALLING
AYLESFORD


In [33]:
print(dt_venues.shape)
dt_venues.head()

(225, 7)


Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ROCHESTER,51.38916,0.50608,Rochester Castle,51.389378,0.501434,Castle
1,ROCHESTER,51.38916,0.50608,Rochester Cathedral,51.389151,0.502962,Church
2,ROCHESTER,51.38916,0.50608,The Eagle Tavern,51.387821,0.505605,Pub
3,ROCHESTER,51.38916,0.50608,The Coopers Arms,51.387915,0.50172,Pub
4,ROCHESTER,51.38916,0.50608,Don Vincenzo,51.388017,0.505315,Italian Restaurant


Let's check how many venues were returned for each Town

In [61]:
df_venues_per_town  = dt_venues.groupby('Town').count()['Venue'].to_frame()
df_venues_per_town.sort_values('Venue', ascending=0)


Unnamed: 0_level_0,Venue
Town,Unnamed: 1_level_1
SITTINGBOURNE,62
MAIDSTONE,48
ROCHESTER,46
CHATHAM,28
FAVERSHAM,18
GILLINGHAM,7
SNODLAND,5
AYLESFORD,4
QUEENBOROUGH,4
WEST MALLING,2


#### Let's find out how many unique categories can be curated from all the returned venues

In [62]:
print('There are {} uniques categories.'.format(len(dt_venues['Venue Category'].unique())))

There are 70 uniques categories.


In [65]:
dt_venues['Venue Category'].unique()

array(['Castle', 'Church', 'Pub', 'Italian Restaurant', 'Coffee Shop',
       'Café', 'Thai Restaurant', 'Pizza Place', 'Creperie',
       'History Museum', 'Bookstore', 'Train Station',
       'Chinese Restaurant', 'Tea Room', 'Platform', 'Sandwich Place',
       'Tourist Information Center', 'Hotel', 'Breakfast Spot',
       'Harbor / Marina', 'Fried Chicken Joint', 'Park', 'Bar',
       'Restaurant', 'Locksmith', 'Construction & Landscaping',
       'Clothing Store', 'Discount Store', 'Theater', 'Bakery',
       'Fast Food Restaurant', 'Department Store', 'Warehouse Store',
       'Supermarket', 'Bus Station', 'Bus Stop', 'Athletics & Sports',
       'Grocery Store', 'Furniture / Home Store', 'Playground',
       'Boat or Ferry', 'Food Service', 'Video Game Store', 'Pharmacy',
       'Stationery Store', 'Electronics Store', 'Cosmetics Shop',
       'Shopping Mall', 'Gym Pool', 'Gym', 'Indian Restaurant',
       'Pet Store', 'Fish & Chips Shop', 'Market', 'Pool',
       'Kebab Restau

In [66]:
df_gill = dt_venues[dt_venues['Town'] == 'CHATHAM']
df_gill

Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
46,CHATHAM,51.38048,0.521,Spoon World Buffet,51.383694,0.521343,Chinese Restaurant
47,CHATHAM,51.38048,0.521,Primark,51.382261,0.525843,Clothing Store
48,CHATHAM,51.38048,0.521,Costa Coffee,51.381382,0.527545,Coffee Shop
49,CHATHAM,51.38048,0.521,Sun Pier House,51.384422,0.520742,Tea Room
50,CHATHAM,51.38048,0.521,Starbucks,51.380386,0.521085,Coffee Shop
51,CHATHAM,51.38048,0.521,Waterstones,51.382579,0.525236,Bookstore
52,CHATHAM,51.38048,0.521,Tap 'n' Tin,51.381859,0.52226,Bar
53,CHATHAM,51.38048,0.521,TK Maxx,51.382951,0.524378,Clothing Store
54,CHATHAM,51.38048,0.521,Prince of Wales,51.38292,0.523707,Pub
55,CHATHAM,51.38048,0.521,B&M Store,51.384386,0.521586,Discount Store


In [67]:
dt_venues.groupby('Town').

SyntaxError: invalid syntax (<ipython-input-67-9b9cb11af4a4>, line 1)

## Analyze Each Town

In [68]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Town'] = dt_venues['Town'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Town,Asian Restaurant,Athletics & Sports,Bakery,Bar,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Castle,Chinese Restaurant,Church,Clothing Store,Coffee Shop,Construction & Landscaping,Cosmetics Shop,Creperie,Department Store,Discount Store,Electronics Store,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Service,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Golf Course,Grocery Store,Gym,Gym Pool,Harbor / Marina,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Kebab Restaurant,Locksmith,Market,Music Store,Nightclub,Optical Shop,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pool,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Shopping Plaza,Stationery Store,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Theater,Tourist Information Center,Train Station,Video Game Store,Warehouse Store,Women's Store
0,ROCHESTER,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,ROCHESTER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,ROCHESTER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,ROCHESTER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,ROCHESTER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [69]:
dt_onehot.shape

(225, 71)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [70]:
dt_grouped = dt_onehot.groupby('Town').mean().reset_index()
dt_grouped

Unnamed: 0,Town,Asian Restaurant,Athletics & Sports,Bakery,Bar,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Castle,Chinese Restaurant,Church,Clothing Store,Coffee Shop,Construction & Landscaping,Cosmetics Shop,Creperie,Department Store,Discount Store,Electronics Store,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Service,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Golf Course,Grocery Store,Gym,Gym Pool,Harbor / Marina,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Kebab Restaurant,Locksmith,Market,Music Store,Nightclub,Optical Shop,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pool,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Shopping Plaza,Stationery Store,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Theater,Tourist Information Center,Train Station,Video Game Store,Warehouse Store,Women's Store
0,AYLESFORD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
1,CHATHAM,0.0,0.035714,0.035714,0.035714,0.0,0.0,0.035714,0.0,0.0,0.071429,0.035714,0.0,0.0,0.071429,0.0,0.142857,0.071429,0.0,0.0,0.0,0.035714,0.071429,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0
2,FAVERSHAM,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.222222,0.0,0.055556,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0
3,GILLINGHAM,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,MAIDSTONE,0.020833,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.020833,0.020833,0.0,0.041667,0.0,0.0,0.0,0.041667,0.083333,0.020833,0.0,0.0,0.041667,0.0,0.0,0.020833,0.020833,0.0,0.0,0.020833,0.0,0.0,0.020833,0.041667,0.0,0.0,0.0,0.020833,0.020833,0.0,0.041667,0.0,0.0,0.0,0.020833,0.020833,0.020833,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.020833,0.083333,0.020833,0.020833,0.0,0.020833,0.020833,0.020833,0.020833,0.0625,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.020833
5,QUEENBOROUGH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
6,ROCHESTER,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.043478,0.0,0.0,0.0,0.065217,0.021739,0.021739,0.021739,0.0,0.043478,0.043478,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.043478,0.043478,0.0,0.086957,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.043478,0.0,0.0,0.0,0.173913,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.021739,0.043478,0.0,0.0,0.0
7,SHEERNESS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,SITTINGBOURNE,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.032258,0.064516,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.080645,0.0,0.0,0.0,0.0,0.064516,0.0,0.064516,0.032258,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.016129,0.0,0.0,0.0,0.0,0.129032,0.0,0.064516,0.0,0.032258,0.0,0.032258,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0
9,SNODLAND,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0


#### Let's confirm the new size

In [71]:
dt_grouped.shape

(11, 71)

#### Let's print each Town along with the top 5 most common venues

In [73]:
num_top_venues = 7

for hood in dt_grouped['Town']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Town'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AYLESFORD----
                    venue  freq
0           Train Station  0.50
1       Fish & Chips Shop  0.25
2  Furniture / Home Store  0.25
3        Asian Restaurant  0.00
4            Optical Shop  0.00
5                Platform  0.00
6             Pizza Place  0.00


----CHATHAM----
                venue  freq
0      Clothing Store  0.14
1  Chinese Restaurant  0.07
2      Sandwich Place  0.07
3      Discount Store  0.07
4         Coffee Shop  0.07
5         Bus Station  0.07
6                 Pub  0.07


----FAVERSHAM----
           venue  freq
0            Pub  0.33
1       Platform  0.22
2            Bar  0.11
3  Grocery Store  0.06
4           Park  0.06
5         Market  0.06
6           Pool  0.06


----GILLINGHAM----
           venue  freq
0            Pub  0.29
1           Park  0.14
2     Playground  0.14
3      Locksmith  0.14
4  Boat or Ferry  0.14
5   Food Service  0.14
6       Platform  0.00


----MAIDSTONE----
                venue  freq
0                 Pub  0.08

First, let's write a function to sort the venues in descending order.

In [74]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [75]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Town']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Town'] = dt_grouped['Town']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AYLESFORD,Train Station,Fish & Chips Shop,Furniture / Home Store,Fast Food Restaurant,Cosmetics Shop,Creperie,Department Store,Discount Store,Electronics Store,English Restaurant
1,CHATHAM,Clothing Store,Chinese Restaurant,Sandwich Place,Discount Store,Coffee Shop,Pub,Bus Station,Tea Room,Bus Stop,Athletics & Sports
2,FAVERSHAM,Pub,Platform,Bar,Grocery Store,Market,Park,Pool,Kebab Restaurant,Train Station,Fish & Chips Shop
3,GILLINGHAM,Pub,Playground,Boat or Ferry,Food Service,Park,Locksmith,Electronics Store,Cosmetics Shop,Creperie,Department Store
4,MAIDSTONE,Coffee Shop,Pub,Supermarket,Pizza Place,Italian Restaurant,Café,Clothing Store,Department Store,Grocery Store,Women's Store


<a id='item5'></a>

# Part 5. Clustering Towns

Run *k*-means to cluster the Towns into 6 clusters.

In [85]:
# set number of clusters
kclusters = 5

dt_grouped_clustering = dt_grouped.drop('Town', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 2, 2, 0, 1, 0, 4, 0, 1], dtype=int32)

In [86]:
dt_grouped_clustering

Unnamed: 0,Asian Restaurant,Athletics & Sports,Bakery,Bar,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Bus Stop,Café,Castle,Chinese Restaurant,Church,Clothing Store,Coffee Shop,Construction & Landscaping,Cosmetics Shop,Creperie,Department Store,Discount Store,Electronics Store,English Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Service,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Golf Course,Grocery Store,Gym,Gym Pool,Harbor / Marina,History Museum,Hotel,Indian Restaurant,Italian Restaurant,Kebab Restaurant,Locksmith,Market,Music Store,Nightclub,Optical Shop,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pool,Portuguese Restaurant,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Shopping Plaza,Stationery Store,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Theater,Tourist Information Center,Train Station,Video Game Store,Warehouse Store,Women's Store
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
1,0.0,0.035714,0.035714,0.035714,0.0,0.0,0.035714,0.0,0.0,0.071429,0.035714,0.0,0.0,0.071429,0.0,0.142857,0.071429,0.0,0.0,0.0,0.035714,0.071429,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0
2,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.222222,0.0,0.055556,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.020833,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.020833,0.020833,0.0,0.041667,0.0,0.0,0.0,0.041667,0.083333,0.020833,0.0,0.0,0.041667,0.0,0.0,0.020833,0.020833,0.0,0.0,0.020833,0.0,0.0,0.020833,0.041667,0.0,0.0,0.0,0.020833,0.020833,0.0,0.041667,0.0,0.0,0.0,0.020833,0.020833,0.020833,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.020833,0.083333,0.020833,0.020833,0.0,0.020833,0.020833,0.020833,0.020833,0.0625,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.020833
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
6,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.043478,0.0,0.0,0.0,0.065217,0.021739,0.021739,0.021739,0.0,0.043478,0.043478,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.043478,0.043478,0.0,0.086957,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.043478,0.0,0.0,0.0,0.173913,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.021739,0.043478,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.032258,0.064516,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.080645,0.0,0.0,0.0,0.0,0.064516,0.0,0.064516,0.032258,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.016129,0.0,0.0,0.0,0.0,0.129032,0.0,0.064516,0.0,0.032258,0.0,0.032258,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0


In [87]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Clusters Labelss', kmeans.labels_)

dt_merged = dt

# merge medway_grouped with medway_data to add latitude/longitude for each neighborhood
dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Town'), on='Post town')

dt_merged.head() # check the last columns!

Unnamed: 0,Postcode district,Post town,Coverage,Local authority area,Latitude,Longitude,Clusters Labelss,Clusters Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ME1,ROCHESTER,"Rochester, Borstal, Burham, Wouldham","Medway, Tonbridge and Malling",51.38916,0.50608,0,4,0,Pub,Italian Restaurant,Café,Construction & Landscaping,Tea Room,Breakfast Spot,Coffee Shop,Platform,Hotel,Train Station
1,ME2,ROCHESTER,"Strood, Halling, Cuxton, Frindsbury",Medway,51.354754,0.443687,0,4,0,Pub,Italian Restaurant,Café,Construction & Landscaping,Tea Room,Breakfast Spot,Coffee Shop,Platform,Hotel,Train Station
2,ME3,ROCHESTER,"Rural, Hoo Peninsula, Higham","Medway, Gravesham",51.427247,0.556319,0,4,0,Pub,Italian Restaurant,Café,Construction & Landscaping,Tea Room,Breakfast Spot,Coffee Shop,Platform,Hotel,Train Station
3,ME4,CHATHAM,"Chatham, Brompton, Luton, St. Mary's Island",Medway,51.38048,0.521,0,4,0,Clothing Store,Chinese Restaurant,Sandwich Place,Discount Store,Coffee Shop,Pub,Bus Station,Tea Room,Bus Stop,Athletics & Sports
4,ME5,CHATHAM,"Walderslade, Blue Bell Hill, Lordswood, Luton","Medway, Tonbridge and Malling & Maidstone",51.378091,0.53719,0,4,0,Clothing Store,Chinese Restaurant,Sandwich Place,Discount Store,Coffee Shop,Pub,Bus Station,Tea Room,Bus Stop,Athletics & Sports


Finally, let's visualize the resulting clusters

In [84]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Post town'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [210]:
dt_merged['Cluster Labels'].unique()

array([3, 4, 1, 0, 2, 5], dtype=int32)

<a id='item6'></a>

# Part 6. Examine Clusters

As we can see above from above map, Cluster 3 with Rochester, Chatham, Maidstone, towns is quite  popular and similar in terms of their venue characteristics. 

Since we are interested in finding the prospects of opening a restaurant, lets find out the number of existing similar business in our cluster

In [106]:
df_food = dt_venues[dt_venues['Venue Category'] .str.contains('Food|Restaurant|Pizza|Sandwich')]
df_food.groupby('Town').count()

Unnamed: 0_level_0,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CHATHAM,6,6,6,6,6,6
FAVERSHAM,1,1,1,1,1,1
GILLINGHAM,1,1,1,1,1,1
MAIDSTONE,11,11,11,11,11,11
ROCHESTER,9,9,9,9,9,9
SITTINGBOURNE,12,12,12,12,12,12
WEST MALLING,1,1,1,1,1,1


In [114]:
df_food = dt_venues[dt_venues['Venue Category'] .str.contains('Restaurant|Pizza|Sandwich|Cafe|Coffee|Pub|Hotel|Food|Sandwich')]
df_food_1 = df_food.groupby('Town').count()['Venue'].to_frame()
df_food_1.sort_values('Venue', ascending=0)

Unnamed: 0_level_0,Venue
Town,Unnamed: 1_level_1
SITTINGBOURNE,24
ROCHESTER,21
MAIDSTONE,20
CHATHAM,10
FAVERSHAM,7
GILLINGHAM,3
SHEERNESS,1
WEST MALLING,1


Rochester and Sittingbourne have highest Food Businesses followed by Maidstone.

Now lets compare other venues between Rochester and Sittingbourne

In [103]:
np_rochester = dt_venues[dt_venues['Town'] == 'ROCHESTER']['Venue Category'].unique()
df_rochester = pd.DataFrame(data=np_rochester, columns=['Rochester'])
df_rochester

Unnamed: 0,Rochester
0,Castle
1,Church
2,Pub
3,Italian Restaurant
4,Coffee Shop
5,Café
6,Thai Restaurant
7,Pizza Place
8,Creperie
9,History Museum


In [104]:
np_sittingbourne = dt_venues[dt_venues['Town'] == 'SITTINGBOURNE']['Venue Category'].unique()
df_sittingbourne = pd.DataFrame(data=np_sittingbourne, columns=['Sittingbourne'])
df_sittingbourne

Unnamed: 0,Sittingbourne
0,Coffee Shop
1,Pub
2,Video Game Store
3,Pharmacy
4,Clothing Store
5,Bakery
6,Grocery Store
7,Supermarket
8,Sandwich Place
9,Furniture / Home Store


In [270]:
dt_venues[dt_venues['Town'] == 'SITTINGBOURNE']['Venue Category'].unique()

array(['Coffee Shop', 'Pub', 'Video Game Store', 'Pharmacy',
       'Clothing Store', 'Bakery', 'Grocery Store', 'Supermarket',
       'Sandwich Place', 'Furniture / Home Store', 'Stationery Store',
       'Electronics Store', 'Fast Food Restaurant', 'Shopping Mall',
       'Gym Pool', 'Café', 'Gym', 'Indian Restaurant',
       'Chinese Restaurant', 'Pet Store', 'Pizza Place'], dtype=object)

As seen above, Rochester has varied characteristic such as History Museum, Castle and Church along with the train station and hence it is **highly likely** to attract more visitors. 

So let us select ***Rochester*** as our destination spot for opening the resturant

let us explore current Restaurant types already established in Rochester. This will give us an idea of variety of food taste existing in the local population as well

In [278]:
df_food = df_food.sort_values(['Town'])
df_food.reset_index(drop=True)
df_food_rochester = df_food[df_food['Town']=='ROCHESTER']
df_food_rochester.drop(columns=['Town Latitude', 'Town Longitude'], inplace=True)
df_food_rochester.reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Town,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ROCHESTER,Don Vincenzo,51.388017,0.505315,Italian Restaurant
1,ROCHESTER,Simply Italian,51.389762,0.503775,Italian Restaurant
2,ROCHESTER,The Garden House Cafe,51.388314,0.504903,Sandwich Place
3,ROCHESTER,Don Vincenzo,51.388017,0.505315,Italian Restaurant
4,ROCHESTER,PizzaExpress,51.390688,0.503061,Pizza Place
5,ROCHESTER,Olive E Capperi,51.391203,0.50255,Italian Restaurant
6,ROCHESTER,Thai Four Two,51.385634,0.510519,Thai Restaurant
7,ROCHESTER,Simply Italian,51.389762,0.503775,Italian Restaurant
8,ROCHESTER,Delizie Di Mamma Mia,51.389219,0.504963,Italian Restaurant
9,ROCHESTER,The Garden House Cafe,51.388314,0.504903,Sandwich Place


We can already see that Rochester already have thriving Italian, Chinese and Thai restaurant businesses

<a id='item7'></a>

# Part 7. Conclusion

In order to start a restaurant business in the North Central Kent area (which includes borough of Maidstone, Swale, Tonbridge, Gravesham and Medway Towns, **Rochester** is highly recommended for following reasons:
1. It has highest number of similar venues and characteristics with respect to other postal towns
2. It has nearby train station for easy commute
3. It includes varied types of venues such as Cathedral, Castle and Museum and hence likely to attract more visitors providing better business prospects
