<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Overland Park</font></h1>

## Introduction

In this Notebook, I will convert neighborhood addresses into their corresponding latitude and longitude values. I will use the Foursquare API to explore neighborhoods in Overland Park, using the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I will use the *k*-means clustering algorithm to complete this task. Finally, I will use the Folium library to visualize the neighborhoods in Overland Park and their emerging clusters.

## Table of Contents

<div  style="margin-top: 20px">

<font size = 3>

1. Setup Dataset

2. Explore Neighborhoods in Overland Park

3. Analyze Each Neighborhood

4. Cluster Neighborhoods

5. Examine Clusters   
</font>
</div>

## 1. Setup Dataset

To explore neighborhoods in Overland Park, we need to find the list of neighborhood names and their corresponding geo coordinates.  Since these data are not readily available, I have decided to scrap relevant info from Internet. We will essentially need a dataset that contains all the neighborhoods in Overland Park as well as the the latitude and logitude coordinates of each neighborhood.

First import all necessaries libraries

In [15]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from bs4 import BeautifulSoup
import csv
print('Libraries imported.')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.



I found a web site that has comprehensive list of neighborhoods in Overland Park. I use the BeautifulSoup package to parse the contents from the web site to get the list as shown below.


In [6]:
url='https://nextdoor.com/city/overland-park--ks/'
source=requests.get(url).text
soup=BeautifulSoup(source,'lxml')
print(soup.prettify()) 

#Print out the whole website text in html format!!

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="#19975d" name="theme-color"/>
  <meta content="68022CCBA99514E28D9C825C3B4D4BA6" name="msvalidate.01"/>
  <meta content="nextdoor.com" property="og:site_name"/>
  <meta content="114611681929998" property="fb:app_id"/>
  <meta content="website" property="og:type"/>
  <meta content="345600" property="og:ttl"/>
  <meta content="https://d19rpgkrjeba2z.cloudfront.net/3c2589717ef9ae21/static/images/fb_share_logo2.png" property="og:image"/>
  <meta content="800" property="og:image:width"/>
  <meta content="450" property="og:image:height"/>
  <meta content="The green Nextdoor house logo on a white background" property="og:image:alt"/>
  <meta content="image/png" property="og:image:type"/>
  <meta content="app-id=640360962" name="apple-itunes-app"/>
  <meta content="
    Overland Park, KS neighborhoods, events and more | Nextdoor
  " property="og:title"/>
  <meta content="
  
      
          Overland Park, KS has 252 nei

Find and verify the first neighborhood on the website

In [70]:
hditem=soup.find('div', class_='hood_group')
hdname=hditem.a.text
print(hdname)

151st/Metcalf Ave


In [20]:
#initiate a neighborhood list
nhList=[]

#### Find all neiborhoods in OP and store in  list

In [71]:
nhList.clear()
for x in soup.find_all('div', class_='hood_group'):
    for y in x.find_all('a'):
        nbhood=y.text
        nhList.append(nbhood)
#nhList
nh_count=len(nhList)
print('There are {} neighborhoods in Overland Park.'.format(nh_count))

There are 252 neighborhoods in Overland Park.


### Use geopy library to get the latitude and longitude values of Overland Park

In [72]:
address ="Overland Park, Kansas"

geolocator = Nominatim(user_agent="op_explorer")# Use geopy library to get the latitude and longitude values of Toronto
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Overland Park are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Overland Park are 38.9742502, -94.6851702.


In [73]:
# create an empty dataframe
columns = ['Neighborhood', 'Latitude', 'Longitude']
op_data = pd.DataFrame(columns=columns)
op_data

Unnamed: 0,Neighborhood,Latitude,Longitude


### Use geopy library to get the latitude and longitude values  all neiborhoods of Overland Park

In [74]:
for i in nhList:
    try:
        #print(i)
        address=i
        geolocator = Nominatim(user_agent="op_explorer")# Use geopy library to get the latitude and longitude values of address
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        op_data = op_data.append({"Neighborhood": i, 
                                        "Latitude": latitude, 
                                        "Longitude": longitude},
                                        ignore_index=True)
    except:
        pass
    # print number of rows of dataframe
op_data.shape[0]


229

Out of the 252 neighborhoods in Overland Park, 229 hav got geo coordinates from the geo query. We will just use these 229 neighborhoods to work with.

In [75]:
op_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,151st/Metcalf Ave,38.853988,-94.668514
1,Access Rd,44.027783,-103.068434
2,Adara,8.366667,8.3
3,Amber Meadows,35.916468,-84.166861
4,Amesbury Lake,42.857954,-70.930092


The coordinates seem off, need to refine address for geo query

In [78]:
address ="Adara, Overland Park, Kansas"

geolocator = Nominatim(user_agent="op_explorer")# Use geopy library to get the latitude and longitude values of Toronto
location1 = geolocator.geocode(address)
latitude1 = location.latitude
longitude1 = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Adara, Overland Park, Kansas are 27.6403686, -80.3767282237518.


Requery all neighborhoods again with Overland park, Kansas added to search

In [79]:
#empty op_data
op_data=op_data[0:0]

In [80]:
print(op_data.shape)

(0, 3)


In [81]:
for i in nhList:
    try:
        print(i)
        address=i+ ", Overland Park, Kansas"
        geolocator = Nominatim(user_agent="op_explorer")# Use geopy library to get the latitude and longitude values of address
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        op_data = op_data.append({"Neighborhood": i, 
                                        "Latitude": latitude, 
                                        "Longitude": longitude},
                                        ignore_index=True)
    except:
        pass
    # print number of rows of dataframe
op_data.shape[0]

151st/Metcalf Ave
75th and Metcalf
75th and Woodson
Access Rd
Adara
Amber Meadows
Amesbury Lake
Apple Valley Estates
Arium
Arlington Estates
Arrowhead Trails
Autumn Ridge
Barrington Park
Bel-air Heights
Bentwood Park
Birchwood Hills
Birchwood Place
Blackthorne Estates
Bluestem
Blue Valley
Bridgestone
Brittany Highlands/Pointe
Brittany Park
Brookridge Estates
Broxton Square
Caenen
Camden Woods
Canterbury Estates
Carpenters Circle
Century
Chapel Hill
Cherbourg
Cherokee Hills
Cherry Hill Estates
Claremont
Clear Creek
Cobblestone Park
Coffee Creek Crossing
College Meadows
College Park Estates
College View
Colony West
Colton Lake Estates
Corbin Crossing
Country Oaks
Craigmont
Creekside
Creek Side
Crestview
Cross Creek
Crowne Chase
Deerbrook
Deer Creek
Deer Run
Deer Valley
Downtown East
Eastland Meadows
Elmhurst
Empire Estates
Estates of Ironhorse
Executive Hills
Fairfield Manor
Fairway Woods
Farmers - Matt North
Fieldstone Court
Forest Creek Estates
Forest Glen
Foxfield Estates
Gladacres So

52

In [82]:
print(op_data.shape)

(52, 3)


In [83]:
op_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Access Rd,39.013294,-94.663835
1,Adara,38.88521,-94.730738
2,Apple Valley Estates,38.951122,-94.646621
3,Brittany Park,38.848433,-94.671561
4,Caenen,38.882212,-94.727341


The coordinates look much similar with the ones for Overland Park. so we will preceed with analyasis with these neighborhoods.

Let's visualize Overland Park with the neighborhoods in it.

In [84]:
# create map of Overland Park using latitude and longitude values
map_op = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(op_data['Latitude'], op_data['Longitude'], op_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_op)

map_op    

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [85]:
CLIENT_ID = 'BJDFH1OSS54FT5SXLG5UOC0PCFMAJWK0524344UZFI2HHY5G' # your Foursquare ID
CLIENT_SECRET = 'RUXN0AKNYKHO1WPA051GWGQ1T2XWBTSNWWPPABYTXQCUTHRO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BJDFH1OSS54FT5SXLG5UOC0PCFMAJWK0524344UZFI2HHY5G
CLIENT_SECRET:RUXN0AKNYKHO1WPA051GWGQ1T2XWBTSNWWPPABYTXQCUTHRO


#### Let's explore the first neighborhood in our dataframe.

In [86]:
op_data.loc[0,'Neighborhood']

'Access Rd'

Get the neighborhood's latitude and longitude values.

In [87]:
neighborhood_latitude = op_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = op_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = op_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Access Rd are 39.0132935, -94.6638354.


#### Now, let's get the top 100 venues that are in Access Rd within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [88]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=BJDFH1OSS54FT5SXLG5UOC0PCFMAJWK0524344UZFI2HHY5G&client_secret=RUXN0AKNYKHO1WPA051GWGQ1T2XWBTSNWWPPABYTXQCUTHRO&v=20180605&ll=39.0132935,-94.6638354&radius=500&limit=100'

Send the GET request and examine the resutls

In [89]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd6ff18dd57972447a42004'},
 'response': {'headerLocation': 'Overland Park',
  'headerFullLocation': 'Overland Park',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 39.017793504500005,
    'lng': -94.65805470130026},
   'sw': {'lat': 39.0087934955, 'lng': -94.66961609869973}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e6defd7fa768e6cee452b83',
       'name': 'Downtown Mission',
       'location': {'crossStreet': 'Johnson Drive btwn Nall Ave and Lamar Ave',
        'lat': 39.01490919802102,
        'lng': -94.66237445615708,
        'labeledLatLngs': [{'label': 'display',
          'lat': 39.01490919802102,
          'lng': -94.66237445615708}],
        'distance': 219,
        'c

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [90]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [91]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Mission,Historic Site,39.014909,-94.662374
1,ARC,Gym / Fitness Center,39.013159,-94.663269
2,Henhouse,Grocery Store,39.010678,-94.667633
3,98.9 The Rock!,Rock Club,39.01693,-94.66671


And how many venues were returned by Foursquare?

In [92]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Overland Park



#### Let's create a function to repeat the same process to all the neighborhoods in Overland Park

In [93]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
          # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                    'Venue Category']
    
    return(nearby_venues)            

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *op_venues*.

In [94]:
op_venues = getNearbyVenues(names=op_data['Neighborhood'],
                                   latitudes=op_data['Latitude'],
                                   longitudes=op_data['Longitude']
                                  )

Access Rd
Adara
Apple Valley Estates
Brittany Park
Caenen
Century
Cherry Hill Estates
Cobblestone Park
Creekside
Crestview
Cross Creek
Elmhurst
Empire Estates
Fairway Woods
Glenwood St
Hawthorne
Highland Village
Indian Creek Village
Maple Hill
Marbella
Melrose Reserve
Metcalf 56
Mills Farm
Mission
Nall Hills
North Park
Oakshire
Outlook
Parkway 103
Pawnee
Pinehurst
Pointe Royal
Prairie View
Rosana Square
Southwoods
St Andrews
St. Andrews
Switzer Rd
Tall Grass Creek
Terrace
The Orchards
The Ridge
The Village
The Wilderness
Town Center
W 142nd St
W 83rd St
Walmer
Wellington Park
Woods Of Cherry Creek
Woodstock
Young's Park


#### Let's check the size of the resulting dataframe

In [95]:
print(op_venues.shape)
op_venues.head()

(449, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Access Rd,39.013294,-94.663835,Downtown Mission,39.014909,-94.662374,Historic Site
1,Access Rd,39.013294,-94.663835,ARC,39.013159,-94.663269,Gym / Fitness Center
2,Access Rd,39.013294,-94.663835,Henhouse,39.010678,-94.667633,Grocery Store
3,Access Rd,39.013294,-94.663835,98.9 The Rock!,39.01693,-94.66671,Rock Club
4,Adara,38.88521,-94.730738,Mai Thai,38.884119,-94.726609,Thai Restaurant


Let's check how many venues were returned for each neighborhood

In [96]:
op_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Access Rd,4,4,4,4,4,4
Adara,7,7,7,7,7,7
Apple Valley Estates,6,6,6,6,6,6
Brittany Park,4,4,4,4,4,4
Caenen,16,16,16,16,16,16
Century,7,7,7,7,7,7
Cherry Hill Estates,1,1,1,1,1,1
Cobblestone Park,3,3,3,3,3,3
Creekside,19,19,19,19,19,19
Crestview,8,8,8,8,8,8


#### Let's find out how many unique categories can be curated from all the returned venues

In [98]:
print('There are {} uniques categories.'.format(len(op_venues['Venue Category'].unique())))

There are 134 uniques categories.


## 3. Analyze Each Neighborhood



In [99]:
# one hot encoding
op_onehot = pd.get_dummies(op_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
op_onehot['Neighborhood'] = op_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [op_onehot.columns[-1]] + list(op_onehot.columns[:-1])
op_onehot = op_onehot[fixed_columns]

op_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Building,Bus Stop,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Fabric Shop,Fast Food Restaurant,Food,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gift Shop,Golf Course,Golf Driving Range,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Historic Site,Hobby Shop,Home Service,Hotel,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Market,Massage Studio,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Multiplex,Music Store,Nightclub,Office,Optical Shop,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Pub,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Sausage Shop,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Ski Lodge,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Video Game Store,Video Store,Weight Loss Center,Wine Bar,Wings Joint,Women's Store
0,Access Rd,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Access Rd,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Access Rd,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Access Rd,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adara,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [100]:
op_onehot.shape

(449, 135)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [101]:
op_grouped = op_onehot.groupby('Neighborhood').mean().reset_index()
op_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Building,Bus Stop,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Fabric Shop,Fast Food Restaurant,Food,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gift Shop,Golf Course,Golf Driving Range,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Historic Site,Hobby Shop,Home Service,Hotel,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Market,Massage Studio,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Multiplex,Music Store,Nightclub,Office,Optical Shop,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Pub,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Sausage Shop,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Ski Lodge,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Video Game Store,Video Store,Weight Loss Center,Wine Bar,Wings Joint,Women's Store
0,Access Rd,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Apple Valley Estates,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brittany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Caenen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0
5,Century,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857
6,Cherry Hill Estates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Cobblestone Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Creekside,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Crestview,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [102]:
op_grouped.shape


(51, 135)

In [145]:
#drop row 33 since it seems to cause trouble
op_grouped=op_grouped.drop(op_grouped.index[32])

In [148]:
op_grouped=op_grouped.reset_index(drop=True)
op_grouped


Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Building,Bus Stop,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Fabric Shop,Fast Food Restaurant,Food,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gift Shop,Golf Course,Golf Driving Range,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Historic Site,Hobby Shop,Home Service,Hotel,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Market,Massage Studio,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Multiplex,Music Store,Nightclub,Office,Optical Shop,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Pub,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Sausage Shop,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Ski Lodge,Smoke Shop,Smoothie Shop,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Video Game Store,Video Store,Weight Loss Center,Wine Bar,Wings Joint,Women's Store
0,Access Rd,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Apple Valley Estates,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brittany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Caenen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0625,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0
5,Century,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857
6,Cherry Hill Estates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Cobblestone Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Creekside,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Crestview,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [149]:
num_top_venues = 5

for hood in op_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = op_grouped[op_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Access Rd----
                  venue  freq
0             Rock Club  0.25
1  Gym / Fitness Center  0.25
2         Grocery Store  0.25
3         Historic Site  0.25
4            Playground  0.00


----Adara----
                  venue  freq
0  Gym / Fitness Center  0.29
1      Basketball Court  0.14
2            Kids Store  0.14
3    Salon / Barbershop  0.14
4           Sports Club  0.14


----Apple Valley Estates----
                 venue  freq
0        Grocery Store  0.17
1  Arts & Crafts Store  0.17
2          Gas Station  0.17
3          Pizza Place  0.17
4         Liquor Store  0.17


----Brittany Park----
                     venue  freq
0               Playground  0.25
1     Gym / Fitness Center  0.25
2                     Park  0.25
3  Health & Beauty Service  0.25
4                      ATM  0.00


----Caenen----
                  venue  freq
0  Gym / Fitness Center  0.12
1    Salon / Barbershop  0.06
2           Sports Club  0.06
3                   Gym  0.06
4           

                  venue  freq
0                   Gym  0.11
1      Asian Restaurant  0.11
2        Sandwich Place  0.11
3           Video Store  0.11
4  Fast Food Restaurant  0.11


----W 142nd St----
                        venue  freq
0                         Pub  0.33
1                Soccer Field  0.33
2  Construction & Landscaping  0.33
3                        Park  0.00
4                      Office  0.00


----W 83rd St----
                  venue  freq
0          Tennis Court   0.2
1                  Pool   0.2
2                   Gym   0.2
3              Pharmacy   0.2
4  Gym / Fitness Center   0.2


----Walmer----
                  venue  freq
0  Outdoor Supply Store  0.17
1  Herbs & Spices Store  0.17
2         Grocery Store  0.17
3  Gym / Fitness Center  0.17
4                  Park  0.17


----Wellington Park----
                           venue  freq
0                      Gift Shop  0.33
1                           Pool  0.33
2                   Soccer Field  0.33
3  P

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [150]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [156]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = op_grouped['Neighborhood']

for ind in np.arange(op_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(op_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [157]:
# set number of clusters
kclusters = 5

op_grouped_clustering = op_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(op_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 0, 4, 3, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [158]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

op_merged = op_data

# merge op_grouped with op_data to add latitude/longitude for each neighborhood
op_merged = op_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

op_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0.0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0.0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0.0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3.0,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0.0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club


In [159]:
op_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0.0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0.0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0.0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3.0,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0.0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club
5,Century,38.907523,-94.72646,0.0,Women's Store,Playground,American Restaurant,Arts & Crafts Store,Park,Grocery Store,Gym,Food,Furniture / Home Store,Fried Chicken Joint
6,Cherry Hill Estates,38.946258,-94.637306,4.0,Bar,Women's Store,Food,Garden Center,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Gift Shop
7,Cobblestone Park,38.918275,-94.716822,3.0,Playground,Park,Business Service,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop
8,Creekside,38.911307,-94.729195,0.0,Grocery Store,Bar,Sandwich Place,Building,Salon / Barbershop,Sporting Goods Shop,Sports Bar,Cosmetics Shop,Chinese Restaurant,Gym
9,Crestview,39.017989,-94.67651,0.0,American Restaurant,Home Service,Athletics & Sports,Grocery Store,Korean Restaurant,Coffee Shop,Hawaiian Restaurant,Liquor Store,Women's Store,Food


row 33 has NaN in values, deleted it from clustering to avoid trouble

In [162]:
op_merged=op_merged.drop(op_merged.index[32])

op_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0.0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0.0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0.0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3.0,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0.0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club
5,Century,38.907523,-94.72646,0.0,Women's Store,Playground,American Restaurant,Arts & Crafts Store,Park,Grocery Store,Gym,Food,Furniture / Home Store,Fried Chicken Joint
6,Cherry Hill Estates,38.946258,-94.637306,4.0,Bar,Women's Store,Food,Garden Center,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Gift Shop
7,Cobblestone Park,38.918275,-94.716822,3.0,Playground,Park,Business Service,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop
8,Creekside,38.911307,-94.729195,0.0,Grocery Store,Bar,Sandwich Place,Building,Salon / Barbershop,Sporting Goods Shop,Sports Bar,Cosmetics Shop,Chinese Restaurant,Gym
9,Crestview,39.017989,-94.67651,0.0,American Restaurant,Home Service,Athletics & Sports,Grocery Store,Korean Restaurant,Coffee Shop,Hawaiian Restaurant,Liquor Store,Women's Store,Food


In [164]:
op_merged=op_merged.drop(op_merged.index[32])

op_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0.0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0.0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0.0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3.0,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0.0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club


In [165]:
print(op_merged.shape)

(50, 14)


In [166]:
op_merged=op_merged.reset_index(drop=True)

op_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0.0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0.0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0.0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3.0,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0.0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club
5,Century,38.907523,-94.72646,0.0,Women's Store,Playground,American Restaurant,Arts & Crafts Store,Park,Grocery Store,Gym,Food,Furniture / Home Store,Fried Chicken Joint
6,Cherry Hill Estates,38.946258,-94.637306,4.0,Bar,Women's Store,Food,Garden Center,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Gift Shop
7,Cobblestone Park,38.918275,-94.716822,3.0,Playground,Park,Business Service,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop
8,Creekside,38.911307,-94.729195,0.0,Grocery Store,Bar,Sandwich Place,Building,Salon / Barbershop,Sporting Goods Shop,Sports Bar,Cosmetics Shop,Chinese Restaurant,Gym
9,Crestview,39.017989,-94.67651,0.0,American Restaurant,Home Service,Athletics & Sports,Grocery Store,Korean Restaurant,Coffee Shop,Hawaiian Restaurant,Liquor Store,Women's Store,Food


In [167]:
op_merged.ClusterLabels = op_merged.ClusterLabels.astype(int)

In [169]:
op_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,39.013294,-94.663835,0,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,38.88521,-94.730738,0,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,38.951122,-94.646621,0,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
3,Brittany Park,38.848433,-94.671561,3,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
4,Caenen,38.882212,-94.727341,0,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club
5,Century,38.907523,-94.72646,0,Women's Store,Playground,American Restaurant,Arts & Crafts Store,Park,Grocery Store,Gym,Food,Furniture / Home Store,Fried Chicken Joint
6,Cherry Hill Estates,38.946258,-94.637306,4,Bar,Women's Store,Food,Garden Center,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Gift Shop
7,Cobblestone Park,38.918275,-94.716822,3,Playground,Park,Business Service,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop
8,Creekside,38.911307,-94.729195,0,Grocery Store,Bar,Sandwich Place,Building,Salon / Barbershop,Sporting Goods Shop,Sports Bar,Cosmetics Shop,Chinese Restaurant,Gym
9,Crestview,39.017989,-94.67651,0,American Restaurant,Home Service,Athletics & Sports,Grocery Store,Korean Restaurant,Coffee Shop,Hawaiian Restaurant,Liquor Store,Women's Store,Food


In [170]:
print(op_merged.shape)

(50, 14)


Finally, let's visualize the resulting clusters

In [172]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(op_merged['Latitude'], op_merged['Longitude'], op_merged['Neighborhood'], op_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

#### Cluster 1

In [185]:
op_merged.loc[op_merged['ClusterLabels'] == 0, op_merged.columns[[0] + list(range(4, op_merged.shape[1]))]]



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Access Rd,Gym / Fitness Center,Rock Club,Historic Site,Grocery Store,Women's Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant,Fabric Shop
1,Adara,Gym / Fitness Center,Basketball Court,Salon / Barbershop,Kids Store,Sports Club,Thai Restaurant,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck
2,Apple Valley Estates,Gas Station,Sushi Restaurant,Pizza Place,Arts & Crafts Store,Grocery Store,Liquor Store,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
4,Caenen,Gym / Fitness Center,Gas Station,Gym,Cosmetics Shop,Coffee Shop,Salon / Barbershop,Fast Food Restaurant,Smoothie Shop,Gift Shop,Sports Club
5,Century,Women's Store,Playground,American Restaurant,Arts & Crafts Store,Park,Grocery Store,Gym,Food,Furniture / Home Store,Fried Chicken Joint
8,Creekside,Grocery Store,Bar,Sandwich Place,Building,Salon / Barbershop,Sporting Goods Shop,Sports Bar,Cosmetics Shop,Chinese Restaurant,Gym
9,Crestview,American Restaurant,Home Service,Athletics & Sports,Grocery Store,Korean Restaurant,Coffee Shop,Hawaiian Restaurant,Liquor Store,Women's Store,Food
12,Empire Estates,Pharmacy,Intersection,Coffee Shop,Fast Food Restaurant,Spa,Gas Station,Juice Bar,Dry Cleaner,Electronics Store,Donut Shop
13,Fairway Woods,IT Services,Trail,Pool,Liquor Store,Golf Course,Wine Bar,Hobby Shop,Garden,Dentist's Office,Department Store
14,Glenwood St,Grocery Store,Historic Site,Gym / Fitness Center,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant


#### Cluster 2

In [186]:
op_merged.loc[op_merged['ClusterLabels'] == 1, op_merged.columns[[0] + list(range(4, op_merged.shape[1]))]]




Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Elmhurst,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
19,Marbella,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
21,Metcalf 56,Home Service,Athletics & Sports,Gym,Women's Store,Fast Food Restaurant,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
28,Parkway 103,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
30,Pinehurst,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
31,Pointe Royal,Gym,Restaurant,Gym / Fitness Center,Gas Station,Hotel,Department Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store
33,St Andrews,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
34,St. Andrews,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint
35,Switzer Rd,Other Repair Shop,Pizza Place,Salon / Barbershop,Liquor Store,Women's Store,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food,Fast Food Restaurant
37,Terrace,Pizza Place,Supermarket,Business Service,Video Store,Asian Restaurant,Liquor Store,Sandwich Place,Fast Food Restaurant,Gym,Fried Chicken Joint


#### Cluster 3

In [187]:
op_merged.loc[op_merged['ClusterLabels'] == 2, op_merged.columns[[0] + list(range(4, op_merged.shape[1]))]]




Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Pawnee,Restaurant,Gas Station,Dentist's Office,Department Store,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Fabric Shop,Fast Food Restaurant


#### Cluster 4

In [188]:
op_merged.loc[op_merged['ClusterLabels'] == 3, op_merged.columns[[0] + list(range(4, op_merged.shape[1]))]]




Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Brittany Park,Playground,Park,Health & Beauty Service,Gym / Fitness Center,Women's Store,Fabric Shop,Furniture / Home Store,Fried Chicken Joint,Food Truck,Food
7,Cobblestone Park,Playground,Park,Business Service,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop
10,Cross Creek,American Restaurant,Business Service,Park,Gym,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck
25,North Park,Business Service,Park,Pool Hall,Women's Store,Food,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fabric Shop


#### Cluster 5

In [189]:
op_merged.loc[op_merged['ClusterLabels'] == 4, op_merged.columns[[0] + list(range(4, op_merged.shape[1]))]]




Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Cherry Hill Estates,Bar,Women's Store,Food,Garden Center,Garden,Furniture / Home Store,Fried Chicken Joint,Food Truck,Fast Food Restaurant,Gift Shop


**Oberservation**: It is interesting that most neiborhoods in Overland Park fall into cluster 1 and 2, with cluster 1 as Living Haven, which pretty much has access to all living related venues; while cluster 2 is a Piza and shopping Haven. With cluster 3, you eat at resturants and fill the car tank conviently too.  For cluster 4 and 5, I label them as Leisure Haven and Happy Hour Place as below:
1. Cluster 1: Living Haven
2. Cluster 2: Piza and Shopping Haven
3. Cluster 3: Refill Haven-Both Human and Car
4. Cluster 4: Leisure Haven
5. Cluster 5: Happy Hour Place