# Capstone Project: The Battle of the Neighborhoods (Week 2)
### Applied Data Science Captstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#intro)
* [Data](#data)
* [Setting up](#setup)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name='introduction'></a>

It is well known that **pizza** is quite popular in **New York City**. With lots of pizzerias available throughout the city, stakeholders interested in opening a new parlor may encounter heavy competition. I will aim to use **data science** techniques to determine areas within the city with not too many options for pizza.

## Data <a name='data'></a>

The main data used will come from the **Foursquare Places API**. It allows developers to search explore venues based on a cetnered location. We can search for venues available within a radius for each neighborhood in NYC.

Additionally, I will use a **GeoJSON file** containing information about the neighborhoods in NYC to extract the borough, name, latitude and longitude values for each neighborhood. 

## Setting up <a name='setup'></a>

### Importing our Libraries

In [1]:
#pandas and numpy will be sued to read our .csv, json, and geojson files
import numpy as np 
import pandas as pd 
import json

#requests will be used to do our API reuests for the Foursquare API
import requests

#KMeans for neighborhood clustering
from sklearn.cluster import KMeans

#Mapping tool
import folium

#Set up pandas options to diaply the whole dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

### Collecting neighborhood Data

Now that we have all our libraries imported, we can collect our data. Let's load the data from **newyork_data.json** on a pandas dataframe.

In [2]:
#Load json
nyc_data = json.load(open('newyork_data.json'))

#Initialize dataframe with custom named columns
column_names = ['Borough','Name', 'Latitude', 'Longitude']
df_neighborhoods = pd.DataFrame(columns=column_names)

#Iterate through json features and append each neighborhood to the dataframe
for i, row in enumerate(nyc_data['features']):
    df_neighborhoods.loc[i,:] = ([row['properties']['borough'],
                                 row['properties']['name'],
                                 row['geometry']['coordinates'][1],
                                 row['geometry']['coordinates'][0]])
    
#Verify that we have 306 neighborhoods saved on the dataframe
print(df_neighborhoods.shape)

df_neighborhoods.head()

(306, 4)


Unnamed: 0,Borough,Name,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### API setup
We have our neighborhoods, now let's set up our API url. In order to use the Foursquare API, you need a **client ID, client secret and access_token** provided by <a href='https://foursquare.com/'>Foursquare</a> once you create a developer account. All my keys are saved on an external file called **keys.py** for security.

In [3]:
import keys

# Replace the values here for your own keys in order to call the API
CLIENT_ID = keys.CLIENT_ID
CLIENT_SECRET = keys.CLIENT_SECRET
ACCESS_TOKEN = keys.ACCESS_TOKEN
VERSION = '20180604'
LIMIT = 500
credentials = "client_id={}&client_secret={}&oauth_token={}".format(CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN)
parameters = "v={}&limit={}".format(VERSION, LIMIT)

search_url = "https://api.foursquare.com/v2/venues/explore?{}&{}".format(credentials, parameters)
print(search_url)

https://api.foursquare.com/v2/venues/explore?client_id=JHTTDJBJUFNGC3N1HBTYRWA5PBBIPAG04DOWOCMZACIGD2VS&client_secret=RRJE5IIXHSWYNP5ZFA2MXM40N4BNFU5AQQQQYAEZMGKEE1NM&oauth_token=31MZ5WDVPNAUOBFVXDBQPVARI2V31JM1VETVND4GAZOYILXX&v=20180604&limit=500


The only thing needed to get venue information is to add our query parameters. Let's create a method that, given a pandas dataframe with neighborhood data, makess an API call for each row's latitude and longitude values to get the venues within 500 meters within the center of those coordinates. 

In [4]:
#The categories field is composed of other children fields. We are only interested in the name
def get_cat_name(row):
    try:
        #Extract name of venue category
        return row['Category'][0]['name']
    except:
        #If there is no category available, return None
        return np.NaN


def get_nearby_venues(neighborhoods, url, radius): 
    
    #Initialize dataframe    
    df = pd.DataFrame()

    for b, n, lat, lon in zip(neighborhoods['Borough'],neighborhoods['Name'], neighborhoods['Latitude'], 
                              neighborhoods['Longitude']):

        search_query = url + "&ll={},{}&radius={}".format(lat, lon, radius)

        response = requests.get(search_query).json()["response"]['groups'][0]['items']  
        
        #Print neighborhood name and number of found venues
        print(n, len(response), 'venues')

        df = df.append([(b, n, v['venue']['name'],
                               v['venue']['categories'],
                               v['venue']['location']['lat'],
                               v['venue']['location']['lng']) for v in response])

    
    #Assign columns
    columns = ['Borough', 'Neighborhood', 'Name', 'Category', 'Latitude', 'Longitude']
    df.columns = columns
    
    #Extract category name from categories
    df['Category'] = df.apply(get_cat_name, axis=1)
    
    #Remove misleading venue categories
    df = df[~df['Category'].isin(['Food','Neighborhood'])]
    
    return df   

Our function it's ready. Let's make our call. It will take a couple of minutes to build the whole dataframe 

In [5]:
df_venues = get_nearby_venues(df_neighborhoods, search_url, 750)
df_venues.to_csv('venues.csv',index=False)

print(df_venues.shape)
df_venues.head()

Wakefield 66 venues
Co-op City 77 venues
Eastchester 69 venues
Fieldston 17 venues
Riverdale 73 venues
Kingsbridge 100 venues
Marble Hill 100 venues
Woodlawn 79 venues
Norwood 89 venues
Williamsbridge 62 venues
Baychester 82 venues
Pelham Parkway 64 venues
City Island 79 venues
Bedford Park 95 venues
University Heights 80 venues
Morris Heights 53 venues
Fordham 100 venues
East Tremont 90 venues
West Farms 77 venues
High  Bridge 58 venues
Melrose 97 venues
Mott Haven 56 venues
Port Morris 52 venues
Longwood 75 venues
Hunts Point 58 venues
Morrisania 52 venues
Soundview 68 venues
Clason Point 18 venues
Throgs Neck 44 venues
Country Club 58 venues
Parkchester 100 venues
Westchester Square 90 venues
Van Nest 59 venues
Morris Park 88 venues
Belmont 100 venues
Spuyten Duyvil 35 venues
North Riverdale 64 venues
Pelham Bay 85 venues
Schuylerville 71 venues
Edgewater Park 76 venues
Castle Hill 54 venues
Olinville 83 venues
Pelham Gardens 61 venues
Concourse 92 venues
Unionport 83 venues
Edenwal

Unnamed: 0,Borough,Neighborhood,Name,Category,Latitude,Longitude
0,Bronx,Wakefield,Lollipops Gelato,Dessert Shop,40.894123,-73.845892
1,Bronx,Wakefield,Jackie's West Indian Bakery,Caribbean Restaurant,40.889283,-73.84331
2,Bronx,Wakefield,Rite Aid,Pharmacy,40.896649,-73.844846
3,Bronx,Wakefield,Carvel Ice Cream,Ice Cream Shop,40.890487,-73.848568
4,Bronx,Wakefield,Walgreens,Pharmacy,40.896528,-73.8447


## Methodology <a name='methodology'></a>

We now have a dataframe with venue information in New York City. To find the best neighborhoods to open a pizza parlor, we will utilize the k-means algorith to group the neighborhoods in 5 clusters based on the list of most frequent venues by neighborhood.

In order to do so, we need to convert the neighborhood category data that is currently categorical to numerical values. For that we will apply **one-hot encoding** to our **Category** row. This will basically create a new column named after each different value for Category and the values will be 1 or 0.

In [6]:
# Get the one-hot encodign dataframe
df_onehot = pd.get_dummies(df_venues['Category'], prefix='', prefix_sep='')

# Add Borough and Neighborhood to the dataframe. Since some neighborhoods in different boroughs are named the same. The borough will belp
# make that distinction
df_onehot['Borough'] = df_venues['Borough']
df_onehot['Neighborhood'] = df_venues['Neighborhood']

# Rearrange columns moving Borough and Neighborhood to the left
columns = list(df_onehot.columns[-2:]) + list(df_onehot.columns[:-2])
df_onehot = df_onehot[columns]

# Group by neighborhood
df_onehot = df_onehot.groupby(by=['Borough','Neighborhood']).mean()

df_onehot.reset_index(inplace=True)


print(df_onehot.shape)
df_onehot.head()

(306, 506)


Unnamed: 0,Borough,Neighborhood,ATM,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,Airport Tram,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Astrologer,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baggage Locker,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Car Wash,Caribbean Restaurant,Carpet Store,Caucasian Restaurant,Cha Chaan Teng,Check Cashing Service,Cheese Shop,Child Care Service,Chinese Restaurant,Chiropractor,Chocolate Shop,Christmas Market,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Cafeteria,College Gym,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dosa Place,Driving School,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Financial or Legal Service,Fire Station,Fish & Chips Shop,Fish Market,Fishing Spot,Fishing Store,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Forest,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Laser Tag,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Medical Supply Store,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Festival,Music School,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Newsstand,Nightclub,Nightlife Spot,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Gym,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Outlet Mall,Outlet Store,Paella Restaurant,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pawn Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Lab,Photography Studio,Physical Therapist,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Post Office,Print Shop,Private School,Professional & Other Places,Pub,Public Art,Public Bathroom,Puerto Rican Restaurant,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Recycling Facility,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Sauna / Steam Room,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Squash Court,Sri Lankan Restaurant,State / Provincial Park,Stationery Store,Steakhouse,Storage Facility,Street Art,Strip Club,Summer Camp,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,TV Station,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Taxi,Taxi Stand,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Booth,Toll Plaza,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trade School,Trail,Train,Train Station,Tram Station,Transportation Service,Tree,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,University,Used Auto Dealership,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Bronx,Allerton,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0375,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0875,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bronx,Baychester,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.025,0.05,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.05,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0
2,Bronx,Bedford Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.065934,0.0,0.0,0.0,0.0,0.0,0.043956,0.032967,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.010989,0.0,0.0,0.0,0.021978,0.0,0.0,0.010989,0.0,0.021978,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.032967,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065934,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bronx,Belmont,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.17,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,Bronx,Bronxdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.030303,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.040404,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.020202,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now let's create a dataframe with the top 10 most frequent venues by neighborhood. 

In [7]:
# Create our columns
indicators = ['st','nd','rd','th']
columns = ['Borough', 'Neighborhood'] + [str(i+1) + (indicators[i] if i < len(indicators)-1 else indicators[-1]) + ' Most Frequent Venue' for i in range(0,10)]

df_top10 = pd.DataFrame(columns=columns)

for b,n in zip(df_onehot['Borough'], df_onehot['Neighborhood']):
    
    # Get dataframe for specific neighborhood and transpose it
    df_n = df_onehot[(df_onehot['Neighborhood'] == n) & (df_onehot['Borough'] == b)]
    
    # Remove Neighborhood & Borough columns
    df_n = df_n.drop(['Borough','Neighborhood'], axis=1)
    
    # Transpose dataframe
    df_n = df_n.transpose().reset_index()
        
    # Rename columns
    df_n.columns = ['Venue', 'Frequency']
        
    # Get 10 most requent locations and keep only Venue
    df_n = df_n.sort_values(by='Frequency', ascending=False)[['Venue']].head(10) 
    
    # Transpose again, add neighborhood and rename columns
    df_n = df_n.transpose().reset_index(drop=True)
    
    # Add borough and neighborhood and move to the left
    df_n['Borough'] = b 
    df_n['Neighborhood'] = n    
    df_n = df_n[list(df_n.columns[-2:]) + list(df_n.columns[:-2])]
    
    df_n.columns = columns
    
    #Append to DataFrame
    df_top10 = df_top10.append(df_n)
    
    del(df_n)

print(df_top10.shape)
df_top10.head()

(306, 12)


Unnamed: 0,Borough,Neighborhood,1st Most Frequent Venue,2nd Most Frequent Venue,3rd Most Frequent Venue,4th Most Frequent Venue,5th Most Frequent Venue,6th Most Frequent Venue,7th Most Frequent Venue,8th Most Frequent Venue,9th Most Frequent Venue,10th Most Frequent Venue
0,Bronx,Allerton,Pizza Place,Deli / Bodega,Donut Shop,Supermarket,Pharmacy,Mexican Restaurant,Sandwich Place,Construction & Landscaping,Bar,Clothing Store
0,Bronx,Baychester,Bank,Department Store,Electronics Store,Donut Shop,Discount Store,Furniture / Home Store,Pharmacy,Cosmetics Shop,Health & Beauty Service,Bakery
0,Bronx,Bedford Park,Deli / Bodega,Pizza Place,Grocery Store,Diner,Mexican Restaurant,Sandwich Place,Coffee Shop,Chinese Restaurant,Bank,Discount Store
0,Bronx,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Bakery,Dessert Shop,Shoe Store,Bank,Mobile Phone Shop,Liquor Store,Coffee Shop
0,Bronx,Bronxdale,Pizza Place,Sandwich Place,Home Service,Pharmacy,Diner,Deli / Bodega,Italian Restaurant,Bank,Grocery Store,Mobile Phone Shop


Now let's merge df_top10 and df_neighborhoods.

In [8]:
# Our original nieghborhoods dataframe had Name instead of Neighborhood. Let's rename our columns to merge our datasets
df_neighborhoods.columns = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
df_cluster = pd.merge(df_neighborhoods, df_top10, on=['Borough','Neighborhood'])
df_cluster.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,1st Most Frequent Venue,2nd Most Frequent Venue,3rd Most Frequent Venue,4th Most Frequent Venue,5th Most Frequent Venue,6th Most Frequent Venue,7th Most Frequent Venue,8th Most Frequent Venue,9th Most Frequent Venue,10th Most Frequent Venue
0,Bronx,Wakefield,40.894705,-73.847201,Supermarket,Pizza Place,Pharmacy,Caribbean Restaurant,Business Service,Mobile Phone Shop,Deli / Bodega,Construction & Landscaping,Food Truck,Chinese Restaurant
1,Bronx,Co-op City,40.874294,-73.829939,Bus Station,Pharmacy,Pizza Place,Fast Food Restaurant,Locksmith,Health & Beauty Service,Men's Store,Liquor Store,Mattress Store,Bank
2,Bronx,Eastchester,40.887556,-73.827806,Construction & Landscaping,Auto Garage,Fast Food Restaurant,Caribbean Restaurant,Bus Station,Furniture / Home Store,Intersection,Grocery Store,Chinese Restaurant,Deli / Bodega
3,Bronx,Fieldston,40.895437,-73.905643,Park,River,Art Gallery,Noodle House,Spa,Basketball Court,Pizza Place,Coffee Shop,Playground,Plaza
4,Bronx,Riverdale,40.890834,-73.912585,Pharmacy,Bank,Athletics & Sports,Italian Restaurant,Diner,Sandwich Place,Playground,Convenience Store,Pizza Place,Bagel Shop


We are now ready to run kmeans clustering on our dataframe. For that, we will utilize the KMeans package from **sci-kit learn**.

In [9]:
# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_onehot.drop(['Borough','Neighborhood'], 1))

# Add cluster values to dataframe
df_cluster['Cluster'] = kmeans.labels_

## Analysis <a name='analysis'></a>

Our neighborhoods are now clustered. However, we haven't found our neighborhoods yet. Let's visualize our clusters on the map. For that we will create function that will create a Folium map object.

In [10]:
# NYC Coordinates
latitude = 40.712
longitude = -74.005

import folium
from matplotlib import cm
import matplotlib.colors as colors
from shapely.geometry import Polygon

def map_clusters(df):

    cmap = list(cm.rainbow(np.linspace(0, 1, len(set(kmeans.labels_)))))
    cmap = [colors.rgb2hex(i) for i in cmap]
    map_nyc_clustered = folium.Map(location=[latitude, longitude], zoom_start=11.2, tiles='cartodbpositron')

    for neighborhood, cluster, lat, lon, in zip(df['Neighborhood'],
                                                df['Cluster'],
                                                df['Latitude'],
                                                df['Longitude']):

        label = folium.Popup('{}, {}'.format(neighborhood, cluster), parse_html = False)

        folium.CircleMarker(
                [lat, lon],
                radius=5,
                popup=label,
                color=None,
               fill=True,
                fill_color=cmap[cluster],
                fill_opacity=0.7,
                parse_html=False).add_to(map_nyc_clustered)
        
    return map_nyc_clustered

In [11]:
map_clusters(df_cluster)

Let's understand a little better our clusters. We can group our dataframe by cluster count the appearances of Pizza Place as nth most frequent venue.

In [12]:
new_columns = ['Pizza Place as ' + c for c in columns[2:]]
df_count = pd.DataFrame(columns=new_columns)

for c, nc in zip(columns[2:], new_columns):     
    values = df_cluster.groupby('Cluster')[c].value_counts().values.tolist()
    indices = [[c[0], values[i]] for i,c in enumerate(df_cluster.groupby('Cluster')[c].value_counts().index.tolist()) if 'Pizza Place' in c[1]]
    for i in indices:
        df_count.loc[i[0],nc] = i[1]

# Replace NaN values with 0
df_count = df_count.fillna(0)

# Sum all values into total
df_count['Total'] = df_count['Total'] = df_count[new_columns].sum(axis=1)

df_count.sort_values(by='Total', ascending=True, inplace=True)
df_count

Unnamed: 0,Pizza Place as 1st Most Frequent Venue,Pizza Place as 2nd Most Frequent Venue,Pizza Place as 3rd Most Frequent Venue,Pizza Place as 4th Most Frequent Venue,Pizza Place as 5th Most Frequent Venue,Pizza Place as 6th Most Frequent Venue,Pizza Place as 7th Most Frequent Venue,Pizza Place as 8th Most Frequent Venue,Pizza Place as 9th Most Frequent Venue,Pizza Place as 10th Most Frequent Venue,Total
2,1,0,0,0,0,0,0,0,0,0,1
4,4,2,0,0,0,2,0,0,0,0,8
0,7,3,0,3,0,1,2,0,0,1,17
3,18,7,10,7,4,1,7,3,3,3,63
1,28,23,21,9,9,12,4,8,5,2,121


## Results <a name='results'></a>
As we can see, the top clusters hae the least amount of pizzerias. The more rows we choose we can expand our options of neighborhoods. Let's stick with the top 3 clusters.

In [13]:
map_clusters(df_cluster[df_cluster['Cluster'].isin(df_count.head(3).index.values.tolist())])

## Conclusion <a name='conclusion'></a>
KMeans clustering is a powerful algorithm that can provide us with good insights of how data is distributed. Of course, there are other factors that can improve our decision making such as demographics and psycographic data.