## Battle of the Neighborhoods
### Exploring New York - A Brewhaus for Pizza enthusiast


### Introduction/Business Problem

As an expert in selecting locations for new business venues my services have been retained to assist in locating the perfect neighborhood for a new beer garden concept. Beerizza Inc. has perfected a brew whose full bouquet of aromas and flavors are best enjoyed alongside a pizza. Unfortunately for Beerizza Inc., while master brewers, the expertise in pizza making is lacking. 

In order to minimize costs of running the business whilst still attracting the right clientele, Beerizza Inc. would like to open a brewery without a kitchen but also completely welcoming of pizza delivery or bringing your own pizza, or other food of your choosing. 

The request from my new business partner Beerizza Inc. is to deliver them a layout of parts of the city that will have the appropriate clientle and a density of take out restaruants within the vicinity of the new beer garden so that customers can freely bring their own foodstuffs, with a particularly strong focus on pizza shop density. The customer would also like to avoid saturated markets where there are other beer gardens.

### Data

#### What data will be used

To build our model we will be using the following data sources (final links to sources will be added later once the final sources have been determined):
- Foursquae API venue data for exploring venues in various neighborhoods and boroughs
- New York City neighborhood data that includes bouroughs and longitude and latitude data to assist in exploring neighboorhoods from https://cocl.us/new_york_dataset 
- New York City Geospace data for visualization fromm https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm 

#### Methodology

The first step will be to collect all of the data and create the relevant dataframes using the sources above. 
Next we will map the city and we will try and identify additional features that may be usefull in location selection (proximity to public transport for example, or nightlife.)

The questions that we will resolve to meet Beerizza Inc. requirements
1. What is the best neighboorhood for the new beer garden with respect to pizza density
2. Where is restaurant density highest (in case the beer customer prefers other foods)
3. where is the pizza shop grave yard that shold be avoided completely.
4. Would like the selected place to be on Staten Island

We will use clustering to segment the data to help us find the best location.

## Approach
- Import appropriate libraries
- Collect the new york city data from https://cocl.us/new_york_dataset
- Use FourSquare API to will find all venues for each neighborhood.
- Explore Pizza shops
- Visualize restuarants across the city

In [57]:
##Here we import the packages that will be needd for this exerise. 

##packages for working with df
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

##library for managing APIS
import requests
##Libaries for scrapping data
from bs4 import BeautifulSoup

##Libraries for mapping
import geocoder
import os
import folium
from geopy.geocoders import Nominatim

# Plotting librares
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

# import k-means from clustering stage
from sklearn.cluster import KMeans

import wget
import json

print('Done')

Done


### Collecting, processing and inspecting the data

In [58]:
##downloading the NYC dataset
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [59]:
##Next we start the process of exploring the by loading it as a json file and then taking a look at the json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [6]:
##explore the json
newyork_data ## this is the data that we downloaded as json file

##in the newyork_data json output we see that have coordinators of borughs and neighborhoods 
##we should proceed to next steps of exploration and defining features variables

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [60]:
## Reviewing the features key for neighboorhoods dataand looking at the first entry to make sense of the data
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

##this shows as that all relevent features for our analysis is located int the features key. 
##We can manipulate this information into a pandas data frame

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Transforming NYC Data into a pandas data frame which will then allow us to work with the data

In [61]:
# define the dataframe columns - this creates a new empty data fram
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [62]:
#Now that we have the empty data frame and it conforms with what we will need later on 
##We complete the datafram by looping through the data one row at at a time
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)
neighborhoods.head()
## this gives us a data fram that plots Bourough, neighborhood and L&L 
##which can later be used alongside venue data to map  venues

The dataframe has 5 boroughs and 306 neighborhoods.


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### We have created the DF that will form the backbone of our exploration. We have confirmed that all 5 boroughs are present and that 306 neigborhoods are repesented. We can now start mapping the city and begin analyzing the data

In [63]:
## Use geopy library to get the latitude and longitude values of New York City.
##In order to define geocoder, we need to define a user_agent (geographical center)
##We will name our agent ny_explorer
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [64]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## After interim discussions with the client it was determined that they want to focus their attention to Staten Island as they would also like to make the beer garden Wu-Tang Clan themed



In [76]:
##dropping the unwanted boroughs
staten_island_data = neighborhoods[neighborhoods['Borough'] == 'Staten Island'].reset_index(drop=True)
staten_island_data.head(45)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Staten Island,St. George,40.644982,-74.079353
1,Staten Island,New Brighton,40.640615,-74.087017
2,Staten Island,Stapleton,40.626928,-74.077902
3,Staten Island,Rosebank,40.615305,-74.069805
4,Staten Island,West Brighton,40.631879,-74.107182
5,Staten Island,Grymes Hill,40.624185,-74.087248
6,Staten Island,Todt Hill,40.597069,-74.111329
7,Staten Island,South Beach,40.580247,-74.079553
8,Staten Island,Port Richmond,40.633669,-74.129434
9,Staten Island,Mariner's Harbor,40.632546,-74.150085


In [152]:
##obtaining the coordinates for Staten Island
address = 'Staten Island, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Staten Island are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Staten Island are 40.5834557, -74.1496048.


In [85]:
##create new map for Staten Island only
map_staten_island = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(staten_island_data['Latitude'], staten_island_data['Longitude'], staten_island_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_staten_island)  
    
map_staten_island

#### Next we work on imported the FourSquare Venue Data

In [153]:
###NOTE Personal details removed for publication of the notebook

#### FINALLY we get to explore our data an crunch numbers. We are going to look at all venues in NYC

In [154]:
###NOTE Personal details removed for publication of the notebook

#### Now that we have downloaded the venue information we want to exam the resuts and the extract category of venues. 

In [88]:
## Examing the results 
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee71189237de16992eeb1a7'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 139,
  'suggestedBounds': {'ne': {'lat': 40.664455781000086,
    'lng': -74.04314898799743},
   'sw': {'lat': 40.50245561899992, 'lng': -74.25606061200259}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4be06d0198f2a593ce34c25a',
       'name': 'Greenbelt Nature Center',
       'location': {'address': '501 Brielle Ave',
        'crossStreet': 'Rockland Ave',
        'lat': 40.586615957446355,
        'lng': -74.1469170064425,
        'labeledLatLngs':

In [89]:
# function that extracts the category of the venue using the items key
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [155]:
#### At this stage we can start view our Staten Island Venues

In [91]:
##importing json normalize
from pandas.io.json import json_normalize

##
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Greenbelt Nature Center,Trail,40.586616,-74.146917
1,Trader Joe's,Grocery Store,40.589997,-74.165715
2,High Rock Park,Park,40.584024,-74.12483
3,Historic Richmond Town,History Museum,40.572803,-74.13343
4,Holtermanns,Bakery,40.564533,-74.155411


In [92]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Here we start pulling all the data pieces together

In [93]:
staten_island_venues = getNearbyVenues(names=staten_island_data['Neighborhood'],
                                   latitudes=staten_island_data['Latitude'],
                                   longitudes=staten_island_data['Longitude']
                                  )
getNearbyVenues
print('done')

St. George
New Brighton
Stapleton
Rosebank
West Brighton
Grymes Hill
Todt Hill
South Beach
Port Richmond
Mariner's Harbor
Port Ivory
Castleton Corners
New Springville
Travis
New Dorp
Oakwood
Great Kills
Eltingville
Annadale
Woodrow
Tottenville
Tompkinsville
Silver Lake
Sunnyside
Park Hill
Westerleigh
Graniteville
Arlington
Arrochar
Grasmere
Old Town
Dongan Hills
Midland Beach
Grant City
New Dorp Beach
Bay Terrace
Huguenot
Pleasant Plains
Butler Manor
Charleston
Rossville
Arden Heights
Greenridge
Heartland Village
Chelsea
Bloomfield
Bulls Head
Richmond Town
Shore Acres
Clifton
Concord
Emerson Hill
Randall Manor
Howland Hook
Elm Park
Manor Heights
Willowbrook
Sandy Ground
Egbertville
Prince's Bay
Lighthouse Hill
Richmond Valley
Fox Hills
done


In [94]:
##examinng the new data frame. We requested up to 1000 venue records, so the shape should be around that
print(staten_island_venues.shape)
staten_island_venues.head()

(830, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,St. George,40.644982,-74.079353,A&S Pizzeria,40.64394,-74.077626,Pizza Place
1,St. George,40.644982,-74.079353,Beso,40.643306,-74.076508,Tapas Restaurant
2,St. George,40.644982,-74.079353,Staten Island September 11 Memorial,40.646767,-74.07651,Monument / Landmark
3,St. George,40.644982,-74.079353,Richmond County Bank Ballpark,40.645056,-74.076864,Baseball Stadium
4,St. George,40.644982,-74.079353,Shake Shack,40.64366,-74.075891,Burger Joint


## Before getting to deep in the analysis, it would be a good idea to do a spot check on venue types within a neighborhood to see if any can be ruled out immediately

In [95]:
staten_island_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
American Restaurant,13,13,13,13,13,13
Arcade,1,1,1,1,1,1
Art Gallery,1,1,1,1,1,1
Art Museum,1,1,1,1,1,1
Arts & Crafts Store,1,1,1,1,1,1
Asian Restaurant,2,2,2,2,2,2
Athletics & Sports,4,4,4,4,4,4
BBQ Joint,2,2,2,2,2,2
Bagel Shop,22,22,22,22,22,22
Bakery,8,8,8,8,8,8


In [96]:
staten_island_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Annadale,14,14,14,14,14,14
Arden Heights,4,4,4,4,4,4
Arlington,8,8,8,8,8,8
Arrochar,18,18,18,18,18,18
Bay Terrace,10,10,10,10,10,10
Bloomfield,4,4,4,4,4,4
Bulls Head,45,45,45,45,45,45
Butler Manor,6,6,6,6,6,6
Castleton Corners,19,19,19,19,19,19
Charleston,31,31,31,31,31,31


In [98]:
print('There are {} uniques categories.'.format(len(staten_island_venues['Venue Category'].unique())))

There are 180 uniques categories.


#### Now we want to evaluate venues at the neighborhood level 

In [99]:
# one hot encoding
staten_island_onehot = pd.get_dummies(staten_island_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
staten_island_onehot['Neighborhood'] = staten_island_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [staten_island_onehot.columns[-1]] + list(staten_island_onehot.columns[:-1])
staten_island_onehot = staten_island_onehot[fixed_columns]

staten_island_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Boarding House,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lawyer,Liquor Store,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightlife Spot,Optical Shop,Outdoors & Recreation,Outlet Mall,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Racetrack,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Russian Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [100]:
staten_island_onehot.shape


(830, 181)

#### Next we can start making groupings

In [151]:
## Grouping neighborhood rows via frequency of each category
staten_island_grouped =staten_island_onehot.groupby('Neighborhood').mean().reset_index()
staten_island_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Boarding House,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lawyer,Liquor Store,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightlife Spot,Optical Shop,Outdoors & Recreation,Outlet Mall,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Racetrack,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Russian Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,Annadale,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0
1,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Arlington,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bay Terrace,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
5,Bloomfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bulls Head,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.022222,0.0,0.0,0.0,0.044444,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.0,0.044444,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.044444,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.044444,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0
7,Butler Manor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Castleton Corners,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.105263,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Charleston,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0


In [102]:
staten_island_grouped.shape

(62, 181)

### Now we can start looking at neighborhoods and their top venues

In [104]:
num_top_venues = 5

for hood in staten_island_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = staten_island_grouped[staten_island_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Annadale----
                 venue  freq
0          Pizza Place  0.21
1  American Restaurant  0.07
2                 Park  0.07
3           Restaurant  0.07
4        Deli / Bodega  0.07


----Arden Heights----
         venue  freq
0  Pizza Place  0.25
1     Bus Stop  0.25
2  Coffee Shop  0.25
3     Pharmacy  0.25
4         Pier  0.00


----Arlington----
                 venue  freq
0             Bus Stop  0.25
1  American Restaurant  0.12
2         Home Service  0.12
3        Boat or Ferry  0.12
4           Playground  0.12


----Arrochar----
                venue  freq
0  Italian Restaurant  0.11
1       Deli / Bodega  0.11
2            Bus Stop  0.11
3               Hotel  0.06
4      Sandwich Place  0.06


----Bay Terrace----
                venue  freq
0         Supermarket   0.2
1  Italian Restaurant   0.1
2       Train Station   0.1
3    Sushi Restaurant   0.1
4        Home Service   0.1


----Bloomfield----
                 venue  freq
0           Theme Park  0.25
1    Recr

In [105]:
##placing the output in a df, function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [116]:
##creating the datafram 
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = staten_island_grouped['Neighborhood']

for ind in np.arange(staten_island_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(staten_island_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Annadale,Pizza Place,American Restaurant,Sports Bar,Food,Park
1,Arden Heights,Coffee Shop,Pizza Place,Bus Stop,Pharmacy,Yoga Studio
2,Arlington,Bus Stop,American Restaurant,Boat or Ferry,Home Service,Playground
3,Arrochar,Deli / Bodega,Italian Restaurant,Bus Stop,Supermarket,Mediterranean Restaurant
4,Bay Terrace,Supermarket,Insurance Office,Italian Restaurant,Home Service,Train Station
5,Bloomfield,Recreation Center,Bus Stop,Burger Joint,Theme Park,Diner
6,Bulls Head,Pizza Place,Bus Stop,Sandwich Place,Gift Shop,Pharmacy
7,Butler Manor,Baseball Field,Pool,Convenience Store,Bus Stop,Yoga Studio
8,Castleton Corners,Pizza Place,Bank,Ice Cream Shop,Go Kart Track,Sandwich Place
9,Charleston,Big Box Store,Coffee Shop,Cosmetics Shop,Irish Pub,Grocery Store


#### we now have a df that can be used to cluster neighborhoods based on common venues and can start generating predictions on where to open up shop.

In [117]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = staten_island_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 1, 2, 2, 1, 2, 2, 2, 2], dtype=int32)

In [123]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

staten_island_merged = staten_island_data

# merge nyc_grouped with neighborhood to add latitude/longitude for each neighborhood
staten_island_merged = staten_island_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


### Note attempting to store the df again resulted in an error b/c the df already existed

ValueError: cannot insert Cluster Labels, already exists

In [124]:
staten_island_merged.head(100)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Staten Island,St. George,40.644982,-74.079353,2.0,Clothing Store,Park,Sporting Goods Shop,Italian Restaurant,Bar
1,Staten Island,New Brighton,40.640615,-74.087017,1.0,Deli / Bodega,Bus Stop,Park,Discount Store,Playground
2,Staten Island,Stapleton,40.626928,-74.077902,2.0,Pizza Place,Discount Store,Mexican Restaurant,Bank,Bar
3,Staten Island,Rosebank,40.615305,-74.069805,2.0,Italian Restaurant,Grocery Store,Cosmetics Shop,Discount Store,Café
4,Staten Island,West Brighton,40.631879,-74.107182,2.0,Bank,Coffee Shop,Italian Restaurant,Music Store,Diner
5,Staten Island,Grymes Hill,40.624185,-74.087248,0.0,Deli / Bodega,Dog Run,Fast Food Restaurant,French Restaurant,Food Truck
6,Staten Island,Todt Hill,40.597069,-74.111329,4.0,Trail,Park,Yoga Studio,Farmers Market,Food Truck
7,Staten Island,South Beach,40.580247,-74.079553,0.0,Deli / Bodega,Pier,Athletics & Sports,Beach,Fast Food Restaurant
8,Staten Island,Port Richmond,40.633669,-74.129434,2.0,Rental Car Location,Food,Donut Shop,Pizza Place,Bus Stop
9,Staten Island,Mariner's Harbor,40.632546,-74.150085,0.0,Deli / Bodega,Italian Restaurant,Supermarket,Athletics & Sports,Fast Food Restaurant


#### During review of the data frames ahaed of the clustering visulaization it was determined that 1 neighborhood did not have any venues and thus was dropped.

In [130]:
## drop NaN row as there are no venues in holwand hook
staten_island_clean=staten_island_merged.drop(staten_island_merged.index[53])
staten_island_clean.head(100)



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Staten Island,St. George,40.644982,-74.079353,2.0,Clothing Store,Park,Sporting Goods Shop,Italian Restaurant,Bar
1,Staten Island,New Brighton,40.640615,-74.087017,1.0,Deli / Bodega,Bus Stop,Park,Discount Store,Playground
2,Staten Island,Stapleton,40.626928,-74.077902,2.0,Pizza Place,Discount Store,Mexican Restaurant,Bank,Bar
3,Staten Island,Rosebank,40.615305,-74.069805,2.0,Italian Restaurant,Grocery Store,Cosmetics Shop,Discount Store,Café
4,Staten Island,West Brighton,40.631879,-74.107182,2.0,Bank,Coffee Shop,Italian Restaurant,Music Store,Diner
5,Staten Island,Grymes Hill,40.624185,-74.087248,0.0,Deli / Bodega,Dog Run,Fast Food Restaurant,French Restaurant,Food Truck
6,Staten Island,Todt Hill,40.597069,-74.111329,4.0,Trail,Park,Yoga Studio,Farmers Market,Food Truck
7,Staten Island,South Beach,40.580247,-74.079553,0.0,Deli / Bodega,Pier,Athletics & Sports,Beach,Fast Food Restaurant
8,Staten Island,Port Richmond,40.633669,-74.129434,2.0,Rental Car Location,Food,Donut Shop,Pizza Place,Bus Stop
9,Staten Island,Mariner's Harbor,40.632546,-74.150085,0.0,Deli / Bodega,Italian Restaurant,Supermarket,Athletics & Sports,Fast Food Restaurant


## Now have our data set in one frame along with the cluster label and can visualize our clusters

In [131]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(staten_island_clean['Latitude'], staten_island_clean['Longitude'], staten_island_clean['Neighborhood'], staten_island_clean['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, the clusters provided sufficient information to deliver a reconmmendation

## Cluster 1

In [134]:
staten_island_merged.loc[staten_island_merged['Cluster Labels'] == 0, staten_island_merged.columns[[1] + list(range(5, staten_island_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Grymes Hill,Deli / Bodega,Dog Run,Fast Food Restaurant,French Restaurant,Food Truck
7,South Beach,Deli / Bodega,Pier,Athletics & Sports,Beach,Fast Food Restaurant
9,Mariner's Harbor,Deli / Bodega,Italian Restaurant,Supermarket,Athletics & Sports,Fast Food Restaurant


## Cluster 2

In [136]:
staten_island_merged.loc[staten_island_merged['Cluster Labels'] == 1, staten_island_merged.columns[[1] + list(range(5, staten_island_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,New Brighton,Deli / Bodega,Bus Stop,Park,Discount Store,Playground
24,Park Hill,Bus Stop,Gym / Fitness Center,Hotel,Athletics & Sports,Coffee Shop
26,Graniteville,Bus Stop,Boat or Ferry,Grocery Store,Yoga Studio,Gas Station
27,Arlington,Bus Stop,American Restaurant,Boat or Ferry,Home Service,Playground
45,Bloomfield,Recreation Center,Bus Stop,Burger Joint,Theme Park,Diner
56,Willowbrook,Bus Stop,Chinese Restaurant,Intersection,Bagel Shop,Filipino Restaurant
62,Fox Hills,Sandwich Place,Bus Stop,Yoga Studio,Farmers Market,Food Truck


## Cluster 3

In [140]:
staten_island_merged.loc[staten_island_merged['Cluster Labels'] == 2, staten_island_merged.columns[[2] + list(range(5, staten_island_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,40.644982,Clothing Store,Park,Sporting Goods Shop,Italian Restaurant,Bar
2,40.626928,Pizza Place,Discount Store,Mexican Restaurant,Bank,Bar
3,40.615305,Italian Restaurant,Grocery Store,Cosmetics Shop,Discount Store,Café
4,40.631879,Bank,Coffee Shop,Italian Restaurant,Music Store,Diner
8,40.633669,Rental Car Location,Food,Donut Shop,Pizza Place,Bus Stop
11,40.613336,Pizza Place,Bank,Ice Cream Shop,Go Kart Track,Sandwich Place
12,40.594252,Pizza Place,Mobile Phone Shop,Bagel Shop,Chinese Restaurant,Coffee Shop
13,40.586314,Hotel,Deli / Bodega,Bowling Alley,Gym / Fitness Center,Spanish Restaurant
14,40.572572,Italian Restaurant,Pizza Place,Dessert Shop,Bakery,Sandwich Place
15,40.558462,Bar,Nightlife Spot,Lawyer,Playground,Farmers Market


## Cluster 4

In [139]:
staten_island_merged.loc[staten_island_merged['Cluster Labels'] == 3, staten_island_merged.columns[[1] + list(range(5, staten_island_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Port Ivory,Bus Station,Yoga Studio,Gas Station,French Restaurant,Food Truck


## Cluster 5

In [141]:
staten_island_merged.loc[staten_island_merged['Cluster Labels'] == 4, staten_island_merged.columns[[1] + list(range(5, staten_island_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Todt Hill,Trail,Park,Yoga Studio,Farmers Market,Food Truck
52,Randall Manor,Playground,Pizza Place,Bus Stop,Park,Yoga Studio


## Results Interpreting the Model - Clustering

Cluster 3 groups a few neighborhoods and the most common venues of interest
Our most important item is pizza places, followed by other restaurant types, absence of other breweries and potentially public transportation. 
Group 3 is the only group of neighborhoods with pizza being the most common venue
Of the neighborhoods with pizza as the most popular venue only three have restaurants as the top two venues.
Eltington and Annandale are two of the neighborhoods that meet the most criteria 
It is recommended that either of these to neighborhoods be selected with the nod going to Annandale as it is close to a park that has the potential to host the beer garden depending on zoning.
Additionally, both neighborhoods are in proximity thus regardless of the neighborhood selected, the same pizza places will likely be frequented



## Discussion

The power of data science and key resources like Foursquare's excellent venues list can help decision making in a myriad of ways. Anything from what neighborhood a person would like to move into, to selecting a potential site for a new business. In this exercise we explored the possibility of selecting a location for a new business based on a few key criteria. The tools of data science and ever-increasing data sets makes a few of these involved decisions much easier. Where previously a painstaking survey of neighborhoods and curation of the resulting data would be required, here, we are able to make a predictive model with a few keystrokes (relatively speaking). With the power of the data science approach we identified a business need, identified a problem solving approach and the data that would be required to solve the problem We iterated through the process reviewed our outputs and were able to deliver a business recommendation with the data and thought process to back up our recommendation. Using the available data, we were able to segment down from a large metropolitan area into its constituent neighborhoods and determine the best place for a new venue that will rely on existing venues for its success.

## Conclusion

In this project we worked with a new company called Beerizza, Inc to determine the best location for a new beer garden. New York City was picked by the client for it’s pizza history. We used location data for New York City Neighborhoods and venue data from Foursquare to explore neighborhoods and venues that would meet our criteria - namely neighborhoods that have high density of pizza restaurants and second to pizza restaurants, other fast food places and an absence of Beer Gardens. An additional request was that the search be limited to Staten Island. We used k-means clustering to build our model. The model showed us that Eltingville and Annadale are both likely to be a good location for new beer garden based on the criteria that was given. The neighborhoods themselves have high density of pizza places, fast food restaurants and other activities and are neighboring so either choice would adequate based on the criteria given.