# Staten Island Battle of Neighborhoods

## Introduction

##### In this project I have to take into consideration a couples' request of finding the best neighborhood among the 62 of Staten Island, that answers their search characteristics.

##### The **three features** I have to bear in mind, in order of importance, are:
##### a. the neighborhood needs to have an excellent school
##### b. the housing price needs to fit their budget and be affordable
##### c. there should be no venue in the neighborhood that already caters for vegetarians as the couple would like to start their own business and would like to avoid competition.

##### In this project, I will analyze  Staten Island, New York by converting addresses into their equivalent latitude and longitude values and by using the Foursquare API to explore the neighborhoods. I will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I will use the *k*-means clustering algorithm to complete this task. Finally, the **Folium library** will be used  to visualize the neighborhoods in Staten Island and their emerging clusters. Furthermore I will provide a DataFrame with the **best schools**  and the **average housing price per neighborhood**. The final stage will be identifying the neighborhood that does not have a **Vegetarian Restaurant**.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Staten Island</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>   
    
6. <a href="#item6">Focus on the features required by the clients</a>
    
7. <a href="#item7">Conclusion and recommendations<a/>
    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

<a id='item1'></a>

## 1. Download and Explore Dataset

St. George, Staten Island is part of the great city of New York, therefore I can most conveniently use a dataset that  exists for free on the web. Here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

For the student's convenience, the instructor of this course has already downloaded the file and placed it on the IBM server, so I can simply run a `wget` command and access the data. 

In [17]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

In [18]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
# for a quick look at the data
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

All the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [19]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [20]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

This is the empty dataframe and we have to look at it to confirm that the columns are as intended.

In [21]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


It is time to loop through the data and fill the dataframe one row at a time.

In [22]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Examine the resulting dataframe.

In [23]:
neighborhoods.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.80553
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631
305,Staten Island,Fox Hills,40.617311,-74.08174


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.

In [24]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of New York City are 40.7308619, -73.9871558.


#### Create a map of New York with neighborhoods superimposed on top.

In [13]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

###  However, for my project, as I am interested only in St. George, Staten Island area, I can simplify the above map and segment and cluster only the neighborhoods in it. So I will slice the original dataframe and create a new dataframe of the St. George, Staten Island data.

In [25]:
StGeorge_data = neighborhoods[neighborhoods['Borough'] == 'Staten Island'].reset_index(drop=True)
StGeorge_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Staten Island,St. George,40.644982,-74.079353
1,Staten Island,New Brighton,40.640615,-74.087017
2,Staten Island,Stapleton,40.626928,-74.077902
3,Staten Island,Rosebank,40.615305,-74.069805
4,Staten Island,West Brighton,40.631879,-74.107182


Let's get the geographical coordinates of Staten Island.

In [26]:
address = 'Staten Island, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Staten Island are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Staten Island are 40.5834557, -74.1496048.


## Let's visualize the neighborhoods of Staten Island on a map, to have a better grasp of the data in space.

In [27]:
# create map of Staten Island using latitude and longitude values
map_StGeorge = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(StGeorge_data['Latitude'], StGeorge_data['Longitude'], StGeorge_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_StGeorge)  
    
map_StGeorge

## It's time to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version
##### for safety reasons the following cell of code will be hidden with **# @hidden_cell**

In [28]:
# @hidden_cell
CLIENT_ID = 'APZK5RIP2EXYOCWIYIKWT0HUQTW0RY5SZI4CSQAAP1GLHEB4' # your Foursquare ID
CLIENT_SECRET = 'KU2QIFH2KK5OYITMWUOP23ARPMQIW0BU5LNARWZJS35IDPQN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: APZK5RIP2EXYOCWIYIKWT0HUQTW0RY5SZI4CSQAAP1GLHEB4
CLIENT_SECRET:KU2QIFH2KK5OYITMWUOP23ARPMQIW0BU5LNARWZJS35IDPQN


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [29]:
StGeorge_data.loc[0, 'Neighborhood']

'St. George'

Get the neighborhood's latitude and longitude values.

In [30]:
neighborhood_latitude = StGeorge_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = StGeorge_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = StGeorge_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of St. George are 40.6449815710044, -74.07935312512797.


#### Now, let's get the top 100 venues that are in St. George within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [31]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=APZK5RIP2EXYOCWIYIKWT0HUQTW0RY5SZI4CSQAAP1GLHEB4&client_secret=KU2QIFH2KK5OYITMWUOP23ARPMQIW0BU5LNARWZJS35IDPQN&v=20180605&ll=40.6449815710044,-74.07935312512797&radius=500&limit=100'

Send the GET request and examine the resutls

In [32]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bd0ef491ed2194287c9789f'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 23,
  'suggestedBounds': {'ne': {'lat': 40.6494815755044,
    'lng': -74.07343346476772},
   'sw': {'lat': 40.6404815665044, 'lng': -74.08527278548821}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e62c75a483bd9a9747d8cd8',
       'name': 'Richmond County Bank Ballpark',
       'location': {'address': '75 Richmond Ter',
        'crossStreet': 'at Wall St',
        'lat': 40.645055836227534,
        'lng': 

### All the information is in the *items* key. Before I proceed, therefore I will borrow the **get_category_type** function from a Foursquare lab previously done in this course.

In [33]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now it is time to clean the json and structure it into a *pandas* dataframe.

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Richmond County Bank Ballpark,Baseball Stadium,40.645056,-74.076864
1,Beso,Tapas Restaurant,40.643306,-74.076508
2,Staten Island September 11 Memorial,Monument / Landmark,40.646767,-74.07651
3,A&S Pizzeria,Pizza Place,40.64394,-74.077626
4,Enoteca Maria,Italian Restaurant,40.641941,-74.07732


How many venues were returned by Foursquare?

In [31]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

23 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Neighborhoods in Staten Island.

#### Let's create a function to repeat the same process to all the neighborhoods in Staten Island

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now I have to create a new dataframe called *StatenIsland_venues*.

In [36]:

StatenIsland_venues = getNearbyVenues(names=StGeorge_data['Neighborhood'],
                                   latitudes=StGeorge_data['Latitude'],
                                   longitudes=StGeorge_data['Longitude']
                                  )

St. George
New Brighton
Stapleton
Rosebank
West Brighton
Grymes Hill
Todt Hill
South Beach
Port Richmond
Mariner's Harbor
Port Ivory
Castleton Corners
New Springville
Travis
New Dorp
Oakwood
Great Kills
Eltingville
Annadale
Woodrow
Tottenville
Tompkinsville
Silver Lake
Sunnyside
Park Hill
Westerleigh
Graniteville
Arlington
Arrochar
Grasmere
Old Town
Dongan Hills
Midland Beach
Grant City
New Dorp Beach
Bay Terrace
Huguenot
Pleasant Plains
Butler Manor
Charleston
Rossville
Arden Heights
Greenridge
Heartland Village
Chelsea
Bloomfield
Bulls Head
Richmond Town
Shore Acres
Clifton
Concord
Emerson Hill
Randall Manor
Howland Hook
Elm Park
Manor Heights
Willowbrook
Sandy Ground
Egbertville
Prince's Bay
Lighthouse Hill
Richmond Valley
Fox Hills


#### Check the size of the resulting dataframe

In [34]:
print(StatenIsland_venues.shape)
StatenIsland_venues.head()

(807, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,St. George,40.644982,-74.079353,Richmond County Bank Ballpark,40.645056,-74.076864,Baseball Stadium
1,St. George,40.644982,-74.079353,Beso,40.643306,-74.076508,Tapas Restaurant
2,St. George,40.644982,-74.079353,Staten Island September 11 Memorial,40.646767,-74.07651,Monument / Landmark
3,St. George,40.644982,-74.079353,A&S Pizzeria,40.64394,-74.077626,Pizza Place
4,St. George,40.644982,-74.079353,Enoteca Maria,40.641941,-74.07732,Italian Restaurant


Let's check how many venues were returned for each neighborhood

In [35]:
StatenIsland_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Annadale,9,9,9,9,9,9
Arden Heights,4,4,4,4,4,4
Arlington,4,4,4,4,4,4
Arrochar,18,18,18,18,18,18
Bay Terrace,10,10,10,10,10,10
Bloomfield,4,4,4,4,4,4
Bulls Head,46,46,46,46,46,46
Butler Manor,5,5,5,5,5,5
Castleton Corners,13,13,13,13,13,13
Charleston,30,30,30,30,30,30


#### How many unique categories can be curated from all the returned venues?

In [36]:
print('There are {} uniques categories.'.format(len(StatenIsland_venues['Venue Category'].unique())))

There are 169 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood in Staten Island

In [37]:
# one hot encoding
StatenIsland_onehot = pd.get_dummies(StatenIsland_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
StatenIsland_onehot['Neighborhood'] = StatenIsland_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [StatenIsland_onehot.columns[-1]] + list(StatenIsland_onehot.columns[:-1])
StatenIsland_onehot = StatenIsland_onehot[fixed_columns]

StatenIsland_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Laundromat,Liquor Store,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Optical Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,St. George,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [38]:
StatenIsland_onehot.shape

(807, 170)

#### I will group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [38]:
StatenIsland_grouped = StatenIsland_onehot.groupby('Neighborhood').mean().reset_index()
StatenIsland_grouped.tail()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Laundromat,Liquor Store,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Optical Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
58,Travis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
59,West Brighton,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.081081,0.027027,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0
60,Westerleigh,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61,Willowbrook,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
62,Woodrow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Confirm the new size

In [41]:
StatenIsland_grouped.shape

(63, 170)

#### Print each neighborhood along with the top 5 most common venues

In [42]:
num_top_venues = 5

for hood in StatenIsland_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = StatenIsland_grouped[StatenIsland_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Annadale----
           venue  freq
0          Diner  0.11
1  Train Station  0.11
2     Restaurant  0.11
3     Sports Bar  0.11
4    Pizza Place  0.11


----Arden Heights----
               venue  freq
0           Pharmacy  0.25
1           Bus Stop  0.25
2        Coffee Shop  0.25
3        Pizza Place  0.25
4  Accessories Store  0.00


----Arlington----
                 venue  freq
0             Bus Stop  0.50
1  American Restaurant  0.25
2        Deli / Bodega  0.25
3    Accessories Store  0.00
4                Plaza  0.00


----Arrochar----
                venue  freq
0         Pizza Place  0.11
1  Italian Restaurant  0.11
2       Deli / Bodega  0.11
3            Bus Stop  0.11
4        Liquor Store  0.06


----Bay Terrace----
                venue  freq
0  Italian Restaurant   0.2
1         Supermarket   0.2
2    Sushi Restaurant   0.1
3  Salon / Barbershop   0.1
4    Insurance Office   0.1


----Bloomfield----
               venue  freq
0     Discount Store  0.25
1  Recreation

#### Due to the fact that I have to put all the previously acquired information into a *pandas* dataframe, I need a function to sort the venues.

In [43]:
# this function will sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

I create the new dataframe and display the top 10 venues for each neighborhood.

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = StatenIsland_grouped['Neighborhood']

for ind in np.arange(StatenIsland_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(StatenIsland_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.tail()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,Travis,Deli / Bodega,Bowling Alley,Gym / Fitness Center,Donut Shop,Comedy Club,Pizza Place,Café,Park,Spanish Restaurant,Sports Club
59,West Brighton,Coffee Shop,Italian Restaurant,Music Store,Diner,Pharmacy,Bar,German Restaurant,Supermarket,Salon / Barbershop,Mexican Restaurant
60,Westerleigh,Arcade,Convenience Store,Women's Store,Falafel Restaurant,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop
61,Willowbrook,Bus Stop,Spa,Chinese Restaurant,Women's Store,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop
62,Woodrow,Grocery Store,Pharmacy,Sushi Restaurant,Mexican Restaurant,Coffee Shop,Miscellaneous Shop,Chinese Restaurant,Diner,Liquor Store,Bakery


<a id='item4'></a>

## 4. Cluster Neighborhoods

 *k*-means to cluster the neighborhood into 4 clusters.

In [53]:
# set number of clusters
kclusters = 4

StatenIsland_grouped_clustering = StatenIsland_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(StatenIsland_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 3, 3, 1, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [60]:
StatenIsland_merged = StGeorge_data

# add clustering labels
StatenIsland_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
StatenIsland_merged = StatenIsland_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

StatenIsland_merged.tail() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,Staten Island,Egbertville,40.579119,-74.127272,0,Bagel Shop,Dance Studio,Italian Restaurant,Cosmetics Shop,Tree,Clothing Store,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop
59,Staten Island,Prince's Bay,40.526264,-74.201526,0,Pizza Place,Pharmacy,Italian Restaurant,Sushi Restaurant,Liquor Store,Bagel Shop,Women's Store,French Restaurant,Food Truck,Food & Drink Shop
60,Staten Island,Lighthouse Hill,40.576506,-74.137927,0,Moving Target,Italian Restaurant,Art Museum,Spa,Trail,Café,Massage Studio,Fast Food Restaurant,French Restaurant,Food Truck
61,Staten Island,Richmond Valley,40.519541,-74.229571,1,Bank,Mexican Restaurant,Deli / Bodega,Convenience Store,Sandwich Place,Train Station,Coffee Shop,Fast Food Restaurant,Smoothie Shop,Women's Store
62,Staten Island,Fox Hills,40.617311,-74.08174,0,Bus Stop,Intersection,Sandwich Place,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant


## Visualize the resulting clusters

In [56]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(StatenIsland_merged['Latitude'], StatenIsland_merged['Longitude'], StatenIsland_merged['Neighborhood'], StatenIsland_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 5. Examine Clusters

Now I can determine the discriminating venue categories that distinguish each cluster and based on the defining categories, I can then assign a name to each cluster. 

#### Cluster 1

In [70]:
df1=StatenIsland_merged.loc[StatenIsland_merged['Cluster Labels'] == 0, StatenIsland_merged.columns[[1] + list(range(4, StatenIsland_merged.shape[1]))]]
df1

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,St. George,0,American Restaurant,Pizza Place,Italian Restaurant,Bar,Monument / Landmark,Harbor / Marina,Scenic Lookout,Steakhouse,Baseball Stadium,Tapas Restaurant
1,New Brighton,0,Bus Stop,Park,Deli / Bodega,Bowling Alley,Convenience Store,Playground,Chinese Restaurant,Discount Store,Donut Shop,Eastern European Restaurant
6,Todt Hill,0,Park,Trail,Women's Store,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
7,South Beach,0,Pier,Beach,Deli / Bodega,Bus Stop,Athletics & Sports,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food
8,Port Richmond,0,Rental Car Location,Bus Station,Donut Shop,Martial Arts Dojo,Pizza Place,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
9,Mariner's Harbor,0,Deli / Bodega,Italian Restaurant,Bus Stop,Food,Athletics & Sports,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop
11,Castleton Corners,0,Pizza Place,Japanese Restaurant,Mini Golf,Tattoo Parlor,Grocery Store,Bank,Hardware Store,Bagel Shop,Burger Joint,Ice Cream Shop
13,Travis,0,Deli / Bodega,Bowling Alley,Gym / Fitness Center,Donut Shop,Comedy Club,Pizza Place,Café,Park,Spanish Restaurant,Sports Club
16,Great Kills,0,Pizza Place,Italian Restaurant,Bar,Japanese Restaurant,Bakery,Falafel Restaurant,Chinese Restaurant,Mexican Restaurant,Grocery Store,Pharmacy
20,Tottenville,0,Italian Restaurant,Home Service,Cosmetics Shop,Bus Stop,Mexican Restaurant,Thrift / Vintage Store,Deli / Bodega,Hookah Bar,Event Space,Food & Drink Shop


#### Cluster 2

In [71]:
df2=StatenIsland_merged.loc[StatenIsland_merged['Cluster Labels'] == 1, StatenIsland_merged.columns[[1] + list(range(4, StatenIsland_merged.shape[1]))]]
df2

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Stapleton,1,Pizza Place,Harbor / Marina,Sandwich Place,Café,Bank,Discount Store,Fast Food Restaurant,Italian Restaurant,Sri Lankan Restaurant,New American Restaurant
5,Grymes Hill,1,Bus Stop,American Restaurant,Dog Run,Gym,Women's Store,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
10,Port Ivory,1,Bar,Women's Store,Fast Food Restaurant,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
15,Oakwood,1,Bar,Women's Store,Fast Food Restaurant,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
18,Annadale,1,Restaurant,Train Station,Food,Sports Bar,Park,Liquor Store,Pizza Place,Diner,Cosmetics Shop,Discount Store
19,Woodrow,1,Grocery Store,Pharmacy,Sushi Restaurant,Mexican Restaurant,Coffee Shop,Miscellaneous Shop,Chinese Restaurant,Diner,Liquor Store,Bakery
21,Tompkinsville,1,Deli / Bodega,Park,Chinese Restaurant,Supermarket,Sri Lankan Restaurant,Food Truck,Spanish Restaurant,Bus Stop,Café,Caribbean Restaurant
24,Park Hill,1,Bus Stop,Park,Hotel,Athletics & Sports,Coffee Shop,Gym / Fitness Center,Women's Store,Fast Food Restaurant,Food Truck,Food & Drink Shop
32,Midland Beach,1,Restaurant,Beach,Deli / Bodega,Dessert Shop,Bus Stop,Liquor Store,Bookstore,Food,Basketball Court,Chinese Restaurant
38,Butler Manor,1,Pool,Baseball Field,Convenience Store,Women's Store,Fast Food Restaurant,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food


## Cluster 3

In [72]:
df3=StatenIsland_merged.loc[StatenIsland_merged['Cluster Labels'] == 2, StatenIsland_merged.columns[[1] + list(range(4, StatenIsland_merged.shape[1]))]]
df3

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Huguenot,2,Italian Restaurant,Bridal Shop,Ice Cream Shop,Sandwich Place,Train Station,Spa,Donut Shop,Bank,Fast Food Restaurant,Food Truck
40,Rossville,2,Pizza Place,Bagel Shop,Liquor Store,Grocery Store,Pharmacy,Chinese Restaurant,Deli / Bodega,Convenience Store,Ice Cream Shop,Food Truck


## Cluster 4

In [73]:
df4=StatenIsland_merged.loc[StatenIsland_merged['Cluster Labels'] == 3, StatenIsland_merged.columns[[1] + list(range(4, StatenIsland_merged.shape[1]))]]
df4

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Rosebank,3,Grocery Store,Pizza Place,Italian Restaurant,Breakfast Spot,Deli / Bodega,Donut Shop,Restaurant,Cosmetics Shop,Eastern European Restaurant,Sandwich Place
4,West Brighton,3,Coffee Shop,Italian Restaurant,Music Store,Diner,Pharmacy,Bar,German Restaurant,Supermarket,Salon / Barbershop,Mexican Restaurant
12,New Springville,3,Chinese Restaurant,Pizza Place,Bagel Shop,Accessories Store,Donut Shop,Spa,Soup Place,Shopping Mall,Coffee Shop,Sandwich Place
14,New Dorp,3,Italian Restaurant,Pizza Place,Chinese Restaurant,Indian Restaurant,Bakery,Dessert Shop,Dim Sum Restaurant,Salon / Barbershop,Sandwich Place,Mexican Restaurant
17,Eltingville,3,Pizza Place,Sushi Restaurant,Italian Restaurant,Pharmacy,Fast Food Restaurant,Bank,Diner,Sandwich Place,Grocery Store,Gourmet Shop
26,Graniteville,3,Food Truck,Grocery Store,Bus Stop,Women's Store,Fast Food Restaurant,French Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
27,Arlington,3,Bus Stop,Deli / Bodega,American Restaurant,Fast Food Restaurant,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop
28,Arrochar,3,Pizza Place,Bus Stop,Italian Restaurant,Deli / Bodega,Bagel Shop,Middle Eastern Restaurant,Mediterranean Restaurant,Food Truck,Sandwich Place,Supermarket
30,Old Town,3,Italian Restaurant,Liquor Store,Grocery Store,Mattress Store,Pharmacy,Optical Shop,Bank,Bakery,Pizza Place,Donut Shop
31,Dongan Hills,3,Pizza Place,Bank,Italian Restaurant,Pharmacy,Bagel Shop,Ice Cream Shop,Eastern European Restaurant,Smoke Shop,Fast Food Restaurant,Sushi Restaurant


#### Eventhough the information is quite overwhelming  at this stage, it can be easily organised in one DataFrame

In [74]:
clusters = pd.DataFrame({"Cluster1":df1["Neighborhood"],
                        "Cluster2":df2["Neighborhood"], 
                        "Cluster3":df3["Neighborhood"],
                        "Cluster4":df4["Neighborhood"]})

### Some cleaning of data is needed, and i also need to visualize the data

In [75]:
clusters = clusters.replace(np.nan, ' ', regex=True)
clusters

Unnamed: 0,Cluster1,Cluster2,Cluster3,Cluster4
0,St. George,,,
1,New Brighton,,,
2,,Stapleton,,
3,,,,Rosebank
4,,,,West Brighton
5,,Grymes Hill,,
6,Todt Hill,,,
7,South Beach,,,
8,Port Richmond,,,
9,Mariner's Harbor,,,


In [76]:
new_StatenIsland = StatenIsland_merged.set_index("Neighborhood", drop=True)
new_StatenIsland

Unnamed: 0_level_0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
St. George,Staten Island,40.644982,-74.079353,0,American Restaurant,Pizza Place,Italian Restaurant,Bar,Monument / Landmark,Harbor / Marina,Scenic Lookout,Steakhouse,Baseball Stadium,Tapas Restaurant
New Brighton,Staten Island,40.640615,-74.087017,0,Bus Stop,Park,Deli / Bodega,Bowling Alley,Convenience Store,Playground,Chinese Restaurant,Discount Store,Donut Shop,Eastern European Restaurant
Stapleton,Staten Island,40.626928,-74.077902,1,Pizza Place,Harbor / Marina,Sandwich Place,Café,Bank,Discount Store,Fast Food Restaurant,Italian Restaurant,Sri Lankan Restaurant,New American Restaurant
Rosebank,Staten Island,40.615305,-74.069805,3,Grocery Store,Pizza Place,Italian Restaurant,Breakfast Spot,Deli / Bodega,Donut Shop,Restaurant,Cosmetics Shop,Eastern European Restaurant,Sandwich Place
West Brighton,Staten Island,40.631879,-74.107182,3,Coffee Shop,Italian Restaurant,Music Store,Diner,Pharmacy,Bar,German Restaurant,Supermarket,Salon / Barbershop,Mexican Restaurant
Grymes Hill,Staten Island,40.624185,-74.087248,1,Bus Stop,American Restaurant,Dog Run,Gym,Women's Store,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
Todt Hill,Staten Island,40.597069,-74.111329,0,Park,Trail,Women's Store,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
South Beach,Staten Island,40.580247,-74.079553,0,Pier,Beach,Deli / Bodega,Bus Stop,Athletics & Sports,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food
Port Richmond,Staten Island,40.633669,-74.129434,0,Rental Car Location,Bus Station,Donut Shop,Martial Arts Dojo,Pizza Place,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
Mariner's Harbor,Staten Island,40.632546,-74.150085,0,Deli / Bodega,Italian Restaurant,Bus Stop,Food,Athletics & Sports,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Flower Shop


# 6. Focus on the features required by the clients

 **After having analyzed the neighborhoods of Staten Island thoroughly and knowing the area well enough to make recommendations to my clients, I should focus now on the first requirement of a young family that has as priority a good school for their children.**

**Therefore I shall look for information about the best 5 schools in Staten Island.**

**This will restrict our search tremendously and simplify the process of identifying the best neighbourhoods among the 62.**

**After many hours of searching the internet for information in a format that can be processed by Python, whether it is .csv, json, etc. I have decided that the easiest way is to create my own Dataframe. This I can easily do with data from www.greatschools.org**

In [6]:
# source: https://www.greatschools.org/new-york/staten-island/schools/
StatenIsland_school_ratings = pd.DataFrame(columns=['Neighborhood', 'School', 'Rating'], \
                                         data=[['Huguenot', 'Ps 5 Huguenot', 10], \
                                               ['Sunnyside', 'Ps 35 The Clove Valley School', 10], \
                                               ['Lighthouse Hill', 'Ps 23 Richmondtown', 9], \
                                               ['Travis', 'Ps 26 The Carteret School', 9], \
                                               ['Castleton Corners', 'Ps 29 Bardwell', 8]])
StatenIsland_school_ratings

Unnamed: 0,Neighborhood,School,Rating
0,Huguenot,Ps 5 Huguenot,10
1,Sunnyside,Ps 35 The Clove Valley School,10
2,Lighthouse Hill,Ps 23 Richmondtown,9
3,Travis,Ps 26 The Carteret School,9
4,Castleton Corners,Ps 29 Bardwell,8


### The second requirement is a reasonable housing price in the range of 250k to 400k

**Background information for housing prices in Staten Island.** 

The information necessary for making a dataframe wuth housing prices can be found on https://www.zillow.com/staten-island-new-york-ny/

In [3]:
StatenIsland_Avg_Housing_Prices=pd.DataFrame(columns=['Neighborhood','Average Housing Price in thousands of dollars'], \
                                            data=[['Huguenot', 655.900], \
                                                 ['Sunnyside', 740.000 ], \
                                                 ['Lighthouse Hill', 965.600], \
                                                 ['Travis', 180.000 ], \
                                                 ['Castleton Corners', 582.200]])

# to make it easy to see which neighborhood has the lowest housing price I need to sort the df
StatenIsland_Avg_Housing_Prices.sort_values(by=['Average Housing Price in thousands of dollars'])

Unnamed: 0,Neighborhood,Average Housing Price in thousands of dollars
3,Travis,180.0
4,Castleton Corners,582.2
0,Huguenot,655.9
1,Sunnyside,740.0
2,Lighthouse Hill,965.6


 Analyzing the two dataframes it is obvious that **Travis Neighborhood** is an excellent option for our clients, with a school rating of 9 and the lowest average housing price among the neighborhoods that interest the couple. But at this point in our search we have to take into consideration also the venues in this neighborhood, because if it already has a Spanish restaurant I am forced to find another location.

### The third requirement is to find the neighborhood which does not have a Vegetarian restaurant  so that the new family can set up their own business.

In [41]:
# visualize Travis neighborhood to check if it has a Vegetarian Restaurant
Travis_df=StatenIsland_grouped.iloc[[58]]
Travis_df

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Laundromat,Liquor Store,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Optical Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pub,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
58,Travis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


I need to make sure that this neighborhood does not have any type of Vegetarian Restaurant and it is counterproductive to search manually through such a number of columns and I have the perfect line of coding to do just that without the hassle of looking for the feature myself. I can check if the string 'Vegetarian' is part of the name of any column.


In [47]:
# check if the string 'vegetarian' is part of the name of any venue in Travis neighborhood
vegetarian_columns = [col for col in Travis_df.columns if 'Vegetarian' in col]

print(vegetarian_columns)

['Vegetarian / Vegan Restaurant']


We have only one column name that has the feature 'Vegetarian'. Now we need to check if it has a venue. If there is already a vegetarian restaurant we have to look into another neighborhood.


In [48]:
Travis_df['Vegetarian / Vegan Restaurant']

58    0.0
Name: Vegetarian / Vegan Restaurant, dtype: float64

There are no restaurants with this characteristic in Travis neighborhood in Staten Island, therefore I can conclude with full certainty that this neighborhood answers the clients' needs.

# 7. Conclusion
##### **With all the data analyzed and an obvious conclusion reached I can finally recommend the best neighborhood for buying or renting property in Staten Island according to the desired characteristics.**

**The three features, in order of importance, that I had to take into consideration were: a neighborhood with a good school, with affordable housing and one that did not already have a Vegetarian Restaurant so that the couple could find a proper dwelling for their young family and also a venue for their business.**

**As proven in this analysis the best option for our clients is Travis Neighborhood due to the fact that it answers fully to the requirements.**

And here we are at the end of this project. Thank you for your time and pacience!