The Battle of the Neighborhoods: Brooklyn Edition

Part 1: Introduction/Business Problem

Introduction:

Your friend is considering opening a coffee shop in the Brooklyn borough of New York City, NY. Brooklyn is the up and coming borough of New York City and they are requesting help selecting a neighborhood to open their shop in to be successful. They have asked for your assistance with your expertise in data analytics with Python to help them select a location for an increased likelihood of success.

Business Problem:

The problem you need to solve is to select a neighborhood to recommend your friend open the coffee shop in. You should employ data analytics with Python to select the optimal neighborhood for the coffee shop based on neighborhood segmenting and clustering as well as analysis of the types of venues in the neighborhood. 

Target Audience/Who Would Care About It:

The target audience of this problem is your friend who is opening the coffee shop as well as any investors or stakeholders involved in the opening of the coffee shop. This presentation will provide a recommendation for the neighborhood to open the coffee shop in as well as provide the documentation of the data analysis performed to inform the recommendation. Your friend, investors, and stakeholders will care about the recommendations and the supporting analysis because it can make them confident that they are making a data informed decision optimizing their success. 

Part 2: Data

Dataset:

For the data to solve this business problem, we will use the dataset of New York City neighborhoods and boroughs at the following link of data collected and stored as a shapefile by NYU. https://geo.nyu.edu/catalog/nyu_2451_34572  

Example of Dataset Contents:

This dataset consists of the 306 neighborhoods in New York City including the neighborhood name, borough, latitude, longitude, geometry type, and annotation. 

What Can Be Extracted from the Dataset:

We can extract each neighborhood including the neighborhood name, borough, latitude, and longitude into a Pandas data frame and then will filter the dataset and data frame to only include the Brooklyn borough. 

How Will It Be Used:

This resultant dataset and data frame can be utilized with the Foursquare data of venues to analyze each neighborhood and make a recommendation. 

Part 3: Data Analysis With Python:

We will need to first import and clean the data. Let's start by importing all necessary libraries. 

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
#pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geo

In [3]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import ipywidgets as widgets
from IPython import display

Then, let's download the New York json data.

In [4]:
#Download New York Json data
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Next, let's load and explore the data.

In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

You can notice that all the relevant data is in the features key. So, let's define a new variable neighborhoods that includes this data and then look at the first entry in the list. 

In [6]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Next, let's transform the list into a pandas data frame. First, let's create an empty data frame. 

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Then, let's loop through the data, fill the data frame one row at a time, and then look at the first five lines. 

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
neighborhoods.head()

AttributeError: 'NoneType' object has no attribute 'items'

  Borough Neighborhood   Latitude  Longitude
0   Bronx    Wakefield  40.894705 -73.847201
1   Bronx   Co-op City  40.874294 -73.829939
2   Bronx  Eastchester  40.887556 -73.827806
3   Bronx    Fieldston  40.895437 -73.905643
4   Bronx    Riverdale  40.890834 -73.912585

Next, lets ensure that the data for all 5 boroughs and 306 neighborhoods has entered the data frame. 

In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Let's use the geopy library to get the latitude and longitude for New York City. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [11]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Then, let's create a map of New York with neighborhoods superimposed on top.

In [12]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

<folium.features.CircleMarker at 0x7f4d7dff9e80>

<folium.features.CircleMarker at 0x7f4d7dff3eb8>

<folium.features.CircleMarker at 0x7f4d7dff3390>

<folium.features.CircleMarker at 0x7f4d7df82be0>

<folium.features.CircleMarker at 0x7f4d7dff3e80>

<folium.features.CircleMarker at 0x7f4d7df82710>

<folium.features.CircleMarker at 0x7f4d7dff3fd0>

<folium.features.CircleMarker at 0x7f4d7dff3f60>

<folium.features.CircleMarker at 0x7f4d7df82358>

<folium.features.CircleMarker at 0x7f4d7df82198>

<folium.features.CircleMarker at 0x7f4d7dff3dd8>

<folium.features.CircleMarker at 0x7f4d7df82470>

<folium.features.CircleMarker at 0x7f4d7dfa45f8>

<folium.features.CircleMarker at 0x7f4d7dfa4ef0>

<folium.features.CircleMarker at 0x7f4d7df82128>

<folium.features.CircleMarker at 0x7f4d7dfa4198>

<folium.features.CircleMarker at 0x7f4d7df82630>

<folium.features.CircleMarker at 0x7f4d7dfa42b0>

<folium.features.CircleMarker at 0x7f4d7dff3710>

<folium.features.CircleMarker at 0x7f4d7df9f080>

<folium.features.CircleMarker at 0x7f4d7dfa4898>

<folium.features.CircleMarker at 0x7f4d7dfa4438>

<folium.features.CircleMarker at 0x7f4d7df9f3c8>

<folium.features.CircleMarker at 0x7f4d7dfb5240>

<folium.features.CircleMarker at 0x7f4d7dff3518>

<folium.features.CircleMarker at 0x7f4d7dfb5550>

<folium.features.CircleMarker at 0x7f4d7dfb5630>

<folium.features.CircleMarker at 0x7f4d7df43be0>

<folium.features.CircleMarker at 0x7f4d7dfb52e8>

<folium.features.CircleMarker at 0x7f4d7df435f8>

<folium.features.CircleMarker at 0x7f4d7df9f8d0>

<folium.features.CircleMarker at 0x7f4d7df43b00>

<folium.features.CircleMarker at 0x7f4d7df43e48>

<folium.features.CircleMarker at 0x7f4d7df40ac8>

<folium.features.CircleMarker at 0x7f4d7df43898>

<folium.features.CircleMarker at 0x7f4d7df40898>

<folium.features.CircleMarker at 0x7f4d7df406d8>

<folium.features.CircleMarker at 0x7f4d7df40f98>

<folium.features.CircleMarker at 0x7f4d7df40b38>

<folium.features.CircleMarker at 0x7f4d7df43400>

<folium.features.CircleMarker at 0x7f4d7df43518>

<folium.features.CircleMarker at 0x7f4d7df43470>

<folium.features.CircleMarker at 0x7f4d7df40b00>

<folium.features.CircleMarker at 0x7f4d7df6e978>

<folium.features.CircleMarker at 0x7f4d7df22940>

<folium.features.CircleMarker at 0x7f4d7df222b0>

<folium.features.CircleMarker at 0x7f4d7df6e438>

<folium.features.CircleMarker at 0x7f4d7df6e2b0>

<folium.features.CircleMarker at 0x7f4d7df40e80>

<folium.features.CircleMarker at 0x7f4d7df220b8>

<folium.features.CircleMarker at 0x7f4d7df1f358>

<folium.features.CircleMarker at 0x7f4d7df22630>

<folium.features.CircleMarker at 0x7f4d7df1f0b8>

<folium.features.CircleMarker at 0x7f4d7df22908>

<folium.features.CircleMarker at 0x7f4d7df6e8d0>

<folium.features.CircleMarker at 0x7f4d7df22ba8>

<folium.features.CircleMarker at 0x7f4d7df1f240>

<folium.features.CircleMarker at 0x7f4d7df02b70>

<folium.features.CircleMarker at 0x7f4d7df1fcc0>

<folium.features.CircleMarker at 0x7f4d7df22240>

<folium.features.CircleMarker at 0x7f4d7df1f588>

<folium.features.CircleMarker at 0x7f4d7df02160>

<folium.features.CircleMarker at 0x7f4d7df02978>

<folium.features.CircleMarker at 0x7f4d7df20898>

<folium.features.CircleMarker at 0x7f4d7df1feb8>

<folium.features.CircleMarker at 0x7f4d7df02940>

<folium.features.CircleMarker at 0x7f4d7df20240>

<folium.features.CircleMarker at 0x7f4d7df02550>

<folium.features.CircleMarker at 0x7f4d7df02c50>

<folium.features.CircleMarker at 0x7f4d7def1ef0>

<folium.features.CircleMarker at 0x7f4d7df206a0>

<folium.features.CircleMarker at 0x7f4d7df02240>

<folium.features.CircleMarker at 0x7f4d7df20f60>

<folium.features.CircleMarker at 0x7f4d7def1f28>

<folium.features.CircleMarker at 0x7f4d7dee47f0>

<folium.features.CircleMarker at 0x7f4d7def12b0>

<folium.features.CircleMarker at 0x7f4d7def1828>

<folium.features.CircleMarker at 0x7f4d7dee44a8>

<folium.features.CircleMarker at 0x7f4d7def1978>

<folium.features.CircleMarker at 0x7f4d7dee44e0>

<folium.features.CircleMarker at 0x7f4d7ded1780>

<folium.features.CircleMarker at 0x7f4d7dee4358>

<folium.features.CircleMarker at 0x7f4d7ded1278>

<folium.features.CircleMarker at 0x7f4d7ded1eb8>

<folium.features.CircleMarker at 0x7f4d7def19b0>

<folium.features.CircleMarker at 0x7f4d7dee46a0>

<folium.features.CircleMarker at 0x7f4d7dee4d30>

<folium.features.CircleMarker at 0x7f4d7dee4048>

<folium.features.CircleMarker at 0x7f4d7ded1cf8>

<folium.features.CircleMarker at 0x7f4d7de88400>

<folium.features.CircleMarker at 0x7f4d7de83828>

<folium.features.CircleMarker at 0x7f4d7dee4e10>

<folium.features.CircleMarker at 0x7f4d7ded1f98>

<folium.features.CircleMarker at 0x7f4d7de83a90>

<folium.features.CircleMarker at 0x7f4d7de88f98>

<folium.features.CircleMarker at 0x7f4d7de83c50>

<folium.features.CircleMarker at 0x7f4d7de88a20>

<folium.features.CircleMarker at 0x7f4d7de88ac8>

<folium.features.CircleMarker at 0x7f4d7de837b8>

<folium.features.CircleMarker at 0x7f4d7dee4128>

<folium.features.CircleMarker at 0x7f4d7de88240>

<folium.features.CircleMarker at 0x7f4d7de83630>

<folium.features.CircleMarker at 0x7f4d7dead9e8>

<folium.features.CircleMarker at 0x7f4d7dead828>

<folium.features.CircleMarker at 0x7f4d7de9afd0>

<folium.features.CircleMarker at 0x7f4d7de9aa90>

<folium.features.CircleMarker at 0x7f4d7de880b8>

<folium.features.CircleMarker at 0x7f4d7deadc18>

<folium.features.CircleMarker at 0x7f4d7de9aba8>

<folium.features.CircleMarker at 0x7f4d7dead160>

<folium.features.CircleMarker at 0x7f4d7dead5f8>

<folium.features.CircleMarker at 0x7f4d7de9ac88>

<folium.features.CircleMarker at 0x7f4d7de49b70>

<folium.features.CircleMarker at 0x7f4d7ded13c8>

<folium.features.CircleMarker at 0x7f4d7deadb70>

<folium.features.CircleMarker at 0x7f4d7de49198>

<folium.features.CircleMarker at 0x7f4d7de54940>

<folium.features.CircleMarker at 0x7f4d7de49518>

<folium.features.CircleMarker at 0x7f4d7de54160>

<folium.features.CircleMarker at 0x7f4d7de54240>

<folium.features.CircleMarker at 0x7f4d7de791d0>

<folium.features.CircleMarker at 0x7f4d7de79ef0>

<folium.features.CircleMarker at 0x7f4d7de9a400>

<folium.features.CircleMarker at 0x7f4d7de79390>

<folium.features.CircleMarker at 0x7f4d7de54e10>

<folium.features.CircleMarker at 0x7f4d7de79160>

<folium.features.CircleMarker at 0x7f4d7de54518>

<folium.features.CircleMarker at 0x7f4d7de79278>

<folium.features.CircleMarker at 0x7f4d7de79668>

<folium.features.CircleMarker at 0x7f4d7de79400>

<folium.features.CircleMarker at 0x7f4d7de548d0>

<folium.features.CircleMarker at 0x7f4d7de797b8>

<folium.features.CircleMarker at 0x7f4d7de55278>

<folium.features.CircleMarker at 0x7f4d7de55be0>

<folium.features.CircleMarker at 0x7f4d7de07400>

<folium.features.CircleMarker at 0x7f4d7de07908>

<folium.features.CircleMarker at 0x7f4d7deade48>

<folium.features.CircleMarker at 0x7f4d7de9a518>

<folium.features.CircleMarker at 0x7f4d7de07128>

<folium.features.CircleMarker at 0x7f4d7de07dd8>

<folium.features.CircleMarker at 0x7f4d7de075f8>

<folium.features.CircleMarker at 0x7f4d7de55f28>

<folium.features.CircleMarker at 0x7f4d7de07cf8>

<folium.features.CircleMarker at 0x7f4d7de07fd0>

<folium.features.CircleMarker at 0x7f4d7de55898>

<folium.features.CircleMarker at 0x7f4d7de34208>

<folium.features.CircleMarker at 0x7f4d7de34ba8>

<folium.features.CircleMarker at 0x7f4d7de35550>

<folium.features.CircleMarker at 0x7f4d7de34b70>

<folium.features.CircleMarker at 0x7f4d7de35940>

<folium.features.CircleMarker at 0x7f4d7ddfd630>

<folium.features.CircleMarker at 0x7f4d7de071d0>

<folium.features.CircleMarker at 0x7f4d7de346d8>

<folium.features.CircleMarker at 0x7f4d7de35438>

<folium.features.CircleMarker at 0x7f4d7de349e8>

<folium.features.CircleMarker at 0x7f4d7de353c8>

<folium.features.CircleMarker at 0x7f4d7ddfdd68>

<folium.features.CircleMarker at 0x7f4d7ddfdda0>

<folium.features.CircleMarker at 0x7f4d7ddfd240>

<folium.features.CircleMarker at 0x7f4d7de35a58>

<folium.features.CircleMarker at 0x7f4d7ddfde10>

<folium.features.CircleMarker at 0x7f4d7de35f28>

<folium.features.CircleMarker at 0x7f4d7ddd79e8>

<folium.features.CircleMarker at 0x7f4d7ddd7400>

<folium.features.CircleMarker at 0x7f4d7ddf8c50>

<folium.features.CircleMarker at 0x7f4d7ddf8978>

<folium.features.CircleMarker at 0x7f4d7ddfd4e0>

<folium.features.CircleMarker at 0x7f4d7de35a90>

<folium.features.CircleMarker at 0x7f4d7ddd7cc0>

<folium.features.CircleMarker at 0x7f4d7ddf8748>

<folium.features.CircleMarker at 0x7f4d7ddfb630>

<folium.features.CircleMarker at 0x7f4d7ddf83c8>

<folium.features.CircleMarker at 0x7f4d7ddfb898>

<folium.features.CircleMarker at 0x7f4d7ddf8da0>

<folium.features.CircleMarker at 0x7f4d7ddd7438>

<folium.features.CircleMarker at 0x7f4d7ddf8ac8>

<folium.features.CircleMarker at 0x7f4d7ddfb710>

<folium.features.CircleMarker at 0x7f4d7ddd26a0>

<folium.features.CircleMarker at 0x7f4d7ddfb048>

<folium.features.CircleMarker at 0x7f4d7ddd2d68>

<folium.features.CircleMarker at 0x7f4d7dd83278>

<folium.features.CircleMarker at 0x7f4d7dd83f28>

<folium.features.CircleMarker at 0x7f4d7ddfbe10>

<folium.features.CircleMarker at 0x7f4d7dd835c0>

<folium.features.CircleMarker at 0x7f4d7ddfb160>

<folium.features.CircleMarker at 0x7f4d7dd83828>

<folium.features.CircleMarker at 0x7f4d7dd832b0>

<folium.features.CircleMarker at 0x7f4d7dd98f60>

<folium.features.CircleMarker at 0x7f4d7ddf8be0>

<folium.features.CircleMarker at 0x7f4d7ddd2e48>

<folium.features.CircleMarker at 0x7f4d7ddd2320>

<folium.features.CircleMarker at 0x7f4d7dd983c8>

<folium.features.CircleMarker at 0x7f4d7dd98320>

<folium.features.CircleMarker at 0x7f4d7dd98780>

<folium.features.CircleMarker at 0x7f4d7ddb3080>

<folium.features.CircleMarker at 0x7f4d7ddfbcc0>

<folium.features.CircleMarker at 0x7f4d7dd98ba8>

<folium.features.CircleMarker at 0x7f4d7ddd2550>

<folium.features.CircleMarker at 0x7f4d7ddb3240>

<folium.features.CircleMarker at 0x7f4d7ddb3b00>

<folium.features.CircleMarker at 0x7f4d7dd89128>

<folium.features.CircleMarker at 0x7f4d7ddb3cc0>

<folium.features.CircleMarker at 0x7f4d7dd89a90>

<folium.features.CircleMarker at 0x7f4d7dd837f0>

<folium.features.CircleMarker at 0x7f4d7dd98908>

<folium.features.CircleMarker at 0x7f4d7dd89c50>

<folium.features.CircleMarker at 0x7f4d7dd891d0>

<folium.features.CircleMarker at 0x7f4d7dd3e668>

<folium.features.CircleMarker at 0x7f4d7dd89fd0>

<folium.features.CircleMarker at 0x7f4d7dd3eb38>

<folium.features.CircleMarker at 0x7f4d7dd89b00>

<folium.features.CircleMarker at 0x7f4d7dd3ed30>

<folium.features.CircleMarker at 0x7f4d7dd3e438>

<folium.features.CircleMarker at 0x7f4d7dd546d8>

<folium.features.CircleMarker at 0x7f4d7dd3eda0>

<folium.features.CircleMarker at 0x7f4d7dd3ecc0>

<folium.features.CircleMarker at 0x7f4d7dd54d68>

<folium.features.CircleMarker at 0x7f4d7dd7a828>

<folium.features.CircleMarker at 0x7f4d7dd89080>

<folium.features.CircleMarker at 0x7f4d7dd3ec88>

<folium.features.CircleMarker at 0x7f4d7dd7ad68>

<folium.features.CircleMarker at 0x7f4d7dd548d0>

<folium.features.CircleMarker at 0x7f4d7dd7ac50>

<folium.features.CircleMarker at 0x7f4d7dd7ac88>

<folium.features.CircleMarker at 0x7f4d7dd54828>

<folium.features.CircleMarker at 0x7f4d7dd49470>

<folium.features.CircleMarker at 0x7f4d7dd3e3c8>

<folium.features.CircleMarker at 0x7f4d7dd54048>

<folium.features.CircleMarker at 0x7f4d7dd7a1d0>

<folium.features.CircleMarker at 0x7f4d7dd7a3c8>

<folium.features.CircleMarker at 0x7f4d7dd49198>

<folium.features.CircleMarker at 0x7f4d7dd236d8>

<folium.features.CircleMarker at 0x7f4d7dd54588>

<folium.features.CircleMarker at 0x7f4d7dd490f0>

<folium.features.CircleMarker at 0x7f4d7dd7acf8>

<folium.features.CircleMarker at 0x7f4d7dd23be0>

<folium.features.CircleMarker at 0x7f4d7dd27470>

<folium.features.CircleMarker at 0x7f4d7dd23cf8>

<folium.features.CircleMarker at 0x7f4d7dd27be0>

<folium.features.CircleMarker at 0x7f4d7dd23668>

<folium.features.CircleMarker at 0x7f4d7dd238d0>

<folium.features.CircleMarker at 0x7f4d7dd27358>

<folium.features.CircleMarker at 0x7f4d7dd27908>

<folium.features.CircleMarker at 0x7f4d7dd27048>

<folium.features.CircleMarker at 0x7f4d7dd330f0>

<folium.features.CircleMarker at 0x7f4d7dd27f98>

<folium.features.CircleMarker at 0x7f4d7dd339b0>

<folium.features.CircleMarker at 0x7f4d7dd118d0>

<folium.features.CircleMarker at 0x7f4d7dd27160>

<folium.features.CircleMarker at 0x7f4d7dd33898>

<folium.features.CircleMarker at 0x7f4d7dd27ef0>

<folium.features.CircleMarker at 0x7f4d7dd33e10>

<folium.features.CircleMarker at 0x7f4d7dd116d8>

<folium.features.CircleMarker at 0x7f4d7dd00320>

<folium.features.CircleMarker at 0x7f4d7dd334a8>

<folium.features.CircleMarker at 0x7f4dd5a01eb8>

<folium.features.CircleMarker at 0x7f4d7dd33668>

<folium.features.CircleMarker at 0x7f4d7dd274e0>

<folium.features.CircleMarker at 0x7f4d7dd11780>

<folium.features.CircleMarker at 0x7f4d7dd00278>

<folium.features.CircleMarker at 0x7f4d7dd00358>

<folium.features.CircleMarker at 0x7f4d7dd00780>

<folium.features.CircleMarker at 0x7f4d7dd11ba8>

<folium.features.CircleMarker at 0x7f4d7dd11860>

<folium.features.CircleMarker at 0x7f4d7dd00898>

<folium.features.CircleMarker at 0x7f4d7de996d8>

<folium.features.CircleMarker at 0x7f4d7de994a8>

<folium.features.CircleMarker at 0x7f4d7de99940>

<folium.features.CircleMarker at 0x7f4d7dcc1400>

<folium.features.CircleMarker at 0x7f4d7de990b8>

<folium.features.CircleMarker at 0x7f4d7dcc1da0>

<folium.features.CircleMarker at 0x7f4d7de99b70>

<folium.features.CircleMarker at 0x7f4d7dcc12e8>

<folium.features.CircleMarker at 0x7f4d7dce1198>

<folium.features.CircleMarker at 0x7f4d7dcc10f0>

<folium.features.CircleMarker at 0x7f4d7de997b8>

<folium.features.CircleMarker at 0x7f4d7dcc11d0>

<folium.features.CircleMarker at 0x7f4d7dce1908>

<folium.features.CircleMarker at 0x7f4d7dce1668>

<folium.features.CircleMarker at 0x7f4d7dcf8048>

<folium.features.CircleMarker at 0x7f4d7dce1b00>

<folium.features.CircleMarker at 0x7f4d7dcc1c18>

<folium.features.CircleMarker at 0x7f4d7dce15f8>

<folium.features.CircleMarker at 0x7f4d7dcf8eb8>

<folium.features.CircleMarker at 0x7f4d7dcf8cc0>

<folium.features.CircleMarker at 0x7f4d7dcf8a90>

<folium.features.CircleMarker at 0x7f4d7dc7d4a8>

<folium.features.CircleMarker at 0x7f4d7dcf80b8>

<folium.features.CircleMarker at 0x7f4d7dc7dd30>

<folium.features.CircleMarker at 0x7f4d7dc7d630>

<folium.features.CircleMarker at 0x7f4d7dc7d5f8>

<folium.features.CircleMarker at 0x7f4d7dc86710>

<folium.features.CircleMarker at 0x7f4d7dc7d6d8>

<folium.features.CircleMarker at 0x7f4d7dc86828>

<folium.features.CircleMarker at 0x7f4d7dc82828>

<folium.features.CircleMarker at 0x7f4d7dc86f28>

<folium.features.CircleMarker at 0x7f4d7dc7def0>

<folium.features.CircleMarker at 0x7f4d7dc7d9b0>

<folium.features.CircleMarker at 0x7f4d7dc863c8>

<folium.features.CircleMarker at 0x7f4d7dc7da90>

<folium.features.CircleMarker at 0x7f4d7dc861d0>

<folium.features.CircleMarker at 0x7f4d7dc7d3c8>

<folium.features.CircleMarker at 0x7f4d7dc82470>

<folium.features.CircleMarker at 0x7f4d7dc805c0>

<folium.features.CircleMarker at 0x7f4d7dc82518>

<folium.features.CircleMarker at 0x7f4d7dc80898>

The business problem being presented is only considering neighborhoods in the Brooklyn borough. So, we need to simplify the map and data frame to only include Brooklyn neighborhoods. 

Frist, let's slice the original data frame to create a new data frame of only Brooklyn data. 

In [13]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

AttributeError: 'NoneType' object has no attribute 'items'

    Borough Neighborhood   Latitude  Longitude
0  Brooklyn    Bay Ridge  40.625801 -74.030621
1  Brooklyn  Bensonhurst  40.611009 -73.995180
2  Brooklyn  Sunset Park  40.645103 -74.010316
3  Brooklyn   Greenpoint  40.730201 -73.954241
4  Brooklyn    Gravesend  40.595260 -73.973471

Then, let's get the geographical coordinates of Brooklyn.

In [14]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


As we did with all of New York City, let's visualize the neighborhoods in Brooklyn. 

In [15]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

<folium.features.CircleMarker at 0x7f4d7da3fc88>

<folium.features.CircleMarker at 0x7f4d7d9ecd30>

<folium.features.CircleMarker at 0x7f4d7da3feb8>

<folium.features.CircleMarker at 0x7f4d7d9eca58>

<folium.features.CircleMarker at 0x7f4d7da25438>

<folium.features.CircleMarker at 0x7f4d7d9ec470>

<folium.features.CircleMarker at 0x7f4d7d9ecf28>

<folium.features.CircleMarker at 0x7f4d7da250b8>

<folium.features.CircleMarker at 0x7f4d7d9ec630>

<folium.features.CircleMarker at 0x7f4d7da3b358>

<folium.features.CircleMarker at 0x7f4d7da25358>

<folium.features.CircleMarker at 0x7f4d7d9eccf8>

<folium.features.CircleMarker at 0x7f4d7da25240>

<folium.features.CircleMarker at 0x7f4d7da3b278>

<folium.features.CircleMarker at 0x7f4d7da3b908>

<folium.features.CircleMarker at 0x7f4d7d9d0320>

<folium.features.CircleMarker at 0x7f4d7da254e0>

<folium.features.CircleMarker at 0x7f4d7da25400>

<folium.features.CircleMarker at 0x7f4d7da3b7b8>

<folium.features.CircleMarker at 0x7f4d7d9d0d30>

<folium.features.CircleMarker at 0x7f4d7d9d87b8>

<folium.features.CircleMarker at 0x7f4d7d9d8da0>

<folium.features.CircleMarker at 0x7f4d7d9d0fd0>

<folium.features.CircleMarker at 0x7f4d7d9d0550>

<folium.features.CircleMarker at 0x7f4d7d9d8470>

<folium.features.CircleMarker at 0x7f4d7d9d0b38>

<folium.features.CircleMarker at 0x7f4d7d9d0c18>

<folium.features.CircleMarker at 0x7f4d7da3ba58>

<folium.features.CircleMarker at 0x7f4d7d9d0240>

<folium.features.CircleMarker at 0x7f4d7d9ce6a0>

<folium.features.CircleMarker at 0x7f4d7d9d8748>

<folium.features.CircleMarker at 0x7f4d7d9ce5c0>

<folium.features.CircleMarker at 0x7f4d7d9d8eb8>

<folium.features.CircleMarker at 0x7f4d7d9d8a58>

<folium.features.CircleMarker at 0x7f4d7d9ce198>

<folium.features.CircleMarker at 0x7f4d7d9a1898>

<folium.features.CircleMarker at 0x7f4d7d9cefd0>

<folium.features.CircleMarker at 0x7f4d7d9a1908>

<folium.features.CircleMarker at 0x7f4d7d9a1400>

<folium.features.CircleMarker at 0x7f4d7d98b9b0>

<folium.features.CircleMarker at 0x7f4d7d9a1748>

<folium.features.CircleMarker at 0x7f4d7d98bb70>

<folium.features.CircleMarker at 0x7f4d7d9ced30>

<folium.features.CircleMarker at 0x7f4d7d98b668>

<folium.features.CircleMarker at 0x7f4d7d9a1cc0>

<folium.features.CircleMarker at 0x7f4d7d98bd68>

<folium.features.CircleMarker at 0x7f4d7d9a1f28>

<folium.features.CircleMarker at 0x7f4d7d98b748>

<folium.features.CircleMarker at 0x7f4d7d9ce860>

<folium.features.CircleMarker at 0x7f4d7d9b80f0>

<folium.features.CircleMarker at 0x7f4d7d9a1320>

<folium.features.CircleMarker at 0x7f4d7d98bbe0>

<folium.features.CircleMarker at 0x7f4d7d9b8c18>

<folium.features.CircleMarker at 0x7f4d7d98bd30>

<folium.features.CircleMarker at 0x7f4d7d9a6470>

<folium.features.CircleMarker at 0x7f4d7d960390>

<folium.features.CircleMarker at 0x7f4d7d9601d0>

<folium.features.CircleMarker at 0x7f4d7d9b82b0>

<folium.features.CircleMarker at 0x7f4d7d9a63c8>

<folium.features.CircleMarker at 0x7f4d7d9b84e0>

<folium.features.CircleMarker at 0x7f4d7d960e80>

<folium.features.CircleMarker at 0x7f4d7d960710>

<folium.features.CircleMarker at 0x7f4d7d9512b0>

<folium.features.CircleMarker at 0x7f4d7d9b8470>

<folium.features.CircleMarker at 0x7f4d7d9a6588>

<folium.features.CircleMarker at 0x7f4d7d9a69b0>

<folium.features.CircleMarker at 0x7f4d7d960ac8>

<folium.features.CircleMarker at 0x7f4d7d951630>

<folium.features.CircleMarker at 0x7f4d7d951048>

<folium.features.CircleMarker at 0x7f4d7d9666a0>

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

First, lets define the Foursquare API credentials. 

In [16]:
CLIENT_ID = '2RBQQAOS5JLFA5XFXJHJH4HYVK0BVLJY1IXCVI234QOZ4QZO' # your Foursquare ID
CLIENT_SECRET = 'VLNLIVI1NUQN1ICJQRLED0DHAE4E4K1ZZTQ2O4IHEQEXNMO4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2RBQQAOS5JLFA5XFXJHJH4HYVK0BVLJY1IXCVI234QOZ4QZO
CLIENT_SECRET:VLNLIVI1NUQN1ICJQRLED0DHAE4E4K1ZZTQ2O4IHEQEXNMO4


Then, lets explore the first neighborhood in the data frame. 

In [17]:
brooklyn_data.loc[0, 'Neighborhood']

'Bay Ridge'

Next, lets get the latitude and longitude values of the data frame. 

In [18]:
neighborhood_latitude = brooklyn_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = brooklyn_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = brooklyn_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bay Ridge are 40.625801065010656, -74.03062069353813.


Now, let's get the top 100 venues that are in Bay Ridge within a radius of 500 meters.

First, let's create the GET request URL. Name your URL url.

In [19]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=2RBQQAOS5JLFA5XFXJHJH4HYVK0BVLJY1IXCVI234QOZ4QZO&client_secret=VLNLIVI1NUQN1ICJQRLED0DHAE4E4K1ZZTQ2O4IHEQEXNMO4&v=20180605&ll=40.625801065010656,-74.03062069353813&radius=500&limit=100'

Send the GET request and examine the results. 

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f35f15c6dd9a7231db4ccf0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Bay Ridge',
  'headerFullLocation': 'Bay Ridge, Brooklyn',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 80,
  'suggestedBounds': {'ne': {'lat': 40.63030106951066,
    'lng': -74.02470273356597},
   'sw': {'lat': 40.62130106051065, 'lng': -74.03653865351028}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b895827f964a5206c2d32e3',
       'name': 'Pilo Arts Day Spa and Salon',
       'location': {'address': '8412 3rd Ave',
        'lat': 40.62474788273414,
        'lng': -74.03059056940135,
        'labeledLatL

Then, let's use the get_category_type from Foursquare. 

In [21]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, lets clean the json and structure it into a pandas dataframe.

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


AttributeError: 'Series' object has no attribute '_mgr'

Then, lets print the number of venues returned by Foursquare for that neighborhood. 

In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

80 venues were returned by Foursquare.


Let's use Foursquare to examine all neighborhoods in Brooklyn. 

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Then, lets use this data to create a new data frame called Brooklyn venues. 

In [25]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )



Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


Let's check the shape and first few rows of the data frame. 

In [26]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(2760, 7)


AttributeError: 'NoneType' object has no attribute 'items'

  Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
0    Bay Ridge              40.625801              -74.030621   
1    Bay Ridge              40.625801              -74.030621   
2    Bay Ridge              40.625801              -74.030621   
3    Bay Ridge              40.625801              -74.030621   
4    Bay Ridge              40.625801              -74.030621   

                         Venue  Venue Latitude  Venue Longitude  \
0  Pilo Arts Day Spa and Salon       40.624748       -74.030591   
1                    Bagel Boy       40.627896       -74.029335   
2                 Pegasus Cafe       40.623168       -74.031186   
3          Leo's Casa Calamari       40.624200       -74.030931   
4                Cocoa Grinder       40.623967       -74.030863   

   Venue Category  
0             Spa  
1      Bagel Shop  
2  Breakfast Spot  
3     Pizza Place  
4       Juice Bar  

 Next, let's check how many venues were returned for each neighborhood. 

In [27]:
brooklyn_venues.groupby('Neighborhood').count()

AttributeError: 'NoneType' object has no attribute 'items'

                    Neighborhood Latitude  Neighborhood Longitude  Venue  \
Neighborhood                                                               
Bath Beach                             47                      47     47   
Bay Ridge                              80                      80     80   
Bedford Stuyvesant                     30                      30     30   
Bensonhurst                            27                      27     27   
Bergen Beach                            6                       6      6   
...                                   ...                     ...    ...   
Vinegar Hill                           29                      29     29   
Weeksville                             16                      16     16   
Williamsburg                           32                      32     32   
Windsor Terrace                        28                      28     28   
Wingate                                22                      22     22   

           

Finally, let's find out how many unique categories can be curated from all the returned venues

In [28]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 292 uniques categories.


Now, we have imported and cleaned all necessary data into Python as well as created a map of Brooklyn. We are ready to proceed with analyzing each neighborhood to select the optimal one to recommend to your friend. 

Let's begin analyzing each neighborhood.

In [29]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[-1]] + list(brooklyn_onehot.columns[:-1])
brooklyn_onehot = brooklyn_onehot[fixed_columns]

brooklyn_onehot.head()

AttributeError: 'NoneType' object has no attribute 'items'

   Yoga Studio  Accessories Store  Adult Boutique  American Restaurant  \
0            0                  0               0                    0   
1            0                  0               0                    0   
2            0                  0               0                    0   
3            0                  0               0                    0   
4            0                  0               0                    0   

   Antique Shop  Arepa Restaurant  Argentinian Restaurant  Art Gallery  \
0             0                 0                       0            0   
1             0                 0                       0            0   
2             0                 0                       0            0   
3             0                 0                       0            0   
4             0                 0                       0            0   

   Arts & Crafts Store  Arts & Entertainment  ...  \
0                    0                     0  ...   
1   

And let's examine the new data frame size. 

In [30]:
brooklyn_onehot.shape

(2760, 292)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped

AttributeError: 'NoneType' object has no attribute 'items'

          Neighborhood  Yoga Studio  Accessories Store  Adult Boutique  \
0           Bath Beach      0.00000                0.0             0.0   
1            Bay Ridge      0.00000                0.0             0.0   
2   Bedford Stuyvesant      0.00000                0.0             0.0   
3          Bensonhurst      0.00000                0.0             0.0   
4         Bergen Beach      0.00000                0.0             0.0   
..                 ...          ...                ...             ...   
65        Vinegar Hill      0.00000                0.0             0.0   
66          Weeksville      0.00000                0.0             0.0   
67        Williamsburg      0.03125                0.0             0.0   
68     Windsor Terrace      0.00000                0.0             0.0   
69             Wingate      0.00000                0.0             0.0   

    American Restaurant  Antique Shop  Arepa Restaurant  \
0              0.000000      0.000000               

Let's confirm the new size. 

In [32]:
brooklyn_grouped.shape

(70, 292)

Let's print each neighborhood along with the top 5 most common venues

In [33]:
num_top_venues = 5

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bath Beach----
                  venue  freq
0            Donut Shop  0.04
1  Fast Food Restaurant  0.04
2       Bubble Tea Shop  0.04
3  Cantonese Restaurant  0.04
4    Italian Restaurant  0.04


----Bay Ridge----
                 venue  freq
0   Italian Restaurant  0.08
1                  Spa  0.06
2          Pizza Place  0.06
3                  Bar  0.04
4  American Restaurant  0.04


----Bedford Stuyvesant----
           venue  freq
0    Coffee Shop  0.10
1            Bar  0.07
2    Pizza Place  0.07
3  Deli / Bodega  0.07
4           Café  0.07


----Bensonhurst----
                venue  freq
0                Park  0.07
1          Donut Shop  0.07
2  Italian Restaurant  0.07
3    Sushi Restaurant  0.07
4         Pizza Place  0.07


----Bergen Beach----
                venue  freq
0     Harbor / Marina  0.33
1          Playground  0.17
2      Baseball Field  0.17
3        Hockey Field  0.17
4  Athletics & Sports  0.17


----Boerum Hill----
               venue  freq
0       Da

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

AttributeError: 'NoneType' object has no attribute 'items'

         Neighborhood 1st Most Common Venue 2nd Most Common Venue  \
0          Bath Beach    Italian Restaurant           Gas Station   
1           Bay Ridge    Italian Restaurant           Pizza Place   
2  Bedford Stuyvesant           Coffee Shop                  Café   
3         Bensonhurst            Donut Shop        Ice Cream Shop   
4        Bergen Beach       Harbor / Marina    Athletics & Sports   

  3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
0              Pharmacy       Bubble Tea Shop  Cantonese Restaurant   
1                   Spa            Bagel Shop   American Restaurant   
2           Pizza Place                   Bar         Deli / Bodega   
3           Pizza Place                  Park    Chinese Restaurant   
4        Baseball Field            Playground          Hockey Field   

  6th Most Common Venue 7th Most Common Venue 8th Most Common Venue  \
0            Donut Shop      Sushi Restaurant  Fast Food Restaurant   
1      Greek Res

Cluster Neighborhoods:
    
Run k-means to cluster the neighborhood into 5 clusters.

In [38]:
# set number of clusters
kclusters = 5

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 3, 3, 3, 3, 3, 2, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

brooklyn_merged = brooklyn_data

# merge brooklyn_grouped with brooklyn_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

AttributeError: 'NoneType' object has no attribute 'items'

    Borough Neighborhood   Latitude  Longitude  Cluster Labels  \
0  Brooklyn    Bay Ridge  40.625801 -74.030621               3   
1  Brooklyn  Bensonhurst  40.611009 -73.995180               3   
2  Brooklyn  Sunset Park  40.645103 -74.010316               3   
3  Brooklyn   Greenpoint  40.730201 -73.954241               3   
4  Brooklyn    Gravesend  40.595260 -73.973471               3   

  1st Most Common Venue 2nd Most Common Venue      3rd Most Common Venue  \
0    Italian Restaurant           Pizza Place                        Spa   
1            Donut Shop        Ice Cream Shop                Pizza Place   
2           Pizza Place                  Bank  Latin American Restaurant   
3                   Bar           Pizza Place                Coffee Shop   
4                Bakery         Metro Station                     Lounge   

  4th Most Common Venue 5th Most Common Venue 6th Most Common Venue  \
0            Bagel Shop   American Restaurant      Greek Restaurant   
1   

Finally, let's visualize the resulting clusters

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<folium.features.CircleMarker at 0x7f4d74b3e400>

<folium.features.CircleMarker at 0x7f4d74b3eb00>

<folium.features.CircleMarker at 0x7f4d74b8bcf8>

<folium.features.CircleMarker at 0x7f4d74b3e320>

<folium.features.CircleMarker at 0x7f4d74bb7898>

<folium.features.CircleMarker at 0x7f4d74bb7198>

<folium.features.CircleMarker at 0x7f4d74b3e208>

<folium.features.CircleMarker at 0x7f4d74b3e978>

<folium.features.CircleMarker at 0x7f4d74bb7a20>

<folium.features.CircleMarker at 0x7f4d74b3e9b0>

<folium.features.CircleMarker at 0x7f4d74b43cf8>

<folium.features.CircleMarker at 0x7f4d74b43f28>

<folium.features.CircleMarker at 0x7f4d74b3ea58>

<folium.features.CircleMarker at 0x7f4d74b3ef28>

<folium.features.CircleMarker at 0x7f4d74b434a8>

<folium.features.CircleMarker at 0x7f4d74b66908>

<folium.features.CircleMarker at 0x7f4d74b43a58>

<folium.features.CircleMarker at 0x7f4d74b66470>

<folium.features.CircleMarker at 0x7f4d74b76400>

<folium.features.CircleMarker at 0x7f4d74bb7d68>

<folium.features.CircleMarker at 0x7f4d74b660f0>

<folium.features.CircleMarker at 0x7f4d74b437b8>

<folium.features.CircleMarker at 0x7f4d74b665c0>

<folium.features.CircleMarker at 0x7f4d74b764a8>

<folium.features.CircleMarker at 0x7f4d74b57358>

<folium.features.CircleMarker at 0x7f4d74b76da0>

<folium.features.CircleMarker at 0x7f4d74bb70f0>

<folium.features.CircleMarker at 0x7f4d74b666d8>

<folium.features.CircleMarker at 0x7f4d74b76c18>

<folium.features.CircleMarker at 0x7f4d74b577b8>

<folium.features.CircleMarker at 0x7f4d74b57400>

<folium.features.CircleMarker at 0x7f4d74afd0b8>

<folium.features.CircleMarker at 0x7f4d74b573c8>

<folium.features.CircleMarker at 0x7f4d74b760b8>

<folium.features.CircleMarker at 0x7f4d74b574a8>

<folium.features.CircleMarker at 0x7f4d74afdd30>

<folium.features.CircleMarker at 0x7f4d74afd390>

<folium.features.CircleMarker at 0x7f4d74b2c048>

<folium.features.CircleMarker at 0x7f4d74afd748>

<folium.features.CircleMarker at 0x7f4d74b2cfd0>

<folium.features.CircleMarker at 0x7f4d74afdfd0>

<folium.features.CircleMarker at 0x7f4d74afddd8>

<folium.features.CircleMarker at 0x7f4d74b2c160>

<folium.features.CircleMarker at 0x7f4d74b0c4a8>

<folium.features.CircleMarker at 0x7f4d74b2c550>

<folium.features.CircleMarker at 0x7f4d74b2cd30>

<folium.features.CircleMarker at 0x7f4d74b0cf98>

<folium.features.CircleMarker at 0x7f4d74afd780>

<folium.features.CircleMarker at 0x7f4d74b0cba8>

<folium.features.CircleMarker at 0x7f4d74b0ccf8>

<folium.features.CircleMarker at 0x7f4d74b03400>

<folium.features.CircleMarker at 0x7f4d74b0c400>

<folium.features.CircleMarker at 0x7f4d74b03b70>

<folium.features.CircleMarker at 0x7f4d74b03f98>

<folium.features.CircleMarker at 0x7f4d74afda90>

<folium.features.CircleMarker at 0x7f4d74b0c710>

<folium.features.CircleMarker at 0x7f4d74b0e940>

<folium.features.CircleMarker at 0x7f4d74b03438>

<folium.features.CircleMarker at 0x7f4d74b0ee48>

<folium.features.CircleMarker at 0x7f4d74b0e630>

<folium.features.CircleMarker at 0x7f4d74ae5940>

<folium.features.CircleMarker at 0x7f4d74b0c208>

<folium.features.CircleMarker at 0x7f4d74b0e5f8>

<folium.features.CircleMarker at 0x7f4d74ae5fd0>

<folium.features.CircleMarker at 0x7f4d74b0ec88>

<folium.features.CircleMarker at 0x7f4d74ae5400>

<folium.features.CircleMarker at 0x7f4d74ae5da0>

<folium.features.CircleMarker at 0x7f4d74ae59e8>

<folium.features.CircleMarker at 0x7f4d74afd6a0>

<folium.features.CircleMarker at 0x7f4d74b0e128>

Examine Clusters:

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. 

Cluster 1: 

The most common venue in this cluster is a park. There is only one neighborhood in the cluster so it is not very representative. 

In [42]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

AttributeError: 'NoneType' object has no attribute 'items'

     Neighborhood 1st Most Common Venue 2nd Most Common Venue  \
35  Dyker Heights                  Park           Golf Course   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
35            Bagel Shop        Cosmetics Shop          Burger Joint   

   6th Most Common Venue 7th Most Common Venue 8th Most Common Venue  \
35            Food Truck            Food Stand            Food Court   

   9th Most Common Venue 10th Most Common Venue  
35     Food & Drink Shop                   Food  

Cluster 2: 

The most common venue in this cluster is a pool. This cluster only includes one neighborhood so it is not very representative. 

In [43]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

AttributeError: 'NoneType' object has no attribute 'items'

   Neighborhood 1st Most Common Venue 2nd Most Common Venue  \
30  Mill Island                  Pool         Women's Store   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
30           Flea Market                 Field   Filipino Restaurant   

   6th Most Common Venue 7th Most Common Venue 8th Most Common Venue  \
30     Fish & Chips Shop           Fish Market          Fishing Spot   

   9th Most Common Venue 10th Most Common Venue  
30         Fishing Store            Flower Shop  

Cluster 3: 

The most common venue is various types of restaurants including Latin American, fast food, Chinese, Fried Chicken, and a Caribbean restaurant. 

In [44]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

AttributeError: 'NoneType' object has no attribute 'items'

           Neighborhood      1st Most Common Venue 2nd Most Common Venue  \
10        East Flatbush              Moving Target           Supermarket   
14          Brownsville                 Restaurant   Fried Chicken Joint   
25        Cypress Hills  Latin American Restaurant  Fast Food Restaurant   
26        East New York       Fast Food Restaurant    Spanish Restaurant   
28             Canarsie         Chinese Restaurant                   Gym   
29            Flatlands                   Pharmacy   Fried Chicken Joint   
43           Ocean Hill              Deli / Bodega   Fried Chicken Joint   
47  Prospect Park South       Caribbean Restaurant           Pizza Place   
55              Wingate        Fried Chicken Joint            Donut Shop   
56                Rugby              Grocery Store         Deli / Bodega   
57       Remsen Village       Caribbean Restaurant  Fast Food Restaurant   
58             New Lots        Fried Chicken Joint           Pizza Place   
64    Broadw

Cluster 4: 

The most common venue in this cluster is various restaurants including Italian restaurants, pizza places, Chinese restaurants, cafes, delis/bodegas, and a few coffee shops

In [45]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

AttributeError: 'NoneType' object has no attribute 'items'

                 Neighborhood 1st Most Common Venue  \
0                   Bay Ridge    Italian Restaurant   
1                 Bensonhurst            Donut Shop   
2                 Sunset Park           Pizza Place   
3                  Greenpoint                   Bar   
4                   Gravesend                Bakery   
5              Brighton Beach    Russian Restaurant   
6              Sheepshead Bay          Dessert Shop   
7           Manhattan Terrace        Ice Cream Shop   
8                    Flatbush    Chinese Restaurant   
9               Crown Heights           Pizza Place   
11                 Kensington         Grocery Store   
12            Windsor Terrace                  Café   
13           Prospect Heights                   Bar   
15               Williamsburg           Pizza Place   
16                   Bushwick                   Bar   
17         Bedford Stuyvesant           Coffee Shop   
18           Brooklyn Heights           Yoga Studio   
19        

Cluster 5:

The most common venue in this cluster is a food venue. There is only one neighborhood in the cluster so it is not very representative. 

In [46]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

AttributeError: 'NoneType' object has no attribute 'items'

       Neighborhood 1st Most Common Venue 2nd Most Common Venue  \
59  Paerdegat Basin                  Food      Asian Restaurant   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
59         Deli / Bodega      Business Service   Filipino Restaurant   

   6th Most Common Venue 7th Most Common Venue 8th Most Common Venue  \
59     Fish & Chips Shop           Fish Market          Fishing Spot   

   9th Most Common Venue 10th Most Common Venue  
59         Fishing Store            Flea Market  

Results/Discussion: 
    
The previously shown code demonstrates how the neighborhoods in Brooklyn can be split into five cluster as illustrated on the color coded map and demonstrates the most common types of venues in each cluster. The following summarizes the number of neighborhoods and most common venue type for each cluster. 

Cluster 1 (1 neighborhood): Park

Cluster 2 (1 neighborhood): Pool

Cluster 3 (14 neighborhoods): Various types of restaurants including Latin American, fast food, Chinese, Fried Chicken, and a Caribbean restaurant. 

Cluster 4 (53 neighborhoods): Various restaurants including Italian restaurants, pizza places, Chinese restaurants, cafes, delis/bodegas, and a few coffee shops

Cluster 5 (1 neighborhood): Food

Based on this information, I would recommend that a coffee shop be opened in one of the neighborhoods included in cluster 3. The data from these neighborhoods have demonstrated that restaurants of various levels of scale and price have done very well and been successful. The neighborhood community's embrace of these restaurants provide evidence that a coffee shop would be successful. 