# <font color=red> Battle of Neighborhoods Capstone Assignment </font>
***

## <font color=blue>Introduction & Background </font>

### Background and Problem Description
Foursquare is a localized search engine which provides results to users based on location input. It operates on large datasets of specific co-ordinates relating to various venues. Foursquare powers many other mobile applications like Apple Maps, Uber, Snapchat etc.

Consequently, we will be utilizing Foursquare to determine the neighborhood in Manhattan, NY that has the leat amount of venues listed on Foursquare. The result will be useful for people looking to live in the city area, but desire a quieter lifestyle. 
Additionally, this may also be usefull for individuals who want to open a business with the least amount of competitors. 

### Data Discription and Problem Solving
Similar to the previous lab, location data on Manhattan's neighborhoods will be imported into the notebook.
The neighborhoods' location data will be processed, cleaned and clustered for further use. After the location data is sufficiently prepared, we will parse the data through Foursquare's API and record the amount pf venues each neighborhood returns.
The neighborhood with the least amount of venues will provide the solution to our question. 
For further insight, the neighborhoods will be sorted in descending order of amount of venues if for some reason the neighborhood with the least amount of venues is not sufficient for the stakeholder's use.

## <font color=blue> Methodology  </font>

### 1. Download and Organize Manhattan Data

First download all libraries:

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.18.1-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  20.98 MB/s
geopy-1.18.1-p 100% |################################| Time: 0:00:00  28.90 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  53.10 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  28.24 MB/s
vincent-0.4.4- 100% |###################

Download and explore the dataset:

In [3]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data = newyork_data['features']
    
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

Transform the data into a Data Frame

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

# Check borough and neighborhoods of the data
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [7]:
# Check the head of the data
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Get the co-ordinates of NYC from geopy library:

In [8]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of New York City are 40.7308619, -73.9871558.


Cluster only the neighborhoods in Manhattan for analysis and obtain the geographical co-ordinates.

In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


Let's create a map of Manhattan showing all the neighborhoods for venue anlaysis.

In [18]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

### 2. Explore Neighborhoods woth Foursquare

Define Foursquare Credentials;

In [11]:
CLIENT_ID = 'NNSDLLJBQUZHHNIPUDNF4IZCZXKDQYM4C3BK5MRCLQ0BEEIS' # your Foursquare ID
CLIENT_SECRET = 'FXTVJ3E3I3BZ2S0453NMGXKGBS3NQCD2T1EEF5FTMEMALUNO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NNSDLLJBQUZHHNIPUDNF4IZCZXKDQYM4C3BK5MRCLQ0BEEIS
CLIENT_SECRET:FXTVJ3E3I3BZ2S0453NMGXKGBS3NQCD2T1EEF5FTMEMALUNO


Create a function to explore neighborhood venues in Manhattan:

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create a data frame to store Manhattan venues by running the above function with a 1000 venue limit in a 500m radius.

In [13]:
LIMIT = 100
radius = 500

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Check the size of the data frame and print the head.

In [14]:
print(manhattan_venues.shape)
manhattan_venues.head()

(3311, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant


Now we need to check how many venues were returned for each neighborhood and sort them in ascending order.

In [15]:
venue_count = manhattan_venues.groupby('Neighborhood').count()
venue_count = venue_count[['Venue']].sort_values('Venue')
venue_count

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Stuyvesant Town,19
Marble Hill,24
Roosevelt Island,27
Manhattanville,40
Morningside Heights,42
Central Harlem,43
East Harlem,44
Inwood,58
Hudson Yards,58
Hamilton Heights,59


## <font color=blue> Results </font>

The 10 Neighborhoods with the least venues are as follows:

In [16]:
venue_count.head(10)

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Stuyvesant Town,19
Marble Hill,24
Roosevelt Island,27
Manhattanville,40
Morningside Heights,42
Central Harlem,43
East Harlem,44
Inwood,58
Hudson Yards,58
Hamilton Heights,59


In [17]:
venue_count.describe()

Unnamed: 0,Venue
count,40.0
mean,82.775
std,26.531537
min,19.0
25%,62.75
50%,100.0
75%,100.0
max,100.0


We can see from the descriptive statistcis that the mean amount of venues for neighborhood's was approximately 83, whilst the median amount was in excess of 100. This showed that our distribution is positively skewed with a large standard deviation of 27 venues. The neiighborhood with the least venues was Stuyvesnat Town followed by Marble Hill and Roosevelt Island.

## <font color=blue> Discussion </font>

Based on the results, we are able to recommend a list of negihborhoods for someone wanting to live in a quieter neighborhood in Manhattan or wanting to set up a business without much competition. The quietest neighborhood and likely to have the least amount of business competition is Stuyvesant Town, consisting of the least amount of venues in a neighborhood. Marble Hill follows Stuyvesant Town with just 5 extra venues and Roosevelt Island with 8. The maximum value was only set to 100 venues as it is irrelevant in our study since we are only searching for the neighborhoods with the least amount of venues. Another thing to note about the data is that some neighborhood clusters may overlap, therefore we may be able to assume that there are actually less venues in each neighborhood than actually reported.

## <font color=blue> Conclusion </font>

Overall the aim of the study was achieved and provides useful data to stakeholders. The top 10 neighborhoods with the least amount of venues were attained from the Foursquare API and displayed above.