# Data Science Capstone - Battle of the Neighborhoods

## Problem
A client wishes to open a new pizza place in New York City and is looking for the best location to do so. The request is to maximize foot traffic while minimizing competition.

## Data
To fulfil the request, we will utilize Foursquare data to segment New York City into its neighborhoods, then get a count of pizza shops in each neighborhood. 

In [3]:
#first, we will import all the libraries we will need
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

In [4]:
#now we'll import the data for NYC's neighborhood coordinates
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


In [5]:
#separating out the important data
neighborhoods_data = newyork_data['features']

#Let's look at a little of the data, just to make sure it imported properly
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [6]:
#convert the JSON file into a dataframe

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
#make sure it worked by looking at the head of our new dataframe
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [7]:
#time to connect to Foursquare
CLIENT_ID = '41UVGBKCELUFGH5URUEESMYG0MDPAMQDWYB0RXERKYZNN1JC' # your Foursquare ID
CLIENT_SECRET = '4CQASKIGY3V2NKDDME3X1LCLZUR02SZ2ISWMQ1QZJFBWXAUW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


From Foursquare, we want to get a count of pizza places in each neighborhood, then a list of all the pizza places with the number of visits to determine the popularity of certain locations. We'll start high level, looking at the number of pizza places in each borough, along with total visits to all the pizza shops in a borough. Once we select a borough, we will do the same for its neighborhoods.


In [20]:
categoryId = '4bf58dd8d48988d1ca941735'  #categoryId for Pizza Place
near = 'Manhattan_NY'

#build a search url for Manhattan
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    near,
    categoryId)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=41UVGBKCELUFGH5URUEESMYG0MDPAMQDWYB0RXERKYZNN1JC&client_secret=4CQASKIGY3V2NKDDME3X1LCLZUR02SZ2ISWMQ1QZJFBWXAUW&v=20180605&near=Manhattan_NY&categoryId=4bf58dd8d48988d1ca941735'

In [22]:
Manhattan = requests.get(url).json()


In [26]:
M_venues = Manhattan['response']['groups'][0]['items']
    
M_nearby_venues = json_normalize(M_venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.location.lat', 'venue.location.lng']
M_nearby_venues =M_nearby_venues.loc[:, filtered_columns]

# clean columns
M_nearby_venues.columns = [col.split(".")[-1] for col in M_nearby_venues.columns]

M_nearby_venues.head()

Unnamed: 0,name,lat,lng
0,B Side,40.763986,-73.988145
1,Joe's Pizza,40.733234,-73.987672
2,Marta,40.744452,-73.984675
3,Joe's Pizza,40.754679,-73.987029
4,New York Pizza Suprema,40.750124,-73.994992


In [29]:
#repeat the process for the other 4 boroughs
near = 'Bronx_NY'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    near,
    categoryId)

Bronx = requests.get(url).json()

B_venues = Bronx['response']['groups'][0]['items']
    
B_nearby_venues = json_normalize(B_venues) # flatten JSON

# filter columns
B_nearby_venues =B_nearby_venues.loc[:, filtered_columns]

# clean columns
B_nearby_venues.columns = [col.split(".")[-1] for col in B_nearby_venues.columns]

B_nearby_venues.head()

Unnamed: 0,name,lat,lng
0,Zero Otto Nove,40.854714,-73.888388
1,Kingsbridge Social Club,40.884545,-73.901964
2,Sam's Pizza,40.879435,-73.905859
3,Louie & Ernie's Pizza,40.83831,-73.828785
4,Nicks Pizza,40.870352,-73.846171


In [30]:
near = 'Queens_NY'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    near,
    categoryId)

Queens = requests.get(url).json()

Q_venues = Queens['response']['groups'][0]['items']
    
Q_nearby_venues = json_normalize(Q_venues) # flatten JSON

# filter columns
Q_nearby_venues =Q_nearby_venues.loc[:, filtered_columns]

# clean columns
Q_nearby_venues.columns = [col.split(".")[-1] for col in Q_nearby_venues.columns]

Q_nearby_venues.head()

Unnamed: 0,name,lat,lng
0,Levante,40.747518,-73.94159
1,Houdini Kitchen Laboratory,40.694077,-73.902269
2,sLICe,40.743741,-73.953689
3,Retro Pizza Cafe,40.758495,-73.918077
4,Rosa's Pizza,40.712168,-73.900103


In [33]:
near = 'Staten_Island_NY'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    near,
    categoryId)

Staten_Island = requests.get(url).json()

S_venues = Staten_Island['response']['groups'][0]['items']
    
S_nearby_venues = json_normalize(S_venues) # flatten JSON

# filter columns
S_nearby_venues =S_nearby_venues.loc[:, filtered_columns]

# clean columns
S_nearby_venues.columns = [col.split(".")[-1] for col in S_nearby_venues.columns]

S_nearby_venues.head()

Unnamed: 0,name,lat,lng
0,Brother's Pizza,40.625096,-74.14399
1,Denino's Pizzeria Tavern,40.630174,-74.140226
2,Lee's Tavern,40.588978,-74.09552
3,Pizzeria Giove,40.572256,-74.113123
4,Joe & Pat Pizzeria and Restaurant,40.613046,-74.122128


In [34]:
near = 'Brooklyn_NY'

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    near,
    categoryId)

Brooklyn = requests.get(url).json()

B_venues = Brooklyn['response']['groups'][0]['items']
    
B_nearby_venues = json_normalize(B_venues) # flatten JSON

# filter columns
B_nearby_venues =B_nearby_venues.loc[:, filtered_columns]

# clean columns
B_nearby_venues.columns = [col.split(".")[-1] for col in B_nearby_venues.columns]

B_nearby_venues.head()

Unnamed: 0,name,lat,lng
0,Paulie Gee’s,40.729801,-73.95852
1,Roberta's Pizza,40.705015,-73.933617
2,Lucali,40.681822,-74.000352
3,Barboncino,40.672104,-73.957412
4,Emmy Squared,40.712166,-73.955705
