<a href="https://colab.research.google.com/github/rishabhprashr/Coursera_Capstone/blob/master/battle_of_neighborhoods3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction
USA is a large and ethnically diverse country. Its largest city New York has a long history of international immigration. Asians, Europeans,Africans,etc make up a large amount of this population and New York is at the centre of it. Cultural diversity brings along a difference of preferences ,opinions,background and tastes. Our goal here is ananlyzing the diversity and popularity of restaurants in the neighborhoods of New York to come up with a business strategy and the most suitable neighborhood to establish the business among all the given neighborhoods. We will use Foursquare places API to find out venues and explore the neighborhoods and scrape out details in a given radius. Our goal here is finding the popular cuisines of different neigborhoods to establish a fusion style restaurant catering to different tastes of multi ethnic population in a diverse neighborhood.

#Data
We will require the neighborhood data of New York along with their location in latitudes and longitudes.The Neighborhoods in NY has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

We will use this freely availabel dataset on the internet: https://geo.nyu.edu/catalog/nyu_2451_34572

The dataset contains details of location of different neighborhoods under the given boroughs in json format. We transform the json to a pandas dataframe. We now have a pandas dataframe containing the location data of all the neighborhoods. We define a Foursquare API URI containing the 
credentials to make requests for the given neighborhoods using the locations of the given neighborhoods. We define and transform the dataframes to select the features of our requirements. We find the top 3 popular cuisines of each neighborhoods and then combine the dataset to find the most popular cuisines overall for a fusion restaurant.

For example while making an API call to explore the surroundings of a given neighborhood,we will collect the top venues and segregate them based upon the cuisines in a given radius around the neighborhood and make a similar dataset for all the neighborhoods.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [0]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)


In [0]:
neighborhoods_data = newyork_data['features']


In [4]:
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)
neighborhoods['Borough']

The dataframe has 5 boroughs and 306 neighborhoods.


0              Bronx
1              Bronx
2              Bronx
3              Bronx
4              Bronx
5              Bronx
6          Manhattan
7              Bronx
8              Bronx
9              Bronx
10             Bronx
11             Bronx
12             Bronx
13             Bronx
14             Bronx
15             Bronx
16             Bronx
17             Bronx
18             Bronx
19             Bronx
20             Bronx
21             Bronx
22             Bronx
23             Bronx
24             Bronx
25             Bronx
26             Bronx
27             Bronx
28             Bronx
29             Bronx
30             Bronx
31             Bronx
32             Bronx
33             Bronx
34             Bronx
35             Bronx
36             Bronx
37             Bronx
38             Bronx
39             Bronx
40             Bronx
41             Bronx
42             Bronx
43             Bronx
44             Bronx
45             Bronx
46          Brooklyn
47          B

In [8]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [9]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
 
    folium.Circle([lat, lng], radius=300,popup=label, color='blue', fill=False).add_to(map_newyork) 
    
map_newyork

In [10]:
CLIENT_ID = 'GUAY1GCI335YAPNHNMK5U3CXNCUM1JWM3W4OT1GL3EYQM5KB' # your Foursquare ID
CLIENT_SECRET = 'YVIS5XX2IWORL2FMQ2M3QZV5VZKN2PCRCVESRYJGJSY13LH1' # your Foursquare Secret
VERSION = '20200609' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GUAY1GCI335YAPNHNMK5U3CXNCUM1JWM3W4OT1GL3EYQM5KB
CLIENT_SECRET:YVIS5XX2IWORL2FMQ2M3QZV5VZKN2PCRCVESRYJGJSY13LH1


In [11]:
LIMIT = 200 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/search?categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/search?categoryId=4d4b7105d754a06374d81259&client_id=GUAY1GCI335YAPNHNMK5U3CXNCUM1JWM3W4OT1GL3EYQM5KB&client_secret=YVIS5XX2IWORL2FMQ2M3QZV5VZKN2PCRCVESRYJGJSY13LH1&v=20200609&ll=40.7127281,-74.0060152&radius=1000&limit=200'

In [0]:
results = requests.get(url).json()


In [0]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [15]:
results

{'meta': {'code': 200, 'requestId': '5edf7bc669babe001b81206f'},
 'response': {'confident': False,
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/taco_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d151941735',
      'name': 'Taco Place',
      'pluralName': 'Taco Places',
      'primary': True,
      'shortName': 'Tacos'}],
    'hasPerk': False,
    'id': '5d5f24ec09484500079aee00',
    'location': {'address': '136 Church St',
     'cc': 'US',
     'city': 'New York',
     'country': 'United States',
     'distance': 287,
     'formattedAddress': ['136 Church St',
      'New York, NY 10007',
      'United States'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.714267,
       'lng': -74.008756}],
     'lat': 40.714267,
     'lng': -74.008756,
     'postalCode': '10007',
     'state': 'NY'},
    'name': 'Los Tacos No. 1',
    'referralId': 'v-1591704643'},
   {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net

In [0]:
def get_category_id(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['id']

In [17]:
venues = results['response']['venues']
k=json_normalize(venues)
filt_columns = ['id','name', 'categories', 'location.lat', 'location.lng']
k_venues =k.loc[:, filt_columns]
k_venues['place categories'] = k_venues.apply(get_category_id, axis=1)
k_venues['categories'] = k_venues.apply(get_category_type, axis=1)
k_venues.columns = [col.split(".")[-1] for col in k_venues.columns]

k_venues



  


Unnamed: 0,id,name,categories,lat,lng,place categories
0,5d5f24ec09484500079aee00,Los Tacos No. 1,Taco Place,40.714267,-74.008756,4bf58dd8d48988d151941735
1,49ccd495f964a52091591fe3,Kaffe 1668,Coffee Shop,40.715045,-74.011509,4bf58dd8d48988d1e0931735
2,5c6f03f30802d4002c16884c,Joe’s Pizza,Pizza Place,40.710318,-74.007694,4bf58dd8d48988d1ca941735
3,4b7de017f964a52049d82fe3,Starbucks,Coffee Shop,40.710922,-74.010284,4bf58dd8d48988d1e0931735
4,4ea0afbf9adf1e334e4cc0e6,Laughing Man Coffee & Tea,Coffee Shop,40.717394,-74.010103,4bf58dd8d48988d1e0931735
5,4afd9156f964a520a82822e3,Burger King,Fast Food Restaurant,40.709677,-74.011887,4bf58dd8d48988d16e941735
6,4b0c402af964a520c33923e3,Stage Door Delicatessen,Sandwich Place,40.711796,-74.010027,4bf58dd8d48988d1c5941735
7,4a56a32ff964a52090b51fe3,Jubilee Marketplace,Grocery Store,40.708241,-74.006468,4bf58dd8d48988d118951735
8,5d754979ea4bf40007c080c4,Tonii’s Fresh Rice Noodle,Chinese Restaurant,40.715671,-73.999074,4bf58dd8d48988d145941735
9,5d39f3d5052ad00008304116,Taco Bell Cantina,Taco Place,40.708428,-74.004985,4bf58dd8d48988d151941735


In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [47]:
manhattan_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [50]:
print(manhattan_venues.shape)
manhattan_venues

(10471, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Pitman Deli,40.894149,-73.845748,Food
1,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
2,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop
3,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898083,-73.850259,Caribbean Restaurant
4,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
5,Wakefield,40.894705,-73.847201,Chef Central,40.891625,-73.844531,Diner
6,Wakefield,40.894705,-73.847201,New China Gardens,40.897796,-73.853388,Asian Restaurant
7,Wakefield,40.894705,-73.847201,Cool Running Restaurant,40.898399,-73.84881,Food
8,Wakefield,40.894705,-73.847201,Louis Pizza,40.898399,-73.84881,Pizza Place
9,Wakefield,40.894705,-73.847201,Baychester Avenue Food Truck,40.890479,-73.842725,Food Truck
