# Capstone Project The Battle of Neighborhoods (Week 1)

## Problem Description
Nowadays, there are a lot of food bloggers in each and every city, who go around discovering places to fillup our stomach and heart everytime we feel hungry.<br>
So here I planned to group together places where you would find different genres of food establishments. <br>
I plan to use a density based clustering algorithm for this project, since I wouldn't know how many such clusters would form in each borough.

## Data Description
The dataset to be used would be the New-York dataset which was previously used in Optional Assignment of Week-2.<br>
The types of establishments I'll be targeting are (but not limited to) restaurants, cafes, bars, delis, bakeries, bistros and more.<br>
I plan segregate these places from others for each borough, and group them together. If possible, I may also provide a ranking for each cluster.

Loading the libarries needed for the project, alongwith my Foursquare API credentials

In [17]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


CLIENT_ID = 'GOB3JCHBYQXPYCANCHMHZFU1JD3JHVQ1JKTDUYKSZQRGG1IH' # your Foursquare ID
CLIENT_SECRET = 'TYXODU3E0FYDVNFBAECEEV2AMDBIK0NYPYRN5MH2FBKNFEVZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print("Credentials Approved !")

Libraries imported.
Credentials Approved !


Loading the JSON file

In [18]:
#!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


Creating a dataframe for the data

In [19]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

neighborhoods_data = newyork_data['features']

Loading data into the dataframe

In [20]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Shuffling the rows within the dataframe<br>(This step is totally optional)<br>Then we view how the dataframe looks like

In [21]:
neighborhoods = neighborhoods.sample(frac=1).reset_index(drop=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Queens,Hillcrest,40.723825,-73.797603
1,Staten Island,Great Kills,40.54948,-74.149324
2,Bronx,Kingsbridge Heights,40.870392,-73.901523
3,Queens,Bellaire,40.733014,-73.738892
4,Queens,Oakland Gardens,40.745619,-73.75495


In [22]:
LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
ny_venues = getNearbyVenues(names=neighborhoods.Borough.unique(),
                                   latitudes=neighborhoods.Latitude,
                                   longitudes=neighborhoods.Longitude,
                                radius=radius)

Queens
Staten Island
Bronx
Brooklyn
Manhattan


In [25]:
ny_venues

Unnamed: 0,Neighborhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Queens,40.723825,-73.797603,D'Angelo Center,40.722717,-73.796049,College Academic Building
1,Queens,40.723825,-73.797603,Carnesecca Arena,40.724029,-73.794458,College Basketball Court
2,Queens,40.723825,-73.797603,Wo Kee Noodle,40.723045,-73.799786,Dim Sum Restaurant
3,Queens,40.723825,-73.797603,Baskin-Robbins,40.723131,-73.799519,Ice Cream Shop
4,Queens,40.723825,-73.797603,Dunkin' Donuts,40.722962,-73.799463,Donut Shop
5,Queens,40.723825,-73.797603,AT&T,40.7228,-73.8009,Mobile Phone Shop
6,Queens,40.723825,-73.797603,7-Eleven,40.723224,-73.800074,Convenience Store
7,Queens,40.723825,-73.797603,The UPS Store,40.725784,-73.792534,Shipping Store
8,Queens,40.723825,-73.797603,Starbucks,40.722781,-73.79608,Coffee Shop
9,Queens,40.723825,-73.797603,Chang Xin Foods Market Inc.,40.723056,-73.80055,Grocery Store
