# IBM Data Science Capstone project
This notebook represents the capstone project built as part of IBM Data Science Certification By Coursera.

# Introduction:

## Business Problem:

Before opening a restaurant in a city there are several important factors that must be considered. Stakeholders in this problem are potential business persons who are willing to launch a new restaurant in New Delhi. The objective of this notebook is to suggest a suitable location in New Delhi to launch a restaurant for a given cuisine. Given the cuisine, the solution should be able to suggest a good location for opening a restaurant of that cuisine. 

In this notebook we are considering the Cuisine to be Italian.

So what are the important characteristics of a good location for a restaurant ?

- Visibility of restaurant. An ideal location should be in a well known market with a good customer footprint so that people visiting the area automatically sees your restaurant. 
- Location should not already have many restaurants for the same cuisine.
- Availability of parking space.

# Dataset Description: 

This notebook will make use of FourSquare API to get location data for various venues in New Delhi. Following are the key features in the Dataset:

- Neighborhoods in New Delhi
- Latitude and longitude for each neighborhood
- Names of Venues
- Latitude and longitude values for each venue
- Venue categories

Latitude and Longitude values for different Neighborhoods will be obtained using the library __geopy__ . These values will be then used to get the detail of venues using Foursquare API.

Details of venues will be obtained for each Neighborhood in New Delhi using Foursquare API. Details will include venue name , co-ordinates and categories. These venues will then be filtered to get only restaurants data.


# Methodolgy:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Lets define the neighborhoods of New Delhi as obtained from Wikipedia

In [2]:
new_delhi_neighs = ["Barakhamba Road", "Chanakyapuri", "Connaught Place"
                   , "Gautampuri", "Gole Market", "Golf Links"
                    , "INA Colony", "Khan Market", "Laxmibai Nagar"
                    ,"Pragati Maidan"
                
                   ]

## Get the coordinates of Connaught Place using geopy

In [4]:
# neighborhoods in New Delhi

neighborhood_address = 'Connaught Place, New Delhi, India'

geolocator = Nominatim(user_agent="delhi_explorer")
location = geolocator.geocode(neighborhood_address)
neighborhood_latitude = location.latitude
neighborhood_longitude = location.longitude
print('The geograpical coordinate of Connaught Place, New Delhi are {}, {}.'.format(neighborhood_latitude, neighborhood_longitude))

The geograpical coordinate of Connaught Place, New Delhi are 28.6313827, 77.2197924.


## Define Foursquare credentials

In [5]:
CLIENT_ID = 'LGQEEPA4FF4WWTQF2IZD2VKT5RGGY2RSPQE22WQNFE3B10VC' # your Foursquare ID
CLIENT_SECRET = '5LB5MHDYW25FFTP4RRELXW35GX3NU0NWM1FS5VCVLXGZANLW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LGQEEPA4FF4WWTQF2IZD2VKT5RGGY2RSPQE22WQNFE3B10VC
CLIENT_SECRET:5LB5MHDYW25FFTP4RRELXW35GX3NU0NWM1FS5VCVLXGZANLW


## Get the Foursquare API URL for getting venues around Connaught place

In [6]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=LGQEEPA4FF4WWTQF2IZD2VKT5RGGY2RSPQE22WQNFE3B10VC&client_secret=5LB5MHDYW25FFTP4RRELXW35GX3NU0NWM1FS5VCVLXGZANLW&v=20180605&ll=28.6313827,77.2197924&radius=500&limit=100'

## Hit the URL and get venues around Connaught place

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ecfe56e60ba08001be5e358'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'N.D. Charge 1',
  'headerFullLocation': 'N.D. Charge 1, Delhi',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 71,
  'suggestedBounds': {'ne': {'lat': 28.635882704500006,
    'lng': 77.22490974853339},
   'sw': {'lat': 28.626882695499994, 'lng': 77.21467505146661}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b489b54f964a520595026e3',
       'name': 'Connaught Place | कनॉट प्लेस (Connaught Place)',
       'location': {'address': 'Connaught Place',
        'crossStreet': 'Many streets meet here',
        'lat': 28.6327309443951,
        'lng': 77.22

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to extract the venues data into a pandas dataframe

In [9]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Connaught Place | कनॉट प्लेस (Connaught Place),Plaza,28.632731,77.220018
1,Rajdhani Thali,Indian Restaurant,28.629999,77.220401
2,Fabindia,Clothing Store,28.632012,77.217729
3,Johnny Rockets,Bistro,28.630457,77.219594
4,Farzi Cafe,Molecular Gastronomy Restaurant,28.632581,77.221125


## Lets filter the dataset to only contain restaurants

In [10]:
restaurants = nearby_venues[nearby_venues['categories'].str.contains('restaurant', case=False)].reset_index(drop=True)

In [11]:
restaurants.head()

Unnamed: 0,name,categories,lat,lng
0,Rajdhani Thali,Indian Restaurant,28.629999,77.220401
1,Farzi Cafe,Molecular Gastronomy Restaurant,28.632581,77.221125
2,Pind Balluchi,North Indian Restaurant,28.630318,77.2176
3,HOTEL SARAVANA BHAVAN,South Indian Restaurant,28.632319,77.216445
4,Nando's,Portuguese Restaurant,28.630947,77.219721


Lets Add the Neighborhood column

In [12]:
restaurants['Neighborhood'] = "Connaught Place"
restaurants.head()

Unnamed: 0,name,categories,lat,lng,Neighborhood
0,Rajdhani Thali,Indian Restaurant,28.629999,77.220401,Connaught Place
1,Farzi Cafe,Molecular Gastronomy Restaurant,28.632581,77.221125,Connaught Place
2,Pind Balluchi,North Indian Restaurant,28.630318,77.2176,Connaught Place
3,HOTEL SARAVANA BHAVAN,South Indian Restaurant,28.632319,77.216445,Connaught Place
4,Nando's,Portuguese Restaurant,28.630947,77.219721,Connaught Place


## Lets create a function to repeat above process for all neighborhoods in New Delhi

In [13]:
def getNearbyVenues(names, radius=500):
    
    venues_list=[]
    
    for name in names:        
        
        print('Fetching venues in: ',name)        
        neighborhood_address = '{}, New Delhi, India'.format(name)
        geolocator = Nominatim(user_agent="delhi_explorer")
        location = geolocator.geocode(neighborhood_address)
        lat = location.latitude
        lng = location.longitude
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()
#         print(results)
        
        try:
            results = results["response"]['groups'][0]['items']
        except:
            print('Following error occured while fetching venues for {}'.format(name))
            print(results)
            continue
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
#Lets create an empty DataFrame to hold location information for all New Delhi neighborhoods
new_dl_venues = getNearbyVenues(new_delhi_neighs)

new_dl_venues.head()

Fetching venues in:  Barakhamba Road
Fetching venues in:  Chanakyapuri
Fetching venues in:  Connaught Place
Fetching venues in:  Gautampuri
Fetching venues in:  Gole Market
Fetching venues in:  Golf Links
Fetching venues in:  INA Colony
Fetching venues in:  Khan Market
Fetching venues in:  Laxmibai Nagar
Fetching venues in:  Pragati Maidan


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barakhamba Road,28.629589,77.225138,Tamasha,28.629663,77.221835,Gastropub
1,Barakhamba Road,28.629589,77.225138,Rajdhani Thali,28.629999,77.220401,Indian Restaurant
2,Barakhamba Road,28.629589,77.225138,Chew,28.632002,77.222706,Asian Restaurant
3,Barakhamba Road,28.629589,77.225138,Cha Bar | चा बार,28.63092,77.222194,Tea Room
4,Barakhamba Road,28.629589,77.225138,Barbeque Nation,28.630253,77.220985,BBQ Joint


Lets find unique list of venue categories in New Delhi

In [15]:
new_dl_venues["Venue Category"].value_counts()

Indian Restaurant                  16
Café                               15
Bar                                10
Chinese Restaurant                  8
Coffee Shop                         7
Hotel                               5
Asian Restaurant                    5
BBQ Joint                           4
Lounge                              4
Pub                                 3
Fast Food Restaurant                3
Italian Restaurant                  3
Market                              3
Restaurant                          3
Gastropub                           2
Clothing Store                      2
Train Station                       2
Bakery                              2
Tea Room                            2
Plaza                               2
Deli / Bodega                       2
Portuguese Restaurant               2
Dessert Shop                        2
South Indian Restaurant             2
Salon / Barbershop                  2
Bookstore                           2
Museum      

## Analyze each Neighborhood

In [16]:
# One hot encoding
new_dl_onehot = pd.get_dummies(new_dl_venues["Venue Category"], prefix="", prefix_sep="")

#Add Neighborhood column
new_dl_onehot["Neighborhood"] = new_dl_venues["Neighborhood"]

#Move Neighborhood column to first position
fixed_columns = [new_dl_onehot.columns[-1]] + list(new_dl_onehot.columns[:-1])

new_dl_onehot = new_dl_onehot[fixed_columns]
new_dl_onehot.head()

Unnamed: 0,Neighborhood,ATM,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Garden,Bistro,Bookstore,Boutique,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Deli / Bodega,Dessert Shop,Diner,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gastropub,Golf Course,Historic Site,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Molecular Gastronomy Restaurant,Multicuisine Indian Restaurant,Museum,North Indian Restaurant,Northeast Indian Restaurant,Performing Arts Venue,Plaza,Portuguese Restaurant,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Snack Place,South Indian Restaurant,Spa,Tea Room,Theater,Trail,Train Station,Women's Store
0,Barakhamba Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Barakhamba Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Barakhamba Road,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Barakhamba Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Barakhamba Road,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Now group the venues by Neighborhood

In [17]:
new_dl_venues_grouped = new_dl_onehot.groupby('Neighborhood').mean().reset_index()
new_dl_venues_grouped.head(10)

Unnamed: 0,Neighborhood,ATM,Art Gallery,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Garden,Bistro,Bookstore,Boutique,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Deli / Bodega,Dessert Shop,Diner,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Furniture / Home Store,Gastropub,Golf Course,Historic Site,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Molecular Gastronomy Restaurant,Multicuisine Indian Restaurant,Museum,North Indian Restaurant,Northeast Indian Restaurant,Performing Arts Venue,Plaza,Portuguese Restaurant,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Snack Place,South Indian Restaurant,Spa,Tea Room,Theater,Trail,Train Station,Women's Store
0,Barakhamba Road,0.0,0.0,0.1,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
1,Chanakyapuri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
2,Connaught Place,0.0,0.0,0.014085,0.028169,0.014085,0.084507,0.014085,0.014085,0.0,0.0,0.084507,0.070423,0.014085,0.042254,0.028169,0.014085,0.014085,0.014085,0.0,0.0,0.014085,0.014085,0.014085,0.0,0.014085,0.014085,0.0,0.0,0.014085,0.0,0.0,0.028169,0.0,0.014085,0.140845,0.014085,0.0,0.028169,0.0,0.014085,0.0,0.042254,0.0,0.0,0.0,0.0,0.014085,0.014085,0.014085,0.014085,0.0,0.0,0.014085,0.014085,0.028169,0.0,0.014085,0.0,0.0,0.028169,0.0,0.014085,0.0,0.0,0.0,0.014085
3,Gautampuri,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Gole Market,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0
5,Golf Links,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
6,INA Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
7,Khan Market,0.0,0.0,0.065217,0.021739,0.0,0.065217,0.0,0.0,0.043478,0.021739,0.173913,0.065217,0.021739,0.086957,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.043478,0.0,0.021739,0.021739,0.0,0.0,0.0,0.021739,0.021739,0.021739,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Laxmibai Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
9,Pragati Maidan,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Lets create a new DataFrame and display Top 10 venues for each Neighborhood

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = new_dl_venues_grouped['Neighborhood']

for ind in np.arange(new_dl_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(new_dl_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(20)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barakhamba Road,Indian Restaurant,Hotel Bar,Gastropub,Asian Restaurant,Tea Room,BBQ Joint,Bar,Salon / Barbershop,Historic Site,Flea Market
1,Chanakyapuri,Trail,Performing Arts Venue,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food
2,Connaught Place,Indian Restaurant,Café,Bar,Chinese Restaurant,Lounge,Coffee Shop,Pub,South Indian Restaurant,Hotel,Deli / Bodega
3,Gautampuri,ATM,Diner,Historic Site,Golf Course,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food
4,Gole Market,Hotel,Fabric Shop,Theater,Indian Restaurant,Bakery,Snack Place,Fast Food Restaurant,Café,Japanese Restaurant,Food
5,Golf Links,Golf Course,Spa,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop
6,INA Colony,Restaurant,Train Station,Northeast Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
7,Khan Market,Café,Coffee Shop,Asian Restaurant,Chinese Restaurant,Bar,Indian Restaurant,Bookstore,Mexican Restaurant,Dessert Shop,Market
8,Laxmibai Nagar,Restaurant,Train Station,Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
9,Pragati Maidan,Hotel,Light Rail Station,Art Gallery,Plaza,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Women's Store


## Above table is very useful to answer below question from our business problem
Location should not already have many restaurants for the same cuisine ?

It is clearly evident that Indian Restautant is already a most popular venue in many neighborhoods so it might not be a good idea to open an Indian restaurant as there are already plenty of them


## Lets Cluster Neighborhoods

We will use clustering technique for machine learning since this is a problem of identifying neighborhoods that would be most appropriate for opening an Italian restaurant. Neighborhoods will be clustered based on Top venues in Neighborhoods and this will give us a cluster of our interest.

__K-means__ will be used as the algorithm as it is simple yet popular algorithim for clustering.

In [20]:
# Set number of clusters
k = 5

#Dropping neighborhood column as it is str type and therefore cannot be processed by ML algorithms
new_dl_venues_grouped_clustering = new_dl_venues_grouped.drop('Neighborhood', 1)

#Run K-means clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(new_dl_venues_grouped_clustering)

#Check Cluster labels generated for top 10 rows in data
kmeans.labels_[0:10]


array([1, 2, 1, 0, 1, 4, 3, 1, 3, 1])

In [21]:
# Add Clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

new_dl_neigh = new_dl_venues[['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude']].drop_duplicates().reset_index(drop=True)
new_dl_neigh.head(20)

new_dl_neigh_merged = new_dl_neigh.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
new_dl_neigh_merged.head(20)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barakhamba Road,28.629589,77.225138,1,Indian Restaurant,Hotel Bar,Gastropub,Asian Restaurant,Tea Room,BBQ Joint,Bar,Salon / Barbershop,Historic Site,Flea Market
1,Chanakyapuri,28.594677,77.188521,2,Trail,Performing Arts Venue,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food
2,Connaught Place,28.631383,77.219792,1,Indian Restaurant,Café,Bar,Chinese Restaurant,Lounge,Coffee Shop,Pub,South Indian Restaurant,Hotel,Deli / Bodega
3,Gautampuri,28.51157,77.302623,0,ATM,Diner,Historic Site,Golf Course,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food
4,Gole Market,28.633719,77.205627,1,Hotel,Fabric Shop,Theater,Indian Restaurant,Bakery,Snack Place,Fast Food Restaurant,Café,Japanese Restaurant,Food
5,Golf Links,28.59597,77.231163,4,Golf Course,Spa,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop
6,INA Colony,28.573514,77.209754,3,Restaurant,Train Station,Northeast Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
7,Khan Market,28.600135,77.226491,1,Café,Coffee Shop,Asian Restaurant,Chinese Restaurant,Bar,Indian Restaurant,Bookstore,Mexican Restaurant,Dessert Shop,Market
8,Laxmibai Nagar,28.575419,77.205109,3,Restaurant,Train Station,Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
9,Pragati Maidan,28.623475,77.242528,1,Hotel,Light Rail Station,Art Gallery,Plaza,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Women's Store


## Finally lets visualize resulting clusters

In [22]:
# Lets create a map of neighborhoods centered around New Delhi

# Get co-ordinates of New Delhi
address = 'New Delhi, India'
geolocator = Nominatim(user_agent="delhi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New Delhi are {}, {}.'.format(latitude, longitude))

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(new_dl_neigh_merged['Neighborhood Latitude'], new_dl_neigh_merged['Neighborhood Longitude'], new_dl_neigh_merged['Neighborhood'], new_dl_neigh_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of New Delhi are 28.6141793, 77.2022662.


## Lets Examine clusters now

Cluster 0

In [23]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 0, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Gautampuri,ATM,Diner,Historic Site,Golf Course,Gastropub,Furniture / Home Store,French Restaurant,Food Truck,Food & Drink Shop,Food


Cluster 1

In [24]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 1, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barakhamba Road,Indian Restaurant,Hotel Bar,Gastropub,Asian Restaurant,Tea Room,BBQ Joint,Bar,Salon / Barbershop,Historic Site,Flea Market
2,Connaught Place,Indian Restaurant,Café,Bar,Chinese Restaurant,Lounge,Coffee Shop,Pub,South Indian Restaurant,Hotel,Deli / Bodega
4,Gole Market,Hotel,Fabric Shop,Theater,Indian Restaurant,Bakery,Snack Place,Fast Food Restaurant,Café,Japanese Restaurant,Food
7,Khan Market,Café,Coffee Shop,Asian Restaurant,Chinese Restaurant,Bar,Indian Restaurant,Bookstore,Mexican Restaurant,Dessert Shop,Market
9,Pragati Maidan,Hotel,Light Rail Station,Art Gallery,Plaza,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Women's Store


Cluster 2

In [26]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 2, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chanakyapuri,Trail,Performing Arts Venue,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food


Cluster 3

In [27]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 3, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,INA Colony,Restaurant,Train Station,Northeast Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
8,Laxmibai Nagar,Restaurant,Train Station,Indian Restaurant,Market,Women's Store,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market


Cluster 4

In [28]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 4, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Golf Links,Golf Course,Spa,Women's Store,Flea Market,Donut Shop,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop


## Discussion

From above examination of Clusters it is evident that neighborhoods in Cluster 1 is most suitable for opening a restaurant as it looks like the restaurants hub of New Delhi. It will have a good footprint of customers and a restaurant here will definitely catch the eyes of customers

In [68]:
new_dl_neigh_merged.loc[new_dl_neigh_merged['Cluster Label'] == 1, new_dl_neigh_merged.columns[[0] + list(range(4, new_dl_neigh_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barakhamba Road,Indian Restaurant,Historic Site,Tea Room,Asian Restaurant,BBQ Joint,Hotel Bar,Bar,Salon / Barbershop,Gastropub,Food
2,Connaught Place,Indian Restaurant,Café,Bar,Chinese Restaurant,Coffee Shop,Lounge,BBQ Joint,South Indian Restaurant,Pub,Clothing Store
4,Gole Market,Hotel,Theater,Fabric Shop,Fast Food Restaurant,Snack Place,Bakery,Indian Restaurant,Café,Japanese Restaurant,Flea Market
7,Khan Market,Café,Coffee Shop,Chinese Restaurant,Asian Restaurant,Bar,Hotel,Bookstore,Indian Restaurant,Italian Restaurant,Lounge
9,Pragati Maidan,Hotel,Art Gallery,Plaza,Light Rail Station,Flea Market,Eastern European Restaurant,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Food


First 2 Neighborhoods "Barakhamba road" and "Connaught Place" seems to be dominated by Indian restaurants therefore an Italian restaurant might not perform well in these neighborhoods. 

On the other side, Neighborhoods Gole Market and Pragati Maidan have Hotels as the most common venue which means it will have most number of foreign tourists. This might be a good factor for an Italian restaurant as having a restaurant in these neighborhoods can attract both locals and foreigners. 

## Conclusion

From the above discussion and analysis it seems that Neighborhood __Gole Market__ seems to be most suitable for opening an Italian restaurant due to following reasons:
1. It is a market area with already a good number of customers visiting the area on a Daily basis. Having a restaurant here will definitely catch attention and needs No/very minimal advertisement.
2. Being a market area it must already have some common parking spaces around as people are already visiting this area.
3. It has hotels as most common venue followed by Theater, Fabric, restaurants and cafe. This means that both foreigners and local tourists will be noticing our Italian restaurant consequently providing more business opportunities.
4. It does not have any Italian restaurant in Top 10 venues meaning our restaurant will be unique and provide an opportunity to customers to try something new.


Thanks for taking the time to read this notebook, hope you find the analysis meaningful!