<h1>Analysis of Populous Cities of India to find cities with similar restaurant culture</h1>

There are many things one has to consider and plan accordingly when venturing into the Restaurant Business. Some of the important steps involved are deciding the concept of your restaurant, getting investors to fund your restaurant business, evaluating restaurant costs involved, and deciding the location of your restaurant.

Selecting a suitable city for a particular type of restaurant or expansion of an existing restaurant group is a crucial step. The existing restaurant culture of that city influences the acceptance of new restaurants, the inflow of customers, and local competition. Although the restaurant business is growing in India, eating-out culture is not so popular in villages and less populous cities.

So, through this project, we are aiming to find a solution to the problem – "which cities of India are suitable for establishing a particular type of chain of restaurants or for expansion of an already existing restaurant group to other cities and restaurant categories."

**1) Importing the necessary packages and modules** 

In [1]:
import pandas as pd
import requests
! pip install BeautifulSoup4
from bs4 import BeautifulSoup #for scraping data from html pages
! pip install tabulate
from tabulate import tabulate
! pip install lxml



In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.21.0-py_0

The following packages will be UPDATED:

  openssl                                 1.1.1f-h516909a_0 --> 1.1.1g-h51

**2) Collecting the list of populous cities in India from a HTML page(Wikipedia)**

In [3]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_cities_in_India_by_population")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df1 = pd.read_html(str(table))[0]

In [4]:
df1

Unnamed: 0,Rank,City,Population(2011)[3],Population(2001),State or union territory
0,1,Mumbai,12442373,11978450,Maharashtra
1,2,Delhi,11007835,9879172,Delhi
2,3,Bangalore,8436675,4301326,Karnataka
3,4,Hyderabad,6809970,3637483,Telangana
4,5,Ahmedabad,5570585,3520085,Gujarat
5,6,Chennai,4681087,4343645,Tamil Nadu
6,7,Kolkata,4486679,4572876,West Bengal
7,8,Surat,4467797,2433835,Gujarat
8,9,Pune,3124458,2538473,Maharashtra
9,10,Jaipur,3046163,2322575,Rajasthan


**3) Extracting the necessary data and cleaning**

In [5]:
#extracting city data from dataframe and removing external links from columns
df1['City'] = df1['City'].str.replace(r"\[.*\]", "")

In [6]:
#top 45 populous cities in india
df1=df1.head(45)

In [7]:
indian_city= df1[['City']].copy()
indian_city

Unnamed: 0,City
0,Mumbai
1,Delhi
2,Bangalore
3,Hyderabad
4,Ahmedabad
5,Chennai
6,Kolkata
7,Surat
8,Pune
9,Jaipur


**4) Adding location coordinates to each city**

In [11]:

indian_cities=[]
for i in range(len(indian_city)):
    address = indian_city.loc[i, "City"] 
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    indian_cities.append([address, latitude,longitude])

In [12]:
indian_cities= pd.DataFrame(indian_cities, columns=["City", "latitude","longitude"])

In [13]:
indian_cities

Unnamed: 0,City,latitude,longitude
0,Mumbai,18.938771,72.835335
1,Delhi,28.651718,77.221939
2,Bangalore,12.97912,77.5913
3,Hyderabad,17.388786,78.461065
4,Ahmedabad,23.021624,72.579707
5,Chennai,13.080172,80.283833
6,Kolkata,22.545412,88.356775
7,Surat,45.9383,3.2553
8,Pune,18.521428,73.854454
9,Jaipur,26.916194,75.820349


**5) Foursquare credentials for data retrieval**

In [14]:
CLIENT_ID = 'U4ABPKYTJD0LOAAOK33XZTGWEYQ4X3BQ22MQEUPWIVLZO3ST' # Foursquare ID
CLIENT_SECRET = 'ZJDMFCB2AOJQEAHHZ33RU1U0L2M0M0MOV0ZR2D2BCK2HU0PR' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: U4ABPKYTJD0LOAAOK33XZTGWEYQ4X3BQ22MQEUPWIVLZO3ST
CLIENT_SECRET:ZJDMFCB2AOJQEAHHZ33RU1U0L2M0M0MOV0ZR2D2BCK2HU0PR


In [15]:
LIMIT=100

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=60000):
    
     
    restaurants_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?query=restaurant&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        restaurants_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for restaurants_list in restaurants_list for item in restaurants_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
indian_restaurants = getNearbyVenues(names=indian_cities['City'],
                                   latitudes=indian_cities['latitude'],
                                   longitudes=indian_cities['longitude']
                                  )


Mumbai
Delhi
Bangalore
Hyderabad
Ahmedabad
Chennai
Kolkata
Surat
Pune
Jaipur
Kanpur
Nagpur
Lucknow
Visakhapatnam
Thane
Bhopal
Indore
Pimpri-Chinchwad
Patna
Vadodara
Ghaziabad
Ludhiana
Agra
Nashik
Faridabad
Meerut
Rajkot
Kalyan-Dombivli
Vasai-Virar
Varanasi
Srinagar
Aurangabad
Dhanbad
Amritsar
Navi Mumbai
Allahabad
Howrah
Ranchi
Gwalior
Jabalpur
Coimbatore
Vijayawada
Jodhpur
Madurai
Raipur


**6) Venue details of each City**

In [18]:
print(indian_restaurants.shape)
indian_restaurants.head()

(2261, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mumbai,18.938771,72.835335,Food for Thought,18.932031,72.831667,Café
1,Mumbai,18.938771,72.835335,Café Mondegar,18.924219,72.832106,Café
2,Mumbai,18.938771,72.835335,Trishna,18.928619,72.832356,Seafood Restaurant
3,Mumbai,18.938771,72.835335,Narayan Dosa,18.957445,72.813251,Fast Food Restaurant
4,Mumbai,18.938771,72.835335,Shree Thaker Bhojnalay,18.951217,72.828326,Indian Restaurant


In [19]:
indian_restaurants.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agra,41,41,41,41,41,41
Ahmedabad,70,70,70,70,70,70
Allahabad,9,9,9,9,9,9
Amritsar,49,49,49,49,49,49
Aurangabad,10,10,10,10,10,10
Bangalore,81,81,81,81,81,81
Bhopal,25,25,25,25,25,25
Chennai,91,91,91,91,91,91
Coimbatore,72,72,72,72,72,72
Delhi,100,100,100,100,100,100


**7) segmenting and visualizing each venue in terms of venue category**

In [20]:
# one hot encoding
indian_restaurants_onehot = pd.get_dummies(indian_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
indian_restaurants_onehot['City'] = indian_restaurants['City'] 

# move neighborhood column to the first column
fixed_columns = [indian_restaurants_onehot.columns[-1]] + list(indian_restaurants_onehot.columns[:-1])
indian_restaurants_onehot = indian_restaurants_onehot[fixed_columns]

indian_restaurants_onehot.head()

Unnamed: 0,City,Afghan Restaurant,African Restaurant,American Restaurant,Andhra Restaurant,Asian Restaurant,Awadhi Restaurant,BBQ Joint,Bagel Shop,Bakery,Bengali Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burmese Restaurant,Cafeteria,Café,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dhaba,Diner,Donut Shop,Dumpling Restaurant,English Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gujarati Restaurant,Hot Dog Joint,Hyderabadi Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,Mughlai Restaurant,Multicuisine Indian Restaurant,North Indian Restaurant,Northeast Indian Restaurant,Pakistani Restaurant,Parsi Restaurant,Persian Restaurant,Pizza Place,Portuguese Restaurant,Punjabi Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South Indian Restaurant,Southern / Soul Food Restaurant,Steakhouse,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Mumbai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [21]:
indian_restaurants_onehot.shape

(2261, 85)

**8) Mean value of the data values groupedby City**

In [22]:
indian_restaurants_grouped = indian_restaurants_onehot.groupby('City').mean().reset_index()
indian_restaurants_grouped

Unnamed: 0,City,Afghan Restaurant,African Restaurant,American Restaurant,Andhra Restaurant,Asian Restaurant,Awadhi Restaurant,BBQ Joint,Bagel Shop,Bakery,Bengali Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burmese Restaurant,Cafeteria,Café,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dhaba,Diner,Donut Shop,Dumpling Restaurant,English Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gujarati Restaurant,Hot Dog Joint,Hyderabadi Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Maharashtrian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,Mughlai Restaurant,Multicuisine Indian Restaurant,North Indian Restaurant,Northeast Indian Restaurant,Pakistani Restaurant,Parsi Restaurant,Persian Restaurant,Pizza Place,Portuguese Restaurant,Punjabi Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South Indian Restaurant,Southern / Soul Food Restaurant,Steakhouse,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Agra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.121951,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.365854,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.073171,0.02439,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0
1,Ahmedabad,0.0,0.0,0.014286,0.0,0.0,0.0,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.014286,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.228571,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.0,0.071429,0.0,0.071429,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0
2,Allahabad,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amritsar,0.0,0.0,0.040816,0.0,0.020408,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.081633,0.0,0.0,0.183673,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.081633,0.020408,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.183673,0.0,0.0,0.020408,0.020408,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0
4,Aurangabad,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bangalore,0.012346,0.0,0.012346,0.012346,0.024691,0.0,0.0,0.0,0.049383,0.0,0.0,0.0,0.0,0.024691,0.061728,0.0,0.0,0.111111,0.0,0.0,0.024691,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0,0.049383,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.259259,0.0,0.0,0.024691,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.024691,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.049383,0.0,0.024691,0.012346,0.037037,0.012346,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.037037,0.0
6,Bhopal,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Chennai,0.0,0.010989,0.0,0.0,0.021978,0.0,0.032967,0.0,0.021978,0.0,0.010989,0.0,0.0,0.021978,0.021978,0.0,0.0,0.131868,0.021978,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.054945,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.21978,0.0,0.0,0.054945,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.032967,0.010989,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.043956,0.0,0.043956,0.054945,0.021978,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0
8,Coimbatore,0.0,0.0,0.0,0.0,0.027778,0.0,0.013889,0.0,0.069444,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.097222,0.027778,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.013889,0.013889,0.0,0.0,0.013889,0.013889,0.0,0.0,0.0,0.0,0.013889,0.0,0.319444,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.027778,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.097222,0.0
9,Delhi,0.0,0.0,0.04,0.0,0.03,0.0,0.02,0.01,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.17,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.18,0.0,0.01,0.1,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0


In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

**9) Information of 20 most common venues of each City**

In [24]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
indian_restaurants_sorted = pd.DataFrame(columns=columns)
indian_restaurants_sorted['City'] = indian_restaurants_grouped['City']

for ind in np.arange(indian_restaurants_grouped.shape[0]):
    indian_restaurants_sorted.iloc[ind, 1:] = return_most_common_venues(indian_restaurants_grouped.iloc[ind, :], num_top_venues)

indian_restaurants_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Agra,Indian Restaurant,Café,Fast Food Restaurant,Multicuisine Indian Restaurant,Pizza Place,Restaurant,Bistro,Mughlai Restaurant,Fried Chicken Joint,Dhaba,North Indian Restaurant,South Indian Restaurant,Italian Restaurant,Food,Vegetarian / Vegan Restaurant,Comfort Food Restaurant,Fish & Chips Shop,Deli / Bodega,Falafel Restaurant,English Restaurant
1,Ahmedabad,Indian Restaurant,Café,Fast Food Restaurant,Restaurant,Sandwich Place,Vegetarian / Vegan Restaurant,Pizza Place,Diner,Food Court,Bakery,Snack Place,Fried Chicken Joint,Moroccan Restaurant,Chinese Restaurant,Italian Restaurant,Indian Sweet Shop,Mexican Restaurant,BBQ Joint,American Restaurant,Mediterranean Restaurant
2,Allahabad,Fast Food Restaurant,Pizza Place,Indian Restaurant,Café,Restaurant,Vietnamese Restaurant,Dhaba,Diner,Donut Shop,Dumpling Restaurant,English Restaurant,Falafel Restaurant,Fish & Chips Shop,Creperie,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant
3,Amritsar,Café,Pakistani Restaurant,Fast Food Restaurant,Burger Joint,American Restaurant,Chinese Restaurant,Bakery,Indian Restaurant,Italian Restaurant,Restaurant,Lebanese Restaurant,Pizza Place,Portuguese Restaurant,Punjabi Restaurant,Breakfast Spot,Diner,Fish & Chips Shop,Food,Vegetarian / Vegan Restaurant,Food Court
4,Aurangabad,Café,Indian Restaurant,Pizza Place,Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Dhaba,Diner,Donut Shop,Dumpling Restaurant,English Restaurant,Falafel Restaurant,Fish & Chips Shop,Creperie,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant


**10) Applying K-means clustering algorithm to the above data**

In [25]:
# set number of clusters
kclusters = 5

indian_restaurants_clustering = indian_restaurants_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(indian_restaurants_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([1, 2, 4, 2, 4, 2, 2, 2, 1, 2, 0, 2, 2, 3, 2, 2, 4, 3, 1, 1],
      dtype=int32)

In [26]:
# add clustering labels
indian_restaurants_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

indian_restaurants_merged = indian_cities

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
indian_restaurants_merged = indian_restaurants_merged.join(indian_restaurants_sorted.set_index('City'), on='City')

indian_restaurants_merged # check the last columns!

Unnamed: 0,City,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Mumbai,18.938771,72.835335,2,Indian Restaurant,Restaurant,Café,Seafood Restaurant,Bakery,Chinese Restaurant,Fast Food Restaurant,Donut Shop,Italian Restaurant,Pizza Place,Diner,Deli / Bodega,Food Truck,Food Court,Snack Place,Comfort Food Restaurant,Breakfast Spot,Mediterranean Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant
1,Delhi,28.651718,77.221939,2,Indian Restaurant,Café,Italian Restaurant,Restaurant,American Restaurant,Thai Restaurant,Bistro,Asian Restaurant,Bakery,South Indian Restaurant,Japanese Restaurant,Chinese Restaurant,BBQ Joint,French Restaurant,Burmese Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Deli / Bodega,Multicuisine Indian Restaurant,Donut Shop
2,Bangalore,12.97912,77.5913,2,Indian Restaurant,Café,Burger Joint,Bakery,Restaurant,Fast Food Restaurant,Snack Place,Vegetarian / Vegan Restaurant,Mexican Restaurant,Asian Restaurant,Sandwich Place,Creperie,Italian Restaurant,Breakfast Spot,Middle Eastern Restaurant,Pakistani Restaurant,Diner,Deli / Bodega,Afghan Restaurant,Pizza Place
3,Hyderabad,17.388786,78.461065,2,Indian Restaurant,Bakery,Café,Restaurant,BBQ Joint,Diner,South Indian Restaurant,Snack Place,Chinese Restaurant,Pizza Place,Mediterranean Restaurant,Italian Restaurant,Middle Eastern Restaurant,Deli / Bodega,Food,Food Court,Burger Joint,Hyderabadi Restaurant,Breakfast Spot,American Restaurant
4,Ahmedabad,23.021624,72.579707,2,Indian Restaurant,Café,Fast Food Restaurant,Restaurant,Sandwich Place,Vegetarian / Vegan Restaurant,Pizza Place,Diner,Food Court,Bakery,Snack Place,Fried Chicken Joint,Moroccan Restaurant,Chinese Restaurant,Italian Restaurant,Indian Sweet Shop,Mexican Restaurant,BBQ Joint,American Restaurant,Mediterranean Restaurant
5,Chennai,13.080172,80.283833,2,Indian Restaurant,Café,Seafood Restaurant,Italian Restaurant,Fast Food Restaurant,Pizza Place,Restaurant,Sandwich Place,Multicuisine Indian Restaurant,South Indian Restaurant,BBQ Joint,Burger Joint,Snack Place,Bakery,Modern European Restaurant,Asian Restaurant,Chinese Restaurant,Breakfast Spot,Middle Eastern Restaurant,North Indian Restaurant
6,Kolkata,22.545412,88.356775,2,Chinese Restaurant,Indian Restaurant,Café,Dhaba,Mughlai Restaurant,Pizza Place,Restaurant,Bakery,Indian Sweet Shop,Sandwich Place,Bengali Restaurant,Italian Restaurant,Fast Food Restaurant,Dumpling Restaurant,Fried Chicken Joint,Asian Restaurant,Awadhi Restaurant,Snack Place,BBQ Joint,Gastropub
7,Surat,45.9383,3.2553,2,French Restaurant,Restaurant,Italian Restaurant,Steakhouse,Fast Food Restaurant,Burger Joint,Japanese Restaurant,Bistro,Diner,Creperie,Pizza Place,Vietnamese Restaurant,Mediterranean Restaurant,Gastropub,Brasserie,Bakery,Fondue Restaurant,Tapas Restaurant,American Restaurant,Fish & Chips Shop
8,Pune,18.521428,73.854454,2,Café,Indian Restaurant,Fast Food Restaurant,Restaurant,Snack Place,Breakfast Spot,Asian Restaurant,Bakery,Italian Restaurant,South Indian Restaurant,Pizza Place,Maharashtrian Restaurant,Seafood Restaurant,American Restaurant,Food Truck,Dhaba,Bistro,English Restaurant,Mexican Restaurant,Chinese Restaurant
9,Jaipur,26.916194,75.820349,1,Café,Indian Restaurant,Bakery,Pizza Place,Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,BBQ Joint,Snack Place,Fast Food Restaurant,Sandwich Place,Turkish Restaurant,Food Court,Food,Deli / Bodega,Food Stand,Food Truck,Fondue Restaurant,Fish & Chips Shop,French Restaurant


**11) Visualizing the obtained clusters on the map**

In [27]:
address = 'India'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
lat_india = location.latitude
long_india = location.longitude
print('The geograpical coordinate of India are {}, {}.'.format(lat_india, long_india))

The geograpical coordinate of India are 22.3511148, 78.6677428.


In [28]:
# create map
map_clusters = folium.Map(location=[lat_india, long_india], zoom_start=5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(indian_restaurants_merged['latitude'], indian_restaurants_merged['longitude'], indian_restaurants_merged['City'], indian_restaurants_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**11) Details & summary of each cluster**

In [29]:
indian_restaurants_merged.loc[indian_restaurants_merged['Cluster Labels'] == 0, indian_restaurants_merged.columns[[1] + list(range(indian_restaurants.shape[1]))]]

Unnamed: 0,latitude,City,latitude.1,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
32,23.795281,Dhanbad,23.795281,86.430964,0,Pizza Place,Indian Restaurant,Asian Restaurant


In [30]:
df_cluster_1=indian_restaurants_merged.loc[indian_restaurants_merged['Cluster Labels'] == 1, indian_restaurants_merged.columns[[1] + list(range(indian_restaurants.shape[1]))]]
df_cluster_1.reset_index(drop=True)

Unnamed: 0,latitude,City,latitude.1,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,26.916194,Jaipur,26.916194,75.820349,1,Café,Indian Restaurant,Bakery
1,21.149813,Nagpur,21.149813,79.082056,1,Indian Restaurant,Fast Food Restaurant,Sandwich Place
2,17.723128,Visakhapatnam,17.723128,83.301284,1,Indian Restaurant,Café,Restaurant
3,27.175255,Agra,27.175255,78.009816,1,Indian Restaurant,Café,Fast Food Restaurant
4,20.011247,Nashik,20.011247,73.790236,1,Indian Restaurant,Café,Pizza Place
5,22.305199,Rajkot,22.305199,70.802834,1,Indian Restaurant,Fast Food Restaurant,Pizza Place
6,25.335649,Varanasi,25.335649,83.007629,1,Indian Restaurant,Café,Pizza Place
7,11.001812,Coimbatore,11.001812,76.962842,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Café
8,16.508759,Vijayawada,16.508759,80.61851,1,Indian Restaurant,Fast Food Restaurant,Café
9,26.296772,Jodhpur,26.296772,73.035143,1,Indian Restaurant,Restaurant,Café


In [31]:
print("% of Venue types in 1st most common restaurant types\n")
print((((df_cluster_1['1st Most Common Venue'].value_counts())/len(df_cluster_1))*100),str("%"))
print("\n\n")
print("% of Venue types in 2nd most common restaurant types\n")
print((((df_cluster_1['2nd Most Common Venue'].value_counts())/len(df_cluster_1))*100),str("%"))
print("\n\n")
print("% of Venue types in 3rd most common restaurant types\n")
print((((df_cluster_1['3rd Most Common Venue'].value_counts())/len(df_cluster_1))*100),str("%"))
print("\n\n")


% of Venue types in 1st most common restaurant types

Indian Restaurant    90.909091
Café                  9.090909
Name: 1st Most Common Venue, dtype: float64 %



% of Venue types in 2nd most common restaurant types

Café                             45.454545
Fast Food Restaurant             27.272727
Restaurant                        9.090909
Vegetarian / Vegan Restaurant     9.090909
Indian Restaurant                 9.090909
Name: 2nd Most Common Venue, dtype: float64 %



% of Venue types in 3rd most common restaurant types

Café                    27.272727
Pizza Place             27.272727
Restaurant              18.181818
Fast Food Restaurant     9.090909
Sandwich Place           9.090909
Bakery                   9.090909
Name: 3rd Most Common Venue, dtype: float64 %





In [32]:
df_cluster_2=indian_restaurants_merged.loc[indian_restaurants_merged['Cluster Labels'] == 2, indian_restaurants_merged.columns[[1] + list(range(indian_restaurants.shape[1]))]]
df_cluster_2.reset_index(drop=True)

Unnamed: 0,latitude,City,latitude.1,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,18.938771,Mumbai,18.938771,72.835335,2,Indian Restaurant,Restaurant,Café
1,28.651718,Delhi,28.651718,77.221939,2,Indian Restaurant,Café,Italian Restaurant
2,12.97912,Bangalore,12.97912,77.5913,2,Indian Restaurant,Café,Burger Joint
3,17.388786,Hyderabad,17.388786,78.461065,2,Indian Restaurant,Bakery,Café
4,23.021624,Ahmedabad,23.021624,72.579707,2,Indian Restaurant,Café,Fast Food Restaurant
5,13.080172,Chennai,13.080172,80.283833,2,Indian Restaurant,Café,Seafood Restaurant
6,22.545412,Kolkata,22.545412,88.356775,2,Chinese Restaurant,Indian Restaurant,Café
7,45.9383,Surat,45.9383,3.2553,2,French Restaurant,Restaurant,Italian Restaurant
8,18.521428,Pune,18.521428,73.854454,2,Café,Indian Restaurant,Fast Food Restaurant
9,19.194329,Thane,19.194329,72.970178,2,Indian Restaurant,Restaurant,Seafood Restaurant


In [33]:
print("% of Venue types in 1st most common restaurant types\n")
print((((df_cluster_2['1st Most Common Venue'].value_counts())/len(df_cluster_2))*100),str("%"))
print("\n\n")
print("% of Venue types in 2nd most common restaurant types\n")
print((((df_cluster_2['2nd Most Common Venue'].value_counts())/len(df_cluster_2))*100),str("%"))
print("\n\n")
print("% of Venue types in 3rd most common restaurant types\n")
print((((df_cluster_2['3rd Most Common Venue'].value_counts())/len(df_cluster_2))*100),str("%"))
print("\n\n")

% of Venue types in 1st most common restaurant types

Indian Restaurant     71.428571
Café                  19.047619
French Restaurant      4.761905
Chinese Restaurant     4.761905
Name: 1st Most Common Venue, dtype: float64 %



% of Venue types in 2nd most common restaurant types

Café                    33.333333
Restaurant              23.809524
Indian Restaurant       14.285714
Fast Food Restaurant     9.523810
Chinese Restaurant       4.761905
Bakery                   4.761905
Pakistani Restaurant     4.761905
Pizza Place              4.761905
Name: 2nd Most Common Venue, dtype: float64 %



% of Venue types in 3rd most common restaurant types

Café                    23.809524
Fast Food Restaurant    19.047619
Italian Restaurant      19.047619
Seafood Restaurant      14.285714
Fried Chicken Joint      4.761905
Indian Restaurant        4.761905
Bakery                   4.761905
Burger Joint             4.761905
Restaurant               4.761905
Name: 3rd Most Common Venue, dtype

In [34]:
df_cluster_3=indian_restaurants_merged.loc[indian_restaurants_merged['Cluster Labels'] == 3, indian_restaurants_merged.columns[[1] + list(range(indian_restaurants.shape[1]))]]
df_cluster_3.reset_index(drop=True)

Unnamed: 0,latitude,City,latitude.1,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,26.460914,Kanpur,26.460914,80.321759,3,Fast Food Restaurant,Café,Pizza Place
1,26.203725,Gwalior,26.203725,78.157363,3,Fast Food Restaurant,Café,Pizza Place
2,23.160894,Jabalpur,23.160894,79.94977,3,Café,Chinese Restaurant,Asian Restaurant
3,21.237947,Raipur,21.237947,81.633683,3,Café,Pizza Place,Fast Food Restaurant


In [35]:
print("% of Venue types in 1st most common restaurant types\n")
print((((df_cluster_3['1st Most Common Venue'].value_counts())/len(df_cluster_3))*100),str("%"))
print("\n\n")
print("% of Venue types in 2nd most common restaurant types\n")
print((((df_cluster_3['2nd Most Common Venue'].value_counts())/len(df_cluster_3))*100),str("%"))
print("\n\n")
print("% of Venue types in 3rd most common restaurant types\n")
print((((df_cluster_3['3rd Most Common Venue'].value_counts())/len(df_cluster_3))*100),str("%"))
print("\n\n")


% of Venue types in 1st most common restaurant types

Fast Food Restaurant    50.0
Café                    50.0
Name: 1st Most Common Venue, dtype: float64 %



% of Venue types in 2nd most common restaurant types

Café                  50.0
Pizza Place           25.0
Chinese Restaurant    25.0
Name: 2nd Most Common Venue, dtype: float64 %



% of Venue types in 3rd most common restaurant types

Pizza Place             50.0
Asian Restaurant        25.0
Fast Food Restaurant    25.0
Name: 3rd Most Common Venue, dtype: float64 %





In [36]:
df_cluster_4=indian_restaurants_merged.loc[indian_restaurants_merged['Cluster Labels'] == 4, indian_restaurants_merged.columns[[1] + list(range(indian_restaurants.shape[1]))]]
df_cluster_4.reset_index(drop=True)

Unnamed: 0,latitude,City,latitude.1,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,26.8381,Lucknow,26.8381,80.9346,4,Indian Restaurant,Fast Food Restaurant,Café
1,22.720362,Indore,22.720362,75.8682,4,Indian Restaurant,Fast Food Restaurant,Café
2,25.609324,Patna,25.609324,85.123525,4,Café,Pizza Place,Indian Restaurant
3,22.297314,Vadodara,22.297314,73.194257,4,Indian Restaurant,Café,Fast Food Restaurant
4,30.909016,Ludhiana,30.909016,75.851601,4,Fast Food Restaurant,Indian Restaurant,Café
5,19.877263,Aurangabad,19.877263,75.339024,4,Café,Indian Restaurant,Pizza Place
6,25.43813,Allahabad,25.43813,81.8338,4,Fast Food Restaurant,Pizza Place,Indian Restaurant
7,23.370035,Ranchi,23.370035,85.325013,4,Indian Restaurant,Café,Pizza Place


In [37]:
print("% of Venue types in 1st most common restaurant types\n")
print((((df_cluster_4['1st Most Common Venue'].value_counts())/len(df_cluster_4))*100),str("%"))
print("\n\n")
print("% of Venue types in 2nd most common restaurant types\n")
print((((df_cluster_4['2nd Most Common Venue'].value_counts())/len(df_cluster_4))*100),str("%"))
print("\n\n")
print("% of Venue types in 3rd most common restaurant types\n")
print((((df_cluster_4['3rd Most Common Venue'].value_counts())/len(df_cluster_4))*100),str("%"))
print("\n\n")


% of Venue types in 1st most common restaurant types

Indian Restaurant       50.0
Fast Food Restaurant    25.0
Café                    25.0
Name: 1st Most Common Venue, dtype: float64 %



% of Venue types in 2nd most common restaurant types

Fast Food Restaurant    25.0
Pizza Place             25.0
Café                    25.0
Indian Restaurant       25.0
Name: 2nd Most Common Venue, dtype: float64 %



% of Venue types in 3rd most common restaurant types

Café                    37.5
Pizza Place             25.0
Indian Restaurant       25.0
Fast Food Restaurant    12.5
Name: 3rd Most Common Venue, dtype: float64 %



