# Indian Cities And What Different People Could Get From Them

## Introduction

#### Background
India is a developing country with a very diverse group of people. Along with the state in which it is in development, population is also a major factor of how different categories of people tend to take decisions concerning various situations from the decision of finding a good place/city to live pertaining to their needs, to the decision of how a particular city might affect a certain business or a company. In a developing country like India, these decisions become very crucial in transforming the lives of people and the overall growth of country as a whole. 

#### Problems / Questions To Consider
Different categories of people like normal people who might want to transit to a different city based on how the other city is similar to another, which business could be more profitable in a particular city or how the city's current state might affect one's decision or business, view the city or a region with different perspectives for obvious reasons. So, it becomes important for them to have a clear idea of how a particular city is ideally suited for their needs. It becomes important to know as to how the overall cities of India are distributed in terms of development and demands pertaining to different groups of people and how they can fulfil their respective needs by knowing the current state of where these cities stand and potentials that these cities have for satisfying their needs. So, to reveal the insights for such people seeking answers, this project takes into account the venues of the city as a way of identifying patterns of how certain cities are similar to one another and what their distribution is in terms of various factors that have impact on people's decisions. This project in a way, also lays emphasis on how a particular city may be lacking in terms of a particular aspect, because this also plays an important role in getting important insights of their characteristics.

#### Audience
The primary audience of this investigational project might include, people who want to take decision to make a transit to another city, depending on how that another city is similar to his/her current city of residence, or how the another city may be better in terms of standard of living and various opportunities it might have to offer, or businessmen who want to know how a particular city could be potentially an ideal centre for a certain business to be pursued there.
This project as a whole, based on how various cities are different and or similar to one another, provides recommendations to a supposedly bigger audience based on what their perspectives and needs are.

## Data

The dataset required for this project and analysis has been gathered from Kaggle and it is curated by merging the census 2011 of Indian Cities with Population more than 1 Lac and City wise number of Graduates from the Census 2011, to create a visualization of where the future cities of India stands today. It lists the top 500 Indian cities.<br>
Link to source of dataset : <a href="https://www.kaggle.com/zed9941/top-500-indian-cities">click here</a>

<h3>Importing the required libraries</h3>

In [185]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim

import requests # library to handle requests

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from sklearn.cluster import KMeans # import k-means from clustering stage

import folium  # map rendering library

import bs4  # beautiful soup library for web scraping purposes

<h3>Preprocessing / Preparation:</h3>

Grabbing the dataset and converting it into a relevant dataframe:

In [205]:
df_ind=pd.read_csv('datasets_557_1096_cities_r2.csv')

In [206]:
df_ind.head()

Unnamed: 0,name_of_city,state_code,state_name,dist_code,population_total,population_male,population_female,0-6_population_total,0-6_population_male,0-6_population_female,literates_total,literates_male,literates_female,sex_ratio,child_sex_ratio,effective_literacy_rate_total,effective_literacy_rate_male,effective_literacy_rate_female,location,total_graduates,male_graduates,female_graduates
0,Abohar,3,PUNJAB,9,145238,76840,68398,15870,8587,7283,103319,58347,44972,890,848,79.86,85.49,73.59,"30.1452928,74.1993043",16287,8612,7675
1,Achalpur,27,MAHARASHTRA,7,112293,58256,54037,11810,6186,5624,92433,49347,43086,928,909,91.99,94.77,89.0,"21.257584,77.5086754",8863,5269,3594
2,Adilabad,28,ANDHRA PRADESH,1,117388,59232,58156,13103,6731,6372,83955,46295,37660,982,947,80.51,88.18,72.73,"19.0809075,79.560344",10565,6797,3768
3,Adityapur,20,JHARKHAND,24,173988,91495,82493,23042,12063,10979,125985,71470,54515,902,910,83.46,89.98,76.23,"22.7834741,86.1576889",19225,12189,7036
4,Adoni,28,ANDHRA PRADESH,21,166537,82743,83794,18406,9355,9051,101292,56203,45089,1013,968,68.38,76.58,60.33,"15.6322227,77.2728368",11902,7871,4031


In [218]:
# let's sort the name of the cities in descending order based on the total population of those cities.
df_ind.sort_values('population_total',ascending=False,inplace=True)
df_ind.reset_index(inplace=True)
df_ind.head()

Unnamed: 0,index,name_of_city,state_code,state_name,dist_code,population_total,population_male,population_female,0-6_population_total,0-6_population_male,0-6_population_female,literates_total,literates_male,literates_female,sex_ratio,child_sex_ratio,effective_literacy_rate_total,effective_literacy_rate_male,effective_literacy_rate_female,location,total_graduates,male_graduates,female_graduates
0,185,Greater Mumbai,27,MAHARASHTRA,99,12478447,6736815,5741632,1139146,599007,540139,10237586,5727774,4509812,852,902,90.28,93.32,86.7,"19.0760,72.8777",1802371,964964,837407
1,141,Delhi,7,NCT OF DELHI,99,11007835,5871362,5136473,1209275,647938,561337,8583105,4776490,3806615,875,866,87.6,91.44,83.2,"28.7041,77.1025",2221137,1210040,1011097
2,72,Bengaluru,29,KARNATAKA,18,8425970,4401299,4024671,862493,444639,417854,6775942,3664959,3110983,914,940,89.59,92.63,86.25,"12.9716,77.5946",1591163,908363,682800
3,184,Greater Hyderabad,28,ANDHRA PRADESH,99,6809970,3500802,3309168,725816,373794,352022,5047705,2688111,2359594,945,942,82.96,85.96,79.79,"17.3850,78.4867",1164149,685402,478747
4,7,Ahmadabad,24,GUJARAT,7,5570585,2935869,2634716,589076,317917,271159,4464303,2459823,2004480,897,853,89.62,93.96,84.81,"23.022505,72.5713621",769858,435267,334591


In [219]:
df_ind.shape

(493, 23)

We can see in above dataframe, the names of the top 500 cities sorted according to the total population, along with the potential factors and indicators as to where the future cities of India stands today.
Now, for our analysis, we are particularly concerned about the name of the cities along with their latitude and longitude values. So we'll extract these informations from the above dataframe and accordingly generate a new dataframe.

In [220]:
# define the dataframe columns.
column_names=['City','Latitude','Longitude']

# instantiate the dataframe
ind_cities=pd.DataFrame(columns=column_names)
ind_cities

Unnamed: 0,City,Latitude,Longitude


In [221]:
# building our new dataframe.

ind_cities['City']=df_ind['name_of_city']

# getting the 'location' values from df_ind and appending them into the list
loc_vals=[]
for index,rows in df_ind.iterrows():
    temp=rows['location']
    temp=temp.split(',')
    loc_vals.append(temp)

# generating the separate lists for latitude and longitude values
lats=[]
lons=[]
for rows in loc_vals:
    lats.append(float(rows[0]))
    lons.append(float(rows[1]))
    
# feeding them into the dataframe.
ind_cities['Latitude']=lats
ind_cities['Longitude']=lons

In [222]:
ind_cities.head(10)

Unnamed: 0,City,Latitude,Longitude
0,Greater Mumbai,19.076,72.8777
1,Delhi,28.7041,77.1025
2,Bengaluru,12.9716,77.5946
3,Greater Hyderabad,17.385,78.4867
4,Ahmadabad,23.022505,72.571362
5,Chennai,13.08268,80.270718
6,Kolkata,22.572646,88.363895
7,Surat,21.17024,72.831061
8,Pune,18.52043,73.856744
9,Jaipur,26.912434,75.787271


So now, we got our required dataframe using which we would propagate our analysis further.

Visualize the cities by mapping them out:

In [196]:
# Use geopy library to get the latitude and longitude values of India
address = 'India'

geolocator = Nominatim(user_agent="indian_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of India are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of India are 22.3511148, 78.6677428.


In [223]:
# create map of India using latitude and longitude values
map_india = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, name in zip(ind_cities['Latitude'], ind_cities['Longitude'],ind_cities['City']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_india)  
    
map_india

Now we are in a position to retrieve data from the Foursqaure API

In [224]:
CLIENT_ID = 'GBEGDDOOT1441SIX3IVP5YIW4IH34KBVW4UYDQZ22XA2GBUY' # your Foursquare ID
CLIENT_SECRET = 'FFLO4QO2HOVI3CME2KCC5YGQTYYKOUH24QBB5MGLPMFAPDKS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GBEGDDOOT1441SIX3IVP5YIW4IH34KBVW4UYDQZ22XA2GBUY
CLIENT_SECRET:FFLO4QO2HOVI3CME2KCC5YGQTYYKOUH24QBB5MGLPMFAPDKS


Let's explore the first city in the dataframe to check if everyhting is fine:

In [230]:
city_latitude = ind_cities.loc[0, 'Latitude'] # City latitude value
city_longitude = ind_cities.loc[0, 'Longitude'] # City longitude value

city_name = ind_cities.loc[0, 'City'] # Name

print('Latitude and longitude values of {}are: {}, {}.'.format(city_name,
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Greater Mumbai are: 19.076, 72.8777.


In [231]:
# Get the top 100 venues within the default city radius
LIMIT = 100 # limit of number of venues returned by Foursquare API

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed0e15a6d8c56001b8d8600'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'suggestedRadius': 2323,
  'headerLocation': 'Mumbai',
  'headerFullLocation': 'Mumbai',
  'headerLocationGranularity': 'city',
  'totalResults': 140,
  'suggestedBounds': {'ne': {'lat': 19.098228803806727,
    'lng': 72.9004750523354},
   'sw': {'lat': 19.057313626941358, 'lng': 72.85927090352504}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54d38a72498e7d4deee4e4c4',
       'name': 'The Bar Stock Exchange',
       'location': {'address': 'BKC',
        'lat': 19.071165828845604,
        'lng': 72.87635864580997,
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.071

Everything seems to be correct upto this point.<br>
Lets extract the category of a venue through below defined function.

In [232]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the json and structuring it into a pandas dataframe:

In [233]:
# Clean the data and structure it as a dataframe
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Bar Stock Exchange,Bar,19.071166,72.876359
1,Hitchki,Bar,19.06973,72.869761
2,Sofitel Mumbai BKC,Hotel,19.067448,72.869006
3,Starbucks Coffee: A Tata Alliance,Coffee Shop,19.069457,72.869375
4,Masala Library,Indian Restaurant,19.068931,72.869738


We now carry out the same process for all of our cities:

In [234]:
# Create a function to repeat the same process to all cities
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [235]:
# Run the function on each city and store in new dataframe
ind_venues = getNearbyVenues(names=ind_cities['City'],
                                   latitudes=ind_cities['Latitude'],
                                   longitudes=ind_cities['Longitude'])

Greater Mumbai 
Delhi 
Bengaluru
Greater Hyderabad 
Ahmadabad 
Chennai 
Kolkata 
Surat 
Pune 
Jaipur 
Lucknow 
Kanpur 
Nagpur 
Indore 
Thane 
Bhopal 
Visakhapatnam
Pimpri Chinchwad 
Patna 
Vadodara 
Ludhiana 
Agra 
Nashik 
Faridabad 
Meerut 
Rajkot 
Vasai Virar City 
Varanasi 
Srinagar 
Aurangabad 
Dhanbad 
Amritsar 
Navi Mumbai 
Allahabad 
Ranchi 
Haora 
Coimbatore 
Jabalpur 
Gwalior 
Vijayawada 
Jodhpur 
Madurai 
Raipur 
Kota 
Guwahati 
Chandigarh 
Solapur 
Hubli-Dharwad 
Bareilly 
Moradabad 
Mysore 
Gurgaon 
Aligarh 
Jalandhar 
Tiruchirappalli 
Bhubaneswar Town 
Salem 
Mira Bhayander 
Thiruvananthapuram 
Bhiwandi 
Saharanpur 
Gorakhpur 
Guntur 
Bikaner 
Amravati 
Noida 
Jamshedpur 
Bhilai Nagar 
Warangal 
Cuttack 
Firozabad 
Kochi 
Bhavnagar 
Dehradun 
Durgapur 
Asansol 
Nanded Waghala 
Kolhapur 
Ajmer 
Gulbarga 
Jamnagar 
Ujjain 
Loni 
Siliguri 
Jhansi 
Ulhasnagar 
Nellore 
Jammu 
Sangli Miraj Kupwad 
Belgaum 
Mangalore 
Ambattur 
Tirunelveli 
Malegaon 
Gaya 
Jalgaon 
Udaipur 
Mahe

In [295]:
#save the result in case we need it.
ind_venues.to_csv('IndianVenues.csv',sep=',',index=False)

In [296]:
# Have a look at the venues dataframe that we generated and its shape
print(ind_venues.shape)
ind_venues.head()

(1163, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Greater Mumbai,19.076,72.8777,Delhi Zaika,19.077054,72.87826,Indian Restaurant
1,Greater Mumbai,19.076,72.8777,Pizza Hut,19.075984,72.877656,Pizza Place
2,Greater Mumbai,19.076,72.8777,Nawab Sheek Corner,19.076933,72.87826,Middle Eastern Restaurant
3,Greater Mumbai,19.076,72.8777,Sahara Restaurant,19.079532,72.880152,Mughlai Restaurant
4,Greater Mumbai,19.076,72.8777,Mithi Nadi,19.076005,72.87468,River


Number of venues that were returned for each city by the foursqaure api:

In [297]:
ind_venues.groupby('City').count().head()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abohar,6,6,6,6,6,6
Adityapur,1,1,1,1,1,1
Agartala,3,3,3,3,3,3
Agra,5,5,5,5,5,5
Ahmadabad,14,14,14,14,14,14


Number of unique categories that we got:

In [298]:
print('There are {} unique categories.'.format(len(ind_venues['Venue Category'].unique())))

There are 185 unique categories.


<h3>Analyzing the cities</h3>

Now we have prepared our data, and we now move towards analyzing it.
First we need to calculate the average frequency for each venue category across each city. We can quickly do this with a Pandas dataframe by converting each venue category into a boolean column using One-hot encoding.

In [299]:
# one hot encoding
ind_onehot = pd.get_dummies(ind_venues[['Venue Category']], prefix="", prefix_sep="")

# Add city column back to dataframe
ind_onehot['City'] = ind_venues['City'] 

# move city column to the first column
fixed_columns = [ind_onehot.columns[-1]] + list(ind_onehot.columns[:-1])
ind_onehot = ind_onehot[fixed_columns]

# Check size of new dataframe
ind_onehot.shape

(1163, 185)

As expected, the number of columns equals the total number of unique categories the we curated.

Next we'll group rows by city mean of frequency for each category.

In [300]:
ind_grouped = ind_onehot.groupby('City').mean().reset_index()
ind_grouped

Unnamed: 0,City,Zoo,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Service,American Restaurant,Andhra Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bank,Bar,Bed & Breakfast,Belgian Restaurant,Bengali Restaurant,Big Box Store,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Buffet,Burger Joint,Bus Station,Business Service,Café,Camera Store,Campground,Cantonese Restaurant,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Cupcake Shop,Currency Exchange,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Fabric Shop,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Grocery Store,Gujarati Restaurant,Gym,Gym / Fitness Center,Halal Restaurant,Health & Beauty Service,Heliport,Historic Site,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karnataka Restaurant,Kebab Restaurant,Lake,Light Rail Station,Lighting Store,Lounge,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Venue,National Park,Neighborhood,Nightclub,North Indian Restaurant,Optical Shop,Outdoors & Recreation,Outlet Store,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Photography Studio,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Rajasthani Restaurant,Rental Car Location,Resort,Restaurant,River,Sake Bar,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,South Indian Restaurant,Spa,Spiritual Center,Stadium,Supermarket,Taco Place,Tea Room,Tennis Stadium,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,Abohar,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adityapur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agartala,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Agra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ahmadabad,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.357143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ahmadnagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Aizawl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ajmer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Akbarpur,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Akola,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [301]:
ind_grouped.shape

(352, 185)

finding the top 5 venues for all cities we are examinig. These venues would be particularly responsible for finding tyhe clustering patterns between the cities:

In [302]:
num_top_venues = 5

for city in ind_grouped['City']:
    print("----"+city+"----")
    temp = ind_grouped[ind_grouped['City'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abohar ----
                     venue  freq
0                      ATM  0.17
1                     Park  0.17
2        Martial Arts Dojo  0.17
3  Health & Beauty Service  0.17
4         Asian Restaurant  0.17


----Adityapur ----
                 venue  freq
0    Indian Restaurant   1.0
1                  Zoo   0.0
2          Music Venue   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Agartala ----
               venue  freq
0                ATM  0.33
1              Hotel  0.33
2        Pizza Place  0.33
3          Multiplex  0.00
4  Mobile Phone Shop  0.00


----Agra ----
               venue  freq
0  Indian Restaurant   0.2
1              Hotel   0.2
2     Clothing Store   0.2
3      Tour Provider   0.2
4           Tea Room   0.2


----Ahmadabad ----
               venue  freq
0              Hotel  0.50
1  Indian Restaurant  0.36
2        Snack Place  0.07
3         Restaurant  0.07
4                Zoo  0.00


----Ahmadnagar ----
            venue  freq
0 

4  Multicuisine Indian Restaurant  0.00


----Barddhaman ----
                            venue  freq
0                             ATM  0.33
1                     Pizza Place  0.33
2               Convenience Store  0.33
3   Paper / Office Supplies Store  0.00
4  Multicuisine Indian Restaurant  0.00


----Bareilly ----
                 venue  freq
0        Women's Store   1.0
1            Multiplex   0.0
2    Mobile Phone Shop   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Barrackpur ----
                            venue  freq
0                             ATM   0.6
1                        Pharmacy   0.2
2  Multicuisine Indian Restaurant   0.2
3                     Music Venue   0.0
4             Monument / Landmark   0.0


----Barshi ----
                 venue  freq
0    Mobile Phone Shop   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Basirhat ----
                 venue  freq
0

            venue  freq
0     Bus Station  0.33
1  Ice Cream Shop  0.33
2     Coffee Shop  0.33
3             Zoo  0.00
4           Motel  0.00


----Ganganagar ----
                 venue  freq
0                Diner   1.0
1          Music Venue   0.0
2  Monument / Landmark   0.0
3                Motel   0.0
4      Motorcycle Shop   0.0


----Gangawati ----
                 venue  freq
0             Pharmacy   1.0
1            Multiplex   0.0
2    Mobile Phone Shop   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Gaya ----
                 venue  freq
0    Convenience Store   0.5
1      Motorcycle Shop   0.5
2          Music Venue   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Ghazipur ----
                 venue  freq
0                  ATM   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----Gokal Pur ----
                 venue  freq
0          Pizza Place   1.0
1 

                 venue  freq
0    Mobile Phone Shop   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Jalpaiguri ----
                 venue  freq
0                  ATM   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----Jamalpur ----
                 venue  freq
0        Train Station   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----Jammu ----
               venue  freq
0      Shopping Mall  0.25
1              Hotel  0.25
2  Indian Restaurant  0.25
3        Pizza Place  0.25
4          Multiplex  0.00


----Jamnagar ----
                 venue  freq
0    Indian Restaurant   0.5
1            Multiplex   0.5
2                  Zoo   0.0
3          Music Venue   0.0
4  Monument / Landmark   0.0


----Jamshedpur ----
                  venue  freq
0          

                 venue  freq
0       Breakfast Spot   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Madavaram ----
               venue  freq
0                ATM   0.2
1          Bookstore   0.2
2         Smoke Shop   0.2
3  Food & Drink Shop   0.2
4        Bus Station   0.2


----Madhyamgram ----
                 venue  freq
0             Pharmacy   0.5
1                Plaza   0.5
2                  Zoo   0.0
3            Multiplex   0.0
4  Monument / Landmark   0.0


----Madurai ----
                 venue  freq
0    Indian Restaurant  0.75
1                Hotel  0.25
2                  Zoo  0.00
3          Music Venue  0.00
4  Monument / Landmark  0.00


----Maheshtala ----
                 venue  freq
0                  ATM   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----Malegaon ----
                     venue  freq
0  N

               venue  freq
0  Indian Restaurant  0.33
1             Market  0.33
2          Gift Shop  0.33
3                Zoo  0.00
4          Multiplex  0.00


----Noida ----
               venue  freq
0        Pizza Place  0.50
1              Diner  0.17
2  Electronics Store  0.17
3  Mobile Phone Shop  0.17
4          Multiplex  0.00


----North Barrackpur ----
                 venue  freq
0                  ATM   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----North Dum Dum ----
               venue  freq
0                ATM  0.33
1      Train Station  0.33
2   Department Store  0.33
3          Multiplex  0.00
4  Mobile Phone Shop  0.00


----Ongole ----
               venue  freq
0      Shopping Mall  0.50
1              Hotel  0.17
2  Indian Restaurant  0.17
3           Mountain  0.17
4                Zoo  0.00


----Palakkad ----
                       venue  freq
0  Indian Chinese Restaurant   0.2


4  Monument / Landmark   0.0


----Robertson Pet ----
                 venue  freq
0         Costume Shop   0.5
1    Electronics Store   0.5
2                  Zoo   0.0
3            Multiplex   0.0
4  Monument / Landmark   0.0


----Rohtak ----
                 venue  freq
0    Indian Restaurant  0.75
1            Multiplex  0.25
2                  Zoo  0.00
3          Music Venue  0.00
4  Monument / Landmark  0.00


----Roorkee ----
                            venue  freq
0                   Grocery Store  0.67
1                             ATM  0.33
2  Multicuisine Indian Restaurant  0.00
3               Mobile Phone Shop  0.00
4             Monument / Landmark  0.00


----Rudrapur ----
                 venue  freq
0        Women's Store  0.33
1          Pizza Place  0.33
2    Convenience Store  0.33
3            Multiplex  0.00
4  Monument / Landmark  0.00


----Sagar ----
                 venue  freq
0         Dessert Shop   1.0
1                  Zoo   0.0
2          Music Venue 

4  Fast Food Restaurant   0.2


----Tiruppur ----
              venue  freq
0    Clothing Store   0.2
1      Dessert Shop   0.1
2  Asian Restaurant   0.1
3              Food   0.1
4          Platform   0.1


----Tiruvannamalai ----
                            venue  freq
0                           Hotel   0.5
1                   Shopping Mall   0.5
2                             Zoo   0.0
3  Multicuisine Indian Restaurant   0.0
4               Mobile Phone Shop   0.0


----Tiruvottiyur ----
                 venue  freq
0                  ATM   1.0
1                  Zoo   0.0
2            Multiplex   0.0
3    Mobile Phone Shop   0.0
4  Monument / Landmark   0.0


----Titagarh ----
                 venue  freq
0          IT Services   1.0
1            Multiplex   0.0
2    Mobile Phone Shop   0.0
3  Monument / Landmark   0.0
4                Motel   0.0


----Tonk ----
                 venue  freq
0          IT Services   1.0
1            Multiplex   0.0
2    Mobile Phone Shop   0.0
3  M

Sorting the venues and structuring it for further processing: 

In [283]:
# Function to sort venues in decscending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [303]:
# Create a dataframe with top 5 venues for each city
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
cities_venues_sorted = pd.DataFrame(columns=columns)
cities_venues_sorted['City'] =ind_grouped['City']

for ind in np.arange(ind_grouped.shape[0]):
    cities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ind_grouped.iloc[ind, :], num_top_venues)

cities_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abohar,Park,Health & Beauty Service,Martial Arts Dojo,Café,Asian Restaurant
1,Adityapur,Indian Restaurant,Women's Store,Dhaba,Food,Flea Market
2,Agartala,Hotel,ATM,Pizza Place,Dhaba,Food
3,Agra,Hotel,Clothing Store,Indian Restaurant,Tour Provider,Tea Room
4,Ahmadabad,Hotel,Indian Restaurant,Snack Place,Restaurant,Diner


In [340]:
cities_venues_sorted.shape

(352, 7)

<h3>Clustering the cities</h3>
Now that we have done all our preprocessing and preparation stuffs, it is time to cluster the cities of our dataframe based on the top 5 venues to get a similarity pattern between the cities:

In [318]:
# Run K-means to break up into clusters
kclusters =6

ind_grouped_clustering = ind_grouped.drop('City', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ind_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 3, 3, 3, 1, 1, 1, 0, 1])

Generating a new dataframe that includes the cluster as well as the top 5 venues for each city.

In [320]:
# Create dataframe that includes the cluster and top 5 venues

# Add clustering labels
cities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ind_merged = ind_cities

# Merge northwest_grouped with northwest_data to add latitude/longitude for each city
ind_merged = ind_merged.join(cities_venues_sorted.set_index('City'), on='City')

# Drop cities with no venue data
ind_merged = ind_merged.dropna()

ind_merged

Unnamed: 0,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Greater Mumbai,19.076,72.8777,3.0,Indian Restaurant,Middle Eastern Restaurant,River,Pizza Place,Diner
1,Delhi,28.7041,77.1025,3.0,Fast Food Restaurant,Indian Restaurant,Music Venue,Café,Burger Joint
2,Bengaluru,12.9716,77.5946,3.0,Italian Restaurant,Mexican Restaurant,Hotel,Café,Lounge
3,Greater Hyderabad,17.385,78.4867,3.0,Bus Station,Chaat Place,Coffee Shop,Cosmetics Shop,Costume Shop
4,Ahmadabad,23.022505,72.571362,4.0,Hotel,Indian Restaurant,Snack Place,Restaurant,Diner
5,Chennai,13.08268,80.270718,3.0,Indian Restaurant,Hotel,Bookstore,Stadium,Metro Station
6,Kolkata,22.572646,88.363895,3.0,Juice Bar,Bookstore,Café,Plaza,Dhaba
7,Surat,21.17024,72.831061,3.0,Shopping Plaza,Diner,Food & Drink Shop,Food,Flea Market
8,Pune,18.52043,73.856744,3.0,Historic Site,Performing Arts Venue,Ice Cream Shop,Indian Restaurant,Bakery
9,Jaipur,26.912434,75.787271,4.0,Hotel,Indian Restaurant,Dessert Shop,Restaurant,Café


Finally, have a look at our resulting clusters:

In [321]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=5)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ind_merged['Latitude'], ind_merged['Longitude'], ind_merged['City'], ind_merged['Cluster Labels']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>Examining the clusters</h3>

<h4>Cluster 1:</h4>

In [335]:
ind_merged.loc[ind_merged['Cluster Labels'] == 0, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
38,Gwalior,Train Station,Coffee Shop,Gastropub,Diner,Food
73,Dehradun,Train Station,IT Services,Café,Women's Store,Diner
88,Sangli Miraj Kupwad,Train Station,Women's Store,Dhaba,Food,Flea Market
111,Muzaffarnagar,Train Station,Shoe Store,Restaurant,Lighting Store,Bakery
117,Korba,Pizza Place,Women's Store,Dhaba,Food,Flea Market
141,Parbhani,Train Station,Women's Store,Dhaba,Food,Flea Market
174,North Dum Dum,Train Station,ATM,Department Store,Women's Store,Diner
185,Mirzapur-cum-Vindhyachal,Train Station,Women's Store,Dhaba,Food,Flea Market
291,Batala,Pizza Place,Women's Store,Dhaba,Food,Flea Market
305,Hindupur,Train Station,Pharmacy,Light Rail Station,Women's Store,Dhaba


<h4>Cluster 2:</h4>

In [336]:
ind_merged.loc[ind_merged['Cluster Labels'] == 1, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
33,Allahabad,ATM,Bank,Women's Store,Donut Shop,Food & Drink Shop
44,Guwahati,ATM,Women's Store,Diner,Food & Drink Shop,Food
75,Asansol,ATM,Women's Store,Diner,Food & Drink Shop,Food
91,Ambattur,ATM,Soccer Field,Women's Store,Diner,Food
97,Maheshtala,ATM,Women's Store,Diner,Food & Drink Shop,Food
148,Bally,ATM,Women's Store,Diner,Food & Drink Shop,Food
170,Arrah,ATM,Athletics & Sports,Women's Store,Donut Shop,Food & Drink Shop
180,Tiruvottiyur,ATM,Women's Store,Diner,Food & Drink Shop,Food
200,Naihati,ATM,Women's Store,Diner,Food & Drink Shop,Food
218,Hospet,ATM,Fabric Shop,Women's Store,Diner,Food & Drink Shop


<h4>Cluster 3:</h4>

In [337]:
ind_merged.loc[ind_merged['Cluster Labels'] == 2, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,Nashik,Indian Restaurant,Women's Store,Dhaba,Food,Flea Market
41,Madurai,Indian Restaurant,Hotel,Diner,Food & Drink Shop,Food
46,Solapur,Pool Hall,Indian Restaurant,Dhaba,Food,Flea Market
79,Gulbarga,Shopping Mall,Indian Restaurant,Diner,Food & Drink Shop,Food
80,Jamnagar,Multiplex,Indian Restaurant,Dhaba,Food,Flea Market
107,Patiala,Indian Restaurant,Hotel,Market,Diner,Food
116,Rohtak,Indian Restaurant,Multiplex,Dhaba,Food,Flea Market
143,Hisar,Indian Restaurant,Train Station,Art Gallery,Women's Store,Diner
150,Dewas,Indian Restaurant,Movie Theater,Women's Store,Diner,Food
154,Bathinda,Indian Restaurant,Shoe Store,Women's Store,Diner,Food & Drink Shop


<h4>Cluster 4:</h4>

In [338]:
ind_merged.loc[ind_merged['Cluster Labels'] == 3, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Greater Mumbai,Indian Restaurant,Middle Eastern Restaurant,River,Pizza Place,Diner
1,Delhi,Fast Food Restaurant,Indian Restaurant,Music Venue,Café,Burger Joint
2,Bengaluru,Italian Restaurant,Mexican Restaurant,Hotel,Café,Lounge
3,Greater Hyderabad,Bus Station,Chaat Place,Coffee Shop,Cosmetics Shop,Costume Shop
5,Chennai,Indian Restaurant,Hotel,Bookstore,Stadium,Metro Station
6,Kolkata,Juice Bar,Bookstore,Café,Plaza,Dhaba
7,Surat,Shopping Plaza,Diner,Food & Drink Shop,Food,Flea Market
8,Pune,Historic Site,Performing Arts Venue,Ice Cream Shop,Indian Restaurant,Bakery
10,Lucknow,Café,Fast Food Restaurant,Hotel,Restaurant,Neighborhood
11,Kanpur,Café,Dessert Shop,Women's Store,Donut Shop,Food & Drink Shop


<h4>Cluster 5:</h4>

In [339]:
ind_merged.loc[ind_merged['Cluster Labels'] == 4, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Ahmadabad,Hotel,Indian Restaurant,Snack Place,Restaurant,Diner
9,Jaipur,Hotel,Indian Restaurant,Dessert Shop,Restaurant,Café
16,Visakhapatnam,Hotel,Jewelry Store,Diner,Food & Drink Shop,Food
31,Amritsar,Hotel,Indian Restaurant,Shoe Store,Monument / Landmark,Dhaba
43,Kota,Hotel,Belgian Restaurant,Donut Shop,Food & Drink Shop,Food
61,Gorakhpur,Hotel,Fast Food Restaurant,Diner,Food & Drink Shop,Food
77,Kolhapur,Hotel,Diner,Italian Restaurant,Department Store,Food & Drink Shop
109,Agartala,Hotel,ATM,Pizza Place,Dhaba,Food
120,Muzaffarpur,Hotel,Pizza Place,Lighting Store,Motorcycle Shop,Donut Shop
132,Shimoga,Hotel,Indian Restaurant,Food,Bus Station,Convention Center


<h4>Cluster 6:</h4>

In [341]:
ind_merged.loc[ind_merged['Cluster Labels'] == 5, ind_merged.columns[[0] + list(range(4,ind_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
49,Moradabad,Pharmacy,Women's Store,Dhaba,Food,Flea Market
124,Avadi,Pharmacy,Women's Store,Dhaba,Food,Flea Market
217,Kharagpur,Pharmacy,Art Gallery,Women's Store,Dhaba,Food
228,Madhyamgram,Pharmacy,Plaza,Women's Store,Dhaba,Flea Market
272,Medinipur,Pharmacy,Women's Store,Dhaba,Food,Flea Market
334,Chitradurga,Pharmacy,Women's Store,Dhaba,Food,Flea Market
381,Ashoknagar Kalyangarh,Pharmacy,Women's Store,Dhaba,Food,Flea Market
403,Gudivada,Pharmacy,Mattress Store,Women's Store,Diner,Food
464,Gangawati,Pharmacy,Women's Store,Dhaba,Food,Flea Market
486,Kalyani,Pharmacy,Women's Store,Dhaba,Food,Flea Market


<h3>Conlcusions:</h3>

The very first thing to notice about our analysis is that, although we have taken into account top 493 cities based on population, but the total number of unique venues that the foursquare api returned is much less when this number is viewd in terms of the number of cities that we took into account. From our clustering algorithm that we ran on our dataset, we could oberve that, the cities which are falling in the cluster 4, are much more promising when it comes to the varieties that the people can get there if there are positively looking to make a transit to another city(this could also be validated from their population levels as they occur in top positions in terms of overall population). Also the number of venues which are returned for these cities are more when compared to other ones(We could infer this from our dataframe), thus, also showing their potential to attract more people of diverse groups. These are the cities which are also the corporate hubs of india and attracts more job seekers. From the type and variety of venues, it could righly be said that these cities would be preferred choices for people who may want a good standard of living and relatively more varieties.
As far as other clusters are considered, broadly speaking, these are not very much promising for those people needing to transit to get a high standard living but at the same time potential centers for starting various buissness and opening more venues characterized by greater varities,possibly because they reasonably lack in terms of such factors. Cluster 5 is mainly about hotels whereas the last cluster cities are mainly concerned with pharmacy, suited for pharmacists and their relevant works and people with buisnesses concerned with medical/pharmacy.
Cluster 2 cities are characterized by particularly restraunts.
Cluster 3 cities have mainly ATMS in common showing more number of average daily digital transactions.
Cluster 1 is mainly centered around more train stations.
The other way of looking at this and also looking from their population levels, they are kind more in a situation where more demand for development needs to be fulfilled. And for a developing country like india, these observations seem fairly reasonable. India is a developing country and through the medium of this project, I tried to show how the overall cities of india are distributed in terms of developement and demands pertaining to different groups of people and how they can fulfill their respective needs by knowing the current state of where these cities stand and potentials that the cities have for satisfying their needs.
This project has a large scope of improvement because much more data could be fed into our algorithm to give a larger scale of analysis and visualization.