# Introduction

Bangalore, or Bengaluru as it goes by officially these days, was until some years ago known as the ‘Silicon Valley of India’ due to its IT/startup culture causing one disruption after another. Many Indian startups have been incubated here, while their global counterparts (startups as well as mega corporations) have chosen to locate their India hub of operations in the city, which is also the capital of the southern Indian state of Maharashtra. Uber, Amazon, Google, Zomato, Ola and Apple are just a few examples in this regard. But off late, Bangalore has been losing its sheen as the so-called startup hub, due to various problems, primarily infrastructure-based, clogging the city. On the other hand, various incentives and policies at the local and state government level, designed to attract investments into other cities, have thrown up quite a few challengers to Bangalore. A very prominent one among them is Pune, located in the western Indian state of Maharashtra. Pune has been traditionally known as the cultural hub of Maharashtra, as opposed to its commercial counterpart and state capital Mumbai. Home to renowned educational institutions, the city is now also coming up as a prominent home to startups, IT as well as non-IT based. Lower costs of living, conducive and fast connectivity with Mumbai thanks to an expressway, are among the few factors causing entrepreneurs and young urban professionals to look at Pune as a viable alternative to Bangalore.

While a comparison of Pune and Bangalore by themselves in terms of innovation drivers or places to live and work in may be beyond the scope of this project, I’ll still attempt to compare two areas/neighbourhoods in the cities from an urban culture perspective and try to show which area/city today has better choices today for the young urban professional, in terms of entertainment and relaxation facilities. The two areas chosen for my study are Koregaon Park in Pune and Indiranagar in Bangalore.

# Methodology

At the outset, I have to state that the limitations of working with data, especially geospatial data, in India are immense. This may not be much different for working with data anywhere else or any other sector. Yet, there are many limitations in the Indian context. The primary among them is a seemingly random way of street naming and clustering of city areas under districts. Parallely, there are no ready datasets available in terms of Wikipedia pages or .csv files for neighbourhood/postcode organization for Indian cities, unlike their bigger global counterparts like New York City or Toronto. I will still attempt to work with what I have, viz Google Maps and Foursquare APIs. Rather than a limitation, Google Maps as well as Foursquare APIs can be used to leverage the project, given both the apps have found large scale acceptance in Indian cities. That in turn, is all thanks to smartphones reaching Indian users on an unprecedented scale. So as a reverse engineering solution to gaps in administrative action, the crowd sourcing of Google Maps has helped put many Indian city neighbourhoods as properly identifiable locations. The folium, json and matplotlib packages in Python have been used to visualise the data. Apart from this, the shapely package to mannipulate the gerographical coordinates and scikit learn to cluster and analyse neighbourhoods using k-means clustering have also been used.


# Data

## Pune - Exploring Koregaon Park

We begin by getting the geographical coordinates using the Google Maps API and then some reverse geocoding to get the addresses/neighbourhood of our locations, given geospatial datasets for Indian cities are not readily available yet. After getting the coordinates, we 

In [1]:
import requests
api_key='abcdef'
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location']
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Koregaon Park, Pune, India'
Koregaon_Park = get_coordinates(api_key, address)
print('Coordinates of {}: {}'.format(address, Koregaon_Park))

Coordinates of Koregaon Park, Pune, India: [18.5362084, 73.8939748]


In [2]:
!pip install folium

Collecting folium
  Downloading https://files.pythonhosted.org/packages/43/77/0287320dc4fd86ae8847bab6c34b5ec370e836a79c7b0c16680a3d9fd770/folium-0.8.3-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 7.2MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Requirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded a

In [3]:
import folium

In [4]:
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

In [5]:
import json

In [6]:
! pip install shapely
import shapely.geometry
! pip install pyproj
import pyproj
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Koregaon_Park longitude={}, latitude={}'.format(Koregaon_Park[1], Koregaon_Park[0]))
x, y = lonlat_to_xy(Koregaon_Park[1], Koregaon_Park[0])
print('Koregaon_Park UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Koregaon Park longitude={}, latitude={}'.format(lo, la))

Collecting shapely
  Downloading https://files.pythonhosted.org/packages/a2/6c/966fa320a88fc685c956af08135855fa84a1589631256abebf73721c26ed/Shapely-1.6.4.post2-cp35-cp35m-manylinux1_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 687kB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting pyproj
  Downloading https://files.pythonhosted.org/packages/a6/7a/bcc981adaeb2127497f56312370c6878c62b1bb5358e3f9ebf30057ff894/pyproj-2.1.1-cp35-cp35m-manylinux1_x86_64.whl (10.8MB)
[K    100% |████████████████████████████████| 10.8MB 93kB/s  eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.1.1
Coordinate transformation check
-------------------------------
Koregaon_Park longitude=73.8939748, latitude=18.5362084
Koregaon_Park UTM X=7727395.3159307465, Y=3668556.0294573694
Koregaon Park longitude=73.89397479999988, latitude=18.536208399995218


To make our scope for coverage not too narrow, we look for areas within 1km of Koregaon Park

In [7]:
Koregaon_Park_x, Koregaon_Park_y = lonlat_to_xy(Koregaon_Park[1], Koregaon_Park[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2
x_min = Koregaon_Park_x - 1000
x_step = 600
y_min = Koregaon_Park_y - 4000 - (int(21/k)*k*600 - 2000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Koregaon_Park_x, Koregaon_Park_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'suitable neighborhood centers generated.')

178 suitable neighborhood centers generated.


Data gathering and clean up: Generating address of Koregaon Park in sync with its coordinates, and then all the neighbourhoods in the area.

In [8]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(api_key, Koregaon_Park[0], Koregaon_Park[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Koregaon_Park[0], Koregaon_Park[1], addr))

Reverse geocoding check
-----------------------
Address of [18.5362084, 73.8939748] is: 94/102, Kavadewadi, Koregaon Park, Pune, Maharashtra 411001, India


In [10]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(api_key, lat, lon)
    if address is None:
        address = 'No Address'
    address = address.replace(', India', '') 
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [11]:
addresses[150:170]

['81/1, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411001',
 'N Main Rd, Fatima Nagar, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411001',
 'North Main Road Ext Mundwa Road Mundwa, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036',
 '90, Mundhwa Rd, Near Yash Bharat Petrol Pump, Pingale Wasti, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036',
 'Gat No 61, Koregaon Park Annex,, Mundhwa Rd, Mundhwa, Pune, Maharashtra 411036',
 'Unnamed Road, Jadhav Nagar, Mundhwa, Pune, Maharashtra 411036',
 'Sai Nagari, Vithalrao Vandekar Rd, Nilanjali Society, Kalyani Nagar, Pune, Maharashtra 411006',
 '2, Rd Number 1, Carnation Society, Nilanjali Society, Kalyani Nagar, Pune, Maharashtra 411006',
 '66, Rd Number 5, Nilanjali Society, Kalyani Nagar, Pune, Maharashtra 411006',
 'Shop No. 101, Fortaleza Co-Operative Housing Society, Vitoria II Near Gold Adlabs, Kalyani Nagar, Pune, Maharashtra 411006',
 'Ivy Glen Palatial Apartments, Marigold Premises, Marigold complex, Kalyani

Now that the proper addresses are generated, we further organise the data by first rendering it into a pandas dataframe

In [12]:
import pandas as pd

df_locations = pd.DataFrame({'Neighbourhood': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,})

df_locations.head(30)

Unnamed: 0,Latitude,Longitude,Neighbourhood
0,18.512688,73.874594,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ..."
1,18.511205,73.877515,"1617/18, Kedari Path, Camp, Pune, Maharashtra ..."
2,18.509722,73.880436,"Shop No. 4 , Radiant Builders, Near Usha Nursi..."
3,18.50824,73.883357,"2, General Bhagat Marg, Stavely Road, Stavely ..."
4,18.506757,73.886277,"Pune-Solapur Road, Next To, Nalini Apartments,..."
5,18.505275,73.889197,"Turf Club Rd, Pune Cantonment, Pune, Maharasht..."
6,18.514359,73.877401,"572, Sachapir St, Camp, Pune, Maharashtra 411001"
7,18.512877,73.880322,"Bombay Garage, Camp, Pune, Maharashtra 411001"
8,18.511394,73.883242,"2420, Exhibition Rd, Camp, Pune, Maharashtra 4..."
9,18.509911,73.886163,"5B, General Bhagat Marg, Camp, Pune, Maharasht..."


In [13]:
df_locations.shape

(178, 3)

Visualising Koregaon Park using folium and markers

In [14]:
map_koregaon = folium.Map(location=[18.5362084, 73.8939748], zoom_start=13)


for lat, lng, label in zip(df_locations['Latitude'], df_locations['Longitude'], df_locations['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_koregaon)  
    
map_koregaon

## Methodology

Data cleanup: Given the not so systematic process of naming roads/localities in India, we'll have to do some initial cleanup by removing the areas with 'no address' entries.

In [15]:
import numpy as np
df_locations.replace("No Address", np.nan, inplace = True)
df_locations.reset_index(drop=True, inplace=True)
df_locations=df_locations.dropna()
df_locations

Unnamed: 0,Latitude,Longitude,Neighbourhood
0,18.512688,73.874594,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ..."
1,18.511205,73.877515,"1617/18, Kedari Path, Camp, Pune, Maharashtra ..."
2,18.509722,73.880436,"Shop No. 4 , Radiant Builders, Near Usha Nursi..."
3,18.508240,73.883357,"2, General Bhagat Marg, Stavely Road, Stavely ..."
4,18.506757,73.886277,"Pune-Solapur Road, Next To, Nalini Apartments,..."
5,18.505275,73.889197,"Turf Club Rd, Pune Cantonment, Pune, Maharasht..."
6,18.514359,73.877401,"572, Sachapir St, Camp, Pune, Maharashtra 411001"
7,18.512877,73.880322,"Bombay Garage, Camp, Pune, Maharashtra 411001"
8,18.511394,73.883242,"2420, Exhibition Rd, Camp, Pune, Maharashtra 4..."
9,18.509911,73.886163,"5B, General Bhagat Marg, Camp, Pune, Maharasht..."


In [39]:
df_locations.shape

(178, 3)

We now use the Foursquare API to explore the spots in our areas

In [1]:
CLIENT_ID = 'xyz'
CLIENT_SECRET = '123'
VERSION = '20190320'

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: xyz
CLIENT_SECRET:123


Looking at the first entry in the dataframe

In [41]:
df_locations.loc[0, 'Neighbourhood']

'1002, Dada Chelaram Path, Sadar Bazaar, Pune, Maharashtra 411001'

In [42]:
Neighbourhood_latitude = df_locations.loc[0, 'Latitude'] 
Neighbourhood_longitude = df_locations.loc[0, 'Longitude']

neighborhood_name = df_locations.loc[0, 'Neighbourhood']
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               Neighbourhood_latitude, 
                                                               Neighbourhood_longitude))

Latitude and longitude values of 1002, Dada Chelaram Path, Sadar Bazaar, Pune, Maharashtra 411001 are 18.51268755323604, 73.87459441840421.


## Analysis

Generating a request URL to explore the top 100 venues in a range of 1km of Koregaon Park

In [43]:
limit=100
radius=1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Neighbourhood_latitude, 
    Neighbourhood_longitude, 
    radius, 
    limit)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=NHAK4YESAV1S5ORCZAUVBQDB4CJMZB2SYJXCHCNYF12VLQD0&client_secret=IHF5N0XIS4F0Y1UQSEH0ADWUYOWZKTCPVWQECYTZN0O2ZZ1S&v=20190320&ll=18.51268755323604,73.87459441840421&radius=1000&limit=100'

In [44]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9204e14434b961192e1c7c'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b769f30f964a52000552ee3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d16a941735',
         'name': 'Bakery',
         'pluralName': 'Bakeries',
         'primary': True,
         'shortName': 'Bakery'}],
       'id': '4b769f30f964a52000552ee3',
       'location': {'address': 'East Street, Opp. Victory Cinema, Camp',
        'cc': 'IN',
        'city': 'Pune',
        'country': 'India',
        'crossStreet': 'Gurudwara Road',
        'distance': 603,
        'formattedAddress': ['East Street, Opp. Victory Cinema, Camp (Gurudwara Road)',
         'Pune 411001',
         'Mahārāshtra'

Rendering the result data into a dataframe

In [45]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [46]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(30)

Unnamed: 0,name,categories,lat,lng
0,Kayani Bakery,Bakery,18.514797,73.87986
1,Thousand Oaks / 1000 Oaks,Bar,18.510672,73.879993
2,Venky's Chicken,Fast Food Restaurant,18.513503,73.879766
3,Marz O Rin,Café,18.511859,73.880156
4,11 East Street Cafe,Bar,18.512403,73.880315
5,Hite Bar,Fast Food Restaurant,18.515298,73.873944
6,Cafe Toons,Sports Bar,18.516272,73.878876
7,Madinah Hotel,Indian Restaurant,18.507211,73.875016
8,Main Street,Plaza,18.516229,73.878812
9,Dorabjee & Sons Restaurant,Parsi Restaurant,18.515215,73.876923


In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

75 venues were returned by Foursquare.


Now we'll explore the top 50 areas in Koregaon within a radius of 500m - narrowing the coverage

In [48]:
LIMIT=50
radius=500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Neighbourhood_latitude, 
    Neighbourhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=NHAK4YESAV1S5ORCZAUVBQDB4CJMZB2SYJXCHCNYF12VLQD0&client_secret=IHF5N0XIS4F0Y1UQSEH0ADWUYOWZKTCPVWQECYTZN0O2ZZ1S&v=20190320&ll=18.51268755323604,73.87459441840421&radius=500&limit=50'

In [49]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9205659fb6b73bc30af321'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bd1afbd9854d13a40f4f94d-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d16e941735',
         'name': 'Fast Food Restaurant',
         'pluralName': 'Fast Food Restaurants',
         'primary': True,
         'shortName': 'Fast Food'}],
       'id': '4bd1afbd9854d13a40f4f94d',
       'location': {'address': 'Convent Street, Camp',
        'cc': 'IN',
        'city': 'Pune',
        'country': 'India',
        'crossStreet': 'St. Vincent Street',
        'distance': 298,
        'formattedAddress': ['Convent Street, Camp (St. Vincent Street)',
         'Pune 411001',
         'Mahārāsh

In [50]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [52]:
koregaon_venues = getNearbyVenues(names=df_locations['Neighbourhood'],
                                   latitudes=df_locations['Latitude'],
                                   longitudes=df_locations['Longitude']
                                  )

1002, Dada Chelaram Path, Sadar Bazaar, Pune, Maharashtra 411001
1617/18, Kedari Path, Camp, Pune, Maharashtra 411001
Shop No. 4 , Radiant Builders, Near Usha Nursing Home,, Behind Indusind Bank, 150, M. G. Road, Camp, Pune, Maharashtra 411001
2, General Bhagat Marg, Stavely Road, Stavely Road, Pune, Maharashtra 411001
Pune-Solapur Road, Next To, Nalini Apartments, Pune, Maharashtra 411001
Turf Club Rd, Pune Cantonment, Pune, Maharashtra 411040
572, Sachapir St, Camp, Pune, Maharashtra 411001
Bombay Garage, Camp, Pune, Maharashtra 411001
2420, Exhibition Rd, Camp, Pune, Maharashtra 411001
5B, General Bhagat Marg, Camp, Pune, Maharashtra 411001
Arjun Marg, Camp, Pune, Maharashtra 411001
AFMC, Pune, Maharashtra
Racecourse, Pune Cantonment, Pune, Maharashtra 411040
Shop No. 251, Fashion Street, Camp, Pune, Maharashtra 411001
subway, Hulshur, Camp, Pune, Maharashtra 411001
Gurudwara Rd, Camp, Pune, Maharashtra 411001
Pattinson Rd, Camp, Pune, Maharashtra 411001
shop no.889, nr. gunaji bull

In [53]:
print(koregaon_venues.shape)
koregaon_venues.head(30)

(1812, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Hite Bar,18.515298,73.873944,Fast Food Restaurant
1,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Dorabjee & Sons Restaurant,18.515215,73.876923,Parsi Restaurant
2,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Shivaji market,18.513451,73.876464,Department Store
3,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Priya Restaurant,18.511606,73.87883,Snack Place
4,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Radio Hotel,18.513901,73.877329,Indian Restaurant
5,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Cafe Yezdan,18.514896,73.876739,Breakfast Spot
6,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Poona Cold Drink & Coffee House,18.511447,73.878611,Ice Cream Shop
7,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Kohinoor restaurant,18.511419,73.878736,Breakfast Spot
8,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,Zaika The Veg Treat,18.512242,73.878778,Indian Chinese Restaurant
9,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",18.512688,73.874594,C.T. Pundole & sons,18.512303,73.878877,Watch Shop


In [55]:
koregaon_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"1002, Dada Chelaram Path, Sadar Bazaar, Pune, Maharashtra 411001",14,14,14,14,14,14
"108, Suyojana Society, Kavadewadi, Koregaon Park, Pune, Maharashtra 411001",20,20,20,20,20,20
"11-B, Iricen Railway Colony, Koregaon Park, Pune, Maharashtra 411001",5,5,5,5,5,5
"12, Hulshur, Camp, Pune, Maharashtra 411001",49,49,49,49,49,49
"12, Koregaon Park, Pune, Maharashtra 411001",8,8,8,8,8,8
"124/510, K. G. Lines, Agarwal Farm, Mansarovar, Jaipur, Maharashtra 302020",1,1,1,1,1,1
"142, Vithalrao Vandekar Rd, Prathamesh Society, Kalyani Nagar, Pune, Maharashtra 411006",33,33,33,33,33,33
"15, Marigold complex, Kalyani Nagar, Pune, Maharashtra 411014",37,37,37,37,37,37
"15, Prince of Wales Dr Rd, Camp, Pune, Maharashtra 411001",7,7,7,7,7,7
"16, Rajendrasinhji Rd, Camp, Pune, Maharashtra 411001",5,5,5,5,5,5


In [56]:
print('There are {} unique categories.'.format(len(koregaon_venues['Venue Category'].unique())))

There are 114 unique categories.


One hot encoding

In [57]:
koregaon_onehot = pd.get_dummies(koregaon_venues[['Venue Category']], prefix="", prefix_sep="")
koregaon_onehot['Neighbourhood'] = koregaon_venues['Neighbourhood']
fixed_columns = [koregaon_onehot.columns[-1]] + list(koregaon_onehot.columns[:-1])
koregaon_onehot = koregaon_onehot[fixed_columns]
koregaon_onehot.head(40)

Unnamed: 0,Neighbourhood,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach Bar,Beer Garden,...,Szechuan Restaurant,Tea Room,Theater,Thrift / Vintage Store,Track Stadium,Train Station,Vegetarian / Vegan Restaurant,Watch Shop,Wine Shop,Yoga Studio
0,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [58]:
koregaon_onehot.shape

(1812, 115)

Grouping neighbourhood by venue categories and creating a new dataframe

In [59]:
koregaon_grouped = koregaon_onehot.groupby('Neighbourhood').mean().reset_index()
koregaon_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach Bar,Beer Garden,...,Szechuan Restaurant,Tea Room,Theater,Thrift / Vintage Store,Track Stadium,Train Station,Vegetarian / Vegan Restaurant,Watch Shop,Wine Shop,Yoga Studio
0,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.071429,0.00,0.000000
1,"108, Suyojana Society, Kavadewadi, Koregaon Pa...",0.000000,0.0,0.000000,0.000000,0.000000,0.100000,0.000000,0.00,0.000000,...,0.000000,0.050000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
2,"11-B, Iricen Railway Colony, Koregaon Park, Pu...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
3,"12, Hulshur, Camp, Pune, Maharashtra 411001",0.000000,0.0,0.000000,0.000000,0.000000,0.020408,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.040816,0.000000,0.00,0.000000
4,"12, Koregaon Park, Pune, Maharashtra 411001",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.125000,0.000000,0.00,0.125000
5,"124/510, K. G. Lines, Agarwal Farm, Mansarovar...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
6,"142, Vithalrao Vandekar Rd, Prathamesh Society...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
7,"15, Marigold complex, Kalyani Nagar, Pune, Mah...",0.000000,0.0,0.054054,0.000000,0.000000,0.000000,0.027027,0.00,0.027027,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.027027
8,"15, Prince of Wales Dr Rd, Camp, Pune, Maharas...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
9,"16, Rajendrasinhji Rd, Camp, Pune, Maharashtra...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000


In [60]:
koregaon_grouped.shape

(150, 115)

Looking at the top 5 venues and creating another dataframe for that

In [61]:
num_top_venues = 5

for hood in koregaon_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = koregaon_grouped[koregaon_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1002, Dada Chelaram Path, Sadar Bazaar, Pune, Maharashtra 411001----
                  venue  freq
0     Indian Restaurant  0.21
1        Breakfast Spot  0.21
2  Fast Food Restaurant  0.07
3           Snack Place  0.07
4      Department Store  0.07


----108, Suyojana Society, Kavadewadi, Koregaon Park, Pune, Maharashtra 411001----
               venue  freq
0               Café  0.15
1             Bakery  0.10
2  Indian Restaurant  0.10
3              Hotel  0.05
4    Organic Grocery  0.05


----11-B, Iricen Railway Colony, Koregaon Park, Pune, Maharashtra 411001----
                      venue  freq
0  Mediterranean Restaurant   0.2
1            Clothing Store   0.2
2      Fast Food Restaurant   0.2
3             Movie Theater   0.2
4                      Café   0.2


----12, Hulshur, Camp, Pune, Maharashtra 411001----
                  venue  freq
0     Indian Restaurant  0.27
1  Fast Food Restaurant  0.08
2        Ice Cream Shop  0.06
3    Chinese Restaurant  0.04
4            

In [62]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [63]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = koregaon_grouped['Neighbourhood']

for ind in np.arange(koregaon_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(koregaon_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",Indian Restaurant,Breakfast Spot,Indian Chinese Restaurant,Spiritual Center,Parsi Restaurant,Department Store,Fast Food Restaurant,Snack Place,Watch Shop,Ice Cream Shop
1,"108, Suyojana Society, Kavadewadi, Koregaon Pa...",Café,Indian Restaurant,Bakery,Park,Bookstore,Organic Grocery,Creperie,Hotel,Convenience Store,Cocktail Bar
2,"11-B, Iricen Railway Colony, Koregaon Park, Pu...",Movie Theater,Clothing Store,Fast Food Restaurant,Mediterranean Restaurant,Café,Yoga Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
3,"12, Hulshur, Camp, Pune, Maharashtra 411001",Indian Restaurant,Fast Food Restaurant,Ice Cream Shop,Clothing Store,Juice Bar,Lounge,Chinese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Paper / Office Supplies Store
4,"12, Koregaon Park, Pune, Maharashtra 411001",Café,Yoga Studio,Italian Restaurant,Park,Hotel,Vegetarian / Vegan Restaurant,Athletics & Sports,French Restaurant,Deli / Bodega,Department Store


Using k-means clustering to analyse neighbourhood clusters

In [64]:
!pip install numpy scipy sklearn
from sklearn.cluster import KMeans
kclusters =5
koregaon_grouped_clustering = koregaon_grouped.drop('Neighbourhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(koregaon_grouped_clustering)
kmeans.labels_[0:10]



array([0, 1, 1, 0, 4, 1, 1, 1, 4, 0], dtype=int32)

In [65]:
neighbourhoods_venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)
koregaon_merged = df_locations
koregaon_merged = koregaon_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
koregaon_merged.head()

Unnamed: 0,Latitude,Longitude,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,18.512688,73.874594,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",0.0,Indian Restaurant,Breakfast Spot,Indian Chinese Restaurant,Spiritual Center,Parsi Restaurant,Department Store,Fast Food Restaurant,Snack Place,Watch Shop,Ice Cream Shop
1,18.511205,73.877515,"1617/18, Kedari Path, Camp, Pune, Maharashtra ...",0.0,Indian Restaurant,Breakfast Spot,Bar,Department Store,Parsi Restaurant,Donut Shop,Fast Food Restaurant,Café,Snack Place,Indian Chinese Restaurant
2,18.509722,73.880436,"Shop No. 4 , Radiant Builders, Near Usha Nursi...",0.0,Indian Restaurant,Bar,Indian Chinese Restaurant,Snack Place,Hookah Bar,Fast Food Restaurant,Donut Shop,Café,Burger Joint,Breakfast Spot
3,18.50824,73.883357,"2, General Bhagat Marg, Stavely Road, Stavely ...",0.0,Indian Restaurant,Thrift / Vintage Store,Bar,Donut Shop,Burger Joint,Yoga Studio,French Restaurant,Deli / Bodega,Department Store,Dessert Shop
4,18.506757,73.886277,"Pune-Solapur Road, Next To, Nalini Apartments,...",2.0,Racetrack,Yoga Studio,Food Truck,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop


Visualising the clusters

In [66]:
map_clusters = folium.Map(location=[18.5362084, 73.8939748], zoom_start=13)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(koregaon_merged['Latitude'], koregaon_merged['Longitude'], koregaon_merged['Neighbourhood'], koregaon_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow,
        fill=True,
        fill_color=rainbow,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [67]:
koregaon_merged.loc[koregaon_merged['Cluster Labels'] == 0, koregaon_merged.columns[[2] + list(range(5, koregaon_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"1002, Dada Chelaram Path, Sadar Bazaar, Pune, ...",Breakfast Spot,Indian Chinese Restaurant,Spiritual Center,Parsi Restaurant,Department Store,Fast Food Restaurant,Snack Place,Watch Shop,Ice Cream Shop
1,"1617/18, Kedari Path, Camp, Pune, Maharashtra ...",Breakfast Spot,Bar,Department Store,Parsi Restaurant,Donut Shop,Fast Food Restaurant,Café,Snack Place,Indian Chinese Restaurant
2,"Shop No. 4 , Radiant Builders, Near Usha Nursi...",Bar,Indian Chinese Restaurant,Snack Place,Hookah Bar,Fast Food Restaurant,Donut Shop,Café,Burger Joint,Breakfast Spot
3,"2, General Bhagat Marg, Stavely Road, Stavely ...",Thrift / Vintage Store,Bar,Donut Shop,Burger Joint,Yoga Studio,French Restaurant,Deli / Bodega,Department Store,Dessert Shop
6,"572, Sachapir St, Camp, Pune, Maharashtra 411001",Fast Food Restaurant,Breakfast Spot,Lounge,Chinese Restaurant,Bar,Ice Cream Shop,Snack Place,Plaza,Café
7,"Bombay Garage, Camp, Pune, Maharashtra 411001",Breakfast Spot,Fast Food Restaurant,Chinese Restaurant,Ice Cream Shop,Snack Place,Bar,Indian Chinese Restaurant,Plaza,Lounge
8,"2420, Exhibition Rd, Camp, Pune, Maharashtra 4...",Bar,Indian Chinese Restaurant,Fast Food Restaurant,Café,Breakfast Spot,Snack Place,Ice Cream Shop,Watch Shop,Thrift / Vintage Store
13,"Shop No. 251, Fashion Street, Camp, Pune, Maha...",Fast Food Restaurant,Ice Cream Shop,Chinese Restaurant,Breakfast Spot,Coffee Shop,Department Store,Vegetarian / Vegan Restaurant,Shopping Mall,Snack Place
14,"subway, Hulshur, Camp, Pune, Maharashtra 411001",Fast Food Restaurant,Ice Cream Shop,Department Store,Breakfast Spot,Chinese Restaurant,Indian Chinese Restaurant,Plaza,Lounge,Juice Bar
15,"Gurudwara Rd, Camp, Pune, Maharashtra 411001",Fast Food Restaurant,Chinese Restaurant,Café,Plaza,Vegetarian / Vegan Restaurant,Motorcycle Shop,Bakery,Bar,Snack Place


In [68]:
koregaon_merged.loc[koregaon_merged['Cluster Labels'] == 1, koregaon_merged.columns[[2] + list(range(5, koregaon_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"D5, NH 9, Camp, Pune, Maharashtra 411001",Pizza Place,Department Store,Electronics Store,Shopping Mall,Motorcycle Shop,Yoga Studio,Cricket Ground,Dance Studio,Deli / Bodega
21,"Fatima Nagar, Fatima Nagar, Wanowrie, Pune, Ma...",Multiplex,Park,Pizza Place,Nightclub,Department Store,Dessert Shop,Electronics Store,Motorcycle Shop,Breakfast Spot
25,"124/510, K. G. Lines, Agarwal Farm, Mansarovar...",Yoga Studio,Food Truck,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop
29,"Cv Town Pune 13, Jambhulkar Mala, Wanowrie, Pu...",Park,Pizza Place,Department Store,Electronics Store,Motorcycle Shop,Yoga Studio,Cupcake Shop,Dance Studio,Deli / Bodega
30,"Blossom Leap Building, Solapur Rd, Jambhulkar ...",Park,Pizza Place,Multiplex,Shopping Mall,Fast Food Restaurant,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega
33,"8, Parade, Ground Road, Near, Milkha Singh Spo...",Pool,Café,Fast Food Restaurant,Yoga Studio,Food Truck,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
34,"Milkha Singh Sports Complex Rd, Camp, Pune, Ma...",Pool,Fast Food Restaurant,Yoga Studio,Food Truck,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
39,"Canal Rd, Empress Garden View Society, Uday Ba...",Food Truck,Yoga Studio,Cricket Ground,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Donut Shop,Electronics Store
40,"Bhat Nagar Hadapsar, Jambhulkar Mala, Wanowrie...",Shopping Mall,Park,Yoga Studio,Food Truck,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
42,"Kahun Road, Camp, Pune, Maharashtra 411001",Other Great Outdoors,Pool,Golf Course,Movie Theater,Yoga Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop


In [69]:
koregaon_merged.loc[koregaon_merged['Cluster Labels'] == 4, koregaon_merged.columns[[2] + list(range(5, koregaon_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,"Race Course Empress Gardan, Camp, Pune, Mahara...",Garden,Gym Pool,Yoga Studio,French Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
28,"15, Prince of Wales Dr Rd, Camp, Pune, Maharas...",Botanical Garden,Motorcycle Shop,Indian Restaurant,Café,English Restaurant,Food Truck,Food Stand,Fast Food Restaurant,Yoga Studio
36,"15, Prince of Wales Dr Rd, Camp, Pune, Maharas...",Botanical Garden,Motorcycle Shop,Indian Restaurant,Café,English Restaurant,Food Truck,Food Stand,Fast Food Restaurant,Yoga Studio
37,"Race Course Empress Gardan, Camp, Pune, Mahara...",Garden,Gym Pool,Yoga Studio,French Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
38,"21, Empress Garden View Society, Kavade Mala, ...",Botanical Garden,Garden,Yoga Studio,French Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
45,"Princess Of Wales Road,, Ghorpuri Lines, Dobar...",Dance Studio,Electronics Store,Yoga Studio,French Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop
46,"Lalbahadur Shastri Marg, Ghorpuri Lines, Dobar...",Café,Botanical Garden,Garden,Yoga Studio,French Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
47,"Disputed Canal Rd, Empress Garden, Kavade Mala...",Botanical Garden,Garden,Yoga Studio,French Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
55,"Princess Of Wales Road,, Ghorpuri Lines, Dobar...",Dance Studio,Electronics Store,Yoga Studio,French Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop
56,"near kalyani nagar signal, Pune, Maharashtra",Yoga Studio,French Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop,Electronics Store


# Bengaluru/Bangalore

##Repeating the same procedure for Pune, with Indiranagar as our chosen neighbourhood this time

In [70]:
import requests

In [71]:
api_key='google'
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location']
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]

address = 'Indiranagar, Bangalore'
Indiranagar = get_coordinates(api_key, address)
print('Coordinates of {}: {}'.format(address, Indiranagar))

Coordinates of Indiranagar, Bangalore: [12.9718915, 77.6411545]


In [72]:
! pip install shapely
import shapely.geometry
! pip install pyproj
import pyproj
import math
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Indiranagar longitude={}, latitude={}'.format(Indiranagar[1], Indiranagar[0]))
x, y = lonlat_to_xy(Indiranagar[1], Indiranagar[0])
print('Indiranagar UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Indiranagar longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Indiranagar longitude=77.6411545, latitude=12.9718915
Indiranagar UTM X=8897933.64661394, Y=2970476.5377279078
Indiranagar longitude=77.64115449994686, latitude=12.971891500001675


In [73]:
Indiranagar_x, Indiranagar_y = lonlat_to_xy(Indiranagar[1], Indiranagar[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = Indiranagar_x - 1000
x_step = 600
y_min = Indiranagar_y - 6000 - (int(21/k)*k*600 - 2000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Indiranagar_x, Indiranagar_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'suitable neighborhood centers generated.')

133 suitable neighborhood centers generated.


In [74]:
import folium

In [75]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(api_key, Indiranagar[0], Indiranagar[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Indiranagar[0], Indiranagar[1], addr))

Reverse geocoding check
-----------------------
Address of [12.9718915, 77.6411545] is: No. 3802/B, 7th Main, Near ESI Hospital, HAL 2nd Stage,, Sodepur, Appareddipalya, Indiranagar, Bengaluru, Karnataka 560038, India


In [83]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', India', '')
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [87]:
addresses[50:70]

['32, 2nd Main Rd, KR Garden, K R Garden, Murgesh Pallya, Bengaluru, Karnataka 560017',
 'MAP QTR-226 Block Command Hospital,Air Force, Cambridge Rd, Domlur Village, Domlur, Bengaluru, Karnataka 560007',
 '101/4A, 4th Main Rd, BDA Colony, Domlur Village, Domlur, Bengaluru, Karnataka 560071',
 '191/3, Domlur Village, Domlur, Bengaluru, Karnataka 560071',
 'GPRA Quarters, Old Airport Road, Domlur, Bengaluru, Karnataka 560071',
 '2D-101, HAL Old Airport Rd, Domlur Village, Domlur, Bengaluru, Karnataka 560071',
 'I Block, Diamond District, Golf Course Rd, ISRO Colony, Domlur, Bengaluru, Karnataka 560008',
 'Golf Course Rd, Kodihalli, Bengaluru, Karnataka 560008',
 '37/5, Wind Tunnel Rd, N R Layout, Rustam Bagh Layout, Bengaluru, Karnataka 560017',
 '297/13, N R Layout, Rustam Bagh Layout, Bengaluru, Karnataka 560017',
 '46, D Cross Rd, K R Garden, Murgesh Pallya, Bengaluru, Karnataka 560017',
 '5, 16th Cross Road, KR Garden, Vinayaka Nagar, Murgesh Pallya, Bengaluru, Karnataka 560017',
 '6

In [90]:
import pandas as pd

df_bengaluru = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,})

df_bengaluru.head(30)
df_bengaluru.shape

(133, 3)

In [94]:
import numpy as np
df_bengaluru.replace("Unnamed Road, Agram, Bengaluru, Karnataka 560007", np.nan, inplace = True)
df_bengaluru.reset_index(drop=True, inplace=True)
df_bengaluru=df_bengaluru.dropna()
df_bengaluru

Unnamed: 0,Address,Latitude,Longitude
0,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",12.950986,77.626719
1,"26/1, Ground Floor, Koramangala Intermediate R...",12.949896,77.629238
4,"Unnamed Road, Challaghatta, Bengaluru, Karnata...",12.946625,77.636794
5,"Embossy golf link/Dell, Challaghatta, Bengalur...",12.945535,77.639312
10,"Intermediate Ring Rd, Challaghatta, Bengaluru,...",12.948219,77.639016
11,"Military Bridge, Agram, Bengaluru, Karnataka 5...",12.947128,77.641534
12,"Unnamed Road, Challaghatta, Bengaluru, Karnata...",12.946038,77.644052
13,"209/3, Agram, Bengaluru, Karnataka 560007",12.955266,77.628646
14,"560068, Agram, Bengaluru, Karnataka 560007",12.954175,77.631164
16,"ASC, Opp Command Hospital, Old Airport Road, A...",12.951994,77.636202


In [97]:
map_indira = folium.Map(location=[12.9718915, 77.6411545], zoom_start=13)


for lat, lng, label in zip(df_bengaluru['Latitude'], df_bengaluru['Longitude'], df_bengaluru['Address']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_indira)  
    
map_indira

In [98]:
df_bengaluru.loc[0,'Address']

'25, 1st Cross Rd, Agram, Bengaluru, Karnataka 560047'

In [99]:
Address_latitude = df_bengaluru.loc[0, 'Latitude'] 
Address_longitude = df_bengaluru.loc[0, 'Longitude']

neighbourhood_name = df_bengaluru.loc[0, 'Address']
print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               Address_latitude, 
                                                               Address_longitude))

Latitude and longitude values of 25, 1st Cross Rd, Agram, Bengaluru, Karnataka 560047 are 12.950986449985916, 77.62671945055658.


In [2]:
CLIENT_ID = 'foursq'
CLIENT_SECRET = 'foursq'
VERSION = '20190320'

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: foursq
CLIENT_SECRET:foursq


In [101]:
LIMIT=100
radius=1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Address_latitude, 
    Address_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=NHAK4YESAV1S5ORCZAUVBQDB4CJMZB2SYJXCHCNYF12VLQD0&client_secret=IHF5N0XIS4F0Y1UQSEH0ADWUYOWZKTCPVWQECYTZN0O2ZZ1S&v=20190320&ll=12.950986449985916,77.62671945055658&radius=1000&limit=100'

In [102]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9209acdd579725f86a0cd3'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4cc2a4e4bde8f04d0bd59f4b-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d10f941735',
         'name': 'Indian Restaurant',
         'pluralName': 'Indian Restaurants',
         'primary': True,
         'shortName': 'Indian'}],
       'id': '4cc2a4e4bde8f04d0bd59f4b',
       'location': {'address': 'Sri rama temple road, Ejipura',
        'cc': 'IN',
        'city': 'Bagalore',
        'country': 'India',
        'distance': 853,
        'formattedAddress': ['Sri rama temple road, Ejipura',
         'Bagalore',
         'Karnātaka',
         'India'],
        'labeledLatLngs': [{'label'

In [103]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [104]:
from pandas.io.json import json_normalize

In [105]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(60)

Unnamed: 0,name,categories,lat,lng
0,Peppers,Indian Restaurant,12.943495,77.628401
1,Fitness Cafe,Gym / Fitness Center,12.943175,77.628976
2,"barista, koramangala",Café,12.948603,77.620339
3,Builders NGV Club,General Entertainment,12.942139,77.625657
4,Malabar Food Plaza,Fast Food Restaurant,12.950426,77.624473
5,Coco Bongo,Nightclub,12.944024,77.62358
6,NGV Park,Park,12.948047,77.622039
7,Mast Kalandar,Indian Restaurant,12.94611,77.62308
8,Star Bazaar,Department Store,12.945987,77.623147
9,"ASC Centre, Domlur",Golf Course,12.950482,77.633711


In [106]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

15 venues were returned by Foursquare.


In [107]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Address', 
                  'Address Latitude', 
                  'Address Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [108]:
bengaluru_venues = getNearbyVenues(names=df_bengaluru['Address'],
                                   latitudes=df_bengaluru['Latitude'],
                                   longitudes=df_bengaluru['Longitude']
                                  )


25, 1st Cross Rd, Agram, Bengaluru, Karnataka 560047
26/1, Ground Floor, Koramangala Intermediate Ring Road, Near Sony World, Junction, Next KFC, Koramangala Bangalore, Karnataka, Bengaluru, Karnataka 560047
Unnamed Road, Challaghatta, Bengaluru, Karnataka 560007
Embossy golf link/Dell, Challaghatta, Bengaluru, Karnataka 560007
Intermediate Ring Rd, Challaghatta, Bengaluru, Karnataka 560071
Military Bridge, Agram, Bengaluru, Karnataka 560007
Unnamed Road, Challaghatta, Bengaluru, Karnataka 560071
209/3, Agram, Bengaluru, Karnataka 560007
560068, Agram, Bengaluru, Karnataka 560007
ASC, Opp Command Hospital, Old Airport Road, Agram, Bengaluru, Karnataka 560007
Intermediate Ring Rd, Krishna Reddy Layout, Domlur, Bengaluru, Karnataka 560007
Cherry Hills, Embassy Golf Links Business Park, Challaghatta, Bengaluru, Karnataka 560071
Sunquest Information Systems, Eagle Ridge, Embassy Golf Links Road, Embassy Golf Links Business Park, Challaghatta, Bengaluru, Karnataka 560071
24/7, Nagsandra Rd,

In [109]:
print (bengaluru_venues.shape)
bengaluru_venues.head()

(2389, 7)


Unnamed: 0,Address,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",12.950986,77.626719,Elite Supermarket,12.950333,77.62553,Department Store
1,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",12.950986,77.626719,Malabar Food Plaza,12.950426,77.624473,Fast Food Restaurant
2,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",12.950986,77.626719,Goli Vada Pav No 1,12.947082,77.628441,Snack Place
3,"26/1, Ground Floor, Koramangala Intermediate R...",12.949896,77.629238,Squash Court at ACS Center,12.948315,77.631174,Athletics & Sports
4,"26/1, Ground Floor, Koramangala Intermediate R...",12.949896,77.629238,Goli Vada Pav No 1,12.947082,77.628441,Snack Place


In [110]:
bengaluru_venues.groupby('Address').count()

Unnamed: 0_level_0,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"#44, 2nd Floor, NAL Wind Tunnel Road,, Opp Golf View Campus, Murugeshpalya, Bengaluru, Karnataka 560017",5,5,5,5,5,5
"1, Golf Avenue, Off. Airport Road, P.B.No. 817, Embassy Golf Links Business Park, Challaghatta, Bengaluru, Karnataka 560008",20,20,20,20,20,20
"1, Wind Tunnel Rd, Rustam Bagh Layout, Bengaluru, Karnataka 560017",2,2,2,2,2,2
"101/4A, 4th Main Rd, BDA Colony, Domlur Village, Domlur, Bengaluru, Karnataka 560071",5,5,5,5,5,5
"1045, 4th Cross Rd, HAL 2nd Stage, Doopanahalli, Indiranagar, Bengaluru, Karnataka 560008",68,68,68,68,68,68
"1075/I, 5th Cross Rd, HAL 2nd Stage, Appareddipalya, Indiranagar, Bengaluru, Karnataka 560008",47,47,47,47,47,47
"1079/4, 13th A Main Rd, HAL 2nd Stage, Indiranagar, Bengaluru, Karnataka 560008",36,36,36,36,36,36
"11, 3rd Cross Rd, S R Layout, Rustam Bagh Layout, Bengaluru, Karnataka 560017",5,5,5,5,5,5
"113, 6th Cross Rd, Stage 2, Hoysala Nagar, Indiranagar, Bengaluru, Karnataka 560038",26,26,26,26,26,26
"12/1, Embassy Golf Links Business Park, Domlur, Bengaluru, Karnataka 560071",26,26,26,26,26,26


In [111]:
print('There are {} unique categories.'.format(len(bengaluru_venues['Venue Category'].unique())))

There are 98 unique categories.


One Hot Encoding

In [112]:
bengaluru_onehot = pd.get_dummies(bengaluru_venues[['Venue Category']], prefix="", prefix_sep="")
bengaluru_onehot['Address'] = bengaluru_venues['Address']
fixed_columns = [bengaluru_onehot.columns[-1]] + list(bengaluru_onehot.columns[:-1])
bengaluru_onehot = bengaluru_onehot[fixed_columns]

bengaluru_onehot.head()

Unnamed: 0,Address,ATM,Accessories Store,Andhra Restaurant,Antique Shop,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,...,Sporting Goods Shop,Sports Bar,Steakhouse,Summer Camp,Tea Room,Tex-Mex Restaurant,Trail,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"26/1, Ground Floor, Koramangala Intermediate R...",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"26/1, Ground Floor, Koramangala Intermediate R...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [113]:
bengaluru_onehot.shape

(2389, 99)

In [114]:
bengaluru_grouped = bengaluru_onehot.groupby('Address').mean().reset_index()
bengaluru_grouped

Unnamed: 0,Address,ATM,Accessories Store,Andhra Restaurant,Antique Shop,Arcade,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,...,Sporting Goods Shop,Sports Bar,Steakhouse,Summer Camp,Tea Room,Tex-Mex Restaurant,Trail,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"#44, 2nd Floor, NAL Wind Tunnel Road,, Opp Gol...",0.0,0.000000,0.000000,0.00,0.000000,0.200000,0.0,0.000000,0.200000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,"1, Golf Avenue, Off. Airport Road, P.B.No. 817...",0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,"1, Wind Tunnel Rd, Rustam Bagh Layout, Bengalu...",0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"101/4A, 4th Main Rd, BDA Colony, Domlur Villag...",0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.200000
4,"1045, 4th Cross Rd, HAL 2nd Stage, Doopanahall...",0.0,0.000000,0.014706,0.00,0.000000,0.014706,0.0,0.000000,0.014706,...,0.000000,0.014706,0.029412,0.0,0.029412,0.014706,0.000000,0.000000,0.000000,0.029412
5,"1075/I, 5th Cross Rd, HAL 2nd Stage, Appareddi...",0.0,0.000000,0.021277,0.00,0.000000,0.021277,0.0,0.000000,0.021277,...,0.000000,0.000000,0.000000,0.0,0.042553,0.021277,0.000000,0.021277,0.000000,0.021277
6,"1079/4, 13th A Main Rd, HAL 2nd Stage, Indiran...",0.0,0.027778,0.000000,0.00,0.000000,0.027778,0.0,0.027778,0.055556,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.027778,0.000000
7,"11, 3rd Cross Rd, S R Layout, Rustam Bagh Layo...",0.0,0.000000,0.000000,0.00,0.000000,0.200000,0.0,0.000000,0.200000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,"113, 6th Cross Rd, Stage 2, Hoysala Nagar, Ind...",0.0,0.000000,0.076923,0.00,0.000000,0.000000,0.0,0.000000,0.038462,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.038462,0.038462,0.000000
9,"12/1, Embassy Golf Links Business Park, Domlur...",0.0,0.000000,0.000000,0.00,0.038462,0.000000,0.0,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [115]:
bengaluru_grouped.shape

(114, 99)

In [116]:
num_top_venues = 5

for hood in bengaluru_grouped['Address']:
    print("----"+hood+"----")
    temp = bengaluru_grouped[bengaluru_grouped['Address'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----#44, 2nd Floor, NAL Wind Tunnel Road,, Opp Golf View Campus, Murugeshpalya, Bengaluru, Karnataka 560017----
               venue  freq
0  Indian Restaurant   0.4
1  Korean Restaurant   0.2
2   Asian Restaurant   0.2
3             Bakery   0.2
4             Lounge   0.0


----1, Golf Avenue, Off. Airport Road, P.B.No. 817, Embassy Golf Links Business Park, Challaghatta, Bengaluru, Karnataka 560008----
                venue  freq
0   Indian Restaurant  0.25
1               Hotel  0.10
2         Coffee Shop  0.10
3                Café  0.10
4  Chinese Restaurant  0.05


----1, Wind Tunnel Rd, Rustam Bagh Layout, Bengaluru, Karnataka 560017----
          venue  freq
0         Hotel   0.5
1    Restaurant   0.5
2           ATM   0.0
3  Liquor Store   0.0
4     Multiplex   0.0


----101/4A, 4th Main Rd, BDA Colony, Domlur Village, Domlur, Bengaluru, Karnataka 560071----
                   venue  freq
0  Vietnamese Restaurant   0.2
1      Indian Restaurant   0.2
2            Pizza Place   

Rendering into pandas dataframe

In [117]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [118]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']


columns = ['Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Address'] = bengaluru_grouped['Address']

for ind in np.arange(bengaluru_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bengaluru_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"#44, 2nd Floor, NAL Wind Tunnel Road,, Opp Gol...",Indian Restaurant,Asian Restaurant,Bakery,Korean Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store
1,"1, Golf Avenue, Off. Airport Road, P.B.No. 817...",Indian Restaurant,Coffee Shop,Café,Hotel,Snack Place,Chinese Restaurant,Golf Course,Bus Station,Pizza Place,Pub
2,"1, Wind Tunnel Rd, Rustam Bagh Layout, Bengalu...",Hotel,Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
3,"101/4A, 4th Main Rd, BDA Colony, Domlur Villag...",Vietnamese Restaurant,Café,Pizza Place,Indian Restaurant,Food & Drink Shop,Farmers Market,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega
4,"1045, 4th Cross Rd, HAL 2nd Stage, Doopanahall...",Indian Restaurant,Café,Restaurant,Italian Restaurant,Lounge,Clothing Store,Pub,Vietnamese Restaurant,Coffee Shop,Cupcake Shop


In [119]:
!pip install sklearn
from sklearn.cluster import KMeans
kclusters = 5
bengaluru_grouped_clustering = bengaluru_grouped.drop('Address', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bengaluru_grouped_clustering)
kmeans.labels_[0:10]



array([4, 1, 1, 1, 1, 1, 4, 4, 1, 1], dtype=int32)

In [120]:
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
bengaluru_merged = df_bengaluru
bengaluru_merged = bengaluru_merged.join(neighbourhoods_venues_sorted.set_index('Address'), on='Address')
bengaluru_merged.head()

Unnamed: 0,Address,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"25, 1st Cross Rd, Agram, Bengaluru, Karnataka ...",12.950986,77.626719,1.0,Snack Place,Fast Food Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Dessert Shop,Diner
1,"26/1, Ground Floor, Koramangala Intermediate R...",12.949896,77.629238,1.0,ATM,Golf Course,Athletics & Sports,Department Store,Burger Joint,Snack Place,Farmers Market,Cocktail Bar,Coffee Shop,Cupcake Shop
2,"Unnamed Road, Challaghatta, Bengaluru, Karnata...",12.946625,77.636794,2.0,Bus Station,Vietnamese Restaurant,Fast Food Restaurant,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner
3,"Embossy golf link/Dell, Challaghatta, Bengalur...",12.945535,77.639312,2.0,Bus Station,Vietnamese Restaurant,Fast Food Restaurant,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner
4,"Intermediate Ring Rd, Challaghatta, Bengaluru,...",12.948219,77.639016,1.0,Gym,Coffee Shop,Trail,Café,Hotel,Cafeteria,Furniture / Home Store,Farmers Market,Cupcake Shop,Deli / Bodega


In [121]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map(location=[12.9718915, 77.6411545], zoom_start=13)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
for lat, lon, poi, cluster in zip(bengaluru_merged['Latitude'], bengaluru_merged['Longitude'], bengaluru_merged['Address'], bengaluru_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow,
        fill=True,
        fill_color=rainbow,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [123]:
bengaluru_merged.loc[bengaluru_merged['Cluster Labels'] == 4, bengaluru_merged.columns[[1] + list(range(5, bengaluru_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,12.957363,Vegetarian / Vegan Restaurant,Flea Market,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner
23,12.956272,Lounge,Breakfast Spot,Furniture / Home Store,Arcade,Hotel,Pizza Place,Irish Pub,Bar,Vietnamese Restaurant
28,12.950817,Indian Restaurant,Bakery,Korean Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store
29,12.949726,Asian Restaurant,Bakery,Korean Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store
31,12.960047,Indian Restaurant,Bakery,Vietnamese Restaurant,Fast Food Restaurant,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Department Store
32,12.958956,Vegetarian / Vegan Restaurant,Park,Pizza Place,Breakfast Spot,Vietnamese Restaurant,Farmers Market,Cocktail Bar,Coffee Shop,Cupcake Shop
33,12.957865,Hotel,Pizza Place,Breakfast Spot,Irish Pub,Coffee Shop,Café,Park,Bus Station,Pub
37,12.953501,Indian Restaurant,Bakery,Snack Place,Korean Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Coffee Shop,Cupcake Shop,Deli / Bodega
38,12.95241,Asian Restaurant,Pizza Place,Bakery,Korean Restaurant,Vietnamese Restaurant,Farmers Market,Coffee Shop,Cupcake Shop,Deli / Bodega
39,12.951319,Breakfast Spot,Department Store,Bar,Korean Restaurant,Snack Place,Vietnamese Restaurant,Coffee Shop,Cupcake Shop,Deli / Bodega


In [124]:
bengaluru_merged.loc[bengaluru_merged['Cluster Labels'] == 1, bengaluru_merged.columns[[1] + list(range(5, bengaluru_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,12.950986,Fast Food Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Cupcake Shop,Deli / Bodega,Dessert Shop,Diner
1,12.949896,Golf Course,Athletics & Sports,Department Store,Burger Joint,Snack Place,Farmers Market,Cocktail Bar,Coffee Shop,Cupcake Shop
4,12.948219,Coffee Shop,Trail,Café,Hotel,Cafeteria,Furniture / Home Store,Farmers Market,Cupcake Shop,Deli / Bodega
5,12.947128,Gym,Ice Cream Shop,Mexican Restaurant,Cafeteria,Coffee Shop,Pizza Place,Juice Bar,Restaurant,Indian Restaurant
6,12.946038,Food Court,Sandwich Place,Hotel,Indian Restaurant,Juice Bar,Coffee Shop,Mexican Restaurant,Cafeteria,Pizza Place
9,12.951994,Lounge,Café,Hotel,Furniture / Home Store,Irish Pub,Vietnamese Restaurant,Farmers Market,Coffee Shop,Cupcake Shop
10,12.950903,Café,Lounge,Indian Restaurant,Ice Cream Shop,Breakfast Spot,Irish Pub,Snack Place,Coffee Shop,Arcade
11,12.949812,Hotel,Lounge,Restaurant,Gym,Sandwich Place,Food Court,Ice Cream Shop,Irish Pub,Juice Bar
12,12.948722,Food Court,Cafeteria,Sandwich Place,Hotel,Ice Cream Shop,Indian Restaurant,Juice Bar,Coffee Shop,Mexican Restaurant
13,12.947631,Food Court,Restaurant,Sandwich Place,Hotel,Indian Restaurant,Juice Bar,Coffee Shop,Mexican Restaurant,Cafeteria


# Results and Discussion

The results certainly look interesting, very simply from the fact that Pune, long known as a laidback college town as compared to its never-sleeping cousin Mumbai, seems to have more spots to let your hair down at the end of a long day. To be fair, even Bangalore had the reputation of being a city of retirees and pensioners. Yet, if a basic data-driven comparison of two very prominent areas of the two cities is any indicator, Indiranagar may not be as ‘happening’ as the initial idea may have been. Koregaon Park, on the other hand, totally seems to live up to its image as the spot to be in town, if you’re a young urban professional earning enough to have a decent lifestyle. On the other hand, less commercialization also means more space for greenery and lesser pollution, though that is beyond the scope of this project. Taking all sorts of inaccuracies into account (which I have also tried to clean up to the best of my abilities), Koregaon Park looks like a clear winner over Indiranagar if you are planning a move to a new city to get your social life in top gear.