# Analyzing Neighborhoods in Bengaluru, India to open a shopping mall

## Introduction

Bengaluru, the city that has been adjudged the most livable city in India, is the capital of the Indian State of Karnataka. It is known for its pleasant climate throughout the year. The city hosts numerous prestigious institutions and a large number of Tech Parks.  

Being the third most populous city in the country, there is a lot of opportunity for property developers to build a lot more malls in the city. This project intends to find reccomendations for the stakeholders based on the analysis

### Data Collection

The data required for this project has been collected from multiple sources. A summary of the data required for this project is given below.

#### Neighborhoods Data

The data of the neighborhoods in Bengaluru was scraped from https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore. The data is read into a pandas dataframe using read_html() method. The main reason for doing so is that the wikipedia page provides a comprehensive and detailed table of the data which can easily be scraped using the read_html() method.

#### Geographical Coordinates

The wikipedia page lacks geographical coordinates. To solve this problem, the python geocoder package is used. This will give us the latitude and longitude coordinates of the neighborhoods.

### Foursquare API Data

After that, the Foursquare API will be used to get the venue data for those neighbourhoods. Foursquare 
has one of the largest database of 105+ million places and is used by over 125,000 developers. 
Foursquare API will provide many categories of the venue data. For this project, the shopping mall category will be used to solve the business problem put forward

### Importing required libraries

In [14]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
import geocoder
from pandas.io.json import json_normalize
import folium
import json

### Scraping data from the Wikipedia page into a dataframe

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore"
html_data = requests.get(url).text

In [3]:
temp_data = pd.read_html(html_data)

In [4]:
blr_data = pd.DataFrame()
for i in range (0,8):
    blr_data = pd.concat([blr_data, temp_data[i]], ignore_index=True)
blr_data

Unnamed: 0,Name,Image,Summary
0,Cantonment area,,The Cantonment area in Bangalore was used as a...
1,Domlur,,"Formerly part of the Cantonment area, Domlur h..."
2,Indiranagar,,Indiranagar is a sought-after residential and ...
3,Rajajinagar,,Established in 1949 on the birthday of C. Raja...
4,Malleswaram,,
...,...,...,...
60,Nandini Layout,,
61,Nayandahalli,,Nayandahalli is a transport junction in the we...
62,Rajajinagar,,
63,Rajarajeshwari Nagar,,Located in the south-western part of the city ...


In [5]:
blr_data.drop(['Image', 'Summary'], axis=1, inplace=True)
blr_data.rename(columns={'Name':"Neighborhood"}, inplace=True)
blr_data.at[0,'Neighborhood'] = "Bangalore Cantonment"
blr_data

Unnamed: 0,Neighborhood
0,Bangalore Cantonment
1,Domlur
2,Indiranagar
3,Rajajinagar
4,Malleswaram
...,...
60,Nandini Layout
61,Nayandahalli
62,Rajajinagar
63,Rajarajeshwari Nagar


### Getting the geographical coordinates

In [6]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initializing variable to None
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Bangalore, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [7]:
coords = [ get_latlng(neighborhood) for neighborhood in blr_data["Neighborhood"].tolist() ]
coords

[[12.975660000000062, 77.60542000000004],
 [12.943290000000047, 77.65602000000007],
 [13.030060000000049, 77.49526000000003],
 [13.005440000000021, 77.55693000000008],
 [13.00632005596653, 77.56839983128529],
 [12.966180000000065, 77.58690000000007],
 [13.014830000000075, 77.57771000000008],
 [12.993550000000027, 77.57988000000006],
 [12.987180000000023, 77.60398000000004],
 [12.989080000000058, 77.62795000000006],
 [12.99105000000003, 77.58855000000005],
 [12.927340000000072, 77.67169000000007],
 [12.978999697242791, 77.65613184800841],
 [12.99201000000005, 77.71506000000005],
 [13.000390000000039, 77.68368000000004],
 [12.994090000000028, 77.66633000000007],
 [12.954660000000047, 77.70752000000005],
 [12.943490000000054, 77.74701000000005],
 [12.975230000000067, 77.75238000000007],
 [13.019526511351998, 77.65502797845224],
 [13.026410000000055, 77.62437000000006],
 [13.038700000000063, 77.66192000000007],
 [12.968020000000024, 77.52114000000006],
 [13.014260000000036, 77.636740000000

In [8]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
blr_data['Latitude'] = df_coords['Latitude']
blr_data['Longitude'] = df_coords['Longitude']
blr_data

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Bangalore Cantonment,12.97566,77.60542
1,Domlur,12.94329,77.65602
2,Indiranagar,13.03006,77.49526
3,Rajajinagar,13.00544,77.55693
4,Malleswaram,13.00632,77.56840
...,...,...,...
60,Nandini Layout,13.01481,77.53891
61,Nayandahalli,12.94205,77.52100
62,Rajajinagar,13.00544,77.55693
63,Rajarajeshwari Nagar,12.93178,77.52668


In [9]:
# save the DataFrame as CSV file
blr_data.to_csv("blr_data.csv", index=False)

### Create a map of Bengaluru with neighborhoods superimposed on top

In [10]:
# get the coordinates of Bangalore
address = 'Bangalore, India'

geolocator = Nominatim(user_agent="bengaluru_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bengaluru, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bengaluru, India 12.9791198, 77.5912997.


In [11]:
# create map of Bengaluru using latitude and longitude values
map_blr = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(blr_data['Latitude'], blr_data['Longitude'], blr_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_blr)  
    
map_blr

### Using the Foursquare API to explore the neighborhoods

In [16]:
# define Foursquare Credentials and Version
CLIENT_ID = 'U3RQE2KIF2JZQVAGKDG3ME3BNOFGAWZYW2FCBN0CT1X4RFL3' # your Foursquare ID
CLIENT_SECRET = 'FRPSCLXHUNFSZFDCPLJIEZVWAWQGPFYUNDIJJ1FF4KNDHSKN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: U3RQE2KIF2JZQVAGKDG3ME3BNOFGAWZYW2FCBN0CT1X4RFL3
CLIENT_SECRET:FRPSCLXHUNFSZFDCPLJIEZVWAWQGPFYUNDIJJ1FF4KNDHSKN


Getting the top 100 venues that are within a radius of 2000 meters.

In [17]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(blr_data['Latitude'], blr_data['Longitude'], blr_data['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [18]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(3537, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Bangalore Cantonment,12.97566,77.60542,M.G Road Boulevard,12.975771,77.603979,Plaza
1,Bangalore Cantonment,12.97566,77.60542,Blossom Book House,12.975042,77.604813,Bookstore
2,Bangalore Cantonment,12.97566,77.60542,Hysteria,12.974843,77.605426,Music Store
3,Bangalore Cantonment,12.97566,77.60542,Coast 2 Coast,12.975305,77.605625,Indian Restaurant
4,Bangalore Cantonment,12.97566,77.60542,The 13th Floor,12.975364,77.604995,Lounge


Checking how many venues were returned for each neighborhood

In [19]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anjanapura,5,5,5,5,5,5
Arekere,82,82,82,82,82,82
BTM Layout,91,91,91,91,91,91
Banashankari,100,100,100,100,100,100
Banaswadi,56,56,56,56,56,56
...,...,...,...,...,...,...
Vidyaranyapura,7,7,7,7,7,7
Vijayanagar,9,9,9,9,9,9
Whitefield,44,44,44,44,44,44
Yelahanka,23,23,23,23,23,23


Finding out how many unique categories can be curated from all the returned venues

In [21]:
print('There are {} unique categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 212 unique categories.


In [22]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Plaza', 'Bookstore', 'Music Store', 'Indian Restaurant', 'Lounge',
       'Café', 'Burger Joint', 'American Restaurant', 'Ice Cream Shop',
       'Toy / Game Store', 'Afghan Restaurant', 'Deli / Bodega',
       'Italian Restaurant', 'Gym', 'Brewery', 'Park', 'Sushi Restaurant',
       'Cricket Ground', 'Breakfast Spot', 'Pub', 'Hotel', 'Cocktail Bar',
       'Cupcake Shop', 'Gym / Fitness Center', 'Chinese Restaurant',
       'Japanese Restaurant', 'Soccer Stadium', 'French Restaurant',
       'Shopping Mall', 'Coffee Shop', 'Andhra Restaurant',
       'Eastern European Restaurant', 'Tea Room', 'Asian Restaurant',
       'Department Store', 'Dessert Shop', 'Road', 'Hotel Bar',
       'Thai Restaurant', 'Arcade', 'Korean Restaurant', 'BBQ Joint',
       'Steakhouse', 'Bakery', 'Bed & Breakfast',
       'Mediterranean Restaurant', "Women's Store", 'Concert Hall',
       'Hockey Arena', 'Wine Bar'], dtype=object)

In [23]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

### Analyzing Each Neighborhood

In [24]:
# one hot encoding
blr_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
blr_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [blr_onehot.columns[-1]] + list(blr_onehot.columns[:-1])
blr_onehot = blr_onehot[fixed_columns]

print(blr_onehot.shape)
blr_onehot.head()

(3537, 213)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Track Stadium,Trail,Train Station,Travel & Transport,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bangalore Cantonment,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping rows by neighborhood and by taking the mean of the frequency of occurence of each category

In [25]:
blr_grouped = blr_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(blr_grouped.shape)
blr_grouped

(64, 213)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Track Stadium,Trail,Train Station,Travel & Transport,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,Anjanapura,0.0,0.0,0.000000,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.00
1,Arekere,0.0,0.0,0.012195,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.024390,0.0,0.0,0.00
2,BTM Layout,0.0,0.0,0.000000,0.010989,0.00,0.0,0.0,0.000000,0.010989,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.032967,0.0,0.0,0.00
3,Banashankari,0.0,0.0,0.000000,0.000000,0.01,0.0,0.0,0.000000,0.020000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.01
4,Banaswadi,0.0,0.0,0.000000,0.017857,0.00,0.0,0.0,0.017857,0.017857,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.035714,0.0,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,Vidyaranyapura,0.0,0.0,0.000000,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.00
60,Vijayanagar,0.0,0.0,0.000000,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.00
61,Whitefield,0.0,0.0,0.022727,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.022727,0.0,0.0,0.00
62,Yelahanka,0.0,0.0,0.043478,0.000000,0.00,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.00


In [26]:
len(blr_grouped[blr_grouped["Shopping Mall"] > 0])

26

Creating a new DataFrame for shopping Mall data only

In [27]:
blr_mall = blr_grouped[["Neighborhoods","Shopping Mall"]]
blr_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Anjanapura,0.0
1,Arekere,0.02439
2,BTM Layout,0.010989
3,Banashankari,0.02
4,Banaswadi,0.0


### Clustering Neighborhoods

In [28]:
# set number of clusters
kclusters = 5

blr_clustering = blr_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(blr_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 3, 2, 0, 3, 0, 0, 0, 3])

In [29]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
blr_merged = blr_mall.copy()

# add clustering labels
blr_merged["Cluster Labels"] = kmeans.labels_

In [30]:
blr_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
blr_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Anjanapura,0.0,0
1,Arekere,0.02439,2
2,BTM Layout,0.010989,3
3,Banashankari,0.02,2
4,Banaswadi,0.0,0


In [31]:
# merging blr_grouped with blr_data to add latitude/longitude for each neighborhood
blr_merged = blr_merged.join(blr_data.set_index("Neighborhood"), on="Neighborhood")

print(blr_merged.shape)
blr_merged.head()

(65, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.0,0,12.85811,77.5591
1,Arekere,0.02439,2,12.88567,77.59673
2,BTM Layout,0.010989,3,12.91495,77.60999
3,Banashankari,0.02,2,12.92231,77.56988
4,Banaswadi,0.0,0,13.019527,77.655028


In [32]:
# sort the results by Cluster Labels
print(blr_merged.shape)
blr_merged.sort_values(["Cluster Labels"], inplace=True)
blr_merged

(65, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.000000,0,12.85811,77.55910
27,Kalyan Nagar,0.000000,0,12.96802,77.52114
28,Kamakshipalya,0.000000,0,12.98699,77.52484
30,Kengeri,0.000000,0,12.90868,77.48718
31,Koramangala,0.000000,0,12.92004,77.62546
...,...,...,...,...,...
5,Bangalore Cantonment,0.010000,3,12.97566,77.60542
43,Nandini Layout,0.012500,3,13.01481,77.53891
46,Peenya,0.043478,4,13.03188,77.52654
57,Varthur,0.033333,4,12.94349,77.74701


Visualizing resulting clusters

In [33]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(blr_merged['Latitude'], blr_merged['Longitude'], blr_merged['Neighborhood'], blr_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining Clusters

Cluster 1

In [34]:
blr_merged.loc[blr_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.0,0,12.85811,77.5591
27,Kalyan Nagar,0.0,0,12.96802,77.52114
28,Kamakshipalya,0.0,0,12.98699,77.52484
30,Kengeri,0.0,0,12.90868,77.48718
31,Koramangala,0.0,0,12.92004,77.62546
62,Yelahanka,0.0,0,13.09931,77.59259
34,Kumaraswamy Layout,0.0,0,12.89819,77.55927
36,Madiwala,0.0,0,12.92052,77.6209
40,Marathahalli,0.0,0,12.95466,77.70752
26,Jayanagar,0.0,0,12.92868,77.5827


Cluster 2

In [35]:
blr_merged.loc[blr_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
25,Jalahalli,0.142857,1,13.0545,77.52658


Cluster 3

In [36]:
blr_merged.loc[blr_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
33,Krishnarajapuram,0.028571,2,13.00039,77.68368
20,Hoodi,0.021739,2,12.99201,77.71506
55,Ulsoor,0.02,2,12.98908,77.62795
3,Banashankari,0.02,2,12.92231,77.56988
45,Padmanabhanagar,0.029412,2,12.91818,77.55925
1,Arekere,0.02439,2,12.88567,77.59673
39,Malleswaram,0.02,2,13.00632,77.5684
61,Whitefield,0.022727,2,12.97523,77.75238
35,Lingarajapuram,0.029412,2,13.00548,77.62597
24,J. P. Nagar,0.02,2,12.90831,77.59024


Cluster 4

In [37]:
blr_merged.loc[blr_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
2,BTM Layout,0.010989,3,12.91495,77.60999
54,Shivajinagar,0.01,3,12.98718,77.60398
53,Seshadripuram,0.01,3,12.99355,77.57988
58,Vasanth Nagar,0.01,3,12.99105,77.58855
22,Hulimavu,0.015873,3,12.88063,77.60147
49,Rajajinagar,0.011494,3,13.00544,77.55693
49,Rajajinagar,0.011494,3,13.00544,77.55693
47,Pete area,0.01,3,12.96618,77.5869
9,Bellandur,0.011236,3,12.92734,77.67169
38,Mahalakshmi Layout,0.016667,3,13.01635,77.54481


Cluster 5

In [38]:
blr_merged.loc[blr_merged['Cluster Labels'] == 4]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
46,Peenya,0.043478,4,13.03188,77.52654
57,Varthur,0.033333,4,12.94349,77.74701
37,Mahadevapura,0.043478,4,12.99409,77.66633


### Observations

Most of the shopping malls are concentrated in the northern and eastern areas of Bengaluru, with the highest number in cluster 2 and moderate number in cluster 5 as well as cluster 3. Cluster 1 has little to no number of malls in its neighborhoods. This is a great opportunity and serves as high potential area to open new shopping malls as there is hardly any competition from existing malls. Meanwhile, shopping malls in cluster 1 and 5 have high competition and therefore its advisable to avoid these neighborhoods to invest or build new shopping malls. Thhis project thereby recommends property developers to capitalzize on these findings to open new shopping malls in neighborhoods in cluster 1. Propery Developers with unique selling propositions can also open new shopping malls in neighborhoods in cluster 3 & 4 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 and cluster 5 which already have high concentration of shopping malls and are suffering from intense competition