# Capstone Project - Opening a New Shopping Mall in Hyderabad, India
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Opening a Shopping Mall in Hyderabad, India](#introduction)
* Build a dataframe of neighborhoods in Hyderabad, India by web scraping the data from Wikipedia page(#data)
* Obtain the venue data for the neighborhoods from Foursquare API(#methodology)
* Explore and cluster the neighborhoods(#analysis)
* Select the best cluster to open a new shopping mall(#results)
* [Conclusion](#conclusion)

## Introduction: Opening a Shopping Mall in Hyderabad, India <a name="introduction"></a>

For many shoppers, visiting shopping malls is a great way to relax and enjoy themselves during weekends and holidays. They can do grocery shopping, dine at restaurants, shop at the various fashion outlets, watch movies and perform many more activities. Shopping malls are like a one-stop destination for all types of shoppers. For retailers, the central location and the large crowd at the shopping malls provides a great distribution channel to market their products and services. Property developers are also taking advantage of this trend to build more shopping malls to cater to the demand. As a result, there are many shopping malls in the city of Hyderabad and many more are being built. Opening shopping malls allows property developers to earn consistent rental income. Of course, as with any business decision, opening a new shopping mall requires serious consideration and is a lot more complicated than it seems. Particularly, the location of the shopping mall is one of the most important decisions that will determine whether the mall will be a success or a failure.

The objective of this capstone project is to analyze and select the best locations in the city of Hyderabad, India to open a new shopping mall. Using data science methodology and machine learning techniques like clustering, this project aims to provide solutions to answer the business question: In the city of Hyderabad, India, if a property developer is looking to open a new shopping mall, where would you recommend that they open it?

## Data <a name="data"></a>

To solve the problem, we will need the following data:

*List of neighborhoods in Hyderabad. This defines the scope of this project which is confined to the city of Hyderabad, India in Asia.

*Latitude and longitude coordinates of those neighborhoods. This is required in order to plot the map and also to get the venue data.

*Venue data, particularly data related to shopping malls. We will use this data to perform clustering on the neighborhoods.


## Sources of data and methods to extract them

This Wikipedia page(https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India) contains a list of neighborhoods in Hyderabad. We will use web scraping techniques to extract the data from the Wikipedia page, with the help of Python requests and beautifulsoup packages. Then we will get the geographical coordinates of the neighbourhoods using Python Geocoder package which will give us the latitude and longitude coordinates of the neighbourhoods.
After that, we will use Foursquare API to get the venue data for those neighbourhoods. Foursquare has one of the largest database of 105+ million places and is used by over 125,000 developers.
Foursquare API will provide many categories of the venue data, we are particularly interested in the Shopping Mall category in order to help us to solve the business problem put forward. This is a project that will make use of many data science skills, from web scraping (Wikipedia), working with API (Foursquare), data cleaning, data wrangling, to machine learning (K-means clustering) and map visualization (Folium). In the next section, we will present the Methodology section where we will discuss the steps taken in this project, the data analysis that we did and the machine learning technique that was used.


In [1]:
#pip install geocoder

In [2]:
#INSTALL IF REQUIREMENT NOT SATISFIED
#pip install -U beautifulsoup4

# IMPORTING LIBRARIES

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
pd.set_option("max_columns",None)
pd.set_option("max_rows",None)
from geopy.geocoders import Nominatim
import geocoder
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
print("LIBRARIES IMPORTED SUCCESSFULLY")

LIBRARIES IMPORTED SUCCESSFULLY


# SCARPPING DATA FROM WIKIPEDIA PAGE TO A DATAFRAME

In [4]:
#SENDING GET REQUEST
data=requests.get("https://en.wikipedia.org/w/index.php?title=Category:Neighbourhoods_in_Hyderabad,_India&pageuntil=Shivam+Road#mw-pages").text

In [5]:
# parse data from the html into a beautifulsoup object
soup=BeautifulSoup(data,"html.parser")

In [6]:
#CREATING A LIST TO STORE A DATA
neighborhood_list=[]

In [7]:
for row in soup.find_all("div",class_="mw-category")[0].findAll("li"):
    neighborhood_list.append(row.text)

In [8]:
hyd_data=pd.DataFrame({"Neighborhood":neighborhood_list})
hyd_data

Unnamed: 0,Neighborhood
0,A. S. Rao Nagar
1,A.C. Guards
2,Abhyudaya Nagar
3,Abids
4,Adibatla
5,Adikmet
6,Afzal Gunj
7,Aghapura
8,"Aliabad, Hyderabad"
9,Alijah Kotla


In [9]:
hyd_data.shape

(200, 1)

# GETTING GEOGRAPHICAL CO-ORDINATES

In [13]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [14]:
coords = [ get_latlng(neighborhood) for neighborhood in hyd_data["Neighborhood"].tolist()]

In [15]:
coords

[[17.411200000000065, 78.50824000000006],
 [17.393000949133675, 78.45689980427697],
 [17.337650000000053, 78.56414000000007],
 [17.389800000000037, 78.47658000000007],
 [17.235790000000065, 78.54132000000004],
 [17.410610000000077, 78.51513000000006],
 [17.37751000000003, 78.48005000000006],
 [17.38738496982723, 78.46699458034638],
 [17.34259000000003, 78.47626000000008],
 [17.36068000000006, 78.47998000000007],
 [17.503370000000075, 78.41602000000006],
 [17.535430000000076, 78.54427000000004],
 [17.385820000000024, 78.51836000000003],
 [17.53332000000006, 78.32529000000005],
 [17.435350000000028, 78.44861000000003],
 [17.45787000000007, 78.53882000000004],
 [17.40784000000002, 78.49150000000003],
 [17.385140000000035, 78.44738000000007],
 [17.369170000000054, 78.43683000000004],
 [17.40710000000007, 78.50233000000003],
 [17.372720000000072, 78.49047000000007],
 [17.38897000000003, 78.48681000000005],
 [17.39931000000007, 78.49964000000006],
 [17.339920000000063, 78.54553000000004],
 [

In [16]:
df_coords=pd.DataFrame(coords,columns=['Latitude','Longitude'])
df_coords

Unnamed: 0,Latitude,Longitude
0,17.4112,78.50824
1,17.393001,78.4569
2,17.33765,78.56414
3,17.3898,78.47658
4,17.23579,78.54132
5,17.41061,78.51513
6,17.37751,78.48005
7,17.387385,78.466995
8,17.34259,78.47626
9,17.36068,78.47998


In [17]:
hyd_data['Latitude']=df_coords['Latitude']
hyd_data['Longitude']=df_coords['Longitude']

In [18]:
hyd_data

Unnamed: 0,Neighborhood,Latitude,Longitude
0,A. S. Rao Nagar,17.4112,78.50824
1,A.C. Guards,17.393001,78.4569
2,Abhyudaya Nagar,17.33765,78.56414
3,Abids,17.3898,78.47658
4,Adibatla,17.23579,78.54132
5,Adikmet,17.41061,78.51513
6,Afzal Gunj,17.37751,78.48005
7,Aghapura,17.387385,78.466995
8,"Aliabad, Hyderabad",17.34259,78.47626
9,Alijah Kotla,17.36068,78.47998


In [19]:
hyd_data.to_csv("Hyd_data",index=False)

# CREATING A MAP FOR VISUALIZING NEIGHBORHOODS

In [57]:
address=' Hyderabad, India'
geolocator=Nominatim(user_agent='Hyd_explorer')
location=geolocator.geocode(address)
Latitude=location.latitude
Longitude=location.longitude
print(Latitude)
print(Longitude)

17.3616079
78.4746286


In [30]:
map_hyd=folium.Map(location=[Latitude,Longitude],zoom_start=11)

for lat,lng,neighborhood in zip(hyd_data['Latitude'],hyd_data['Longitude'],hyd_data['Neighborhood']):
    label='{}'.format(neighborhood)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_hyd)
map_hyd

In [31]:
map_hyd.save('map_hyd.html')

# METHODOLOGY

Firstly, we need to get the list of neighborhoods in the city of Hyderabad, India. Fortunately, the list is available in the Wikipedia 
(https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India). 

We will do web scraping using Python requests and beautifulsoup packages to extract the list of neighborhoods data. However, this is just a list of names. We need to get the geographical coordinates in the form of latitude and longitude in order to be able to use Foursquare API. To do so, we will use the wonderful Geocoder package that will allow us to convert address into geographical coordinates in the form of latitude and longitude. After gathering the data, we will populate the data into a pandas Data Frame and then visualize the neighborhoods in a map using Folium package. This allows us to perform a sanity check to make sure that the geographical coordinates data returned by Geocoder are correctly plotted in the city of Hyderabad, India.
Next, we will use Foursquare API to get the top 100 venues that are within a radius of 2000 meters. We need to register a Foursquare Developer Account in order to obtain the Foursquare ID and Foursquare secret key. We then make API calls to Foursquare passing in the geographical coordinates of the neighborhoods in a Python loop. Foursquare will return the venue data in JSON format and we will extract the venue name, venue category, venue latitude and longitude. With the data, we can check how many venues were returned for each neighborhood and examine how many unique categories can be curated from all the returned venues. Then, we will analyze each neighborhood by grouping the rows by neighborhood and taking the mean of the frequency of occurrence of each venue category. By doing so, we are also preparing the data for use in clustering. Since we are analyzing the “Shopping Mall” data, we will filter the “Shopping Mall” as venue category for the neighborhoods.
Lastly, we will perform clustering on the data by using k-means clustering. K-means clustering algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. It is one of the simplest and popular unsupervised machine learning algorithms and is particularly suited to solve the problem for this project. We will cluster the neighborhoods into 3 clusters based on their frequency of occurrence for “Shopping Mall”. The results will allow us to identify which neighborhoods have higher concentration of shopping malls while which neighborhoods have fewer number of shopping malls. Based on the occurrence of shopping malls in different neighborhoods, it will help us to answer the question as to which neighborhoods are most suitable to open new shopping malls.


### FOURSQUARE API

In [33]:
# define Foursquare Credentials and Version
CLIENT_ID = '5MBWV4OQ0FJFENHYJFXVMBZTKELQZVOSY1BUWWVJWO1XXXXX' # your Foursquare ID
CLIENT_SECRET = '3TQXJM2Z54LUCNEDKGPSQUI2KPEWSSAM10FELHMXIFYXXXXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5MBWV4OQ0FJFENHYJFXVMBZTKELQZVOSY1BUWWVJWO1XVBZW
CLIENT_SECRET:3TQXJM2Z54LUCNEDKGPSQUI2KPEWSSAM10FELHMXIFYQHQNL


In [43]:
radius=2000
LIMIT=100

venues=[]


for lat,lng,neighborhood in zip(hyd_data['Latitude'],hyd_data['Longitude'],hyd_data['Neighborhood']):
    url= "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lng,
        radius, 
        LIMIT)
    
    results=requests.get(url).json()["response"]["groups"][0]['items']
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            lng, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [60]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(6445, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,A. S. Rao Nagar,17.4112,78.50824,Bawarchi,17.406369,78.497662,Indian Restaurant
1,A. S. Rao Nagar,17.4112,78.50824,Sudharshan Theatre 35mm,17.40653,78.49515,Movie Theater
2,A. S. Rao Nagar,17.4112,78.50824,Subway,17.404173,78.51495,Sandwich Place
3,A. S. Rao Nagar,17.4112,78.50824,Devi 70 MM,17.406329,78.495409,Movie Theater
4,A. S. Rao Nagar,17.4112,78.50824,Baskin-Robbins,17.404311,78.510034,Ice Cream Shop


In [24]:
venues_df['VenueCategory'].value_counts()

Indian Restaurant                           882
Café                                        315
Fast Food Restaurant                        299
Hotel                                       267
Coffee Shop                                 258
Bakery                                      239
Pizza Place                                 211
Ice Cream Shop                              195
Restaurant                                  161
Chinese Restaurant                          156
Multiplex                                   139
Department Store                            125
Dessert Shop                                115
Movie Theater                               111
Snack Place                                  93
South Indian Restaurant                      92
Juice Bar                                    88
Sandwich Place                               87
Vegetarian / Vegan Restaurant                87
Asian Restaurant                             80
Breakfast Spot                          

In [61]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A. S. Rao Nagar,28,28,28,28,28,28
A.C. Guards,60,60,60,60,60,60
Abhyudaya Nagar,11,11,11,11,11,11
Abids,81,81,81,81,81,81
Adikmet,25,25,25,25,25,25
Afzal Gunj,45,45,45,45,45,45
Aghapura,56,56,56,56,56,56
"Aliabad, Hyderabad",10,10,10,10,10,10
Alijah Kotla,17,17,17,17,17,17
Allwyn Colony,17,17,17,17,17,17


In [19]:
len(venues_df['VenueCategory'].unique())

174

In [20]:

# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Indian Restaurant', 'Movie Theater', 'Sandwich Place',
       'Ice Cream Shop', 'Coffee Shop', 'Convenience Store', 'Café',
       'Asian Restaurant', 'Gym', 'Electronics Store', 'Food Court',
       'Shopping Mall', 'Light Rail Station', 'Bookstore', 'Dessert Shop',
       'Bakery', 'Hyderabadi Restaurant', 'Lounge', 'Juice Bar',
       'South Indian Restaurant', 'Bistro', 'Park', 'Science Museum',
       'Snack Place', 'Middle Eastern Restaurant', 'Stadium',
       'Vegetarian / Vegan Restaurant', 'Performing Arts Venue', 'Hotel',
       'Hotel Bar', 'Pizza Place', 'Fast Food Restaurant',
       'Mobile Phone Shop', 'Fried Chicken Joint', 'Department Store',
       'Hookah Bar', 'Clothing Store', 'Restaurant', 'Train Station',
       'Chinese Restaurant', 'Paper / Office Supplies Store',
       'Bus Station', 'Fruit & Vegetable Store', 'Food Truck',
       'Shoe Store', 'Diner', 'Neighborhood', 'Burger Joint',
       'Chaat Place', 'Smoke Shop', 'Breakfast Spot', 'Bar', 'Food

In [21]:
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

In [65]:
venues_df.to_csv("venues_hyd_df",index=False)

# Analyzing Each Neighborhood

In [36]:
#ONEHOTENCODING
hyd_onehot=pd.get_dummies(venues_df[['VenueCategory']],prefix="",prefix_sep="")


In [37]:
#ASSIGNING NEIGHBORHOOD COLUMN TO hyd_onehot
hyd_onehot['Neighborhoods']=venues_df['Neighborhood']
#MOVING NEIGHBORHOOD COLUMN TO FIRST PLACE
fixed_columns = [hyd_onehot.columns[-1]] + list(hyd_onehot.columns[:-1])
hyd_onehot = hyd_onehot[fixed_columns]
#PRINTING SIZE AND DATA
print(hyd_onehot.shape)
hyd_onehot.head()

(6445, 175)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Candy Store,Castle,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,New American Restaurant,Night Market,Nightclub,North Indian Restaurant,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Rajasthani Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Tech Startup,Temple,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Zoo
0,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [38]:
#group the neighborhood column
hyd_grouped=hyd_onehot.groupby(['Neighborhoods']).mean().reset_index()
print(hyd_grouped.shape)
hyd_grouped.head()

(198, 175)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Candy Store,Castle,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,New American Restaurant,Night Market,Nightclub,North Indian Restaurant,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Rajasthani Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Tech Startup,Temple,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Zoo
0,A. S. Rao Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,A.C. Guards,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.066667,0.016667,0.0,0.0,0.033333,0.033333,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.016667,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0
2,Abhyudaya Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Abids,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.024691,0.0,0.0,0.012346,0.024691,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.012346,0.0,0.049383,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.037037,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.061728,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.049383,0.012346,0.0,0.0,0.0,0.061728,0.123457,0.0,0.012346,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.012346,0.012346,0.0,0.012346,0.012346,0.024691,0.0,0.0,0.024691,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Adikmet,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
len(hyd_grouped[hyd_grouped['Shopping Mall'] > 0])

59

In [40]:
hyd_mall = hyd_grouped[["Neighborhoods","Shopping Mall"]]

In [41]:
hyd_mall.head(10)

Unnamed: 0,Neighborhoods,Shopping Mall
0,A. S. Rao Nagar,0.035714
1,A.C. Guards,0.0
2,Abhyudaya Nagar,0.0
3,Abids,0.012346
4,Adikmet,0.0
5,Afzal Gunj,0.022222
6,Aghapura,0.017857
7,"Aliabad, Hyderabad",0.0
8,Alijah Kotla,0.0
9,Allwyn Colony,0.0


In [42]:
hyd_mall.to_csv("hyd_mall",index=False)

# CLUSTERING THE NEIGHBORHOODS BY APPLYING KMEANS

In [44]:
#SETTING NUMBER OF CLUSTERS
k=3
#DROPING NEIGHBORHOOD COLUMN
k_clustering=hyd_mall.drop(['Neighborhoods'],1)

In [45]:
k_clustering.head()

Unnamed: 0,Shopping Mall
0,0.035714
1,0.0
2,0.0
3,0.012346
4,0.0


In [46]:
#APPLYING KMEANS
Kmeans=KMeans(n_clusters=k,random_state=0).fit(k_clustering)
#CHECK CLUSTER LABELS
Kmeans.labels_[0:10]

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [48]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
hyd_merged = hyd_mall.copy()
# add clustering labels
hyd_merged["Cluster Labels"] = Kmeans.labels_

In [49]:
hyd_merged

Unnamed: 0,Neighborhoods,Shopping Mall,Cluster Labels
0,A. S. Rao Nagar,0.035714,1
1,A.C. Guards,0.0,0
2,Abhyudaya Nagar,0.0,0
3,Abids,0.012346,0
4,Adikmet,0.0,0
5,Afzal Gunj,0.022222,0
6,Aghapura,0.017857,0
7,"Aliabad, Hyderabad",0.0,0
8,Alijah Kotla,0.0,0
9,Allwyn Colony,0.0,0


In [50]:
hyd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hyd_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,A. S. Rao Nagar,0.035714,1
1,A.C. Guards,0.0,0
2,Abhyudaya Nagar,0.0,0
3,Abids,0.012346,0
4,Adikmet,0.0,0


In [52]:
hyd_merged=hyd_merged.join(hyd_data.set_index("Neighborhood"),on="Neighborhood")
print(hyd_merged.shape)
hyd_merged.head()

(198, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,A. S. Rao Nagar,0.035714,1,17.4112,78.50824
1,A.C. Guards,0.0,0,17.393001,78.4569
2,Abhyudaya Nagar,0.0,0,17.33765,78.56414
3,Abids,0.012346,0,17.3898,78.47658
4,Adikmet,0.0,0,17.41061,78.51513


In [53]:
# sort the results by Cluster Labels
print(hyd_merged.shape)
hyd_merged.sort_values(["Cluster Labels"], inplace=True)
hyd_merged

(198, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
98,Kondapur,0.0,0,17.4666,78.35685
126,Manikonda,0.0,0,17.40139,78.39163
127,Marredpally,0.0,0,17.44777,78.50873
128,Masab Tank,0.01,0,17.40093,78.45362
129,Meerpet–Jillelguda,0.0,0,17.32964,78.53303
130,"Mehboob ki Mehendi, Hyderabad",0.0,0,17.362015,78.470795
131,Mehdipatnam,0.0,0,17.39263,78.44219
132,Mettuguda,0.0,0,17.42774,78.52892
133,"Minister Road, Hyderabad",0.0,0,17.432718,78.484523
134,Mir Alam Tank,0.0,0,17.355107,78.454118


# FINALLY LETS VISUALIZE THE RESULTING CLUSTERS

In [63]:
# create map
map_clusters = folium.Map(location=[Latitude,Longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lng, neigh, cluster in zip(hyd_merged['Latitude'], hyd_merged['Longitude'], hyd_merged['Neighborhood'], hyd_merged['Cluster Labels']):
    label = folium.Popup(str(neigh) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [59]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

# EXAMINING THE CLUSTERS

In [60]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
98,Kondapur,0.0,0,17.4666,78.35685
126,Manikonda,0.0,0,17.40139,78.39163
127,Marredpally,0.0,0,17.44777,78.50873
128,Masab Tank,0.01,0,17.40093,78.45362
129,Meerpet–Jillelguda,0.0,0,17.32964,78.53303
130,"Mehboob ki Mehendi, Hyderabad",0.0,0,17.362015,78.470795
131,Mehdipatnam,0.0,0,17.39263,78.44219
132,Mettuguda,0.0,0,17.42774,78.52892
133,"Minister Road, Hyderabad",0.0,0,17.432718,78.484523
134,Mir Alam Tank,0.0,0,17.355107,78.454118


In [61]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
11,Amberpet,0.058824,1,17.38582,78.51836
53,Dilsukhnagar,0.055556,1,17.36857,78.53515
99,"Kothapet, Hyderabad",0.071429,1,17.36883,78.54229
87,"Kamala Nagar, Hyderabad",0.05,1,17.36561,78.53305
138,Moosapet,0.090909,1,17.46705,78.42858
102,Kukatpally,0.1,1,17.48735,78.42087
91,Karwan,0.076923,1,17.37907,78.43668
0,A. S. Rao Nagar,0.035714,1,17.4112,78.50824
106,Lab quarters,0.056604,1,17.4907,78.392
163,Parsigutta,0.037037,1,17.41663,78.51093


In [62]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
59,Ferozguda,0.166667,2,17.474121,78.426397
17,Attapur,0.2,2,17.36917,78.43683
90,Karmanghat,0.2,2,17.34061,78.53258
81,Jalal Baba Nagar,0.166667,2,17.35442,78.43255


# RESULTS

• Cluster 0: Neighbourhoods with very less number of shopping malls

• Cluster 1: Neighbourhoods with a moderate concentration of shopping malls

• Cluster 2: Neighbourhoods with a high concentration of shopping malls

# CONCLUSION

Most of the shopping malls are concentrated in the central area of Hyderabad city, with the highest number in cluster 2 and moderate number in cluster 1. On the other hand, cluster 0 has  low number shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little  competition from existing malls. 

Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. 

Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little  competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 1 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.