# Week 5 Report

## Setting up a new resteraunt in Hyderabad, India

- Build a dataframe of neighborhoods in Hyderabad, India by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new restaurant

##### Import

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

!pip install geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

print("Libraries imported.")

Libraries imported.


##### Scrap Wikipedia

In [341]:
# send the GET request
data = requests.get("https://commons.wikimedia.org/wiki/Category:Suburbs_of_Hyderabad,_India").text

In [342]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [343]:
# create a list to store neighborhood data
neighborhoodList = []

In [344]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [345]:
# create a new DataFrame from the list
hb_df = pd.DataFrame({"Neighborhood": neighborhoodList})

hb_df.head()

Unnamed: 0,Neighborhood
0,"► Abids‎ (1 C, 12 F)"
1,"► Alwal‎ (1 C, 1 F)"
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)"
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)"
4,"► Banjara Hills‎ (2 C, 21 F)"


In [346]:
# print the number of rows of the dataframe
hb_df.shape

(50, 1)

##### Obatin coordinates

In [347]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [348]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in hb_df["Neighborhood"].tolist() ]

In [349]:
coords

[[17.389800000000037, 78.47658000000007],
 [17.535430000000076, 78.54427000000004],
 [17.434800000000052, 78.44953000000004],
 [17.37367000000006, 78.57104000000004],
 [17.42538312421386, 78.43498965940728],
 [17.40211000000005, 78.47770000000008],
 [17.445230000000038, 78.46202000000005],
 [17.536214413606626, 78.23504495152397],
 [17.4089385, 78.32673900000002],
 [17.40301000000005, 78.49793000000005],
 [17.4089385, 78.32673900000002],
 [17.36860000000007, 78.53515000000004],
 [17.409950000000038, 78.48229000000003],
 [17.45330000000007, 78.43035000000003],
 [17.431920000000048, 78.38558000000006],
 [17.522760000000062, 78.43862000000007],
 [17.49519217211755, 78.60749410384328],
 [17.389370000000042, 78.40420000000006],
 [17.334250000000054, 78.61262000000005],
 [17.447330000000022, 78.37872000000004],
 [17.399230000000045, 78.48073000000005],
 [17.36838000000006, 78.39999000000006],
 [17.427640000000054, 78.40830000000005],
 [17.386870000000044, 78.49553000000003],
 [17.40592000000

In [350]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [351]:
# merge the coordinates into the original dataframe
hb_df['Latitude'] = df_coords['Latitude']
hb_df['Longitude'] = df_coords['Longitude']

In [352]:
# check the neighborhoods and the coordinates
print(hb_df.shape)
hb_df

(50, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"► Abids‎ (1 C, 12 F)",17.3898,78.47658
1,"► Alwal‎ (1 C, 1 F)",17.53543,78.54427
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)",17.4348,78.44953
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)",17.37367,78.57104
4,"► Banjara Hills‎ (2 C, 21 F)",17.425383,78.43499
5,► Basheerbagh‎ (7 F),17.40211,78.4777
6,"► Begumpet‎ (5 C, 1 F)",17.44523,78.46202
7,"► Bolarum‎ (3 C, 1 F)",17.536214,78.235045
8,"► Cavalry Barracks, Hyderabad‎ (1 C)",17.408939,78.326739
9,► Chikkadpally‎ (7 F),17.40301,78.49793


In [353]:
# save the DataFrame as CSV file
hb_df.to_csv("hb_df.csv", index=False)

##### Make the map

In [354]:
# get the coordinates 
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad, India 17.3616079, 78.4746286.


In [355]:
# create map using latitude and longitude values
map_hb = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(hb_df['Latitude'], hb_df['Longitude'], hb_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hb)  
    
map_hb

In [356]:
map_hb.save('map_hb.html')


##### Foursqaure API to view the neighborhoods

In [357]:
# define Foursquare Credentials and Version
CLIENT_ID = '3C4I35AFHSLH5PFJT11M4SCEWTU0JNHSHA2HPZCYXOXUQRXJ' # your Foursquare ID
CLIENT_SECRET = '5RQ3TVGTRVP21COKNONJM53MVKTC120HH0WZP5NLCM0UWLZU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3C4I35AFHSLH5PFJT11M4SCEWTU0JNHSHA2HPZCYXOXUQRXJ
CLIENT_SECRET:5RQ3TVGTRVP21COKNONJM53MVKTC120HH0WZP5NLCM0UWLZU


##### Find the top 100 venues that are within a 2000 meter radius



In [358]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(hb_df['Latitude'], hb_df['Longitude'], hb_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
        # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']

    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))
        
        

In [359]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2244, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"► Abids‎ (1 C, 12 F)",17.3898,78.47658,Mayur Pan Shop,17.388894,78.480578,Juice Bar
1,"► Abids‎ (1 C, 12 F)",17.3898,78.47658,Pragati,17.388088,78.481134,South Indian Restaurant
2,"► Abids‎ (1 C, 12 F)",17.3898,78.47658,Santosh Dhaba,17.388485,78.479509,Indian Restaurant
3,"► Abids‎ (1 C, 12 F)",17.3898,78.47658,Taj Mahal Hotel,17.391942,78.476915,Hotel
4,"► Abids‎ (1 C, 12 F)",17.3898,78.47658,Famous Ice Cream,17.384321,78.474796,Ice Cream Shop


##### See how many categories are unique

In [360]:
print('There are {} categories that are unique.'.format(len(venues_df['VenueCategory'].unique())))


There are 151 categories that are unique.


In [361]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Juice Bar', 'South Indian Restaurant', 'Indian Restaurant',
       'Hotel', 'Ice Cream Shop', 'Bakery', 'Diner', 'Food Truck',
       'Shoe Store', 'Chaat Place', 'Neighborhood', 'Lounge',
       'Burger Joint', 'Café', 'Dessert Shop', 'Mobile Phone Shop',
       'Science Museum', 'Snack Place', 'Smoke Shop', 'Multiplex',
       'Fast Food Restaurant', 'Breakfast Spot', 'Stadium', 'Food',
       'Chinese Restaurant', 'Hotel Bar', 'Restaurant', 'Coffee Shop',
       'Department Store', 'Bar', 'Vegetarian / Vegan Restaurant',
       'Shopping Mall', 'Pizza Place', 'Gaming Cafe',
       'Performing Arts Venue', 'Indie Movie Theater',
       'Fried Chicken Joint', 'Farmers Market', 'Clothing Store',
       'Bus Station', 'Sporting Goods Shop', 'Tea Room', 'Golf Course',
       'Asian Restaurant', 'Pharmacy', 'ATM', 'Pub', 'Bookstore',
       'American Restaurant', 'Sandwich Place'], dtype=object)

In [362]:
# check if the results contain "Restaurant"
"Restaurant" in venues_df['VenueCategory'].unique()

True

##### Analyze Neighborhoods

In [363]:
# one hot encoding
hb_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hb_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hb_onehot.columns[-1]] + list(hb_onehot.columns[:-1])
hb_onehot = hb_onehot[fixed_columns]

print(hb_onehot.shape)
hb_onehot.head()

(2244, 152)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Hunan Restaurant,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Laser Tag,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Music Store,Neighborhood,New American Restaurant,Nightclub,North Indian Restaurant,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Racetrack,Rajasthani Restaurant,Recreation Center,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Taxi Stand,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theme Park,Tibetan Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Women's Store
0,"► Abids‎ (1 C, 12 F)",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"► Abids‎ (1 C, 12 F)",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"► Abids‎ (1 C, 12 F)",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"► Abids‎ (1 C, 12 F)",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"► Abids‎ (1 C, 12 F)",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


##### Group rows by neighborhood to find frequencies

In [364]:
hb_grouped = hb_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(hb_grouped.shape)
hb_grouped

(49, 152)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hotel Pool,Hunan Restaurant,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Laser Tag,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Music Store,Neighborhood,New American Restaurant,Nightclub,North Indian Restaurant,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Racetrack,Rajasthani Restaurant,Recreation Center,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Taxi Stand,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theme Park,Tibetan Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Women's Store
0,"► Abids‎ (1 C, 12 F)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.049383,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.012346,0.024691,0.037037,0.012346,0.049383,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.037037,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.061728,0.0,0.012346,0.0,0.0,0.012346,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.012346,0.0,0.0,0.0,0.0,0.061728,0.123457,0.0,0.012346,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.012346,0.012346,0.012346,0.012346,0.024691,0.0,0.024691,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0
1,"► Alwal‎ (1 C, 1 F)",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.04,0.03,0.01,0.07,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.01,0.0,0.0,0.01,0.0,0.01,0.13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"► Banjara Hills‎ (2 C, 21 F)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.03,0.0,0.0,0.0,0.07,0.0,0.04,0.01,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.03,0.0,0.0,0.0,0.0,0.0,0.03,0.09,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
5,► Basheerbagh‎ (7 F),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.04,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.02,0.0,0.0,0.02,0.0,0.04,0.11,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.08,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0
6,"► Begumpet‎ (5 C, 1 F)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.015873,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.047619,0.0,0.047619,0.047619,0.015873,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.079365,0.0,0.0,0.0,0.015873,0.0,0.031746,0.206349,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079365,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.063492,0.0,0.0
7,"► Bolarum‎ (3 C, 1 F)",0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"► Cavalry Barracks, Hyderabad‎ (1 C)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.235294,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,► Chikkadpally‎ (7 F),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.079365,0.0,0.0,0.015873,0.0,0.063492,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.063492,0.15873,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.063492,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.031746,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.015873,0.0,0.0


In [365]:
len(hb_grouped[hb_grouped["Restaurant"] > 0])


27

##### Make a new dataframe for Restaurants

In [366]:
hb_restaurant = hb_grouped[["Neighborhoods","Restaurant"]]

hb_restaurant.head()


Unnamed: 0,Neighborhoods,Restaurant
0,"► Abids‎ (1 C, 12 F)",0.037037
1,"► Alwal‎ (1 C, 1 F)",0.0
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)",0.01
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)",0.0
4,"► Banjara Hills‎ (2 C, 21 F)",0.01


##### Cluster the Neighborhoods

In [388]:
# set number of clusters
kclusters = 3

hb_clustering = hb_restaurant.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hb_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 0, 0, 1, 0, 0, 1, 0], dtype=int32)

In [389]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
hb_merged = hb_restaurant.copy()

# add clustering labels
hb_merged["Cluster Labels"] = kmeans.labels_

In [390]:
hb_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hb_merged.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,"► Abids‎ (1 C, 12 F)",0.037037,1
1,"► Alwal‎ (1 C, 1 F)",0.0,0
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)",0.01,0
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)",0.0,0
4,"► Banjara Hills‎ (2 C, 21 F)",0.01,0


In [391]:
# merge grouped with data to add latitude/longitude for each neighborhood
hb_merged = hb_merged.join(hb_df.set_index("Neighborhood"), on="Neighborhood")

print(hb_merged.shape)
hb_merged.head() # check the last columns!

(49, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,"► Abids‎ (1 C, 12 F)",0.037037,1,17.3898,78.47658
1,"► Alwal‎ (1 C, 1 F)",0.0,0,17.53543,78.54427
2,"► Ameerpet, Hyderabad‎ (2 C, 20 F)",0.01,0,17.4348,78.44953
3,"► Bandlaguda, Rangareddy‎ (1 C, 2 F)",0.0,0,17.37367,78.57104
4,"► Banjara Hills‎ (2 C, 21 F)",0.01,0,17.425383,78.43499


In [392]:
# sort the results by Cluster Labels
print(hb_merged.shape)
hb_merged.sort_values(["Cluster Labels"], inplace=True)
hb_merged

(49, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
16,"► Ghatkesar‎ (1 C, 2 F)",0.0,0,17.495192,78.607494
34,► Miyapur‎ (5 F),0.0,0,17.42102,78.58244
30,"► Malkajgiri‎ (3 C, 6 F)",0.0,0,17.439137,78.529172
29,"► Malakpet‎ (3 C, 2 F)",0.0,0,17.37491,78.51569
35,► Moazzam Jahi Market‎ (15 F),0.0,0,17.38448,78.47442
27,► L. B. Nagar‎ (11 F),0.0,0,17.352926,78.555107
26,► Kukatpally‎ (16 F),0.0,0,17.48735,78.42087
25,"► Koti, Hyderabad‎ (3 C, 6 F)",0.015152,0,17.38594,78.48338
47,► Somajiguda‎ (5 F),0.02,0,17.42073,78.46303
23,"► Kachiguda‎ (1 C, 3 F)",0.020408,0,17.38687,78.49553


In [393]:
  # create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hb_merged['Latitude'], hb_merged['Longitude'], hb_merged['Neighborhood'], hb_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [394]:
# save the map as HTML file
map_clusters.save('map_clusters.html')


##### Review

###### Cluster 0

In [395]:
hb_merged.loc[hb_merged['Cluster Labels'] == 0]


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
16,"► Ghatkesar‎ (1 C, 2 F)",0.0,0,17.495192,78.607494
34,► Miyapur‎ (5 F),0.0,0,17.42102,78.58244
30,"► Malkajgiri‎ (3 C, 6 F)",0.0,0,17.439137,78.529172
29,"► Malakpet‎ (3 C, 2 F)",0.0,0,17.37491,78.51569
35,► Moazzam Jahi Market‎ (15 F),0.0,0,17.38448,78.47442
27,► L. B. Nagar‎ (11 F),0.0,0,17.352926,78.555107
26,► Kukatpally‎ (16 F),0.0,0,17.48735,78.42087
25,"► Koti, Hyderabad‎ (3 C, 6 F)",0.015152,0,17.38594,78.48338
47,► Somajiguda‎ (5 F),0.02,0,17.42073,78.46303
23,"► Kachiguda‎ (1 C, 3 F)",0.020408,0,17.38687,78.49553


###### Cluster 1

In [396]:
hb_merged.loc[hb_merged['Cluster Labels'] == 1]


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
46,"► Sitaphalmandi‎ (1 C, 1 F)",0.058824,1,17.408939,78.326739
41,"► Old City (Hyderabad, India)‎ (8 C, 1 F)",0.040404,1,17.408398,78.47846
0,"► Abids‎ (1 C, 12 F)",0.037037,1,17.3898,78.47658
24,"► Khairtabad‎ (1 C, 2 F)",0.03,1,17.40592,78.45856
22,"► Jubilee Hills‎ (3 C, 7 F)",0.030928,1,17.42764,78.4083
20,► Hyderguda‎ (2 F),0.032258,1,17.39923,78.48073
18,"► HITEC City‎ (5 C, 28 F)",0.05,1,17.44733,78.37872
17,"► Golconda‎ (5 C, 3 F)",0.066667,1,17.38937,78.4042
14,"► Gachibowli‎ (4 C, 16 F)",0.03,1,17.43192,78.38558
12,► Domalguda‎ (3 C),0.047059,1,17.40995,78.48229


###### Cluster 2

In [397]:
hb_merged.loc[hb_merged['Cluster Labels'] == 2]


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
28,"► Madhapur‎ (1 C, 16 F)",0.094118,2,17.458598,78.368032
21,► Hydershakote‎ (14 F),0.125,2,17.36838,78.39999
42,► Sanathnagar‎ (8 F),0.102041,2,17.46518,78.37119
31,► Manikonda‎ (1 F),0.1,2,17.40144,78.39166
48,"► Trimulgherry‎ (1 C, 3 F)",0.090909,2,17.470719,78.504503


# Conclusion

The majority of restaurants are centered in cluster 2, in the outskirts of Hyderabad. Cluster 1 has the second highest number of resteraunts, while cluster 0, which is located in the middle of the city, does not have as many resteraunts. This opens up a great opportunity to build a new restaurant in the center of Hyderabad (cluster 0) due to the lack of competition in this area. More and more customers will want to come here as it is located in the center and requires less traveling. Cluster zero also has the most amount of neighborhoods, and more customers will come here as a result. The lack of competition will cause the restaurant to prosper. Cluster 2 has the most amount of restaurants even though it has the least amount of neighborhoods, which is unusual, and brings up a strong possibility that people from clusters 0 and 1 are traveling to the restaurants in cluster 2, in the outskirts of the city. To avoid constant traveling and avoid competition, the best place to set up a new resteraunt would be in cluster 0. Cluster 1 could also use a new restaurant as it does not seem to have too much competition as compared to cluster 2 and has more neighborhoods. Cluster 0 is in a higher need for the new restaurant, however.