# NascentVenue.com


With the increasing number of venues in a city (restaurants, cafes, shopping malls, gyms, etc.), for a stakeholder who wants to establish a new venue of a particular category, there is an undying need to know about the distribution density of the existing venues of that category which he/she want to establish, across the various neighborhoods in that city. So, a web application is being developed that lets the user to select a city in the world and then a venue amongst the list of all the unique category of venues in that city to get various maps of neighborhoods which are clustered and color coded by k-means algorithm according to the density of the chosen category of venue.


In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!pip install geocoder
import geocoder # to get coordinates
import requests # library to handle requests
!pip install bs4
from bs4 import BeautifulSoup # library to parse HTML and XML documents
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
import folium # map rendering library
print("Libraries imported.")

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

# Getting the Data

In [110]:
data = {'City Name':['Hyderabad','Jaipur','Kolkata','Ahmedabad','Surat','Pune','Visakhapatnam','Delhi','Chennai','Mumbai','Bangalore','Lagos','Nairobi',\
                    'Hong Kong','Sydney','Singapore','Toronto','New York','Tokyo','Los Angeles','San Francisco','Berlin','Boston','Chicago','Shanghai','Karachi','Dallas','London'],\
        'Wiki_link':['https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Hyderabad','https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Jaipur',\
                    'https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Kolkata','https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad',\
                    'https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Surat','https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Pune',\
                    'https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam','https://en.wikipedia.org/wiki/Neighbourhoods_of_Delhi',\
                    'https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai','https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai',\
                    'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore','https://en.wikipedia.org/wiki/Category:Neighborhoods_of_Lagos',\
                     'https://en.wikipedia.org/wiki/Category:Suburbs_of_Nairobi','https://en.wikipedia.org/wiki/Districts_of_Hong_Kong',\
                     'https://en.wikipedia.org/wiki/List_of_Sydney_suburbs','https://en.wikipedia.org/wiki/List_of_places_in_Singapore',\
                    'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto','https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City',\
                     'https://en.wikipedia.org/wiki/Category:Neighborhoods_of_Tokyo','https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_in_Los_Angeles',\
                     'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_San_Francisco','https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin',
                     'https://en.wikipedia.org/wiki/Neighborhoods_in_Boston','https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago',\
                     'https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Shanghai','https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Karachi',\
                     'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Dallas','https://en.wikipedia.org/wiki/List_of_areas_of_London']}
df = pd.DataFrame(data)
pd.set_option("display.max_colwidth", None)
df=df.sort_values(by=['City Name']).reset_index(drop=True)
print('                                                            '+'Menu\n')
for i in range(df.shape[0]):
    print('                                                          ' +df.iloc[i,0])
cn=input("\n\n               Select a city from the above Menu - ")
while True:
    try:
        i=df[df['City Name']==cn].index.values
        data=df.iloc[i,1].iloc[0]
        break
    except:
        print("\n                    Sorry we are unservicable for the city you have selected")
        cn=input("\n    Please select a city from the above Menu - ")
#the GET request
data = requests.get(data).text
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
#list to store neighbourhood data
nList = []
if(cn in ['Mumbai']):
    try:
        tables=soup.find_all('table')
        for table in tables:
            rows = table.find_all('tr')

        for row in rows:
            cells = row.find_all('td')
            if len(cells) > 1:
                nei = cells[0]
                nList.append((nei.text)[0:-1])

    except:
        print("sorry we are unable to gather neighbourhood data for your selected city")
    
if(cn in ['Delhi']):
    try:
        for row in soup.find_all('td',class_="navbox-list navbox-odd")[0].findAll("li"):
            nList.append(row.text)
    except:
            print("sorry we are unable to gather neighbourhood data for your selected city")
if(cn in ['Ahmedabad','Nairobi','Surat','Visakhapatnam','Jaipur','Karachi','Kolkata'] ):
    try:
        for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
            nList.append(row.text)
    except:
            print("sorry we are unable to gather neighbourhood data for your selected city")
if(cn in ['Hyderabad','Toronto']):
    try:
        for row in soup.find_all('td',class_="navbox-list navbox-even hlist")[0].findAll("li"):
            nList.append(row.text)        
    except:
            print("sorry we are unable to gather neighbourhood data for your selected city")
if(cn in ['Chennai','Pune']):
    try:
        for row in soup.find_all('td',class_="navbox-list navbox-odd hlist")[0].findAll("li"):
            nList.append(row.text)
    except:
            print("sorry we are unable to gather neighbourhood data for your selected city")

# # a DataFrame from the list
Vn_df = pd.DataFrame({"Neighbourhood": nList})
print('\n                         Neighbourhoods in '+cn+'  are:\n')
for i in range(Vn_df.shape[0]):
    print('                                                          ' +Vn_df.iloc[i,0])
print("             There are total {} neighbourhoods in {} ".format(Vn_df.shape[0],cn))

                                                            Menu

                                                          Ahmedabad
                                                          Bangalore
                                                          Berlin
                                                          Boston
                                                          Chennai
                                                          Chicago
                                                          Dallas
                                                          Delhi
                                                          Hong Kong
                                                          Hyderabad
                                                          Jaipur
                                                          Karachi
                                                          Kolkata
                                                          Lagos
          



               Select a city from the above Menu -  Delhi



                         Neighbourhoods in Delhi  are:

                                                          Ashok Nagar
                                                          Ashok Vihar
                                                          Ashram Chowk
                                                          Ber Sarai
                                                          Chanakyapuri
                                                          Chandni Chowk
                                                          Chawri Bazar
                                                          Chittaranjan Park
                                                          Civil Lines
                                                          Connaught Place
                                                          Daryaganj
                                                          Dayanand Colony
                                                          Defence Colony
               

In [111]:
# a function to get coordinates
def ltlg(neighbourhood):
    g = geocoder.arcgis('{}, {}, India'.format(neighbourhood,cn))
    lt_lg_cds = g.latlng
    return lt_lg_cds
# calling function to get coordinates
cds = [ ltlg(neighbourhood) for neighbourhood in Vn_df["Neighbourhood"].tolist() ]
#temporary dataframe to store coordinates
df_cds = pd.DataFrame(cds, columns=['latitude', 'longitude'])
# merging coordinates into the original dataframe
Vn_df['latitude'] = df_cds['latitude']
Vn_df['longitude'] = df_cds['longitude']
print(Vn_df.shape)
Vn_df

(99, 3)


Unnamed: 0,Neighbourhood,latitude,longitude
0,Ashok Nagar,28.69223,77.30124
1,Ashok Vihar,28.69037,77.17609
2,Ashram Chowk,28.710598,77.326965
3,Ber Sarai,28.54954,77.18167
4,Chanakyapuri,28.59506,77.18573
5,Chandni Chowk,28.65627,77.23232
6,Chawri Bazar,28.64858,77.23071
7,Chittaranjan Park,28.5384,77.24832
8,Civil Lines,28.67671,77.21767
9,Connaught Place,28.63394,77.21968


In [112]:
# getting the coordinates of selected city
address = cn+',India,Asia'
geolocator = Nominatim(user_agent="V-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('\n\nThe geograpical coordinate of {}, India are {}, {}.'.format(cn,latitude, longitude))
# creating map of selected city
V_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers
for lt, lg, n in zip(Vn_df['latitude'], Vn_df['longitude'], Vn_df['Neighbourhood']):
    label = '{}'.format(n)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lt, lg],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(V_map)  
print("\n\nA map showing {} and it's neighbourhoods".format(cn))
display(V_map)




The geograpical coordinate of Delhi, India are 28.5359988, 77.2122279.


A map showing Delhi and it's neighbourhoods


# Use the foursquare API to explore the neighbourhoods

In [113]:
# let us now use the foursquare api to explore the neighbourhoods
CLIENT_ID = 'IMHQBHKWEFCWLOMDW2BKPUIF1ZRC3NCOMDLFQ0LHVSSJW2EB'

CLIENT_SECRET='I3JBASAILJQUWFK0LXBSGQYEH0MXSIH1TWTRL3SKJPXVBAQ3' # your Foursquare ID # your Foursquare Secret
VERSION = '20180605' # Foursquare API versionn
radius = 2000
limit = 500

venues = []

for lt, lg, n in zip(Vn_df['latitude'], Vn_df['longitude'], Vn_df['Neighbourhood']):
    
    # createing API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lt,
        lg,
        radius, 
        limit)
    
    # making GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # appending only relevant information for each venue
    for venue in results:
        venues.append((
            n,
            lt, 
            lg, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [125]:
# convert the venues list into a new DataFrame
vv_df = pd.DataFrame(venues)

# define the column names
vv_df.columns = ['Neighbourhood', 'latitude', 'longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print('\n\nTotal number of venues in {}: {}\n\n'.format(cn,vv_df.shape[0]))
display(vv_df.head(30))
print("Dimension of above table - ", vv_df.shape)
vv_df=Vn_df.join(vv_df.set_index('Neighbourhood'),on='Neighbourhood', lsuffix='', rsuffix='x')
vv_df.drop(['latitudex','longitudex'],axis=1,inplace=True)
vv_df['VenueName']=vv_df['VenueName'].replace(np.nan,'None')
vv_df[['VenueLatitude','VenueLongitude']]=vv_df[['VenueLatitude','VenueLongitude']].fillna(0)
vv_df[['VenueCategory']]=vv_df[['VenueCategory']].fillna('no popular venues')



Total number of venues in Delhi: 5024




Unnamed: 0,Neighbourhood,latitude,longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ashok Nagar,28.69223,77.30124,Sutta Chowk,28.697897,77.30001,Smoke Shop
1,Ashok Nagar,28.69223,77.30124,Haldirams Crossriver Mall,28.687241,77.293538,Indian Restaurant
2,Ashok Nagar,28.69223,77.30124,yamuna vihar,28.689816,77.283876,Park
3,Ashok Nagar,28.69223,77.30124,the gym,28.682996,77.315775,Gym
4,Ashok Nagar,28.69223,77.30124,Shivaji park,28.682657,77.285503,Park
5,Ashok Vihar,28.69037,77.17609,Major Dhyan Chand Sports Complex,28.684029,77.167487,Athletics & Sports
6,Ashok Vihar,28.69037,77.17609,Bellagio,28.696361,77.180021,Asian Restaurant
7,Ashok Vihar,28.69037,77.17609,Subway,28.696321,77.179983,Sandwich Place
8,Ashok Vihar,28.69037,77.17609,Rahul Egg Corner,28.68824,77.168599,Snack Place
9,Ashok Vihar,28.69037,77.17609,Subway.,28.695571,77.171964,Sandwich Place


Dimension of above table -  (5024, 7)



# Now Lets check how many venues were returned for each neighbourhood

In [115]:
t=vv_df.groupby(["Neighbourhood"]).count().reset_index() 
t2=t[["Neighbourhood",'VenueCategory']]
t2.rename(columns = {'VenueCategory':'Count of all venues'}, inplace = True)
print(t2.shape)
t2

(99, 2)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighbourhood,Count of all venues
0,Ashok Nagar,5
1,Ashok Vihar,25
2,Ashram Chowk,4
3,Ber Sarai,100
4,Chanakyapuri,76
5,Chandni Chowk,62
6,Chawri Bazar,100
7,Chittaranjan Park,97
8,Civil Lines,47
9,Connaught Place,100


# Let's find out how many unique categories can be curated from all the returned venues

In [126]:
print('\n\n\nThere are only {} unique categories of venues among the {} total number of venues.\n\n'.format(len(vv_df['VenueCategory'].unique()),vv_df.shape[0]))
print(vv_df['VenueCategory'].unique()) #displays all the unique category names
#count of each category
s=vv_df.pivot_table(index=['VenueCategory'], aggfunc='size')
s=s.to_frame().reset_index()
s.columns=['category of venue','count']
s=s.sort_values(['count'],ascending=False).reset_index(drop=True)
print("\n\nA table showing all unique categories of venues existing in {} and the total count of each category:\n".format(cn))
display(s)
vn=input("select a venue category from the above table")
while True:
    try:
        i=s[s['category of venue']==vn].index.values
        data=s.iloc[i,1].iloc[0]
        break
    except:
        print("\n                    Sorry we cannot find the venue that you have selected  ")
        vn=input("\n\n    Please select a category of venue from the above table - ")




There are only 211 unique categories of venues among the 5024 total number of venues.


['Smoke Shop' 'Indian Restaurant' 'Park' 'Gym' 'Athletics & Sports'
 'Asian Restaurant' 'Sandwich Place' 'Snack Place' 'Pizza Place'
 'Department Store' 'South Indian Restaurant' 'Fast Food Restaurant'
 'Coffee Shop' 'Market' 'Dessert Shop' 'Hobby Shop' 'Garden'
 'Train Station' 'Light Rail Station' 'Restaurant' 'Bakery' 'ATM'
 'Print Shop' 'Tourist Information Center' 'Art Gallery'
 'Mediterranean Restaurant' 'Café' 'Tibetan Restaurant' 'Tea Room'
 'Lounge' 'Ice Cream Shop' 'American Restaurant' 'Beer Garden'
 'Gourmet Shop' 'Historic Site' 'Food & Drink Shop' 'Tapas Restaurant'
 'Chinese Restaurant' 'Italian Restaurant' 'Jazz Club' 'Food Truck'
 'Scandinavian Restaurant' 'Turkish Restaurant' 'Pub' 'Donut Shop' 'Bar'
 'Middle Eastern Restaurant' 'Movie Theater' 'Moroccan Restaurant'
 'Bagel Shop' 'Hotel' 'History Museum' 'Clothing Store'
 'Mexican Restaurant' 'Stadium' 'Grocery Store' 'Nightlife

Unnamed: 0,category of venue,count
0,Indian Restaurant,577
1,Café,364
2,Coffee Shop,263
3,Hotel,259
4,Fast Food Restaurant,213
5,Pizza Place,192
6,Restaurant,169
7,Chinese Restaurant,155
8,Market,131
9,Bar,114


select a venue category from the above table Indian Restaurant


# Analyze each neighbourhood

In [127]:
# one hot encoding
vv_onehot = pd.get_dummies(vv_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vv_onehot['Neighbourhoods'] = vv_df['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [vv_onehot.columns[-1]] + list(vv_onehot.columns[:-1])
vv_onehot = vv_onehot[fixed_columns]

print(vv_onehot.shape)
vv_grouped = vv_onehot.groupby(["Neighbourhoods"]).sum().reset_index()
display(vv_grouped)

print("\n\nTtoal count of {}s - {}".format(vn,vv_grouped[vn].sum()))
print("\n\nThese {} {}s are confined to only {} neighbourhoods of the total {} neighbourhoods".\
      format(vv_grouped[vn].sum(),vn,len((vv_grouped[vv_grouped[vn] > 0])),(vv_grouped.shape[0])))
vm_df = vv_grouped[["Neighbourhoods",vn]] # Creating a dataframe for Chosen Venue Category data only

(5024, 212)


Unnamed: 0,Neighbourhoods,ATM,Accessories Store,Airport,Airport Food Court,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Campground,Candy Store,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Event Space,Fabric Shop,Falafel Restaurant,Farm,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Health Food Store,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multiplex,Museum,Music Store,Music Venue,Neighborhood,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Other Great Outdoors,Other Nightlife,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Print Shop,Pub,Public Art,Punjabi Restaurant,Racetrack,Recreation Center,Rental Car Location,Resort,Restaurant,River,Road,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Women's Store,Yoga Studio
0,Ashok Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ashok Vihar,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0
2,Ashram Chowk,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,Ber Sarai,0,0,0,0,0,2,0,0,2,0,0,2,0,0,0,0,1,2,0,3,0,1,0,0,0,1,0,0,0,0,0,0,0,0,13,0,0,0,2,1,1,5,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,12,0,0,0,2,0,1,0,0,0,0,0,0,0,2,4,2,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,2,0,0,0,0,0,0,8,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,2,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
4,Chanakyapuri,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,1,0,0,6,0,0,0,3,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,5,2,1,0,0,0,13,0,0,0,1,1,0,0,0,1,0,0,0,0,1,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,2,0,0,0,3,0,1,2,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,2,0,0,0,1,0,0,0,1,1,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
5,Chandni Chowk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,4,2,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,3,0,7,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,2,0,0,0,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0
6,Chawri Bazar,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,7,0,0,0,1,0,0,1,0,0,0,0,0,1,0,1,0,4,0,0,0,0,0,0,0,0,0,0,0,0,7,1,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,2,0,12,0,0,0,1,1,20,0,0,0,0,1,1,0,0,0,1,0,0,0,2,2,0,0,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0
7,Chittaranjan Park,0,0,0,0,0,0,0,0,0,0,0,2,0,1,0,3,0,4,0,3,0,0,1,0,1,0,0,0,0,0,0,0,0,0,5,0,0,0,4,0,0,7,0,0,0,2,1,0,0,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,4,1,0,1,0,2,0,1,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,3,0,8,0,0,0,3,1,0,0,0,0,0,0,0,0,2,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,6,0,0,0,1,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0
8,Civil Lines,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,3,1,0,1,0,1,3,0,0,0,3,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,0,2,1,0,0,0,0,0,0,0,6,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Connaught Place,0,0,0,0,0,0,0,1,2,0,0,2,0,0,0,1,0,3,0,4,0,1,0,0,1,0,0,0,1,0,0,0,0,0,8,0,0,0,3,1,0,3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,1,0,12,0,0,0,1,1,18,0,0,0,2,0,0,0,0,0,1,0,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0




Ttoal count of Indian Restaurants - 577


These 577 Indian Restaurants are confined to only 85 neighbourhoods of the total 99 neighbourhoods


# Now cluster the neighbourhoods

Run k-means to cluster the neighborhoods in Visakhapatnam into 4 clusters.

In [128]:
# set number of clusters
kclusters = 5
vm_clustering = vm_df.drop(["Neighbourhoods"], 1)
# run k-means clustering
kmeans = KMeans(init="k-means++", n_clusters=kclusters, n_init=12).fit(vm_clustering)
# check cluster labels generated for each row in the dataframe
qw=pd.DataFrame({"labels":kmeans.labels_,"count":kmeans.labels_})
qw.groupby(["labels"]).count()

Unnamed: 0_level_0,count
labels,Unnamed: 1_level_1
0,14
1,40
2,17
3,6
4,22


In [129]:
# create a new dataframe that includes the cluster as well 
vm_merged = vm_df.copy()
# add clustering labels
vm_merged["Cluster Labels"] = kmeans.labels_
vm_merged.rename(columns={"Neighbourhoods": "Neighbourhood",vn:("Total number of "+ vn+"s")}, inplace=True)
vm_merged

Unnamed: 0,Neighbourhood,Total number of Indian Restaurants,Cluster Labels
0,Ashok Nagar,1,1
1,Ashok Vihar,2,1
2,Ashram Chowk,0,1
3,Ber Sarai,12,2
4,Chanakyapuri,13,2
5,Chandni Chowk,11,2
6,Chawri Bazar,20,3
7,Chittaranjan Park,8,0
8,Civil Lines,2,1
9,Connaught Place,18,3


In [130]:
#Add latitude and longitude values by using the join operation(the new dataframe with the old dataframe containing the latitude and longitude values)
vm=Vn_df.join(vm_merged.set_index('Neighbourhood'), on='Neighbourhood')
# sorting the results by Cluster Labels
print(vm.shape)
vm.sort_values([("Total number of "+ vn+"s")], inplace=True)
vm=vm.reset_index(drop=True)
vm

(99, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Yamuna Vihar,28.70059,77.27212,0,1
1,Laxmi Nagar,28.63875,77.27592,0,1
2,West Patel Nagar,28.64783,77.16449,0,1
3,Najafgarh,28.6251,76.9974,0,1
4,Nizamuddin East,28.60124,77.264521,0,1
5,Noida,28.53342,77.3819,0,1
6,Delhi Cantonment,28.59151,77.12945,0,1
7,Okhla,28.53247,77.27839,0,1
8,Patel Nagar,28.64783,77.16449,0,1
9,Palam,28.59106,77.09117,0,1


In [131]:
# vm[vm[("Total number of "+ vn+"s")]==.index.values
i=vm[vm[("Total number of "+ vn+"s")]==vm[("Total number of "+ vn+"s")].max()].index.values
print("Table showing neighbourhood highest number of ",vn)
vm.iloc[i,[0,3]].reset_index(drop= True)


Table showing neighbourhood highest number of  Indian Restaurant


Unnamed: 0,Neighbourhood,Total number of Indian Restaurants
0,Paharganj,21


In [132]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+i**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [
    'red',
    'blue',
    'green',
    'black',
    'orange',
]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vm['latitude'], vm['longitude'], vm['Neighbourhood'], vm['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Now we visualize the resulting clusters

# Examine clusters

In [133]:
com=['low','medium-low','medium','meidum-high','high']
c=0
for u in vm['Cluster Labels'].unique():
    s=vm.loc[vm['Cluster Labels'] == u]
    print("\n\nMap showing {} neighbourhoods of cluster {}\n\nThese are the neighbourhoods of {} competition\n".format(s.shape[0],u,com[c]))
    c=c+1
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
    rainbow = ['red','blue','green','black','orange',]

# add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(s['latitude'], s['longitude'], s['Neighbourhood'], s['Cluster Labels']):
        label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster],
            fill=True,
            fill_color=rainbow[cluster],
            fill_opacity=0.7).add_to(map_clusters)
       
    display(map_clusters)
    print("\n The table below shows the details of neighbourhoods in the above map\n")
    display(s.reset_index(drop=True))
    




Map showing 40 neighbourhoods of cluster 1

These are the neighbourhoods of low competition




 The table below shows the details of neighbourhoods in the above map



Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Yamuna Vihar,28.70059,77.27212,0,1
1,Laxmi Nagar,28.63875,77.27592,0,1
2,West Patel Nagar,28.64783,77.16449,0,1
3,Najafgarh,28.6251,76.9974,0,1
4,Nizamuddin East,28.60124,77.264521,0,1
5,Noida,28.53342,77.3819,0,1
6,Delhi Cantonment,28.59151,77.12945,0,1
7,Okhla,28.53247,77.27839,0,1
8,Patel Nagar,28.64783,77.16449,0,1
9,Palam,28.59106,77.09117,0,1




Map showing 22 neighbourhoods of cluster 4

These are the neighbourhoods of medium-low competition




 The table below shows the details of neighbourhoods in the above map



Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Vasundhara Enclave,28.60015,77.31663,3,4
1,Vasant Kunj,28.53152,77.1502,3,4
2,Punjabi Bagh,28.66634,77.125,3,4
3,Nehru Place,28.60074,77.29248,3,4
4,Faridabad,28.483505,77.313725,3,4
5,Ghaziabad,28.67816,77.40861,3,4
6,Kirti Nagar,28.64821,77.14273,3,4
7,Vasant Vihar,28.56494,77.16131,4,4
8,Kalkaji,28.53662,77.26094,4,4
9,Rama Krishna Puram,28.56553,77.17719,4,4




Map showing 14 neighbourhoods of cluster 0

These are the neighbourhoods of medium competition




 The table below shows the details of neighbourhoods in the above map



Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Sarita Vihar,28.55038,77.28341,7,0
1,Saket,28.52407,77.20677,7,0
2,Munirka,28.55504,77.17132,7,0
3,Sarojini Nagar,28.5756,77.19364,8,0
4,Chittaranjan Park,28.5384,77.24832,8,0
5,Indirapuram,28.63951,77.36271,8,0
6,Jangpura,28.5834,77.24719,8,0
7,Shahpur Jat,28.54854,77.21393,9,0
8,Gulmohar Park,28.55439,77.21252,9,0
9,Lajpat Nagar,28.57026,77.247,10,0




Map showing 17 neighbourhoods of cluster 2

These are the neighbourhoods of meidum-high competition




 The table below shows the details of neighbourhoods in the above map



Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Hauz Khas,28.55109,77.20399,11,2
1,Rajendra Nagar,28.59075,77.22749,11,2
2,Kailash Colony,28.55613,77.2406,11,2
3,Chandni Chowk,28.65627,77.23232,11,2
4,Kotla Mubarakpur,28.57435,77.22419,12,2
5,Urdu Bazaar,28.648881,77.238692,12,2
6,Ber Sarai,28.54954,77.18167,12,2
7,Laxmibai Nagar,28.57815,77.20618,12,2
8,Old Delhi,28.65434,77.23258,12,2
9,Safdarjung,28.56583,77.19907,12,2




Map showing 6 neighbourhoods of cluster 3

These are the neighbourhoods of high competition




 The table below shows the details of neighbourhoods in the above map



Unnamed: 0,Neighbourhood,latitude,longitude,Total number of Indian Restaurants,Cluster Labels
0,Connaught Place,28.63394,77.21968,18,3
1,New Delhi,28.63095,77.21721,18,3
2,Palika Bazaar,28.63156,77.21959,18,3
3,Raisina Hill,28.6184,77.215481,18,3
4,Chawri Bazar,28.64858,77.23071,20,3
5,Paharganj,28.64596,77.21492,21,3
