## Business problem
With such a variance of foodies, getting into a good department store is a nice business for people interested. City centers being densely populated and lot of stores available that are costly for common man. Our problem now is to find a best suitable place in Chennai (preferably South and East part), which is home to many households of middle class. People go to different parts of the city to work from these areas. 

Creating a department store business in such a location will be of great use for the people and also for the stakeholders interested in getting good profit, as people coming from office and local households would be thronging to get the items of interest with low cost. Our task now is to identify the suitable location based on K-means clustering data from information gathered from data sources, cleaning it and analysing it, to get appropriate location for our business requirement.


We import the required libraries for use in this data exploration.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

!pip install geocoder
!pip install folium
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Initial load of the data, along with cleanup, with column renaming, gives us the below data, when we see the head of the dataset.

In [2]:
URL = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai"
tables = pd.read_html(URL,match="Area")
ch_df = pd.DataFrame(tables[0], columns=["Area", "Location", "Latitude", "Longitude"])
ch_df.rename(columns={"Area":"Neighborhood", "Location":"Borough"}, inplace=True)
ch_df.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,Adambakkam,South and East Chennai,12.988,80.2047
1,Adyar,South and East Chennai,13.0012,80.2565
2,Alandur,South and East Chennai,12.9975,80.2006
3,Alapakkam,West Chennai,13.049,80.1673
4,Alwarthirunagar,West Chennai,13.0426,80.184


Using geolocator, we get the cordinates to get latitude and longitude.

In [3]:
address = 'Chennai'

geolocator = Nominatim(user_agent="ch-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chennai is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chennai is 13.0836939, 80.270186.


Now Chennai map with folium shows this, with all the neighbourhoods.

In [4]:
map_ch = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(ch_df['Latitude'], ch_df['Longitude'], ch_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_ch)  
    
map_ch

Our interest is in the South and East of Chennai, so getting to that data here.

In [5]:
sec_data = ch_df[ch_df['Borough'] == 'South and East Chennai'].reset_index(drop=True)
sec_data = sec_data[['Borough','Neighborhood','Latitude','Longitude']]
sec_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,South and East Chennai,Adambakkam,12.988,80.2047
1,South and East Chennai,Adyar,13.0012,80.2565
2,South and East Chennai,Alandur,12.9975,80.2006
3,South and East Chennai,Besant Nagar,13.0003,80.2667
4,South and East Chennai,Chetpet,13.0714,80.2417


Now we use foursquare credentials and limit top 100 for our data to 500m radius.

In [6]:
# define Foursquare Credentials and Version
CLIENT_ID = 'JB0WXWMLJYDUJ2O3ZTF2UEUAINWKV2X44GLOL12YD0LRV15Z'  # your Foursquare ID
CLIENT_SECRET = '4ADSZPWXMIKB5IQNO0ZYNAFV42KBWNW0CODBA5Y3VKTPRUGG'  # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JB0WXWMLJYDUJ2O3ZTF2UEUAINWKV2X44GLOL12YD0LRV15Z
CLIENT_SECRET:4ADSZPWXMIKB5IQNO0ZYNAFV42KBWNW0CODBA5Y3VKTPRUGG


Define the function to getNearbyValues of the borough.

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Getting the nearby venues for the location of interest (South and East Chennai) with below code.

In [8]:
chennai_venues = getNearbyVenues(names=sec_data['Neighborhood'],
                                   latitudes=sec_data['Latitude'],
                                   longitudes=sec_data['Longitude']
                                  )

Adambakkam
Adyar
Alandur
Besant Nagar
Chetpet
Egmore
Gopalapuram
Guindy
Hastinapuram
Injambakkam
Irumbuliyur
Kadaperi
Keelkattalai
Kolappakkam
Kottivakkam
Kovilambakkam
Madipakkam
Mambakkam
Medavakkam
Mudichur
Mylapore
Nagalkeni
Nanganallur
Neelankarai
Palavakkam
Pallavaram
Pallikaranai
Pammal
Pazhavanthangal
Peerkankaranai
Perungalathur
Perungudi
Pozhichalur
Saidapet
Selaiyur
Sholinganallur
T. Nagar
Tambaram
Taramani
Teynampet
Thiruvanmiyur
Thoraipakkam
Thousand Lights
Triplicane
Vandalur
Varadharajapuram
Velachery
West Mambalam


Now quickly looking at the head of the dataset, we see this.

In [9]:
chennai_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adambakkam,12.988,80.2047,Venkateshwara Super Market,12.98632,80.205168,Department Store
1,Adambakkam,12.988,80.2047,Ibaco,12.988729,80.205646,Dessert Shop
2,Adambakkam,12.988,80.2047,Deepam Restaurant,12.98538,80.205281,Indian Restaurant
3,Adambakkam,12.988,80.2047,Shreeji Foods,12.985735,80.20253,Fast Food Restaurant
4,Adambakkam,12.988,80.2047,ibaco Adambakkam,12.987358,80.200504,Ice Cream Shop


After grouping the neighbourhood, to see the count of data in each of them, we get to this data.

In [10]:
chennai_venues.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adambakkam,5,5,5,5,5,5
Adyar,16,16,16,16,16,16
Alandur,6,6,6,6,6,6
Besant Nagar,20,20,20,20,20,20
Chetpet,12,12,12,12,12,12
Egmore,11,11,11,11,11,11
Gopalapuram,33,33,33,33,33,33
Guindy,16,16,16,16,16,16
Hastinapuram,4,4,4,4,4,4
Injambakkam,5,5,5,5,5,5


When we see the unique categories of the venue category, as taken from the foursquare api, we get this data.

In [11]:
print('There are {} uniques categories.'.format(len(chennai_venues['Venue Category'].unique())))

There are 100 uniques categories.


In [12]:
chennai_venues['Venue Category'].unique()

array(['Department Store', 'Dessert Shop', 'Indian Restaurant',
       'Fast Food Restaurant', 'Ice Cream Shop', 'Café', 'Pizza Place',
       'Middle Eastern Restaurant', 'Asian Restaurant', 'Sandwich Place',
       'Electronics Store', 'Restaurant', 'Arcade', 'Bus Station',
       'Breakfast Spot', 'Train Station', 'Metro Station', 'Juice Bar',
       'Bistro', 'Beach', 'Chinese Restaurant', 'Italian Restaurant',
       'Burger Joint', 'Coffee Shop', 'Snack Place',
       'Herbs & Spices Store', 'Concert Hall', 'Bakery',
       'Performing Arts Venue', "Women's Store", 'Thai Restaurant',
       'Farmers Market', 'Motel', 'Movie Theater', 'Theater',
       'Shopping Mall', 'Vegetarian / Vegan Restaurant', 'Pool Hall',
       'Bar', 'Hotel', 'African Restaurant', 'Tea Room',
       'South Indian Restaurant', 'Lounge', 'Russian Restaurant', 'Bank',
       'Mexican Restaurant', 'Arts & Crafts Store', 'Whisky Bar',
       'Gym Pool', 'Golf Course', 'Spa', 'Athletics & Sports',
       'Mol

In [13]:
"Department Store" in chennai_venues['Venue Category'].unique()

True

Now, its time to analyze each neighborhood, using onehot encoding.

In [14]:
# Analyzing each neighborhood
# one hot encoding
ch_onehot = pd.get_dummies(chennai_venues[['Venue Category']], prefix="", prefix_sep="")

# add area column back to dataframe
ch_onehot['Neighborhood'] = chennai_venues['Neighborhood'] 

# move area column to the first column
fixed_columns = [ch_onehot.columns[-1]] + list(ch_onehot.columns[:-1])
ch_onehot = ch_onehot[fixed_columns]

print(ch_onehot.shape)
ch_onehot.head()

(355, 101)


Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Bistro,Boarding House,Boutique,Bowling Alley,Breakfast Spot,Buffet,Burger Joint,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Golf Course,Grocery Store,Gym Pool,Herbs & Spices Store,Hotel,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Lounge,Luggage Store,Market,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Movie Theater,Music Store,Office,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Pool,Pool Hall,Portuguese Restaurant,Resort,Restaurant,Russian Restaurant,Sandwich Place,Seafood Restaurant,Shoe Repair,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Stadium,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Women's Store
0,Adambakkam,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Adambakkam,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Adambakkam,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Adambakkam,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adambakkam,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Grouping this data, based on mean and resetting the index gets us to this dataset.

In [15]:
ch_grouped = ch_onehot.groupby(["Neighborhood"]).mean().reset_index()

print(ch_grouped.shape)
ch_grouped.head()

(44, 101)


Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Bistro,Boarding House,Boutique,Bowling Alley,Breakfast Spot,Buffet,Burger Joint,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food Court,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Golf Course,Grocery Store,Gym Pool,Herbs & Spices Store,Hotel,Hyderabadi Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Lounge,Luggage Store,Market,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Movie Theater,Music Store,Office,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Pool,Pool Hall,Portuguese Restaurant,Resort,Restaurant,Russian Restaurant,Sandwich Place,Seafood Restaurant,Shoe Repair,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Stadium,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Women's Store
0,Adambakkam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adyar,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alandur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
3,Besant Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.1,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.15,0.2,0.0,0.05,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chetpet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.083333,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333


Let's print each neighborhood along with the top 5 most common venues in each of them.

In [16]:
num_top_venues = 5

for hood in ch_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ch_grouped[ch_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adambakkam----
                  venue  freq
0        Ice Cream Shop   0.2
1     Indian Restaurant   0.2
2      Department Store   0.2
3          Dessert Shop   0.2
4  Fast Food Restaurant   0.2


----Adyar----
               venue  freq
0  Indian Restaurant  0.19
1        Pizza Place  0.12
2               Café  0.12
3   Asian Restaurant  0.06
4        Bus Station  0.06


----Alandur----
                venue  freq
0   Indian Restaurant  0.33
1       Train Station  0.33
2      Breakfast Spot  0.17
3       Metro Station  0.17
4  Italian Restaurant  0.00


----Besant Nagar----
                venue  freq
0   Indian Restaurant  0.20
1      Ice Cream Shop  0.15
2           Juice Bar  0.10
3              Bistro  0.10
4  Italian Restaurant  0.05


----Chetpet----
              venue  freq
0     Women's Store  0.08
1            Bakery  0.08
2       Coffee Shop  0.08
3  Department Store  0.08
4              Café  0.08


----Egmore----
               venue  freq
0               Café  0.09
1

First, let's write a function to sort the venues in descending order.

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Then create a new data frame and display the top 10 venues for each of the neighbourhood.

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ch_grouped['Neighborhood']

for ind in np.arange(ch_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ch_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adambakkam,Ice Cream Shop,Indian Restaurant,Department Store,Dessert Shop,Fast Food Restaurant,Women's Store,Farmers Market,Clothing Store,Coffee Shop,Concert Hall
1,Adyar,Indian Restaurant,Café,Pizza Place,Asian Restaurant,Middle Eastern Restaurant,Restaurant,Electronics Store,Sandwich Place,Fast Food Restaurant,Dessert Shop
2,Alandur,Train Station,Indian Restaurant,Breakfast Spot,Metro Station,Women's Store,Farm,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
3,Besant Nagar,Indian Restaurant,Ice Cream Shop,Bistro,Juice Bar,Herbs & Spices Store,Snack Place,Café,Coffee Shop,Burger Joint,Beach
4,Chetpet,Women's Store,Bakery,Ice Cream Shop,Indian Restaurant,Department Store,Concert Hall,Coffee Shop,Café,Performing Arts Venue,Restaurant


When we check the number of department stores in the data, this is what we get.

In [19]:
len(ch_grouped[ch_grouped["Department Store"] > 0])

5

## Clustering the neighbourhoods

Using K-means clustering, we cluster neighbourhood to 4 clusters.

In [27]:
# set number of clusters
kclusters = 3

ch_clustering = ch_grouped.drop(["Neighborhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ch_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]


array([1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 0, 2, 1, 2, 2, 0],
      dtype=int32)

In [28]:

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#ch_merged = chennai_venues
ch_merged = sec_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
ch_merged = ch_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
ch_merged = ch_merged.dropna()
ch_merged # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,South and East Chennai,Adambakkam,12.988,80.2047,1.0,Ice Cream Shop,Indian Restaurant,Department Store,Dessert Shop,Fast Food Restaurant,Women's Store,Farmers Market,Clothing Store,Coffee Shop,Concert Hall
1,South and East Chennai,Adyar,13.0012,80.2565,2.0,Indian Restaurant,Café,Pizza Place,Asian Restaurant,Middle Eastern Restaurant,Restaurant,Electronics Store,Sandwich Place,Fast Food Restaurant,Dessert Shop
2,South and East Chennai,Alandur,12.9975,80.2006,2.0,Train Station,Indian Restaurant,Breakfast Spot,Metro Station,Women's Store,Farm,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
3,South and East Chennai,Besant Nagar,13.0003,80.2667,2.0,Indian Restaurant,Ice Cream Shop,Bistro,Juice Bar,Herbs & Spices Store,Snack Place,Café,Coffee Shop,Burger Joint,Beach
4,South and East Chennai,Chetpet,13.0714,80.2417,1.0,Women's Store,Bakery,Ice Cream Shop,Indian Restaurant,Department Store,Concert Hall,Coffee Shop,Café,Performing Arts Venue,Restaurant
5,South and East Chennai,Egmore,13.0732,80.2609,1.0,Farmers Market,Indian Restaurant,Café,Pool Hall,Motel,Asian Restaurant,Shopping Mall,Movie Theater,Vegetarian / Vegan Restaurant,Thai Restaurant
6,South and East Chennai,Gopalapuram,13.0489,80.2586,2.0,Indian Restaurant,Hotel,Chinese Restaurant,Café,Sandwich Place,Mexican Restaurant,Concert Hall,Lounge,Dessert Shop,Juice Bar
7,South and East Chennai,Guindy,13.0067,80.2206,1.0,Indian Restaurant,Gym Pool,Molecular Gastronomy Restaurant,Golf Course,Concert Hall,Lounge,Italian Restaurant,Athletics & Sports,Asian Restaurant,Bakery
8,South and East Chennai,Hastinapuram,12.9387,80.1461,1.0,Bakery,Burger Joint,Department Store,Smoke Shop,Women's Store,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop
9,South and East Chennai,Injambakkam,12.9198,80.2511,1.0,Café,Burger Joint,Art Gallery,Art Museum,Farm,Farmers Market,Clothing Store,Coffee Shop,Concert Hall,Convenience Store


In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ch_merged['Latitude'], ch_merged['Longitude'], ch_merged['Neighborhood'], ch_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining the clusters
First cluster in our data below is cluster 0.

In [30]:
ch_merged.loc[ch_merged['Cluster Labels'] == 0, ch_merged.columns[[1] + list(range(5, ch_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Kovilambakkam,ATM,Bus Station,Herbs & Spices Store,Farm,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop
21,Nagalkeni,ATM,Farmers Market,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop
32,Pozhichalur,ATM,Farmers Market,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop
34,Selaiyur,ATM,Burger Joint,Farmers Market,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop


Cluster 1 gives us the below data.

In [31]:
ch_merged.loc[ch_merged['Cluster Labels'] == 1, ch_merged.columns[[1] + list(range(5, ch_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adambakkam,Ice Cream Shop,Indian Restaurant,Department Store,Dessert Shop,Fast Food Restaurant,Women's Store,Farmers Market,Clothing Store,Coffee Shop,Concert Hall
4,Chetpet,Women's Store,Bakery,Ice Cream Shop,Indian Restaurant,Department Store,Concert Hall,Coffee Shop,Café,Performing Arts Venue,Restaurant
5,Egmore,Farmers Market,Indian Restaurant,Café,Pool Hall,Motel,Asian Restaurant,Shopping Mall,Movie Theater,Vegetarian / Vegan Restaurant,Thai Restaurant
7,Guindy,Indian Restaurant,Gym Pool,Molecular Gastronomy Restaurant,Golf Course,Concert Hall,Lounge,Italian Restaurant,Athletics & Sports,Asian Restaurant,Bakery
8,Hastinapuram,Bakery,Burger Joint,Department Store,Smoke Shop,Women's Store,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop
9,Injambakkam,Café,Burger Joint,Art Gallery,Art Museum,Farm,Farmers Market,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
10,Irumbuliyur,Coffee Shop,Asian Restaurant,Juice Bar,Restaurant,Women's Store,Farmers Market,Clothing Store,Concert Hall,Convenience Store,Cosmetics Shop
12,Keelkattalai,Bus Station,Electronics Store,Pizza Place,Fried Chicken Joint,Fast Food Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
14,Kottivakkam,Ice Cream Shop,Indian Restaurant,Arcade,Cosmetics Shop,Boarding House,Fast Food Restaurant,Women's Store,Farmers Market,Coffee Shop,Concert Hall
17,Mambakkam,IT Services,Stadium,Snack Place,Cafeteria,Restaurant,Women's Store,Electronics Store,Chinese Restaurant,Clothing Store,Coffee Shop


Cluster 2 data gives us below data.

In [32]:
ch_merged.loc[ch_merged['Cluster Labels'] == 2, ch_merged.columns[[1] + list(range(5, ch_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Adyar,Indian Restaurant,Café,Pizza Place,Asian Restaurant,Middle Eastern Restaurant,Restaurant,Electronics Store,Sandwich Place,Fast Food Restaurant,Dessert Shop
2,Alandur,Train Station,Indian Restaurant,Breakfast Spot,Metro Station,Women's Store,Farm,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
3,Besant Nagar,Indian Restaurant,Ice Cream Shop,Bistro,Juice Bar,Herbs & Spices Store,Snack Place,Café,Coffee Shop,Burger Joint,Beach
6,Gopalapuram,Indian Restaurant,Hotel,Chinese Restaurant,Café,Sandwich Place,Mexican Restaurant,Concert Hall,Lounge,Dessert Shop,Juice Bar
11,Kadaperi,Indian Restaurant,Jewelry Store,Restaurant,Women's Store,Farm,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
16,Madipakkam,Indian Restaurant,Grocery Store,Coffee Shop,Department Store,Women's Store,Farm,Chinese Restaurant,Clothing Store,Concert Hall,Convenience Store
18,Medavakkam,Vegetarian / Vegan Restaurant,Chinese Restaurant,Pizza Place,Indian Restaurant,Women's Store,Farm,Clothing Store,Coffee Shop,Concert Hall,Convenience Store
20,Mylapore,Vegetarian / Vegan Restaurant,Park,Buffet,Indian Restaurant,Fried Chicken Joint,Flea Market,Clothing Store,Music Store,Hotel,Arts & Crafts Store
22,Nanganallur,Indian Restaurant,Bakery,Grocery Store,Jewelry Store,Dessert Shop,Park,Farm,Chinese Restaurant,Clothing Store,Coffee Shop
23,Neelankarai,Indian Restaurant,Chinese Restaurant,Coffee Shop,American Restaurant,Pizza Place,Department Store,Café,Beach,Farm,Clothing Store


Cluster 3, which is the last cluster, gives an empty dataset.

In [33]:
ch_merged.loc[ch_merged['Cluster Labels'] == 3, ch_merged.columns[[1] + list(range(5, ch_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


From the different cluster categories, cluster 0 is having less number of big departmental store and candidate for new one in the area.