In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy -y
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 -y
import folium # map rendering library

! pip install lxml html5lib beautifulsoup4

print('Libraries imported.')




Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    openssl-1.1.1j             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be

# Capstone Project - The Battle of the Neighborhoods_Project
### Course: Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project , the objective is to try to find a set of hot-spot locations for a hang-out themed Snack bar. This report will be targeted to those stakeholders who are interested in opening a chain of  **Hang-out themed Snack bar** in **Chennai**, India.

The definiton of 'Hang-out themed Snack Bar' here is a place where people unwind in the evening/weekends or after a long day at work/schools/colleges for tasty snacks/foods(light evening foods) and fun time with their friends/family. The place is designed as an alternative to cater to the same crowd who visits any of these following venues- cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlour,Pizza Bar,Fast Food Center,Sandwidch Shop,Bakery,Chaat Corner,Food Truck,Tea Stall.We will try to detect **locations that have demand for such hang-out places in the Chennai City** . The rationale behind is to come up with a list of potential locations where there are already venues which provides similar experiences through the above mentioned venues and know there is a demand(audience) for these **hang-out themed Snack bars** . This way these new hang-out themed snack bars could provide as a viable alternative catering to these segments highlighting the  **wholesome experience it brings along with it**. 

Using FourSquare APIs , the list of hot-spot tier-1 locations(with high demand for these hang-out bars)  and the next-best tier-2(Moderate demand) locations across Chennai.Focus is to first identify the hot-regions(burroughs with high demand for these venues) in Chennai for setting up a chain of these **hang-out themed snack bars**  and then provide the neighborhood clusters in them so as to make informed decision . In real-time scenario, some more deep-dive analysis needs to be done by considering real-estate prices,rent prices, transport and the landmarks nearby(offices/schools/colleges)- hence as part of this project we would effectively narrow down high-demand pocket areas within Chennai in which potentially a chain of hang-ot themed snack bars could be opened .

## Data <a name="data"></a>

Based on definition of our problem, critical factor that will influence the decision is:
* number of existing  venues as mentioned in this list(cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlour,Pizza Bar,Fast Food Center,Sandwidch Shop,Bakery,Chaat Corner,Food Truck,Tea Stall) in the neighborhood and corresponding region(burroughs) 
 

Following data sources will be needed to extract/generate the required information:
* geospatial coordinates of the Chennai Neighborbood along with the regions from this wiki page - Following data sources will be needed to extract/generate the required information:https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**


## Methodology <a name="methodology"></a>


Comment-1: Get the list of Chennai neighborhoods along with the corresponding region from this wiki page and load that into a dataframe. 


In [3]:
# Webpage url                                                                                                               
url = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai'

# Extract tables
chennai_wiki = pd.read_html(url)
                                                                                                        
chennai_list = chennai_wiki[1] # get the  table  continaing zipcodes from the page
print('Chennai Neighorhoods and their Region are loaded into dataframe')
chennai_list.rename(columns={"Area":"Neighborhood","Location":"Region"},inplace=True)
chennai_list.head()



Chennai Neighorhoods and their Region are loaded into dataframe


Unnamed: 0,Neighborhood,Region,Latitude,Longitude
0,Adambakkam,South and East Chennai,12.988,80.2047
1,Adyar,South and East Chennai,13.0012,80.2565
2,Alandur,South and East Chennai,12.9975,80.2006
3,Alapakkam,West Chennai,13.049,80.1673
4,Alwarthirunagar,West Chennai,13.0426,80.184


In [4]:
print('The dataframe has {} Region and {} Neighborhood.'.format(
        len(chennai_list['Region'].unique()),
        chennai_list.shape[0]
    )
)

The dataframe has 8 Region and 176 Neighborhood.


In [5]:

#using geolocator to get the coordinates of Chennai
address = 'Chennai, India'
geolocator = Nominatim(user_agent="chennai-nei")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Chennai, India {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chennai, India 13.0836939, 80.270186.


Comment 2: Map of Chennai Neighborhood

In [165]:
map_ch = folium.Map(location=[latitude, longitude], zoom_start=12)
# Adding markers to map
for lat, lng, neighborhood in zip(chennai_list['Latitude'],  chennai_list['Longitude'], chennai_list['Neighborhood']):
 label = '{}'.format(neighborhood)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch)
map_ch

In [166]:
CLIENT_ID = 'FYRYFNPWI233M4NHY1U4L3NKC4QIRK4YFRBQ00H3MT3CYUUZ' # your Foursquare ID
CLIENT_SECRET = 'JBY25GHOQYVR1BPJIHE4RIFSAVK5DB4K0VIUMM0V1UCEL2EQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FYRYFNPWI233M4NHY1U4L3NKC4QIRK4YFRBQ00H3MT3CYUUZ
CLIENT_SECRET:JBY25GHOQYVR1BPJIHE4RIFSAVK5DB4K0VIUMM0V1UCEL2EQ


Comment 3: Get the list of venues from FourSquare API for Chennai Neighborhood

In [167]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [168]:
# type your answer here
chennai_neig_venues = getNearbyVenues(names=chennai_list['Neighborhood'],
                                   latitudes=chennai_list['Latitude'],
                                   longitudes=chennai_list['Longitude']
                                  )
chennai_neig_venues.shape
chennai_neig_venues

Adambakkam
Adyar
Alandur
Alapakkam
Alwarthirunagar
Ambattur
Aminjikarai
Anna Nagar
Annanur
Arumbakkam
Ashok Nagar
Avadi
Ayappakkam
Basin Bridge
Besant Nagar
Broadway
Central
Chetpet
Choolai
MMDA Colony
Defence Colony
Egmore
Ennore
Erukanchery
George Town
Gerugambakkam
Gopalapuram
Guindy
Hastinapuram
ICF Colony
Injambakkam
Irumbuliyur
Iyyapanthangal
Jamalia
K.K. Nagar
Kadaperi
Kallikuppam
Karambakkam
Kathirvedu
Kathivakkam
Keelkattalai
Kodungaiyur
Kolappakkam
Kolathur
Korattur
Korukkupet
Kosapet
Kottivakkam
Kovilambakkam
Koyambedu
Kundrathur
Lakshmipuram
M.G.R. Garden
M.G.R. Nagar
M.K.B. Nagar
Madhavaram
Madhavaram Milk Colony
Madipakkam
Maduravoyal
Mambakkam
Manali
Manali New Town
manapakkam
Mangadu
Manjambakkam
Mannadi
Mathur MMDA
Medavakkam
Minjur
Mogappair
Moolakadai
Mowlivakkam
Mudichur
Mugalivakkam
Mylapore
Nagalkeni
Nandambakkam
Nanganallur
Naravarikuppam
Neelankarai
Nerkundrum
Nesapakkam
New Washermenpet
Nolambur
Old Washermenpet
Oragadam
Otteri
Padi
Palavakkam
Pallavaram
Pallik

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adambakkam,12.988,80.2047,Venkateshwara Super Market,12.98632,80.205168,Department Store
1,Adambakkam,12.988,80.2047,Ibaco,12.988729,80.205646,Dessert Shop
2,Adambakkam,12.988,80.2047,Deepam Restaurant,12.98538,80.205281,Indian Restaurant
3,Adambakkam,12.988,80.2047,Kings Gym Unisex Fitness Centre,12.991093,80.206011,Gym
4,Adambakkam,12.988,80.2047,ibaco Adambakkam,12.987358,80.200504,Ice Cream Shop


In [169]:
chennai_neig_venues.shape


(891, 7)

Comment 4: Merge the latitude,longitude values with the venue details obtained from foursquare into a single dataframe

In [75]:
chennai_venues= chennai_neig_venues.join(chennai_list.set_index('Neighborhood'), on='Neighborhood')
chennai_venues = chennai_venues[ ['Neighborhood','Region'] + [ col for col in chennai_venues.columns if (col != 'Neighborhood' and col != 'Region') ] ]
chennai_venues.head()



Unnamed: 0,Neighborhood,Region,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Latitude,Longitude
0,Adambakkam,South and East Chennai,12.988,80.2047,Venkateshwara Super Market,12.98632,80.205168,Department Store,12.988,80.2047
1,Adambakkam,South and East Chennai,12.988,80.2047,Ibaco,12.988729,80.205646,Dessert Shop,12.988,80.2047
2,Adambakkam,South and East Chennai,12.988,80.2047,Deepam Restaurant,12.98538,80.205281,Indian Restaurant,12.988,80.2047
3,Adambakkam,South and East Chennai,12.988,80.2047,Kings Gym Unisex Fitness Centre,12.991093,80.206011,Gym,12.988,80.2047
4,Adambakkam,South and East Chennai,12.988,80.2047,ibaco Adambakkam,12.987358,80.200504,Ice Cream Shop,12.988,80.2047


Comment 5: Filtering the dataframe to focus on the 'hangout' venues- with the following venue categories - Neighborhood|Region|Cafe|Coffee|Snack|Dessert|Juice|Ice|Pizza|Fast|Sand|Bakery|Chaat|Truck|Tea

In [117]:
# one hot encoding
chennai_onehot = pd.get_dummies(chennai_venues[['Venue Category']], prefix="", prefix_sep="")
chennai_onehot['Neighborhood'] = chennai_venues['Neighborhood']
chennai_onehot['Region'] = chennai_venues['Region']
chennai_onehot = chennai_onehot[ ['Neighborhood'] +['Region'] + [ col for col in chennai_onehot.columns if ((col != 'Neighborhood') and (col != 'Region') ) ] ]

chennai_snack_venues=chennai_onehot.filter(regex='Neighborhood|Region|Cafe|Coffee|Snack|Dessert|Juice|Ice|Pizza|Fast|Sand|Bakery|Chaat|Truck|Tea')

chennai_snack_venues.head()


Unnamed: 0,Neighborhood,Region,Bakery,Cafeteria,Chaat Place,Coffee Shop,Dessert Shop,Fast Food Restaurant,Food Truck,Gaming Cafe,Ice Cream Shop,Juice Bar,Pizza Place,Sandwich Place,Snack Place,Tea Room
0,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Adambakkam,South and East Chennai,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,1,0,0,0,0,0


Comment 6: Group by Regions and indetify the 'most happening ' regions . This is to identify the demand where hangouts can be established.

In [118]:
chennai_grouped = chennai_snack_venues.groupby('Region').sum().reset_index()
chennai_grouped['Total']= chennai_grouped.sum(axis=1)
chennai_grouped

Unnamed: 0,Region,Bakery,Cafeteria,Chaat Place,Coffee Shop,Dessert Shop,Fast Food Restaurant,Food Truck,Gaming Cafe,Ice Cream Shop,Juice Bar,Pizza Place,Sandwich Place,Snack Place,Tea Room,Total
0,North Chennai,5,0,0,3,1,6,1,0,5,3,10,3,4,0,41
1,Northern Suburbs of Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,South and East Chennai,10,1,1,10,7,10,1,0,10,8,9,5,5,1,78
3,Southern-Eastern Suburbs of Chennai,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,Suburban Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Suburbs along ECR and OMR of Chennai,0,0,0,3,2,2,0,0,3,0,2,1,0,0,13
6,West Chennai,9,0,0,4,2,13,0,1,9,2,11,4,3,3,61
7,Western Suburbs of Chennai,0,0,0,0,0,0,0,0,0,0,1,0,0,2,3


Commen 7 : Focus on 3 regions - 'North', 'South East' and 'West' as these are most-happening regions

In [331]:
chennai_tar_north = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'North Chennai']
chennai_tar_south_east = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'South and East Chennai']
chennai_tar_west = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'West Chennai']

chennai_tar_north.head()

Unnamed: 0,Neighborhood,Region,Bakery,Cafeteria,Chaat Place,Coffee Shop,Dessert Shop,Fast Food Restaurant,Food Truck,Gaming Cafe,Ice Cream Shop,Juice Bar,Pizza Place,Sandwich Place,Snack Place,Tea Room
105,Basin Bridge,North Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
127,Broadway,North Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
128,Broadway,North Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
129,Broadway,North Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
130,Broadway,North Chennai,0,0,0,0,0,0,0,0,0,0,0,1,0,0


Comment 8: Create 'total_hangouts'- sum of all the hangout places in each neighborhood of these 3 regions 

In [332]:
chennai_tar_north_grouped = chennai_tar_north.groupby('Neighborhood').sum().reset_index()
chennai_tar_south_east_grouped = chennai_tar_south_east.groupby('Neighborhood').sum().reset_index()
chennai_tar_west_grouped = chennai_tar_west.groupby('Neighborhood').sum().reset_index()

chennai_tar_north_grouped['Total_hangouts']= chennai_tar_north_grouped.sum(axis=1)
chennai_tar_south_east_grouped['Total_hangouts']= chennai_tar_south_east_grouped.sum(axis=1)
chennai_tar_west_grouped['Total_hangouts']= chennai_tar_west_grouped.sum(axis=1)


chennai_tar_north_hangouts = chennai_tar_north_grouped[['Neighborhood','Total_hangouts']]
chennai_tar_south_east_hangouts = chennai_tar_south_east_grouped[['Neighborhood','Total_hangouts']]
chennai_tar_west_hangouts = chennai_tar_west_grouped[['Neighborhood','Total_hangouts']]



Comment 9: Create 3 clusters one for each region.
           Then identify neighborhoods in each region with the most hangouts. This would give a better decision point to identify the 'demand' for such places

In [220]:
# set number of clusters
kclusters = 5

chennai_tar_north_hangouts_clusters = chennai_tar_north_hangouts.drop('Neighborhood', 1)
chennai_tar_south_east_hangouts_clusters = chennai_tar_south_east_hangouts.drop('Neighborhood', 1)
chennai_tar_west_hangouts_clusters = chennai_tar_west_hangouts.drop('Neighborhood', 1)

# run k-means clustering
kmeans_north = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_north_hangouts_clusters)
kmeans_south_east = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_south_east_hangouts_clusters)
kmeans_west = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_west_hangouts_clusters)
# check cluster labels generated for each row in the dataframe
kmeans_north.labels_[0:48]
kmeans_south_east.labels_[0:46]
kmeans_west.labels_[0:41]

# add clustering labels
columns = ['Neighborhood','Total_hangouts']

neighborhoods_venues_sorted_north = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_south_east = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_west = pd.DataFrame(columns=columns)

neighborhoods_venues_sorted_north['Neighborhood'] = chennai_tar_north_hangouts['Neighborhood']
neighborhoods_venues_sorted_north['Total_hangouts'] = chennai_tar_north_hangouts['Total_hangouts']
neighborhoods_venues_sorted_south_east['Neighborhood'] = chennai_tar_south_east_hangouts['Neighborhood']
neighborhoods_venues_sorted_south_east['Total_hangouts'] = chennai_tar_south_east_hangouts['Total_hangouts']
neighborhoods_venues_sorted_west['Neighborhood'] = chennai_tar_west_hangouts['Neighborhood']
neighborhoods_venues_sorted_west['Total_hangouts'] = chennai_tar_west_hangouts['Total_hangouts']

neighborhoods_venues_sorted_north.insert(0, 'Cluster Labels', kmeans_north.labels_)
neighborhoods_venues_sorted_south_east.insert(0, 'Cluster Labels', kmeans_south_east.labels_)
neighborhoods_venues_sorted_west.insert(0, 'Cluster Labels', kmeans_west.labels_)

chennai_tar_north_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'North Chennai']
chennai_tar_south_east_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'South and East Chennai']
chennai_tar_west_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'West Chennai']

chennai_tar_north_hangouts_merged = chennai_tar_north_hangouts_merged.join(neighborhoods_venues_sorted_north.set_index('Neighborhood'), on='Neighborhood',how='right')
chennai_tar_south_east_hangouts_merged = chennai_tar_south_east_hangouts_merged.join(neighborhoods_venues_sorted_south_east.set_index('Neighborhood'), on='Neighborhood',how='right')
chennai_tar_west_hangouts_merged = chennai_tar_west_hangouts_merged.join(neighborhoods_venues_sorted_west.set_index('Neighborhood'), on='Neighborhood',how='right')

chennai_tar_north_hangouts_merged = chennai_tar_north_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_north_hangouts_merged.columns if col != 'Neighborhood' ] ]
chennai_tar_south_east_hangouts_merged = chennai_tar_south_east_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_south_east_hangouts_merged.columns if col != 'Neighborhood' ] ]
chennai_tar_west_hangouts_merged = chennai_tar_west_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_west_hangouts_merged.columns if col != 'Neighborhood' ] ]
print(chennai_tar_north_hangouts_merged.shape,chennai_tar_south_east_hangouts_merged.shape, chennai_tar_west_hangouts_merged.shape)
chennai_tar_north_hangouts_merged

(48, 6) (46, 6) (41, 6)


Unnamed: 0,Neighborhood,Region,Latitude,Longitude,Cluster Labels,Total_hangouts
13,Basin Bridge,North Chennai,13.1014,80.2704,0,0
15,Broadway,North Chennai,19.2274,72.9739,2,8
16,Central,North Chennai,13.0825,80.2755,3,1
18,Choolai,North Chennai,13.0919,80.2642,3,1
22,Ennore,North Chennai,13.2146,80.3203,0,0
24,George Town,North Chennai,13.0969,80.2865,3,1
33,Jamalia,North Chennai,13.1048,80.2533,0,0
39,Kathivakkam,North Chennai,13.2046,80.31674,0,0
41,Kodungaiyur,North Chennai,13.14096,80.24818,0,0
43,Kolathur,North Chennai,13.124,80.2121,0,0


In [221]:
# create map
map_clusters_north = folium.Map(location=[13.0827, 80.2707], zoom_start=11)
map_clusters_south_east = folium.Map(location=[13.0827, 80.2707], zoom_start=11)
map_clusters_west = folium.Map(location=[13.0827, 80.2707], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chennai_tar_north_hangouts_merged['Latitude'], chennai_tar_north_hangouts_merged['Longitude'], chennai_tar_north_hangouts_merged['Neighborhood'], chennai_tar_north_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_north)
    
for lat, lon, poi, cluster in zip(chennai_tar_south_east_hangouts_merged['Latitude'], chennai_tar_south_east_hangouts_merged['Longitude'], chennai_tar_south_east_hangouts_merged['Neighborhood'], chennai_tar_south_east_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_south_east)
    
for lat, lon, poi, cluster in zip(chennai_tar_west_hangouts_merged['Latitude'], chennai_tar_west_hangouts_merged['Longitude'], chennai_tar_west_hangouts_merged['Neighborhood'], chennai_tar_west_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_west)       


Comment 10: Create 3 cluster maps within these 3 regions

In [222]:
map_clusters_north


In [223]:
map_clusters_south_east


In [224]:
map_clusters_west


In [241]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster0 = pd.DataFrame(columns=columns)
chennai_north_cluster0=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 0, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
13,Basin Bridge,0
22,Ennore,0
33,Jamalia,0
39,Kathivakkam,0
41,Kodungaiyur,0
43,Kolathur,0
45,Korukkupet,0
51,Lakshmipuram,0
54,M.K.B. Nagar,0
55,Madhavaram,0


In [275]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster1 = pd.DataFrame(columns=columns)
chennai_north_cluster1=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 1, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
113,Royapuram,3


In [276]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster2 = pd.DataFrame(columns=columns)
chennai_north_cluster2=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 2, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
15,Broadway,8


In [244]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster3 = pd.DataFrame(columns=columns)
chennai_north_cluster3=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 3, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
16,Central,1
18,Choolai,1
24,George Town,1
46,Kosapet,1
65,Mannadi,1
78,Naravarikuppam,1
82,New Washermenpet,1
84,Old Washermenpet,1
86,Otteri,1
92,Park Town,1


In [245]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster4 = pd.DataFrame(columns=columns)
chennai_north_cluster4=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 4 , chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster4

Unnamed: 0,Neighborhood,Total_hangouts
56,Madhavaram Milk Colony,2
70,Moolakadai,2
85,Oragadam,2
99,Perambur,2
123,T.V.K. Nagar,2


In [246]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster0 = pd.DataFrame(columns=columns)
chennai_south_east_cluster0=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 0, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
0,Adambakkam,2
27,Guindy,2
40,Keelkattalai,2
47,Kottivakkam,2
57,Madipakkam,3
67,Medavakkam,2
77,Nanganallur,3
79,Neelankarai,2
119,Sholinganallur,2
144,Velachery,3


In [247]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster1 = pd.DataFrame(columns=columns)
chennai_south_east_cluster1=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 1, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
2,Alandur,0
21,Egmore,0
28,Hastinapuram,0
30,Injambakkam,0
35,Kadaperi,0
48,Kovilambakkam,0
72,Mudichur,0
75,Nagalkeni,0
89,Pallavaram,0
97,Pazhavanthangal,0


In [248]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster2 = pd.DataFrame(columns=columns)
chennai_south_east_cluster2=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 2, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
1,Adyar,9
14,Besant Nagar,9
26,Gopalapuram,8


In [249]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster3 = pd.DataFrame(columns=columns)
chennai_south_east_cluster3=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 3, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
17,Chetpet,5
122,T. Nagar,5
125,Taramani,5
135,Thousand Lights,4


In [250]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster4 = pd.DataFrame(columns=columns)
chennai_south_east_cluster4=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 4, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster4

Unnamed: 0,Neighborhood,Total_hangouts
31,Irumbuliyur,1
59,Mambakkam,1
74,Mylapore,1
88,Palavakkam,1
91,Pammal,1
102,Perungudi,1
124,Tambaram,1
134,Thoraipakkam,1


In [251]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster0 = pd.DataFrame(columns=columns)
chennai_west_cluster0=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 0, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
5,Ambattur,0
8,Annanur,0
11,Avadi,0
20,Defence Colony,0
25,Gerugambakkam,0
37,Karambakkam,0
49,Koyambedu,0
19,MMDA Colony,0
58,Maduravoyal,0
63,Mangadu,0


In [252]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster1 = pd.DataFrame(columns=columns)
chennai_west_cluster1=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 1, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
34,K.K. Nagar,4
53,M.G.R. Nagar,3
81,Nesapakkam,3
139,Vadapalani,4


In [253]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster2 = pd.DataFrame(columns=columns)
chennai_west_cluster2=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 2, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
4,Alwarthirunagar,6
7,Anna Nagar,8
10,Ashok Nagar,6


In [254]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster3 = pd.DataFrame(columns=columns)
chennai_west_cluster3=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 3, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
3,Alapakkam,2
6,Aminjikarai,2
32,Iyyapanthangal,2
44,Korattur,2
50,Kundrathur,2
83,Nolambur,2
107,Pudur,2
115,Saligramam,2
62,manapakkam,2


In [298]:
chennai_north_cluster1_neighborhoods=chennai_north_cluster1['Neighborhood'].tolist()
chennai_north_cluster2_neighborhoods=chennai_north_cluster2['Neighborhood'].tolist()
chennai_west_cluster1_neighborhoods=chennai_west_cluster1['Neighborhood'].tolist()
chennai_west_cluster2_neighborhoods=chennai_west_cluster2['Neighborhood'].tolist()
chennai_south_east_cluster2_neighborhoods=chennai_south_east_cluster2['Neighborhood'].tolist()
chennai_south_east_cluster3_neighborhoods=chennai_south_east_cluster3['Neighborhood'].tolist()
cols = ['Neighborhood', 'Region', 'Cluster Labels']

chennai_tar_north_hangouts_merged.set_index('Neighborhood')
chennai_tar_south_east_hangouts_merged.set_index('Neighborhood')
chennai_tar_west_hangouts_merged.set_index('Neighborhood')

chennai_north_cluster1_neighborhood_loc = chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Neighborhood'].isin(chennai_north_cluster1_neighborhoods)]
chennai_north_cluster2_neighborhood_loc = chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Neighborhood'].isin(chennai_north_cluster2_neighborhoods)]
chennai_north_cluster1_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_north_cluster1_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_north_cluster1_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_north_cluster1_neighborhood_loc = chennai_north_cluster1_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_north_cluster1_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_north_cluster2_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_north_cluster2_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_north_cluster2_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_north_cluster2_neighborhood_loc = chennai_north_cluster2_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_north_cluster2_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]

chennai_west_cluster1_neighborhood_loc = chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Neighborhood'].isin(chennai_west_cluster1_neighborhoods)]
chennai_west_cluster2_neighborhood_loc = chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Neighborhood'].isin(chennai_west_cluster2_neighborhoods)]
chennai_west_cluster1_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_west_cluster1_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_west_cluster1_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_west_cluster1_neighborhood_loc = chennai_west_cluster1_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_west_cluster1_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_west_cluster2_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_west_cluster2_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_west_cluster2_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_west_cluster2_neighborhood_loc = chennai_west_cluster2_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_west_cluster2_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]

chennai_south_east_cluster2_neighborhood_loc = chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Neighborhood'].isin(chennai_south_east_cluster2_neighborhoods)]
chennai_south_east_cluster3_neighborhood_loc = chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Neighborhood'].isin(chennai_south_east_cluster3_neighborhoods)]
chennai_south_east_cluster2_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_south_east_cluster2_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_south_east_cluster2_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_south_east_cluster2_neighborhood_loc = chennai_south_east_cluster2_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_south_east_cluster2_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_south_east_cluster3_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_south_east_cluster3_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_south_east_cluster3_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_south_east_cluster3_neighborhood_loc = chennai_south_east_cluster3_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_south_east_cluster3_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_south_east_cluster3_neighborhood_loc

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/use

Unnamed: 0,Neighborhood,Neighbourhood-Region-Cluster,Latitude,Longitude,Total_hangouts
17,Chetpet,Chetpet_South and East Chennai_3_Cluster,13.0714,80.2417,5
122,T. Nagar,T. Nagar_South and East Chennai_3_Cluster,13.0418,80.2341,5
125,Taramani,Taramani_South and East Chennai_3_Cluster,12.9863,80.2432,5
135,Thousand Lights,Thousand Lights_South and East Chennai_3_Cluster,13.0617,80.2544,4


In [299]:
temp_data_list_tier1 = [chennai_north_cluster2_neighborhood_loc,chennai_south_east_cluster2_neighborhood_loc,chennai_west_cluster2_neighborhood_loc]  
temp_data_list_tier2 =[chennai_south_east_cluster3_neighborhood_loc,chennai_west_cluster1_neighborhood_loc,chennai_north_cluster1_neighborhood_loc]
# List of your dataframes
chennai_potential_tier1_locations = pd.concat(temp_data_list_tier1).sort_values(by='Total_hangouts',ascending=False)
chennai_potential_tier2_locations = pd.concat(temp_data_list_tier2).sort_values(by='Total_hangouts',ascending=False)

In [301]:
chennai_potential_tier2_locations

Unnamed: 0,Neighborhood,Neighbourhood-Region-Cluster,Latitude,Longitude,Total_hangouts
17,Chetpet,Chetpet_South and East Chennai_3_Cluster,13.0714,80.2417,5
122,T. Nagar,T. Nagar_South and East Chennai_3_Cluster,13.0418,80.2341,5
125,Taramani,Taramani_South and East Chennai_3_Cluster,12.9863,80.2432,5
135,Thousand Lights,Thousand Lights_South and East Chennai_3_Cluster,13.0617,80.2544,4
34,K.K. Nagar,K.K. Nagar_West Chennai_1_Cluster,13.041,80.1994,4
139,Vadapalani,Vadapalani_West Chennai_1_Cluster,13.05,80.2121,4
53,M.G.R. Nagar,M.G.R. Nagar_West Chennai_1_Cluster,13.0352,80.1973,3
81,Nesapakkam,Nesapakkam_West Chennai_1_Cluster,13.0379,80.192,3
113,Royapuram,Royapuram_North Chennai_1_Cluster,13.1137,80.2954,3


In [329]:
map_ch_tier1 = folium.Map(location=[latitude, longitude], zoom_start=12)
# Adding markers to map
for lat, lng, neighborhood,neigh_region_cluster in zip(chennai_potential_tier1_locations['Latitude'],  chennai_potential_tier1_locations['Longitude'], chennai_potential_tier1_locations['Neighborhood'],chennai_potential_tier1_locations['Neighbourhood-Region-Cluster']):
 label = '{}'.format(neigh_region_cluster)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch_tier1)
 #folium.Marker([lat, lng], icon=folium.DivIcon(icon_size=(150,36), icon_anchor=(5,5),html='<div style="font-size: 8pt; color : red">{}</div>'.format(neigh_region_cluster))).add_to(map_ch_tier1)
map_ch_tier1

In [330]:
map_ch_tier2 = folium.Map(location=[latitude, longitude], zoom_start=12)
# Adding markers to map
for lat, lng, neighborhood,neigh_region_cluster in zip(chennai_potential_tier2_locations['Latitude'],  chennai_potential_tier2_locations['Longitude'], chennai_potential_tier2_locations['Neighborhood'],chennai_potential_tier2_locations['Neighbourhood-Region-Cluster']):
 label = '{}'.format(neigh_region_cluster)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch_tier2)
 #folium.Marker([lat, lng], icon=folium.DivIcon(icon_size=(150,36), icon_anchor=(5,5),html='<div style="font-size: 8pt; color : red">{}</div>'.format(neigh_region_cluster))).add_to(map_ch_tier1)
map_ch_tier2