In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy -y
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 -y
import folium # map rendering library

! pip install lxml html5lib beautifulsoup4

print('Libraries imported.')




Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    openssl-1.1.1j             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be

# Capstone Project - The Battle of the Neighborhoods_Project
### Course: Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project , the objective is to try to find a set of hot-spot locations for a hang-out themed Snack bar. This report will be targeted to those stakeholders who are interested in opening a chain of  **Hang-out themed Snack bar** in **Chennai**, India.

The definition of 'Hang-out themed Snack Bar' here is a place where people unwind in the evening/weekends or after a long day at work/schools/colleges for tasty snacks/foods(light evening foods) and fun time with their friends/family. The place is designed as an alternative to cater to the same crowd who visits any of these following venues- cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlur,Pizza Bar,Fast Food Center,Sandwich Shop,Bakery,Chaat Corner,Food Truck,Tea Stall.We will try to detect **locations that have demand for such hang-out places in the Chennai City** . The rationale behind is to come up with a list of potential locations where there are already venues which provides similar experiences through the above mentioned venues and know there is a demand(audience) for these **hang-out themed Snack bars** . This way these new hang-out themed snack bars could provide as a viable alternative catering to these segments highlighting the  **wholesome experience it brings along with it**. 

Using FourSquare APIs , the list of hot-spot tier-1 locations(with high demand for these hang-out bars)  and the next-best tier-2(Moderate demand) locations across Chennai.Focus is to first identify the hot-regions(boroughs with high demand for these venues) in Chennai for setting up a chain of these **hang-out themed snack bars**  and then provide the neighborhood clusters in them so as to make informed decision . In real-time scenario, some more deep-dive analysis needs to be done by considering real-estate prices,rent prices, transport and the landmarks nearby(offices/schools/colleges)- hence as part of this project we would effectively narrow down high-demand pocket areas within Chennai in which potentially a chain of hang-out themed snack bars could be opened .

## Data <a name="data"></a>

Based on definition of our problem, critical factor that will influence the decision is:
* number of existing  venues as mentioned in this list(cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlor,Pizza Bar,Fast Food Center,Sandwich Shop,Bakery,Chaat Corner,Food Truck,Tea Stall) in the neighborhood and corresponding region(boroughs) 
 

Following data sources will be needed to extract/generate the required information:
* geospatial coordinates of the Chennai Neighborbood along with the regions from this wiki page - Following data sources will be needed to extract/generate the required information:https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**


## Methodology <a name="methodology"></a>

In this project - the high-demand areas for 'hangout-themed snack bars' in Chennai are to be identified.

In first step , the required **data: from the wiki page** https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai are collected and then using FourSquare API identified **data: Count of Venues such as 'cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlor,Pizza Bar,Fast Food Center,Sandwich Shop,Bakery,Chaat Corner,Food Truck,Tea Stall' across different neighborhoods pertaining to different regions(boroughs) of Chennai** (these venues are categorized according to Foursquare categorization).

Second step in the  analysis is to find the high demand for these kind of hang-out places in regions(boroughs) of Chennai to focus on the negihborhoods only from these regions. For this , groupby(regions) and get the total sum of these hangout venues from the results of Foursquare data. Based on the total values, the 'high-demand' regions are identified for further analysis.

In the third and final in the analysis is to find the most happening neighborhood clusters among these 'high-demand' regions . The primary objective for neighborhood clusters is to establish a chain of these new 'hang-out themed Snack bars' in those clusters based on the target audience and demand for such hangout places.For this,get the present map of these identified 'high-demand' regions and  also create clusters (using **k-means clustering with cluster size 5**) within these regions(burroughs) of chennai. Each 'high-demand' region will have 5 neighborhood clusters based on the 'total_hangout' value as the base parameter for clustering. Total_hangout' of a neighborhood is the total sum of all the counts of these venues(cafe bar,Coffee Bar,Snack Bar,Dessert Bar,Juice bar,Ice-cream Parlour,Pizza Bar,Fast Food Center,Sandwich Shop,Bakery,Chaat Corner,Food Truck,Tea Stall). From them, the most-happening 'neighborhood clusters' are identified across these 'high-demand' regions. Then the final list of 'Tier-1' hotspot locations are identified from these 'neighborhood clusters' based on a certain threshold value and also next-best'tier-2' location are identified based on the next-best available threshold value. these will be shared with Stakeholders to do further deep-dive into these 'tier-1' and 'tier-2' locations by considering real-estate prices,rent prices, transport and the landmarks nearby(offices/schools/colleges) to make an informed decision 



## Analysis <a name="analysis"></a>

Basic explanatory data analysis are performed in the following steps and derive the required information from the raw data. 


**Step-1**: Get the list of Chennai neighborhoods along with the corresponding region from this wiki page and load that into a dataframe(chennai_list). 


In [85]:
# Webpage url                                                                                                               
url = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai'

# Extract tables
chennai_wiki = pd.read_html(url)
                                                                                                        
chennai_list = chennai_wiki[1] # get the  table  continaing zipcodes from the page
print('Chennai Neighorhoods and their Region are loaded into dataframe')
chennai_list.rename(columns={"Area":"Neighborhood","Location":"Region"},inplace=True)
chennai_list.head()



Chennai Neighorhoods and their Region are loaded into dataframe


Unnamed: 0,Neighborhood,Region,Latitude,Longitude
0,Adambakkam,South and East Chennai,12.988,80.2047
1,Adyar,South and East Chennai,13.0012,80.2565
2,Alandur,South and East Chennai,12.9975,80.2006
3,Alapakkam,West Chennai,13.049,80.1673
4,Alwarthirunagar,West Chennai,13.0426,80.184


**Step-2** - Get to know the dataset more by identifying the unique regions and the Neighborhoods in them.

In [86]:
print('The dataframe has {} Region and {} Neighborhood.'.format(
        len(chennai_list['Region'].unique()),
        chennai_list.shape[0]
    )
)

The dataframe has 8 Region and 176 Neighborhood.


**Step-3** - Get the Co-ordinates of Chennai,India by using the geolocator

In [87]:

#using geolocator to get the coordinates of Chennai
address = 'Chennai, India'
geolocator = Nominatim(user_agent="chennai-nei")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Chennai, India {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chennai, India 13.0836939, 80.270186.


**Step-4**: Map of Chennai Neighborhood using folium

In [88]:
map_ch = folium.Map(location=[latitude, longitude], zoom_start=12)
# Adding markers to map
for lat, lng, neighborhood in zip(chennai_list['Latitude'],  chennai_list['Longitude'], chennai_list['Neighborhood']):
 label = '{}'.format(neighborhood)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch)
map_ch

**Step-5** - Connection to FourSquare API is established to deep-dive into these neighborhoods

In [89]:
CLIENT_ID = 'FYRYFNPWI233M4NHY1U4L3NKC4QIRK4YFRBQ00H3MT3CYUUZ' # your Foursquare ID
CLIENT_SECRET = 'JBY25GHOQYVR1BPJIHE4RIFSAVK5DB4K0VIUMM0V1UCEL2EQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FYRYFNPWI233M4NHY1U4L3NKC4QIRK4YFRBQ00H3MT3CYUUZ
CLIENT_SECRET:JBY25GHOQYVR1BPJIHE4RIFSAVK5DB4K0VIUMM0V1UCEL2EQ


**Step-6**: Get the list of venues from FourSquare API for all the Chennai Neighborhood by creating the 'getNearbyVenues' function to get the FourSquare dataset

In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**Step-7** - The above fuction is called by passing the  values from  original dataframe(chennai_list) containing neighborhood latitude and longitude and stored into a new dataframe(chennai_neig_venues) with the foursquare venues' data

In [91]:
chennai_neig_venues = getNearbyVenues(names=chennai_list['Neighborhood'],
                                   latitudes=chennai_list['Latitude'],
                                   longitudes=chennai_list['Longitude']
                                  )


Adambakkam
Adyar
Alandur
Alapakkam
Alwarthirunagar
Ambattur
Aminjikarai
Anna Nagar
Annanur
Arumbakkam
Ashok Nagar
Avadi
Ayappakkam
Basin Bridge
Besant Nagar
Broadway
Central
Chetpet
Choolai
MMDA Colony
Defence Colony
Egmore
Ennore
Erukanchery
George Town
Gerugambakkam
Gopalapuram
Guindy
Hastinapuram
ICF Colony
Injambakkam
Irumbuliyur
Iyyapanthangal
Jamalia
K.K. Nagar
Kadaperi
Kallikuppam
Karambakkam
Kathirvedu
Kathivakkam
Keelkattalai
Kodungaiyur
Kolappakkam
Kolathur
Korattur
Korukkupet
Kosapet
Kottivakkam
Kovilambakkam
Koyambedu
Kundrathur
Lakshmipuram
M.G.R. Garden
M.G.R. Nagar
M.K.B. Nagar
Madhavaram
Madhavaram Milk Colony
Madipakkam
Maduravoyal
Mambakkam
Manali
Manali New Town
manapakkam
Mangadu
Manjambakkam
Mannadi
Mathur MMDA
Medavakkam
Minjur
Mogappair
Moolakadai
Mowlivakkam
Mudichur
Mugalivakkam
Mylapore
Nagalkeni
Nandambakkam
Nanganallur
Naravarikuppam
Neelankarai
Nerkundrum
Nesapakkam
New Washermenpet
Nolambur
Old Washermenpet
Oragadam
Otteri
Padi
Palavakkam
Pallavaram
Pallik

**Step-8**: Merge the latitude,longitude values with the venue details obtained from foursquare into a single dataframe 'Chennai_venues'

In [92]:
chennai_venues= chennai_neig_venues.join(chennai_list.set_index('Neighborhood'), on='Neighborhood')
chennai_venues = chennai_venues[ ['Neighborhood','Region'] + [ col for col in chennai_venues.columns if (col != 'Neighborhood' and col != 'Region') ] ]
chennai_venues.head()



Unnamed: 0,Neighborhood,Region,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Latitude,Longitude
0,Adambakkam,South and East Chennai,12.988,80.2047,Venkateshwara Super Market,12.98632,80.205168,Department Store,12.988,80.2047
1,Adambakkam,South and East Chennai,12.988,80.2047,Ibaco,12.988729,80.205646,Dessert Shop,12.988,80.2047
2,Adambakkam,South and East Chennai,12.988,80.2047,Deepam Restaurant,12.98538,80.205281,Indian Restaurant,12.988,80.2047
3,Adambakkam,South and East Chennai,12.988,80.2047,visakan mess,12.985661,80.201748,Restaurant,12.988,80.2047
4,Adambakkam,South and East Chennai,12.988,80.2047,ibaco Adambakkam,12.987358,80.200504,Ice Cream Shop,12.988,80.2047


**Step-9**: Do the onhot encoding for the 'chennai_venues' dataframe with all the venues in it and then filtering the dataframe to focus on the 'hangout' venues- with the following venue categories - Neighborhood|Region|Cafe|Coffee|Snack|Dessert|Juice|Ice|Pizza|Fast|Sand|Bakery|Chaat|Truck|Tea
and put them into a new dataframe 'chennai_snack_venues'

In [93]:
# one hot encoding
chennai_onehot = pd.get_dummies(chennai_venues[['Venue Category']], prefix="", prefix_sep="")
chennai_onehot['Neighborhood'] = chennai_venues['Neighborhood']
chennai_onehot['Region'] = chennai_venues['Region']
chennai_onehot = chennai_onehot[ ['Neighborhood'] +['Region'] + [ col for col in chennai_onehot.columns if ((col != 'Neighborhood') and (col != 'Region') ) ] ]

chennai_snack_venues=chennai_onehot.filter(regex='Neighborhood|Region|Cafe|Coffee|Snack|Dessert|Juice|Ice|Pizza|Fast|Sand|Bakery|Chaat|Truck|Tea')

chennai_snack_venues.head()


Unnamed: 0,Neighborhood,Region,Bakery,Cafeteria,Coffee Shop,Dessert Shop,Fast Food Restaurant,Food Truck,Gaming Cafe,Ice Cream Shop,Juice Bar,Pizza Place,Sandwich Place,Snack Place,Tea Room
0,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Adambakkam,South and East Chennai,0,0,0,1,0,0,0,0,0,0,0,0,0
2,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adambakkam,South and East Chennai,0,0,0,0,0,0,0,1,0,0,0,0,0


**Step-10**: Group by Regions and identify the 'most happening ' regions by creating a new column 'Total' that basically sums up all the count of these chosen venues . This is to identify the demand where the new hangouts can be established.This will be in the 'chennai_grouped' dataframe

In [94]:
chennai_grouped = chennai_snack_venues.groupby('Region').sum().reset_index()
chennai_grouped['Total']= chennai_grouped.sum(axis=1)
chennai_grouped

Unnamed: 0,Region,Bakery,Cafeteria,Coffee Shop,Dessert Shop,Fast Food Restaurant,Food Truck,Gaming Cafe,Ice Cream Shop,Juice Bar,Pizza Place,Sandwich Place,Snack Place,Tea Room,Total
0,North Chennai,6,0,3,1,11,1,0,2,1,7,5,2,0,39
1,Northern Suburbs of Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,South and East Chennai,6,2,10,5,10,1,0,9,7,8,5,6,1,70
3,Southern-Eastern Suburbs of Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Suburban Chennai,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Suburbs along ECR and OMR of Chennai,0,0,3,2,2,0,0,3,0,2,1,0,0,13
6,West Chennai,8,0,4,3,11,0,1,8,2,10,4,4,2,57
7,Western Suburbs of Chennai,0,0,0,0,3,0,0,0,0,1,0,0,2,6


**Step-11** : Choose 3 regions - 'North', 'South East' and 'West' as these are most-happening regions based on the 'total' column value. For these 3 regions vlaues are  'North-39', 'South East-70' and 'West-57' and create 3 different venues for each of these regions(chennai_tar_north,chennai_tar_south_east,chennai_tar_west) from the 'chennai_snack_venues' dataframe

In [95]:
chennai_tar_north = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'North Chennai']
chennai_tar_south_east = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'South and East Chennai']
chennai_tar_west = chennai_snack_venues.loc[chennai_snack_venues['Region'] == 'West Chennai']



**Step-12**: Now groupby these 3 regions by using the 'neighborhoods' in them 'chennai_tar_north_grouped', 'chennai_tar_south_east_grouped', 'chennai_tar_west_grouped'. Create the Total_hangouts for these regions by summing these venue counts.Then create 3 dataframes(chennai_tar_north_hangouts,chennai_tar_south_east_hangouts,chennai_tar_west_hangouts) with only 'Neighborhood' and 'Total_hangouts' for creating clusters for these regions based on this 'Total_hangouts' as the cluster parameter.

In [96]:
chennai_tar_north_grouped = chennai_tar_north.groupby('Neighborhood').sum().reset_index()
chennai_tar_south_east_grouped = chennai_tar_south_east.groupby('Neighborhood').sum().reset_index()
chennai_tar_west_grouped = chennai_tar_west.groupby('Neighborhood').sum().reset_index()

chennai_tar_north_grouped['Total_hangouts']= chennai_tar_north_grouped.sum(axis=1)
chennai_tar_south_east_grouped['Total_hangouts']= chennai_tar_south_east_grouped.sum(axis=1)
chennai_tar_west_grouped['Total_hangouts']= chennai_tar_west_grouped.sum(axis=1)


chennai_tar_north_hangouts = chennai_tar_north_grouped[['Neighborhood','Total_hangouts']]
chennai_tar_south_east_hangouts = chennai_tar_south_east_grouped[['Neighborhood','Total_hangouts']]
chennai_tar_west_hangouts = chennai_tar_west_grouped[['Neighborhood','Total_hangouts']]



**Step-13**: Create 3 clusters(kmeans_north,kmeans_south_east,kmeans_west) one for each region and store it into 3 dataframe with cluster labels(neighborhoods_venues_sorted_north,neighborhoods_venues_sorted_south_east,neighborhoods_venues_sorted_west) and merge the 'latitude' and longitude' values from the original dataframe .
           

In [98]:
# set number of clusters
kclusters = 5

chennai_tar_north_hangouts_clusters = chennai_tar_north_hangouts.drop('Neighborhood', 1)
chennai_tar_south_east_hangouts_clusters = chennai_tar_south_east_hangouts.drop('Neighborhood', 1)
chennai_tar_west_hangouts_clusters = chennai_tar_west_hangouts.drop('Neighborhood', 1)

# run k-means clustering
kmeans_north = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_north_hangouts_clusters)
kmeans_south_east = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_south_east_hangouts_clusters)
kmeans_west = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_tar_west_hangouts_clusters)
# check cluster labels generated for each row in the dataframe
kmeans_north.labels_[0:48]
kmeans_south_east.labels_[0:46]
kmeans_west.labels_[0:41]

# add clustering labels
columns = ['Neighborhood','Total_hangouts']

neighborhoods_venues_sorted_north = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_south_east = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_west = pd.DataFrame(columns=columns)

neighborhoods_venues_sorted_north['Neighborhood'] = chennai_tar_north_hangouts['Neighborhood']
neighborhoods_venues_sorted_north['Total_hangouts'] = chennai_tar_north_hangouts['Total_hangouts']
neighborhoods_venues_sorted_south_east['Neighborhood'] = chennai_tar_south_east_hangouts['Neighborhood']
neighborhoods_venues_sorted_south_east['Total_hangouts'] = chennai_tar_south_east_hangouts['Total_hangouts']
neighborhoods_venues_sorted_west['Neighborhood'] = chennai_tar_west_hangouts['Neighborhood']
neighborhoods_venues_sorted_west['Total_hangouts'] = chennai_tar_west_hangouts['Total_hangouts']

neighborhoods_venues_sorted_north.insert(0, 'Cluster Labels', kmeans_north.labels_)
neighborhoods_venues_sorted_south_east.insert(0, 'Cluster Labels', kmeans_south_east.labels_)
neighborhoods_venues_sorted_west.insert(0, 'Cluster Labels', kmeans_west.labels_)

chennai_tar_north_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'North Chennai']
chennai_tar_south_east_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'South and East Chennai']
chennai_tar_west_hangouts_merged = chennai_list.loc[chennai_list['Region'] == 'West Chennai']

chennai_tar_north_hangouts_merged = chennai_tar_north_hangouts_merged.join(neighborhoods_venues_sorted_north.set_index('Neighborhood'), on='Neighborhood',how='right')
chennai_tar_south_east_hangouts_merged = chennai_tar_south_east_hangouts_merged.join(neighborhoods_venues_sorted_south_east.set_index('Neighborhood'), on='Neighborhood',how='right')
chennai_tar_west_hangouts_merged = chennai_tar_west_hangouts_merged.join(neighborhoods_venues_sorted_west.set_index('Neighborhood'), on='Neighborhood',how='right')

chennai_tar_north_hangouts_merged = chennai_tar_north_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_north_hangouts_merged.columns if col != 'Neighborhood' ] ]
chennai_tar_south_east_hangouts_merged = chennai_tar_south_east_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_south_east_hangouts_merged.columns if col != 'Neighborhood' ] ]
chennai_tar_west_hangouts_merged = chennai_tar_west_hangouts_merged[ ['Neighborhood'] + [ col for col in chennai_tar_west_hangouts_merged.columns if col != 'Neighborhood' ] ]
print(chennai_tar_north_hangouts_merged.shape,chennai_tar_south_east_hangouts_merged.shape, chennai_tar_west_hangouts_merged.shape)


(46, 6) (45, 6) (39, 6)


**Step-14** - Create 3 cluster map for these 3 regions (map_clusters_north,map_clusters_south_east,map_clusters_west).

In [99]:
# create map
map_clusters_north = folium.Map(location=[13.0827, 80.2707], zoom_start=11)
map_clusters_south_east = folium.Map(location=[13.0827, 80.2707], zoom_start=11)
map_clusters_west = folium.Map(location=[13.0827, 80.2707], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chennai_tar_north_hangouts_merged['Latitude'], chennai_tar_north_hangouts_merged['Longitude'], chennai_tar_north_hangouts_merged['Neighborhood'], chennai_tar_north_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_north)
    
for lat, lon, poi, cluster in zip(chennai_tar_south_east_hangouts_merged['Latitude'], chennai_tar_south_east_hangouts_merged['Longitude'], chennai_tar_south_east_hangouts_merged['Neighborhood'], chennai_tar_south_east_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_south_east)
    
for lat, lon, poi, cluster in zip(chennai_tar_west_hangouts_merged['Latitude'], chennai_tar_west_hangouts_merged['Longitude'], chennai_tar_west_hangouts_merged['Neighborhood'], chennai_tar_west_hangouts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_west)       


**Step-15** - Display cluster map for these 3 regions (map_clusters_north,map_clusters_south_east,map_clusters_west).

In [100]:
map_clusters_north


In [101]:
map_clusters_south_east


In [102]:
map_clusters_west


**Step-16** - Create dataframes for each clusters across these 3 region-cluster-maps. There will be 5 clusters for each region-cluster map amounting to toal of 15 neighborhood-clusters and put them into 15 dataframes

In [104]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster0 = pd.DataFrame(columns=columns)
chennai_north_cluster0=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 0, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
16,Central,1
24,George Town,1
33,Jamalia,1
38,Kathirvedu,1
70,Moolakadai,1
85,Oragadam,1
92,Park Town,1
100,Periamet,1
117,Selavoyal,1
118,Sembiam,1


In [105]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster1 = pd.DataFrame(columns=columns)
chennai_north_cluster1=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 1, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
13,Basin Bridge,0
22,Ennore,0
39,Kathivakkam,0
41,Kodungaiyur,0
43,Kolathur,0
45,Korukkupet,0
46,Kosapet,0
51,Lakshmipuram,0
54,M.K.B. Nagar,0
55,Madhavaram,0


In [106]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster2 = pd.DataFrame(columns=columns)
chennai_north_cluster2=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 2, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
15,Broadway,7


In [107]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster3 = pd.DataFrame(columns=columns)
chennai_north_cluster3=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 3, chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
18,Choolai,2
56,Madhavaram Milk Colony,2
65,Mannadi,2
94,Pattabiram,2
95,Pattalam,2
99,Perambur,2
113,Royapuram,2
120,Sowcarpet,2


In [108]:
columns = ['Neighborhood','Total_hangouts']

chennai_north_cluster4 = pd.DataFrame(columns=columns)
chennai_north_cluster4=chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Cluster Labels'] == 4 , chennai_tar_north_hangouts_merged.columns[[0] + list(range(5, chennai_tar_north_hangouts_merged.shape[1]))]]
chennai_north_cluster4

Unnamed: 0,Neighborhood,Total_hangouts
93,Parry's Corner,3


In [109]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster0 = pd.DataFrame(columns=columns)
chennai_south_east_cluster0=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 0, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
0,Adambakkam,2
17,Chetpet,3
27,Guindy,2
28,Hastinapuram,2
40,Keelkattalai,2
47,Kottivakkam,2
57,Madipakkam,2
59,Mambakkam,2
77,Nanganallur,2
79,Neelankarai,2


In [110]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster1 = pd.DataFrame(columns=columns)
chennai_south_east_cluster1=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 1, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
1,Adyar,5
26,Gopalapuram,6
122,T. Nagar,5
125,Taramani,5
135,Thousand Lights,4


In [111]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster2 = pd.DataFrame(columns=columns)
chennai_south_east_cluster2=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 2, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
2,Alandur,0
21,Egmore,0
30,Injambakkam,0
35,Kadaperi,0
48,Kovilambakkam,0
75,Nagalkeni,0
89,Pallavaram,0
90,Pallikaranai,0
97,Pazhavanthangal,0
98,Peerkankaranai,0


In [112]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster3 = pd.DataFrame(columns=columns)
chennai_south_east_cluster3=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 3, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
14,Besant Nagar,9


In [113]:
columns = ['Neighborhood','Total_hangouts']

chennai_south_east_cluster4 = pd.DataFrame(columns=columns)
chennai_south_east_cluster4=chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Cluster Labels'] == 4, chennai_tar_south_east_hangouts_merged.columns[[0] + list(range(5, chennai_tar_south_east_hangouts_merged.shape[1]))]]
chennai_south_east_cluster4

Unnamed: 0,Neighborhood,Total_hangouts
31,Irumbuliyur,1
67,Medavakkam,1
74,Mylapore,1
88,Palavakkam,1
91,Pammal,1
102,Perungudi,1
119,Sholinganallur,1
124,Tambaram,1


In [114]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster0 = pd.DataFrame(columns=columns)
chennai_west_cluster0=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 0, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster0

Unnamed: 0,Neighborhood,Total_hangouts
3,Alapakkam,2
6,Aminjikarai,2
32,Iyyapanthangal,2
44,Korattur,2
50,Kundrathur,2
53,M.G.R. Nagar,2
81,Nesapakkam,2
83,Nolambur,2
62,manapakkam,2


In [115]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster1 = pd.DataFrame(columns=columns)
chennai_west_cluster1=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 1, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster1

Unnamed: 0,Neighborhood,Total_hangouts
4,Alwarthirunagar,4
10,Ashok Nagar,6
34,K.K. Nagar,4
139,Vadapalani,5


In [116]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster2 = pd.DataFrame(columns=columns)
chennai_west_cluster2=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 2, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster2

Unnamed: 0,Neighborhood,Total_hangouts
5,Ambattur,0
8,Annanur,0
11,Avadi,0
20,Defence Colony,0
37,Karambakkam,0
49,Koyambedu,0
63,Mangadu,0
80,Nerkundrum,0
87,Padi,0
104,Poonamallee,0


In [117]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster3 = pd.DataFrame(columns=columns)
chennai_west_cluster3=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 3, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster3

Unnamed: 0,Neighborhood,Total_hangouts
9,Arumbakkam,1
52,M.G.R. Garden,1
19,MMDA Colony,1
69,Mogappair,1
71,Mowlivakkam,1
73,Mugalivakkam,1
107,Pudur,1
115,Saligramam,1
127,Thirumangalam,1
140,Valasaravakkam,1


In [119]:
columns = ['Neighborhood','Total_hangouts']

chennai_west_cluster4 = pd.DataFrame(columns=columns)
chennai_west_cluster4=chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Cluster Labels'] == 4, chennai_tar_west_hangouts_merged.columns[[0] + list(range(5, chennai_tar_west_hangouts_merged.shape[1]))]]
chennai_west_cluster4

Unnamed: 0,Neighborhood,Total_hangouts
7,Anna Nagar,9


**Step-17** - Identify the most-happening clusters from these 15 clusters . Most happening clusters are
 chennai_north_cluster2,chennai_north_cluster4,chennai_west_cluster1,chennai_west_cluster4,chennai_south_east_cluster1,chennai_south_east_cluster3) and put them into dataframes for further analysis.

In [120]:
chennai_north_cluster2_neighborhoods=chennai_north_cluster2['Neighborhood'].tolist()
chennai_north_cluster4_neighborhoods=chennai_north_cluster4['Neighborhood'].tolist()
chennai_west_cluster1_neighborhoods=chennai_west_cluster1['Neighborhood'].tolist()
chennai_west_cluster4_neighborhoods=chennai_west_cluster4['Neighborhood'].tolist()
chennai_south_east_cluster1_neighborhoods=chennai_south_east_cluster1['Neighborhood'].tolist()
chennai_south_east_cluster3_neighborhoods=chennai_south_east_cluster3['Neighborhood'].tolist()
cols = ['Neighborhood', 'Region', 'Cluster Labels']

chennai_tar_north_hangouts_merged.set_index('Neighborhood')
chennai_tar_south_east_hangouts_merged.set_index('Neighborhood')
chennai_tar_west_hangouts_merged.set_index('Neighborhood')

chennai_north_cluster2_neighborhood_loc = chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Neighborhood'].isin(chennai_north_cluster2_neighborhoods)]
chennai_north_cluster4_neighborhood_loc = chennai_tar_north_hangouts_merged.loc[chennai_tar_north_hangouts_merged['Neighborhood'].isin(chennai_north_cluster4_neighborhoods)]
chennai_north_cluster2_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_north_cluster2_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_north_cluster2_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_north_cluster2_neighborhood_loc = chennai_north_cluster2_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_north_cluster2_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_north_cluster4_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_north_cluster4_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_north_cluster4_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_north_cluster4_neighborhood_loc = chennai_north_cluster4_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_north_cluster4_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]

chennai_west_cluster1_neighborhood_loc = chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Neighborhood'].isin(chennai_west_cluster1_neighborhoods)]
chennai_west_cluster4_neighborhood_loc = chennai_tar_west_hangouts_merged.loc[chennai_tar_west_hangouts_merged['Neighborhood'].isin(chennai_west_cluster4_neighborhoods)]
chennai_west_cluster1_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_west_cluster1_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_west_cluster1_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_west_cluster1_neighborhood_loc = chennai_west_cluster1_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_west_cluster1_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_west_cluster4_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_west_cluster4_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_west_cluster4_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_west_cluster4_neighborhood_loc = chennai_west_cluster4_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_west_cluster4_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]

chennai_south_east_cluster1_neighborhood_loc = chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Neighborhood'].isin(chennai_south_east_cluster1_neighborhoods)]
chennai_south_east_cluster3_neighborhood_loc = chennai_tar_south_east_hangouts_merged.loc[chennai_tar_south_east_hangouts_merged['Neighborhood'].isin(chennai_south_east_cluster3_neighborhoods)]
chennai_south_east_cluster1_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_south_east_cluster1_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_south_east_cluster1_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_south_east_cluster1_neighborhood_loc = chennai_south_east_cluster1_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_south_east_cluster1_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]
chennai_south_east_cluster3_neighborhood_loc['Neighbourhood-Region-Cluster'] = chennai_south_east_cluster3_neighborhood_loc[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)+'_Cluster'
chennai_south_east_cluster3_neighborhood_loc.drop(columns=['Region','Cluster Labels'],inplace=True)
chennai_south_east_cluster3_neighborhood_loc = chennai_south_east_cluster3_neighborhood_loc[ ['Neighborhood','Neighbourhood-Region-Cluster'] + [ col for col in chennai_south_east_cluster3_neighborhood_loc.columns if (col != 'Neighborhood' and col != 'Neighbourhood-Region-Cluster') ] ]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/use

**Step-18** : Split these clusters based on threshold value(6) into tier-1 and tier-2location

In [121]:
temp_data_list_tier1 = [chennai_north_cluster2_neighborhood_loc,chennai_south_east_cluster3_neighborhood_loc,chennai_west_cluster4_neighborhood_loc]  
temp_data_list_tier2 =[chennai_south_east_cluster1_neighborhood_loc,chennai_west_cluster1_neighborhood_loc,chennai_north_cluster4_neighborhood_loc]
# List of your dataframes
chennai_potential_tier1_locations = pd.concat(temp_data_list_tier1).sort_values(by='Total_hangouts',ascending=False)
chennai_potential_tier2_locations = pd.concat(temp_data_list_tier2).sort_values(by='Total_hangouts',ascending=False)

**Step-19**: Display these tier-1 and tier-2 locations. 
Tier-1 will be 'chennai_north_cluster2_neighborhood_loc,chennai_south_east_cluster3_neighborhood_loc,chennai_west_cluster4_neighborhood_loc'
Tier-2 will be 'chennai_south_east_cluster1_neighborhood_loc,chennai_west_cluster1_neighborhood_loc,chennai_north_cluster4_neighborhood_loc'

In [122]:
chennai_potential_tier1_locations

Unnamed: 0,Neighborhood,Neighbourhood-Region-Cluster,Latitude,Longitude,Total_hangouts
14,Besant Nagar,Besant Nagar_South and East Chennai_3_Cluster,13.0003,80.2667,9
7,Anna Nagar,Anna Nagar_West Chennai_4_Cluster,13.085,80.2101,9
15,Broadway,Broadway_North Chennai_2_Cluster,19.2274,72.9739,7


In [123]:
chennai_potential_tier2_locations

Unnamed: 0,Neighborhood,Neighbourhood-Region-Cluster,Latitude,Longitude,Total_hangouts
26,Gopalapuram,Gopalapuram_South and East Chennai_1_Cluster,13.0489,80.2586,6
10,Ashok Nagar,Ashok Nagar_West Chennai_1_Cluster,13.0373,80.2123,6
1,Adyar,Adyar_South and East Chennai_1_Cluster,13.0012,80.2565,5
122,T. Nagar,T. Nagar_South and East Chennai_1_Cluster,13.0418,80.2341,5
125,Taramani,Taramani_South and East Chennai_1_Cluster,12.9863,80.2432,5
139,Vadapalani,Vadapalani_West Chennai_1_Cluster,13.05,80.2121,5
135,Thousand Lights,Thousand Lights_South and East Chennai_1_Cluster,13.0617,80.2544,4
4,Alwarthirunagar,Alwarthirunagar_West Chennai_1_Cluster,13.0426,80.184,4
34,K.K. Nagar,K.K. Nagar_West Chennai_1_Cluster,13.041,80.1994,4
93,Parry's Corner,Parry's Corner_North Chennai_4_Cluster,13.0896,80.2882,3


**Step-20** - Create folium maps for these tier-1 and tier2 locations

In [125]:
map_ch_tier1 = folium.Map(location=[latitude, longitude], zoom_start=9)
# Adding markers to map
for lat, lng, neighborhood,neigh_region_cluster in zip(chennai_potential_tier1_locations['Latitude'],  chennai_potential_tier1_locations['Longitude'], chennai_potential_tier1_locations['Neighborhood'],chennai_potential_tier1_locations['Neighbourhood-Region-Cluster']):
 label = '{}'.format(neigh_region_cluster)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch_tier1)
 #folium.Marker([lat, lng], icon=folium.DivIcon(icon_size=(150,36), icon_anchor=(5,5),html='<div style="font-size: 8pt; color : red">{}</div>'.format(neigh_region_cluster))).add_to(map_ch_tier1)
map_ch_tier1

In [126]:
map_ch_tier2 = folium.Map(location=[latitude, longitude], zoom_start=12)
# Adding markers to map
for lat, lng, neighborhood,neigh_region_cluster in zip(chennai_potential_tier2_locations['Latitude'],  chennai_potential_tier2_locations['Longitude'], chennai_potential_tier2_locations['Neighborhood'],chennai_potential_tier2_locations['Neighbourhood-Region-Cluster']):
 label = '{}'.format(neigh_region_cluster)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7).add_to(map_ch_tier2)
 #folium.Marker([lat, lng], icon=folium.DivIcon(icon_size=(150,36), icon_anchor=(5,5),html='<div style="font-size: 8pt; color : red">{}</div>'.format(neigh_region_cluster))).add_to(map_ch_tier1)
map_ch_tier2

## Results and Discussion <a name="results"></a>

Analysis shows that the 3 hot-spot regions are - North, South-East and West regions of Chennai. These are based on the 'Total-hangouts' count when it is grouped under 'Regions'. 
North  39
South-East 70
West 57

Once the regions are identified, each region is divided into  5 clusters. There will be 15 clusters in total.Among these 15 clusters, most-happening neighborhood clusters are identified. 
Most happening clusters are (chennai_north_cluster2, chennai_north_cluster4, chennai_west_cluster1, chennai_west_cluster4, chennai_south_east_cluster1,chennai_south_east_cluster3). The neighborhood-clusters are identified based on the 'Total-hangouts'. 

Tier-1 locations and Tier-2 locations are split based on the threshold value (7).
Tier-1 location is 'chennai_north_cluster2_neighborhood_loc,chennai_south_east_cluster3_neighborhood_loc,chennai_west_cluster4_neighborhood_loc'
Neighborhood     Neighboorhood-Region-Cluster                         Total_hangouts
Besant Nagar     Besant Nagar_South and East Chennai_3_Cluster        9
Anna Nagar       Anna Nagar_West Chennai_4_Cluster                    9
Broadway         Broadway_North Chennai_2_Cluster                     7

Tier-2 location is 'chennai_south_east_cluster1_neighborhood_loc,chennai_west_cluster1_neighborhood_loc,chennai_north_cluster4_neighborhood_loc'
Neighborhood     Neighboorhood-Region-Cluster                         Total_hangouts
Gopalapuram     Gopalapuram_South and East Chennai_1_Cluster           6
Ashok Nagar     Ashok Nagar_West Chennai_1_Cluster                     6
Adyar           Adyar_South and East Chennai_1_Cluster                 5
T.Nagar         T.Nagar_South and East Chennai_1_Cluster               5
Taramani        Taramani_South and East Chennai_1_Cluster              5
Vadapalani      Vadapalani_West Chennai_1_Cluster                      5
Thousandlights  Thousandlights_South and East Chennai_1_Cluster        4
Alwarthirunagar Alwarthirunagar_West Chennai_1_Cluster                 4
K.K.Nagar       K.K.Nagar_West Chennai_1_Cluster                       4
Parry'sCorner   Parry'sCorner_North Chennai_4_Cluster                  3



Recommended tier-1 and tier2 should therefore be considered only as a starting point for more detailed analysis which could eventually result in setting up the 'hang-out themed snack bar'  but also other factors(Real estate/nearby landmarks such as coffices, colleges etc) taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project is to identify most-happening locations in Chennai Neighborhoods in order to aid stakeholders in narrowing down the search for optimal location for a new 'Hang-out themed Snack bar'. By calculating various venue density distribution from Foursquare data, first hot-spot regions are identified which then are used to create cluster for each identified region . Clustering of those neighborhood-locations was then performed for these hot-spot regions in order to create major neighborhood of interest (containing tier-1 and tier-2  potential locations) . These will be  used as starting points for final exploration by stakeholders.

Final decision on optimal '**hang-out themed snack bar**' location will be made by stakeholders based on specific characteristics of neighborhoods, taking into consideration additional factors like real estate availability, prices, social and economic dynamics of every neighborhood, nearby landmarks such as offices/colleges/schools etc.