# Capstone Project

### Problem Description

#### The client wants to start the restaurant in Chennai in Tamil Nadu, India. He is looking to start a big Chinese restaurant, so he wants to build it in a main area in the city where there is more footfall. He also doesn’t want a place where there are many already an established Chinese restaurant. The goal of the project is to find is to find an area in Chennai which has most of the crowded places and also the least number of Chinese type restaurants. 

In [1]:
#to get location data using pincodes
pip install pgeocode

Collecting pgeocode
  Downloading pgeocode-0.3.0-py3-none-any.whl (8.5 kB)
Installing collected packages: pgeocode
Successfully installed pgeocode-0.3.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
#for maps
pip install folium

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.8 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Note: you may need to restart the kernel to use updated packages.


In [3]:
import pgeocode
import folium
import pandas as pd
import numpy as np
import requests #request for Foursquare API
import json
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans #clustering algorithm
import matplotlib.cm as cm #for maps
import matplotlib.colors as colors #for maps

### Creating a dataframe which has the pincodes for all of the areas in Chennai.

In [4]:
read_page = pd.read_html('https://www.mapsofindia.com/pincode/india/tamil-nadu/chennai/')

In [5]:
pcodes_df = pd.DataFrame(data = read_page[0])
pcodes_df.columns = pcodes_df.iloc[0]
pcodes_df.head()

In [6]:
only_pincodes = pcodes_df[['Pincode']]
pincodes = only_pincodes.drop_na() #dropping rows because some of the pincodes didn't have values
pincodes.sort_values(by = 'Pincode')
pincodes.reset_index(inplace = True)

In [7]:
location_list = []

### Using the pincodes from above to get the areas and their associated latitude and longitudes using the pgeocode library

In [8]:
# The areas in Chennai are given a pincode. Some of the areas may have the same pincode, so they are combined as one area. 
# (like pincode 600018 for Pr. Accountant General and Teynampet)
for code in range(len(pincodes)):
    pin = pincodes.iloc[code][1]
    nomi = pgeocode.Nominatim('in')
    result = nomi.query_postal_code(pin)
    location_list.append([result.postal_code, result.place_name, result.latitude, result.longitude])

In [9]:
location_list[0:5] #list which contains all the details as a list

[['600018', 'Pr. Accountant General, Teynampet', 13.0433, 80.2528],
 ['600020',
  'Shastri Nagar (Chennai), Kasturibai Nagar, Adyar (Chennai)',
  12.9967,
  80.2603],
 ['600082', 'Periyar Nagar, Kumaran Nagar, G K M Colony', 13.0572, 80.2554],
 ['600029', nan, nan, nan],
 ['600040', 'Anna Nagar (Chennai)', 12.8819, 80.0885]]

In [10]:
places_with_location_df = pd.DataFrame(location_list, columns = ['Pincode','Places','Latitude','Longitude']) #converting the list into dataframe

In [11]:
places_with_location_df.head()

Unnamed: 0,Pincode,Places,Latitude,Longitude
0,600018,"Pr. Accountant General, Teynampet",13.0433,80.2528
1,600020,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603
2,600082,"Periyar Nagar, Kumaran Nagar, G K M Colony",13.0572,80.2554
3,600029,,,
4,600040,Anna Nagar (Chennai),12.8819,80.0885


### Checking the data for empty values

In [12]:
chennai_data = places_with_location_df.dropna()

In [13]:
chennai_data.shape

(55, 4)

In [14]:
#Chennai city latitude and longitude to view in the map
chennai_lat = 13.067439
chennai_long = 80.237617

### Creating a map to view the areas within Chennai

In [15]:
chennai_map = folium.Map(location=[chennai_lat, chennai_long], zoom_start=10)
for lat, lng, places in zip(chennai_data['Latitude'], chennai_data['Longitude'], chennai_data['Places']):
    label = places
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(chennai_map)  
    
chennai_map

### Creating a Foursquare connection to get the venues in each area

In [16]:
CLIENT_ID = 'VS30EBSEKGFBYJP1OSNATOVVP5MUSKURQULLVXXGAPXPYJSM' # Foursquare ID 
CLIENT_SECRET = 'XGZTKYOOMT2QCNHBBBQBPAMBKBIHTHVECF11GGDIYRF35AQI' # Foursquare Secret 
ACCESS_TOKEN = 'FKUWDU1HQ4GENCZLJZ3BBLV1NNUOO24BMASLHM5RT5KIOXAH' # FourSquare Access Token
VERSION = '20180605' # API version

In [None]:
### Exploring one area (Anna nagar) using the Foursquare API

In [17]:
# Creating url make request to the Foursquare API
# A limit of 100 is maitained for all the areas
LIMIT = 100
# lat and long of Anna nagar
latitude = 12.8819
longitude = 80.0885
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude,  
    LIMIT)

results = requests.get(url).json()

In [18]:
# function to retrieve the category of the venue from the json result
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Converting the resulting venues of Anna nagar to a dataframe

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Vandalur zoo,Zoo Exhibit,12.879441,80.081328
1,Kumbakonam Degree Coffee,Coffee Shop,12.833756,80.048721
2,ibaco selaiyur,Ice Cream Shop,12.92228,80.138609
3,Domino's Pizza,Pizza Place,12.926,80.108
4,Honey Spice,Multicuisine Indian Restaurant,12.870879,80.07668


### Creating the function to explore all the areas in Chennai

In [20]:
def getNearbyVenues(names, latitude, longitude):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitude, longitude):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Places', 
                  'Latitude of the place', 
                  'Longitude of the place', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
#running the above function
chennai_venues = getNearbyVenues(names=chennai_data['Places'],
                                   latitude=chennai_data['Latitude'],
                                   longitude=chennai_data['Longitude']
                                  )

Pr. Accountant General, Teynampet
Shastri Nagar (Chennai), Kasturibai Nagar, Adyar (Chennai)
Periyar Nagar, Kumaran Nagar, G K M Colony
Anna Nagar (Chennai)
Anna Nagar East
Anna Nagar Western Extn
Chintadripet, Anna Road H.O, Madras Electricity System
Arumbakkam, D G Vaishnav College
Jafferkhanpet, Ashoknagar (Chennai)
Aynavaram
Besantnagar, Rajaji Bhavan
Washermanpet, Washermanpet East
Tidel Park, TTTI Taramani
Ekkaduthangal, Guindy Industrial Estate
Sowcarpet, Mannady (Chennai), Chennai G.P.O., Flower Bazaar, Mint Building, Govt Stanley Hospital
Chepauk, Tiruvallikkeni, Parthasarathy Koil, Madras University
Chetput, World University Centre
Choolaimedu
Venkatesapuram, Puliyanthope, Perambur Barracks
Engineering College (Chennai)
Madras Medical College, Park Town H.O, Edapalayam, Ripon Buildings
Egmore, Ethiraj Salai, Egmore ND
Erukkancheri, Kodungaiyur, Rv Nagar
Flowers Road
Raja Annamalaipuram, Ramakrishna Nagar (Chennai)
Fort St George
Royapettah, Lloyds Estate
Gopalapuram (Chennai)

In [23]:
chennai_venues.shape #all the venues from all the areas in the dataset

(4570, 7)

In [24]:
chennai_venues.head()

Unnamed: 0,Places,Latitude of the place,Longitude of the place,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Pr. Accountant General, Teynampet",13.0433,80.2528,Stix,13.042898,80.24864,Chinese Restaurant
1,"Pr. Accountant General, Teynampet",13.0433,80.2528,Hyatt Regency Chennai,13.042874,80.248593,Hotel
2,"Pr. Accountant General, Teynampet",13.0433,80.2528,Ente Keralam,13.042188,80.25588,Kerala Restaurant
3,"Pr. Accountant General, Teynampet",13.0433,80.2528,Z The Tapas Bar & Restaurant,13.045808,80.258013,Bar
4,"Pr. Accountant General, Teynampet",13.0433,80.2528,Nando's,13.046237,80.256267,African Restaurant


In [25]:
chennai_venues.groupby('Places').count() #to view how much of venues Foursquare has returned for each area

Unnamed: 0_level_0,Latitude of the place,Longitude of the place,Venue,Venue Latitude,Venue Longitude,Venue Category
Places,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anna Nagar (Chennai),39,39,39,39,39,39
Anna Nagar East,100,100,100,100,100,100
Anna Nagar Western Extn,100,100,100,100,100,100
"Arumbakkam, D G Vaishnav College",100,100,100,100,100,100
Aynavaram,100,100,100,100,100,100
"Besantnagar, Rajaji Bhavan",30,30,30,30,30,30
"Chepauk, Tiruvallikkeni, Parthasarathy Koil, Madras University",100,100,100,100,100,100
"Chetput, World University Centre",100,100,100,100,100,100
"Chintadripet, Anna Road H.O, Madras Electricity System",100,100,100,100,100,100
Choolaimedu,30,30,30,30,30,30


### As the radius was not given in the url request section, there may be duplicate values in the dataset. So it must be cleaned.

In [27]:
chennai_venues.loc[chennai_venues['Venue'] == 'Subway'] #checking for duplicate values for a the venue Subway

Unnamed: 0,Places,Latitude of the place,Longitude of the place,Venue,Venue Latitude,Venue Longitude,Venue Category
38,"Pr. Accountant General, Teynampet",13.0433,80.2528,Subway,13.054787,80.249618,Sandwich Place
96,"Pr. Accountant General, Teynampet",13.0433,80.2528,Subway,13.049846,80.256958,Sandwich Place
162,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603,Subway,12.989579,80.248830,Sandwich Place
175,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603,Subway,12.999040,80.254743,Sandwich Place
193,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603,Subway,13.017396,80.271303,Sandwich Place
...,...,...,...,...,...,...,...
4200,"Valmiki Nagar, Tiruvanmiyur, Palavakkam (Kanch...",12.9695,80.2561,Subway,12.999040,80.254743,Sandwich Place
4275,Vadapalani,13.0511,80.2125,Subway,13.038853,80.212274,Sandwich Place
4279,Vadapalani,13.0511,80.2125,Subway,13.047275,80.194957,Sandwich Place
4348,Velacheri,13.0647,80.2523,Subway,13.061403,80.248521,Sandwich Place


### It's been confirmed that there are duplicate values for Subway. There must be many like this. The dataframe currently contains 4570 samples.

In [28]:
chennai_venues.drop_duplicates(subset = ['Places', 'Venue'], keep = 'last', inplace = True) #dropping duplicate values

In [29]:
chennai_venues.shape

(4252, 7)

In [30]:
chennai_venues.loc[chennai_venues['Venue'] == 'Subway']

Unnamed: 0,Places,Latitude of the place,Longitude of the place,Venue,Venue Latitude,Venue Longitude,Venue Category
96,"Pr. Accountant General, Teynampet",13.0433,80.2528,Subway,13.049846,80.256958,Sandwich Place
196,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603,Subway,12.986004,80.245865,Sandwich Place
288,"Periyar Nagar, Kumaran Nagar, G K M Colony",13.0572,80.2554,Subway,13.058682,80.264231,Sandwich Place
323,Anna Nagar (Chennai),12.8819,80.0885,Subway,12.823322,80.044616,Sandwich Place
405,Anna Nagar East,13.0974,80.195,Subway,13.094722,80.169145,Fast Food Restaurant
505,Anna Nagar Western Extn,13.0974,80.195,Subway,13.094722,80.169145,Fast Food Restaurant
603,"Chintadripet, Anna Road H.O, Madras Electricit...",13.0744,80.2714,Subway,13.058682,80.264231,Sandwich Place
639,"Arumbakkam, D G Vaishnav College",13.0734,80.2069,Subway,13.082455,80.210927,Sandwich Place
822,"Jafferkhanpet, Ashoknagar (Chennai)",13.0582,80.24,Subway,13.069381,80.237896,Restaurant
915,Aynavaram,13.0484,80.2473,Subway,13.049846,80.256958,Sandwich Place


In [31]:
chennai_venues.loc[chennai_venues['Venue'] == 'Subway'].count()

Places                    44
Latitude of the place     44
Longitude of the place    44
Venue                     44
Venue Latitude            44
Venue Longitude           44
Venue Category            44
dtype: int64

### As we can see the subway duplicates are removed and the number of total samples are much reduced. It confirms that all the duplicates are removed.

### The dataset will be encoded inorder to view the most common places in all areas of Chennai. 
### This is done because we need to find the amount of restaurants and other common places in each area, so that we can decide on which area to choose.

In [32]:
# one hot encoding
chennai_onehot = pd.get_dummies(chennai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chennai_onehot['Places'] = chennai_venues['Places'] 

# move neighborhood column to the first column
fixed_columns = [chennai_onehot.columns[-1]] + list(chennai_onehot.columns[:-1])
chennai_onehot = chennai_onehot[fixed_columns]

chennai_onehot.head()

Unnamed: 0,Places,African Restaurant,Airport Lounge,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,...,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Zoo,Zoo Exhibit
0,"Pr. Accountant General, Teynampet",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Pr. Accountant General, Teynampet",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Pr. Accountant General, Teynampet",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Pr. Accountant General, Teynampet",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Pr. Accountant General, Teynampet",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [33]:
chennai_onehot.shape

(4252, 149)

### The dataframe is grouped so that the most common place in that particular area has the highest number.

In [34]:
chennai_onehot_grouped = chennai_onehot.groupby('Places').mean().reset_index()
chennai_onehot_grouped.head()

Unnamed: 0,Places,African Restaurant,Airport Lounge,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,...,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Zoo,Zoo Exhibit
0,Anna Nagar (Chennai),0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.03125
1,Anna Nagar East,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.01087,0.032609,...,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0
2,Anna Nagar Western Extn,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.01087,0.032609,...,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0
3,"Arumbakkam, D G Vaishnav College",0.0,0.0,0.011236,0.0,0.0,0.011236,0.0,0.011236,0.022472,...,0.0,0.0,0.0,0.0,0.0,0.067416,0.0,0.0,0.0,0.0
4,Aynavaram,0.010309,0.0,0.0,0.0,0.010309,0.020619,0.0,0.020619,0.010309,...,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.010309,0.0,0.0
5,"Besantnagar, Rajaji Bhavan",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.035714,0.035714
6,"Chepauk, Tiruvallikkeni, Parthasarathy Koil, M...",0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,...,0.01087,0.01087,0.0,0.0,0.021739,0.01087,0.0,0.0,0.0,0.0
7,"Chetput, World University Centre",0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.020833,...,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.010417,0.0,0.0
8,"Chintadripet, Anna Road H.O, Madras Electricit...",0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,...,0.010753,0.010753,0.0,0.010753,0.010753,0.010753,0.0,0.0,0.0,0.0
9,Choolaimedu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.035714,0.035714


In [35]:
chennai_onehot_grouped.shape

(55, 149)

### Function is created to view the top 10 venues from each of the area

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Places']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
chennai_venues_sorted = pd.DataFrame(columns=columns)
chennai_venues_sorted['Places'] = chennai_onehot_grouped['Places']

for ind in np.arange(chennai_onehot_grouped.shape[0]):
    chennai_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chennai_onehot_grouped.iloc[ind, :], num_top_venues)

chennai_venues_sorted

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Anna Nagar (Chennai),Indian Restaurant,Market,Coffee Shop,Pizza Place,Indie Movie Theater,Zoo Exhibit,Sports Club,Sandwich Place,Movie Theater,Multicuisine Indian Restaurant
1,Anna Nagar East,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant
2,Anna Nagar Western Extn,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant
3,"Arumbakkam, D G Vaishnav College",Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Clothing Store,Multiplex,Electronics Store,Shopping Mall,Hotel,Department Store
4,Aynavaram,Indian Restaurant,Hotel,Café,Chinese Restaurant,Italian Restaurant,Ice Cream Shop,Middle Eastern Restaurant,Multiplex,Dessert Shop,South Indian Restaurant
5,"Besantnagar, Rajaji Bhavan",Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant
6,"Chepauk, Tiruvallikkeni, Parthasarathy Koil, M...",Indian Restaurant,Hotel,Café,Multiplex,Fast Food Restaurant,Middle Eastern Restaurant,Clothing Store,Beach,Movie Theater,Juice Bar
7,"Chetput, World University Centre",Indian Restaurant,Ice Cream Shop,Chinese Restaurant,Café,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Asian Restaurant,Bakery,Sandwich Place
8,"Chintadripet, Anna Road H.O, Madras Electricit...",Indian Restaurant,Hotel,Clothing Store,Pizza Place,Middle Eastern Restaurant,Sandwich Place,Café,Multiplex,Seafood Restaurant,Bookstore
9,Choolaimedu,Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant


### Clustering the above data into groups to know about the similarities in areas.
### By this the most suitable area cluster can be identified and the others clusters can be ignored instead of checking them one by one.

In [38]:
kclusters = 5

chennai_onehot_grouped_clustering = chennai_onehot_grouped.drop('Places', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chennai_onehot_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:] 

array([1, 2, 2, 2, 0, 1, 3, 0, 3, 1, 3, 0, 0, 3, 0, 3, 3, 2, 0, 0, 0, 1,
       4, 3, 0, 3, 3, 0, 2, 0, 4, 3, 0, 0, 0, 4, 3, 2, 3, 0, 2, 4, 3, 2,
       1, 4, 3, 0, 3, 2, 4, 3, 4, 4, 3], dtype=int32)

In [39]:
chennai_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) #merging the cluster labels with the data

chennai_merged = chennai_data

chennai_merged = chennai_merged.join(chennai_venues_sorted.set_index('Places'), on='Places')

chennai_merged.head()

Unnamed: 0,Pincode,Places,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,600018,"Pr. Accountant General, Teynampet",13.0433,80.2528,0,Hotel,Indian Restaurant,Café,Lounge,Restaurant,Italian Restaurant,Ice Cream Shop,Chinese Restaurant,Tea Room,Park
1,600020,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",12.9967,80.2603,0,Indian Restaurant,Ice Cream Shop,Café,Beach,Chinese Restaurant,Hotel,Fast Food Restaurant,Dessert Shop,Snack Place,Multiplex
2,600082,"Periyar Nagar, Kumaran Nagar, G K M Colony",13.0572,80.2554,3,Indian Restaurant,Hotel,Café,Multiplex,Clothing Store,Middle Eastern Restaurant,Movie Theater,Ice Cream Shop,Italian Restaurant,Donut Shop
4,600040,Anna Nagar (Chennai),12.8819,80.0885,1,Indian Restaurant,Market,Coffee Shop,Pizza Place,Indie Movie Theater,Zoo Exhibit,Sports Club,Sandwich Place,Movie Theater,Multicuisine Indian Restaurant
5,600102,Anna Nagar East,13.0974,80.195,2,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant


### Visualizing the Chennai map with the clusters to see how they are grouped

In [40]:
# create map
cluster_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chennai_merged['Latitude'], chennai_merged['Longitude'], chennai_merged['Places'], chennai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(cluster_map)
       
cluster_map

### Analyzing each cluster to know about the most common places

### As analyzing as a whole data, it is clear that most of the areas have their top most common places to be Indian Restaurant.
### So we can ignore it and start analyzing from the 2nd most common place in each area.

In [41]:
c1 = chennai_merged.loc[chennai_merged['Cluster Labels'] == 0, chennai_merged.columns[[1] + list(range(5, chennai_merged.shape[1]))]]

In [42]:
c2 = chennai_merged.loc[chennai_merged['Cluster Labels'] == 1, chennai_merged.columns[[1] + list(range(5, chennai_merged.shape[1]))]]

In [43]:
c3 = chennai_merged.loc[chennai_merged['Cluster Labels'] == 2, chennai_merged.columns[[1] + list(range(5, chennai_merged.shape[1]))]]

In [44]:
c4 = chennai_merged.loc[chennai_merged['Cluster Labels'] == 3, chennai_merged.columns[[1] + list(range(5, chennai_merged.shape[1]))]]

In [45]:
c5 = chennai_merged.loc[chennai_merged['Cluster Labels'] == 4, chennai_merged.columns[[1] + list(range(5, chennai_merged.shape[1]))]]

In [56]:
c1

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Pr. Accountant General, Teynampet",Hotel,Indian Restaurant,Café,Lounge,Restaurant,Italian Restaurant,Ice Cream Shop,Chinese Restaurant,Tea Room,Park
1,"Shastri Nagar (Chennai), Kasturibai Nagar, Ady...",Indian Restaurant,Ice Cream Shop,Café,Beach,Chinese Restaurant,Hotel,Fast Food Restaurant,Dessert Shop,Snack Place,Multiplex
9,"Jafferkhanpet, Ashoknagar (Chennai)",Indian Restaurant,Café,Ice Cream Shop,Chinese Restaurant,Coffee Shop,Hotel,Italian Restaurant,Asian Restaurant,BBQ Joint,Lounge
10,Aynavaram,Indian Restaurant,Hotel,Café,Chinese Restaurant,Italian Restaurant,Ice Cream Shop,Middle Eastern Restaurant,Multiplex,Dessert Shop,South Indian Restaurant
15,"Ekkaduthangal, Guindy Industrial Estate",Indian Restaurant,Ice Cream Shop,Café,Chinese Restaurant,Restaurant,Italian Restaurant,Pizza Place,Asian Restaurant,Coffee Shop,South Indian Restaurant
18,"Chetput, World University Centre",Indian Restaurant,Ice Cream Shop,Chinese Restaurant,Café,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Asian Restaurant,Bakery,Sandwich Place
22,Engineering College (Chennai),Indian Restaurant,Hotel,Café,Chinese Restaurant,Italian Restaurant,Ice Cream Shop,Middle Eastern Restaurant,Multiplex,Dessert Shop,South Indian Restaurant
26,Flowers Road,Indian Restaurant,Café,Ice Cream Shop,Chinese Restaurant,Coffee Shop,Hotel,Italian Restaurant,Asian Restaurant,BBQ Joint,Lounge
27,"Raja Annamalaipuram, Ramakrishna Nagar (Chennai)",Indian Restaurant,Restaurant,Hotel,Bakery,Café,Italian Restaurant,Lounge,Juice Bar,Chinese Restaurant,Dessert Shop
35,Icf Colony,Indian Restaurant,Ice Cream Shop,Café,Chinese Restaurant,Restaurant,Italian Restaurant,Pizza Place,Asian Restaurant,Coffee Shop,South Indian Restaurant


### As we can see in the first cluster, most of the top 5 common places are Indian Restaurants, Chinese Restaurants, Fast Food Restaurants and Café.
### As Fast Food Restaurants also have Chinese cuisines we can choose to ignore them.
### And also there are less number of Multiplexes or theatres, park, zoo, bus or train stations. 
### It would be a good decision to ignore this cluster as a whole as it is against our choosing criteria.

In [57]:
c2

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Anna Nagar (Chennai),Indian Restaurant,Market,Coffee Shop,Pizza Place,Indie Movie Theater,Zoo Exhibit,Sports Club,Sandwich Place,Movie Theater,Multicuisine Indian Restaurant
11,"Besantnagar, Rajaji Bhavan",Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant
14,"Tidel Park, TTTI Taramani",Indian Restaurant,Indie Movie Theater,Sports Club,Zoo,Food,Multicuisine Indian Restaurant,Department Store,Pizza Place,Racetrack,Coffee Shop
20,Choolaimedu,Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant
37,Kalaignar Karunanidhi Nagar,Indian Restaurant,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Movie Theater,Café,Bookstore,Department Store,Pub,Road


### As for the 2nd cluster, it has more crowded places like theatres, zoo and markets. So we'll take this into consideration

In [58]:
c3

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Anna Nagar East,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant
6,Anna Nagar Western Extn,Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant
8,"Arumbakkam, D G Vaishnav College",Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Clothing Store,Multiplex,Electronics Store,Shopping Mall,Hotel,Department Store
21,"Venkatesapuram, Puliyanthope, Perambur Barracks",Indian Restaurant,Bus Station,Train Station,Café,South Indian Restaurant,Fast Food Restaurant,Racetrack,Bakery,Sporting Goods Shop,Theme Park
32,"Saidapet (Chennai), Guindy North",Indian Restaurant,Fast Food Restaurant,Bakery,Vegetarian / Vegan Restaurant,Café,Restaurant,Chinese Restaurant,Coffee Shop,Park,Middle Eastern Restaurant
33,High Court Building (Chennai),Indian Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Multiplex,Café,Clothing Store,Bus Station,Bakery,Pizza Place,Chinese Restaurant
34,"Thygarayanagar North ND, Thygarayanagar South ...",Indian Restaurant,Fast Food Restaurant,Bakery,Vegetarian / Vegan Restaurant,Café,Restaurant,Chinese Restaurant,Coffee Shop,Park,Middle Eastern Restaurant
42,"Nerkundram, Koyambedu Wholesale Market Com, Ko...",Indian Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Clothing Store,Café,Hotel,Burger Joint,Restaurant
49,"Shenoy Nagar, Aminjikarai",Indian Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Chinese Restaurant,Café,Fast Food Restaurant,Bakery,Restaurant,Clothing Store,Bookstore


### The third one has more Fast food restaurants and also less number of crowded places. So this one can be ignored.

In [59]:
c4

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Periyar Nagar, Kumaran Nagar, G K M Colony",Indian Restaurant,Hotel,Café,Multiplex,Clothing Store,Middle Eastern Restaurant,Movie Theater,Ice Cream Shop,Italian Restaurant,Donut Shop
7,"Chintadripet, Anna Road H.O, Madras Electricit...",Indian Restaurant,Hotel,Clothing Store,Pizza Place,Middle Eastern Restaurant,Sandwich Place,Café,Multiplex,Seafood Restaurant,Bookstore
17,"Chepauk, Tiruvallikkeni, Parthasarathy Koil, M...",Indian Restaurant,Hotel,Café,Multiplex,Fast Food Restaurant,Middle Eastern Restaurant,Clothing Store,Beach,Movie Theater,Juice Bar
23,"Madras Medical College, Park Town H.O, Edapala...",Indian Restaurant,Clothing Store,Hotel,Café,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Multiplex,Juice Bar,Bookstore
24,"Egmore, Ethiraj Salai, Egmore ND",Indian Restaurant,Hotel,Café,Italian Restaurant,Ice Cream Shop,Chinese Restaurant,Pizza Place,Middle Eastern Restaurant,Donut Shop,Theater
25,"Erukkancheri, Kodungaiyur, Rv Nagar",Indian Restaurant,Café,Hotel,Multiplex,Middle Eastern Restaurant,Movie Theater,Chinese Restaurant,Italian Restaurant,Ice Cream Shop,Clothing Store
28,Fort St George,Indian Restaurant,Clothing Store,Hotel,Café,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Multiplex,Juice Bar,Bookstore
29,"Royapettah, Lloyds Estate",Indian Restaurant,Hotel,Café,Restaurant,Bar,Multiplex,Clothing Store,Juice Bar,Movie Theater,Fast Food Restaurant
30,Gopalapuram (Chennai),Indian Restaurant,Hotel,Café,Multiplex,Restaurant,Ice Cream Shop,Juice Bar,Clothing Store,Chinese Restaurant,South Indian Restaurant
31,"Teynampet West, DPI, Shastri Bhavan, Greams Road",Indian Restaurant,Clothing Store,Hotel,Café,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Multiplex,Juice Bar,Bookstore


### This one doesn't have most of the people going place but has less number of Fast food or Chinese restaurants. We can add this.

In [60]:
c5

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"Washermanpet, Washermanpet East",Indian Restaurant,Train Station,Hotel,Multiplex,Farmers Market,Coffee Shop,Sandwich Place,Bus Station,Nightclub,Restaurant
16,"Sowcarpet, Mannady (Chennai), Chennai G.P.O., ...",Indian Restaurant,Hotel,Pizza Place,Italian Restaurant,Farmers Market,Donut Shop,Multiplex,Museum,Platform,Coffee Shop
38,Rayapuram,Indian Restaurant,Train Station,Italian Restaurant,Hotel,Farmers Market,Multiplex,Electronics Store,Market,Snack Place,Motel
40,"Kilpauk, Kilpauk Medical College",Indian Restaurant,Bakery,Restaurant,Coffee Shop,Ice Cream Shop,Clothing Store,Department Store,Café,Fast Food Restaurant,Chinese Restaurant
50,"Perambur, Sembiam, Perambur North",Indian Restaurant,Bakery,Coffee Shop,Café,Train Station,Multiplex,Fast Food Restaurant,Middle Eastern Restaurant,Department Store,Restaurant
54,"Tondiarpet, Tondiarpet West, Tondiarpet Bazaar",Indian Restaurant,Hotel,Train Station,Multiplex,Farmers Market,Nightclub,Pizza Place,Museum,Market,Coffee Shop
57,Vepery,Indian Restaurant,Hotel,Multiplex,Café,Snack Place,Sandwich Place,Fast Food Restaurant,Seafood Restaurant,Platform,Italian Restaurant
58,"Vyasarpadi, Vyasar Nagar Colony",Indian Restaurant,Coffee Shop,Italian Restaurant,Train Station,Hotel,Multiplex,Farmers Market,Fast Food Restaurant,Food,Market


### The last cluster is also favourable to our criteria. So we can add this cluster.

In [62]:
final_df = pd.concat([c2, c4, c5], axis=0)


In [63]:
final_df.shape

(30, 11)

### Counting the most number of crowded places in the top 10 places in each area and also taking into account the places to be avoided.

In [64]:
final_df["Key counts"] = np.nan

In [65]:
key_words = ['Train', 'Multiplex', 'Theatre', 'Bus', 'Clothing Store', 'Park', 'Zoo', 'Market', 'Mall'] #for most crowded areas
not_key_words = ['Fast Food', 'Chinese'] #to avoid Chinese and fast food restaurants

In [66]:
final_df.head()

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Key counts
4,Anna Nagar (Chennai),Indian Restaurant,Market,Coffee Shop,Pizza Place,Indie Movie Theater,Zoo Exhibit,Sports Club,Sandwich Place,Movie Theater,Multicuisine Indian Restaurant,
11,"Besantnagar, Rajaji Bhavan",Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant,
14,"Tidel Park, TTTI Taramani",Indian Restaurant,Indie Movie Theater,Sports Club,Zoo,Food,Multicuisine Indian Restaurant,Department Store,Pizza Place,Racetrack,Coffee Shop,
20,Choolaimedu,Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant,
37,Kalaignar Karunanidhi Nagar,Indian Restaurant,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Movie Theater,Café,Bookstore,Department Store,Pub,Road,
2,"Periyar Nagar, Kumaran Nagar, G K M Colony",Indian Restaurant,Hotel,Café,Multiplex,Clothing Store,Middle Eastern Restaurant,Movie Theater,Ice Cream Shop,Italian Restaurant,Donut Shop,
7,"Chintadripet, Anna Road H.O, Madras Electricit...",Indian Restaurant,Hotel,Clothing Store,Pizza Place,Middle Eastern Restaurant,Sandwich Place,Café,Multiplex,Seafood Restaurant,Bookstore,
17,"Chepauk, Tiruvallikkeni, Parthasarathy Koil, M...",Indian Restaurant,Hotel,Café,Multiplex,Fast Food Restaurant,Middle Eastern Restaurant,Clothing Store,Beach,Movie Theater,Juice Bar,
23,"Madras Medical College, Park Town H.O, Edapala...",Indian Restaurant,Clothing Store,Hotel,Café,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Multiplex,Juice Bar,Bookstore,
24,"Egmore, Ethiraj Salai, Egmore ND",Indian Restaurant,Hotel,Café,Italian Restaurant,Ice Cream Shop,Chinese Restaurant,Pizza Place,Middle Eastern Restaurant,Donut Shop,Theater,


In [67]:
# to fill the Key counts column which decides the area with most crowded and least Chinese and Fast Food restaurants.
sum = 0
for i in range(len(final_df)):
    for j in range(10):
        venue = final_df.iloc[i, j+1]
        if (any(ele in venue for ele in key_words) & any(ele not in venue for ele in not_key_words)):
            sum = sum+1
    final_df.iloc[i,11] = sum
    sum = 0
final_df.head()

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Key counts
4,Anna Nagar (Chennai),Indian Restaurant,Market,Coffee Shop,Pizza Place,Indie Movie Theater,Zoo Exhibit,Sports Club,Sandwich Place,Movie Theater,Multicuisine Indian Restaurant,2.0
11,"Besantnagar, Rajaji Bhavan",Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant,3.0
14,"Tidel Park, TTTI Taramani",Indian Restaurant,Indie Movie Theater,Sports Club,Zoo,Food,Multicuisine Indian Restaurant,Department Store,Pizza Place,Racetrack,Coffee Shop,1.0
20,Choolaimedu,Indian Restaurant,Market,Indie Movie Theater,Train Station,Fast Food Restaurant,Coffee Shop,Café,Zoo,Juice Bar,Multicuisine Indian Restaurant,3.0
37,Kalaignar Karunanidhi Nagar,Indian Restaurant,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Movie Theater,Café,Bookstore,Department Store,Pub,Road,0.0
2,"Periyar Nagar, Kumaran Nagar, G K M Colony",Indian Restaurant,Hotel,Café,Multiplex,Clothing Store,Middle Eastern Restaurant,Movie Theater,Ice Cream Shop,Italian Restaurant,Donut Shop,2.0
7,"Chintadripet, Anna Road H.O, Madras Electricit...",Indian Restaurant,Hotel,Clothing Store,Pizza Place,Middle Eastern Restaurant,Sandwich Place,Café,Multiplex,Seafood Restaurant,Bookstore,2.0
17,"Chepauk, Tiruvallikkeni, Parthasarathy Koil, M...",Indian Restaurant,Hotel,Café,Multiplex,Fast Food Restaurant,Middle Eastern Restaurant,Clothing Store,Beach,Movie Theater,Juice Bar,2.0
23,"Madras Medical College, Park Town H.O, Edapala...",Indian Restaurant,Clothing Store,Hotel,Café,Middle Eastern Restaurant,Pizza Place,Sandwich Place,Multiplex,Juice Bar,Bookstore,2.0
24,"Egmore, Ethiraj Salai, Egmore ND",Indian Restaurant,Hotel,Café,Italian Restaurant,Ice Cream Shop,Chinese Restaurant,Pizza Place,Middle Eastern Restaurant,Donut Shop,Theater,0.0


In [68]:
final_df[final_df['Key counts']>3]

Unnamed: 0,Places,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Key counts
13,"Washermanpet, Washermanpet East",Indian Restaurant,Train Station,Hotel,Multiplex,Farmers Market,Coffee Shop,Sandwich Place,Bus Station,Nightclub,Restaurant,4.0
38,Rayapuram,Indian Restaurant,Train Station,Italian Restaurant,Hotel,Farmers Market,Multiplex,Electronics Store,Market,Snack Place,Motel,4.0
54,"Tondiarpet, Tondiarpet West, Tondiarpet Bazaar",Indian Restaurant,Hotel,Train Station,Multiplex,Farmers Market,Nightclub,Pizza Place,Museum,Market,Coffee Shop,4.0
58,"Vyasarpadi, Vyasar Nagar Colony",Indian Restaurant,Coffee Shop,Italian Restaurant,Train Station,Hotel,Multiplex,Farmers Market,Fast Food Restaurant,Food,Market,4.0


### Finally the dataframe is narrowed down to four areas in Chennai which are the best suitable for the client to start a Chinese Restaurant.

### Washermanpet area would be the best according to the data as it has Train Station, Multiplex and Market in the top 5. And also the bus station when considering top 10.