
# Table of Contents
 - ## Introduction
 - ## Objectives
 - ## Data
 - ## <a id="methodology">Methodology</a>
      - ### Analyze Kannur
      - ### K-mean Cluster Kannur
      - ### Analyze Kozhikode
      - ### K-mean Cluster Kozhikode
 - ## <a id="Results">Results</a>
 - ## <a id="Discussion">Discussion</a>
 - ## <a id="Conclusion">Conclusion</a>
 
**Introduction**

**Kannur** and **Kozhikode** are two major cities in Kerala, India. Both cities become a center of attention for residential, job employment, tourism, education, and shopping and sports activity. Both cities are well known in India.
Brief information about both cities:
**Kannur** is a city and a Municipal Corporation in Kannur district, state of Kerala, India. Kannur is the largest city of North Malabar region.[1] As of 2011 census population of Kannur was 232,486.[2] Kannur is one of the million-plus urban agglomerations in India with a population of 1,642,892 in 2011. (Source: https://en.wikipedia.org/wiki/Kannur)
**Kozhikode** , also known as Calicut, is an Indian city, second-largest urban agglomeration in the State of Kerala and 20th largest in the country with a population of 2 million as of 2011 (Source: https://en.wikipedia.org/wiki/Kozhikode)
Objective
In this project, we will study in details the area classification using Foursquare data and machine learning segmentation and clustering. The aim of this project is to segment areas of Kannur and Kozhikode based on the most common places captured from Foursquare.
Using segmentation and clustering, we hope we can determine:
1.The similarity or dissimilarity of both cities
2.Classification of area located inside the city whether it is residential, tourism places, or others

Data
The data acquired from wikipedia pages and restructure to csv file for easier manipulation and reading. Both files uploaded to my github for references. Link to the files are:
Another aspect to consider for this project is the foursquare data. I believe that the data as good as provided, meaning although we are using Foursquare data for segmentation and clustering, the amount and accuracy of data captured can't 100% determine correct classification in real world.
To start, let's get and look at the data. I've already downloaded it, so let's read it (from local drive) and load it to data frame:

In [1]:
#import the required library
import numpy as np
import pandas as pd

df_kn = pd.read_csv('Kannur.csv')
df_kn.head()

Unnamed: 0,Pincode,Taluk,Area
0,670571,Taliparamba,Alakode
1,670571,Taliparamba,Alakode kuttaramba
2,670008,Kannur,Alavil
3,670331,Kannur,Anchampeedika
4,670612,Thalassery,Anjarakandy


In [2]:
#examine data
print('Kannur dataframe has {} Taluk and {} areas.'.format(
        len(df_kn['Taluk'].unique()),
        df_kn.shape[0]
    )
)

#grouping data to find District with highest number of area
df_kn.groupby('Taluk').count()

Kannur dataframe has 7 Taluk and 378 areas.


Unnamed: 0_level_0,Pincode,Area
Taluk,Unnamed: 1_level_1,Unnamed: 2_level_1
Kannur,76,76
Taliaparamba,1,1
Taliparamba,152,152
Thalasery,3,3
Thalasseery,1,1
Thalassery,140,140


In [3]:
#read and load Kozhikode data
df_kz = pd.read_csv("Kozhikode.csv")
df_kz.head()

Unnamed: 0,Pincode,Taluk,Area
0,673586,Kozhikode,Adivaram Pudupadi
1,673602,Kozhikode,Alli
2,673603,Kozhikode,Anakampoyil
3,673028,Kozhikode,Arakinar
4,673572,Kozhikode,Avilora


In [4]:
#examine data
print('Kozhikode dataframe has {} Taluk and {} areas.'.format(
        len(df_kz['Taluk'].unique()),
        df_kz.shape[0]
    )
)

#group by district
df_kz.groupby('Taluk').count()

Kozhikode dataframe has 5 Taluk and 408 areas.


Unnamed: 0_level_0,Pincode,Area
Taluk,Unnamed: 1_level_1,Unnamed: 2_level_1
Koyilandi,36,36
Koyilani,1,1
Kozhikode,190,190
Quilandy,67,67
Vadakara,114,114


In [6]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [5]:
#now, using Geocoder and Google API, we get the Latitude and Longitude of each area
import geocoder
GOOGLE_API_KEY='AIzaSyA5iFHz6wtIVzu2UWFAQrs-0l8AwfRIUlE'  

#function to get latitude and longitude
def get_latlng(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Kerala'.format(postal_code), key=GOOGLE_API_KEY)
        lat_lng_coords = g.latlng
    return lat_lng_coords

#put new column of latitude and logitude into dataframe
postal_codes1 = df_kn['Area']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes1.tolist() ]

df_kn_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_kn['Latitude'] = df_kn_coords['Latitude']
df_kn['Longitude'] = df_kn_coords['Longitude']
df_kn.head(10)


Unnamed: 0,Pincode,Taluk,Area,Latitude,Longitude
0,670571,Taliparamba,Alakode,12.190978,75.467331
1,670571,Taliparamba,Alakode kuttaramba,12.190978,75.467331
2,670008,Kannur,Alavil,11.901894,75.346538
3,670331,Kannur,Anchampeedika,11.532411,75.730335
4,670612,Thalassery,Anjarakandy,11.885834,75.485089
5,670307,Taliparamba,Annur,12.132956,75.202438
6,670582,Taliparamba,Arang,8.556909,76.983869
7,670582,Taliparamba,Areekamala,12.132892,75.577157
8,670143,Taliparamba,Aril,10.850516,76.271083
9,670571,Taliparamba,Arivilanjapoyil,11.350407,75.914712


In [6]:

#new column for Kz dataframe
postal_codes2 = df_kz['Area']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes2.tolist() ]

df_kz_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_kz['Latitude'] = df_kz_coords['Latitude']
df_kz['Longitude'] = df_kz_coords['Longitude']
df_kz.head(10)

Unnamed: 0,Pincode,Taluk,Area,Latitude,Longitude
0,673586,Kozhikode,Adivaram Pudupadi,11.488039,76.013078
1,673602,Kozhikode,Alli,9.967117,76.288968
2,673603,Kozhikode,Anakampoyil,11.436896,76.058772
3,673028,Kozhikode,Arakinar,11.200921,75.79819
4,673572,Kozhikode,Avilora,11.383467,75.908301
5,673015,Kozhikode,Beypore North,11.173585,75.804002
6,673015,Kozhikode,Beypore,11.173585,75.804002
7,673018,Kozhikode,Calicut Arts & Science College,11.213722,75.798346
8,673032,Kozhikode,Calicut Beach,11.262623,75.767309
9,673020,Kozhikode,Calicut Civil Station,11.283644,75.791106


In [7]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [8]:

from geopy.geocoders import Nominatim
import folium

address = 'Kannur, Kerala'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Kannur using latitude and longitude values
map_kn = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_kn['Latitude'], df_kn['Longitude'], df_kn['Taluk'], df_kn['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kn)  
    
map_kn

  """


In [9]:
from geopy.geocoders import Nominatim
import folium
address = 'Kozhikode, Kerala'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Kozhikode using latitude and longitude values
map_kz = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_kz['Latitude'], df_kz['Longitude'], df_kz['Taluk'], df_kz['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kz)  
    
map_kz

  after removing the cwd from sys.path.



## [Methodology](#Methodology)
In this project, I will use the basic methodology as taught in Week 3 lab.

Above, we have converted addresses into their equivalent latitude and longitude values.
Then we will use the Foursquare API to explore neighborhoods in both cities, Kannur and Kozhikode
After that, explore function to get the most common venue categories in each neighborhood,
and then use this feature to group the neighborhoods into clusters
K-means clustering algorithm will be use to complete this task. And also, the Folium library to visualize the neighborhoods in Kannur and Kozhikode and their emerging clusters.

Based on dataframe analysis above, we found out that Taliparamba area in Kannur and Kozhikode area in Kozhikode are both have the highest number of area within it those district.

In [10]:
#slice the original dataframe and create a new dataframe of the Bukit Bintang
Taliparamba = df_kn[df_kn['Taluk'] == 'Taliparamba'].reset_index(drop=True)

#get the geographical coordinates of Taliparamba, Kannur
address = 'Taliparamba, Kannur'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Taliparamba using latitude and longitude values
map_taliparamba = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Taliparamba['Latitude'], Taliparamba['Longitude'], Taliparamba['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_taliparamba)  
    
map_taliparamba

  


In [11]:
from geopy.geocoders import Nominatim
import folium
#slice the original dataframe and create a new dataframe of the Kozhikode
kzk = df_kz[df_kz['Taluk'] == 'Kozhikode'].reset_index(drop=True)

#get the geographical coordinates of Kozhikode
address = 'Kozhikode, Kozhikode'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Kozhikode using latitude and longitude values
map_kzk = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(kzk['Latitude'], kzk['Longitude'], kzk['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kzk)  
    
map_kzk

  


Using Foursquare API to get venues at surounding area of both Taliparamba, Kannur and Kozhikode area.

In [13]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
CLIENT_ID = 'JC4JGWMCFVQFIOPW1E1YCWXFJI25VX252KUIK3KKWGKFVRGO' # your Foursquare ID
CLIENT_SECRET = 'TGDVQV4P2KILF1DGY00WJ0DPDPL5WVTZV0PUJUG3ZPDJ3OMT' # your Foursquare Secret
VERSION = '20180604'

#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = Taliparamba.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Taliparamba.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = Taliparamba.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Taliparamba within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Taliparamba, Kannur.'.format(nearby_venues.shape[0]))
nearby_venues.head()

1 venues were returned by Foursquare for Taliparamba, Kannur.


Unnamed: 0,name,categories,lat,lng
0,Hotel Plaza,Indian Restaurant,12.189457,75.467127


In [14]:
#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = kzk.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = kzk.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = kzk.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Marble Hill within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 3000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Kozhikode.'.format(nearby_venues.shape[0]))
nearby_venues.head()

4 venues were returned by Foursquare for Kozhikode.


Unnamed: 0,name,categories,lat,lng
0,Thamarassery Ghat,Mountain,11.495398,76.022222
1,Luncheon Restaurant,Indian Restaurant,11.484522,76.012101
2,Adivaram,Neighborhood,11.483999,76.012356
3,Thattukada,Juice Bar,11.484814,76.0098


In [15]:
#function to repeat the same process to all area
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#run the above function on each neighborhood and create a new dataframe
Taliparamba_venues = getNearbyVenues(names=Taliparamba['Area'],
                                   latitudes=Taliparamba['Latitude'],
                                   longitudes=Taliparamba['Longitude']
                                  )

#check the size of the resulting dataframe
print(Taliparamba_venues.shape)
Taliparamba_venues.head()

Alakode
Alakode kuttaramba
Annur
Arang
Areekamala
Aril
Arivilanjapoyil
Chamathachal
Chandanakampara
Chattuvapara
Chekkikulam
Cheleri
Chempanthotty
Chemperi
Chengalayi
Chepparapadava
Cherikode
Cherupazhassi
Cherupuzha
Chithapilapoyil
Chittodi
Chunda
Chundakunnu
Chuzhali
CRPF Camp Aravanchal
Edakkom
Edavaramba
Eramam Desom
Eruvassi
Eruvatty chapparapadava
Ettikulam
Ettukudukka
Ezhilode
Ezhimala Naval Academy
Irukkur
Josegiri
Kadannapally
Kaithapram
Kakkara
Kalliad
Kanamvayal
Kanayi
Kandakai
Kandankali
Kandoth
Kanhirangad
Kanjirakolly
Kankol
Kanul
Karanthat
Karimbam
Karippal
Karivellur
Karthikapuram
Karuvanchal
Kavvayi
Kayalampara
Kayaralam
Kokkanisseri
Kolacherry
Kootumugham
Kooveri
Korom
Kottayad
Koyyam
Kozhichal
Kozhummal
Kudiyanmala
Kunhimangalam
Kuniyampuzha
Kuniyan
Kuppam taliparamba
Kurumathur
Kusavanvayal
Kuttikol
Kuttiyattur
Kuttiyeri
Kuttur
Madayi
Malapattam
Manakadavu
Mandalam
Mandur kannur
Mathamangalam Bazar
Mathil
Mayyil
Morazha
Mullakodi
Muyyam
Naduvil
Nanichery
Nareekamval

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alakode,12.190978,75.467331,Hotel Plaza,12.189457,75.467127,Indian Restaurant
1,Alakode kuttaramba,12.190978,75.467331,Hotel Plaza,12.189457,75.467127,Indian Restaurant
2,Arang,8.556909,76.983869,"Vasantham videos,Eanikkara",8.559857,76.982046,Video Store
3,Aril,10.850516,76.271083,God's Own,10.853096,76.269638,Indian Restaurant
4,Chengalayi,12.044759,75.489816,Chengalayi Town,12.042412,75.490222,Plaza


In [16]:
#run the above function on each neighborhood and create a new dataframe
kzk_venues = getNearbyVenues(names=kzk['Area'],
                                   latitudes=kzk['Latitude'],
                                   longitudes=kzk['Longitude']
                                  )

#check the size of the resulting dataframe
print(kzk_venues.shape)
kzk_venues.head()

Adivaram Pudupadi
Alli
Anakampoyil
Arakinar
Avilora
Beypore North
Beypore
Calicut Arts & Science College
Calicut Beach
Calicut Civil Station
Calicut Collectorate
Calicut Courts
Calicut
Calicut Medical College
Calicut R.S.
Chalapuram
Chaliyam
Chamal
Chathamangalam
Chelannur
Chelavur
Chembu Kadavu
Chennamangallur
Cherooppa
Cherukulathur
Cheruvadi
Cheruvannur
Chevarambalam
Chevayur
Chulur
Devagiri College
East Hill
Edakkad West Hill
Edakkara Quilandy
Elathur Kozhikode
Elettil
Eranhikkal
Eranhipalam
Eravannur
Farook College
Feroke Pettah
Feroke
Govinda Puram
Guruvayurappan College
Iim Kozhikode Campus
Iringallur
Irivallur
Kadalundi
Kakkad Pudupadi
Kakkatampoyil
Kakkoti
Kakkur
Kallai kozhikode
Kallurutty
Kannan Kara
Kannancheri
Kanni Paramba
Kannoth
Karanthur
Karaparamba
Karasseri
Karuvampoyil
Karuvanthuruthy
Karuvasseri
Kattippara
Kayal
Kilakkoth
Kizhakkumuri
Kodiyathur
Kolathara
Kommeri
Konott
Koombara
Koombara Bazar
Kotancheri tamaracheri
Kottamparamba
Kottuli
Kotuvalli
Kudathai
Kudathai

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adivaram Pudupadi,11.488039,76.013078,Luncheon Restaurant,11.484522,76.012101,Indian Restaurant
1,Adivaram Pudupadi,11.488039,76.013078,Adivaram,11.483999,76.012356,Neighborhood
2,Alli,10.850516,76.271083,God's Own,10.853096,76.269638,Indian Restaurant
3,Arakinar,11.200921,75.79819,Calicut Foodbook,11.203721,75.800835,Hotel
4,Beypore North,11.173585,75.804002,Ration shop beypore,11.17702,75.806962,Grocery Store


In [17]:
#check how many venues were returned for each area
print('There are {} uniques categories in Kannur.'.format(len(Taliparamba_venues['Venue Category'].unique())))
Taliparamba_venues.groupby('Area').count()

There are 30 uniques categories in Kannur.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alakode,1,1,1,1,1,1
Alakode kuttaramba,1,1,1,1,1,1
Arang,1,1,1,1,1,1
Aril,1,1,1,1,1,1
Chengalayi,1,1,1,1,1,1
Cherikode,1,1,1,1,1,1
Cherupuzha,3,3,3,3,3,3
Chundakunnu,1,1,1,1,1,1
Irukkur,2,2,2,2,2,2
Kandankali,1,1,1,1,1,1


In [18]:
#check how many venues were returned for each area
print('There are {} uniques categories in Kozhikode.'.format(len(kzk_venues['Venue Category'].unique())))
kzk_venues.groupby('Area').count()

There are 65 uniques categories in Kozhikode.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adivaram Pudupadi,2,2,2,2,2,2
Alli,1,1,1,1,1,1
Arakinar,1,1,1,1,1,1
Beypore,1,1,1,1,1,1
Beypore North,1,1,1,1,1,1
...,...,...,...,...,...,...
Velliparamba,1,1,1,1,1,1
Vrindavan Colony,1,1,1,1,1,1
West Hill,5,5,5,5,5,5
West Hill Beach,3,3,3,3,3,3



## Analyze Kannur

In [19]:
# one hot encoding
Taliparamba_onehot = pd.get_dummies(Taliparamba_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Taliparamba_onehot['Area'] =Taliparamba_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [Taliparamba_onehot.columns[-1]] + list(Taliparamba_onehot.columns[:-1])
Taliparamba_onehot = Taliparamba_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(Taliparamba_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
Taliparamba_grouped = Taliparamba_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(Taliparamba_grouped.shape[0]))

57 rows were returned after one hot encoding.
29 rows were returned after grouping.


In [20]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in Taliparamba_grouped['Area']:
    print("----"+hood+"----")
    temp = Taliparamba_grouped[Taliparamba_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alakode----
                           venue  freq
0              Indian Restaurant   1.0
1                            ATM   0.0
2              Mobile Phone Shop   0.0
3  Vegetarian / Vegan Restaurant   0.0
4                  Train Station   0.0


----Alakode kuttaramba----
                           venue  freq
0              Indian Restaurant   1.0
1                            ATM   0.0
2              Mobile Phone Shop   0.0
3  Vegetarian / Vegan Restaurant   0.0
4                  Train Station   0.0


----Arang----
                             venue  freq
0                      Video Store   1.0
1                           Bakery   0.0
2    Vegetarian / Vegan Restaurant   0.0
3                    Train Station   0.0
4  Southern / Soul Food Restaurant   0.0


----Aril----
                           venue  freq
0              Indian Restaurant   1.0
1                            ATM   0.0
2              Mobile Phone Shop   0.0
3  Vegetarian / Vegan Restaurant   0.0
4              

In [51]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted1 = pd.DataFrame(columns=columns)
areas_venues_sorted1['Area'] = Taliparamba_grouped['Area']

for ind in np.arange(Taliparamba_grouped.shape[0]):
    areas_venues_sorted1.iloc[ind, 1:] = return_most_common_venues(Taliparamba_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted1.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Alakode,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café
1,Alakode kuttaramba,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café
2,Arang,Video Store,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café
3,Aril,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café
4,Chengalayi,Plaza,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café



K-mean Cluster Kannur

In [53]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

Taliparamba_grouped_clustering = Taliparamba_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Taliparamba_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
Taliparamba_merged = Taliparamba_grouped
#Taliparamba_grouped_clustering

# add clustering labels
Taliparamba_merged['Cluster Labels'] = kmeans.labels_

# merge Taliparamba_grouped with Taliparamba_data to add latitude/longitude for each neighborhood
Taliparamba_merged = Taliparamba_merged.join(areas_venues_sorted1.set_index('Area'), on='Area')
Taliparamba_merged['Latitude'] = df_kn_coords['Latitude']
Taliparamba_merged['Longitude'] = df_kn_coords['Longitude']
Taliparamba_merged.head()

Unnamed: 0,Area,ATM,Bakery,Basketball Court,Burger Joint,Bus Station,Café,Clothing Store,Coffee Shop,Cosmetics Shop,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
0,Alakode,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.190978,75.467331
1,Alakode kuttaramba,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.190978,75.467331
2,Arang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Video Store,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.901894,75.346538
3,Aril,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.532411,75.730335
4,Chengalayi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Plaza,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.885834,75.485089


In [55]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map 12.0351° N, 75.3611° E
tp_clusters = folium.Map(location=[12.03515, 75.3611], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Taliparamba_merged['Latitude'], Taliparamba_merged['Longitude'], Taliparamba_merged['Area'], Taliparamba_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(tp_clusters)
       
tp_clusters

## Analyze Kozhikode

In [24]:
# one hot encoding
kzk_onehot = pd.get_dummies(kzk_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kzk_onehot['Area'] = kzk_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [kzk_onehot.columns[-1]] + list(kzk_onehot.columns[:-1])
kzk_onehot = kzk_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(kzk_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
kzk_grouped = kzk_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(kzk_grouped.shape[0]))

215 rows were returned after one hot encoding.
77 rows were returned after grouping.


In [25]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in kzk_grouped['Area']:
    print("----"+hood+"----")
    temp = kzk_grouped[kzk_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adivaram Pudupadi----
               venue  freq
0  Indian Restaurant   0.5
1       Neighborhood   0.5
2     Ice Cream Shop   0.0
3      Jewelry Store   0.0
4          Juice Bar   0.0


----Alli----
               venue  freq
0  Indian Restaurant   1.0
1     Ice Cream Shop   0.0
2      Jewelry Store   0.0
3          Juice Bar   0.0
4  Kerala Restaurant   0.0


----Arakinar----
               venue  freq
0              Hotel   1.0
1     Ice Cream Shop   0.0
2      Jewelry Store   0.0
3          Juice Bar   0.0
4  Kerala Restaurant   0.0


----Beypore----
           venue  freq
0  Grocery Store   1.0
1            ATM   0.0
2       Pharmacy   0.0
3  Jewelry Store   0.0
4      Juice Bar   0.0


----Beypore North----
           venue  freq
0  Grocery Store   1.0
1            ATM   0.0
2       Pharmacy   0.0
3  Jewelry Store   0.0
4      Juice Bar   0.0


----Calicut----
                           venue  freq
0              Indian Restaurant  0.18
1                          Hotel  0.12
2

In [47]:
#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted2 = pd.DataFrame(columns=columns)
areas_venues_sorted2['Area'] = kzk_grouped['Area']

for ind in np.arange(kzk_grouped.shape[0]):
    areas_venues_sorted2.iloc[ind, 1:] = return_most_common_venues(kzk_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted2.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Adivaram Pudupadi,Cluster Labels,Indian Restaurant,Neighborhood,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop
1,Alli,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store
2,Arakinar,Hotel,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store
3,Beypore,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store
4,Beypore North,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store


## K-mean Cluster Kozhikode

In [54]:
# set number of clusters
kclusters = 3

kzk_grouped_clustering = kzk_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kzk_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kzk_merged = kzk_grouped

# add clustering labels
kzk_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
kzk_merged = kzk_merged.join(areas_venues_sorted2.set_index('Area'), on='Area')
kzk_merged['Latitude'] = df_kz_coords['Latitude']
kzk_merged['Longitude'] = df_kz_coords['Longitude']
kzk_merged.head() # check the last columns!


Unnamed: 0,Area,ATM,Accessories Store,Airport,Airport Terminal,Arts & Crafts Store,Asian Restaurant,Bakery,Beach,Boat or Ferry,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
0,Adivaram Pudupadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Neighborhood,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop,11.488039,76.013078
1,Alli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,10.850516,76.271083
2,Arakinar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Hotel,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.436896,76.058772
3,Beypore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store,11.200921,75.79819
4,Beypore North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store,11.383467,75.908301


In [28]:
#Finally, let's visualize the resulting clusters
# create map
kzk_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kzk_merged['Latitude'], kzk_merged['Longitude'], kzk_merged['Area'], kzk_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(kzk_clusters)
       
kzk_clusters

## [Results](#Results)

In [62]:
#Cluster 1 for Kannur
Taliparamba_merged.loc[Taliparamba_merged['Cluster Labels'] == 0, Taliparamba_merged.columns[[0] + list(range(5, Taliparamba_merged.shape[1]))]]
#Taliparamba_merged

Unnamed: 0,Area,Bus Station,Café,Clothing Store,Coffee Shop,Cosmetics Shop,Currency Exchange,Fabric Shop,Fast Food Restaurant,Gym / Fitness Center,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
2,Arang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Video Store,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.901894,75.346538
4,Chengalayi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Plaza,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.885834,75.485089
6,Cherupuzha,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Bus Station,Music Store,Indian Restaurant,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,8.556909,76.983869
7,Chundakunnu,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,Currency Exchange,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.132892,75.577157
8,Irukkur,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Bus Station,Playground,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Café,10.850516,76.271083
10,Kandoth,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,Cosmetics Shop,Cluster Labels,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.984557,75.380936
11,Kokkanisseri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Basketball Court,Cluster Labels,Jewelry Store,Bakery,Burger Joint,Bus Station,Café,Clothing Store,11.867886,75.431233
12,Kottayad,0.1,0.0,0.2,0.1,0.0,0.0,0.0,0.0,0.0,...,Bakery,Clothing Store,Jewelry Store,Vegetarian / Vegan Restaurant,Shopping Mall,Bus Station,Coffee Shop,Outlet Mall,11.917065,75.335387
13,Kunhimangalam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Train Station,Platform,Cluster Labels,Indian Restaurant,Bakery,Basketball Court,Burger Joint,Bus Station,11.917065,75.335387
14,Kuniyampuzha,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,Motorcycle Shop,Indian Restaurant,Bus Station,Café,Restaurant,Gym / Fitness Center,Cluster Labels,Bakery,11.908579,75.332472


In [63]:
#Cluster 2 for Kannur
Taliparamba_merged.loc[Taliparamba_merged['Cluster Labels'] == 1, Taliparamba_merged.columns[[0] + list(range(5, Taliparamba_merged.shape[1]))]]

Unnamed: 0,Area,Bus Station,Café,Clothing Store,Coffee Shop,Cosmetics Shop,Currency Exchange,Fabric Shop,Fast Food Restaurant,Gym / Fitness Center,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
0,Alakode,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.190978,75.467331
1,Alakode kuttaramba,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.190978,75.467331
3,Aril,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.532411,75.730335
5,Cherikode,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.132956,75.202438
9,Kandankali,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.350407,75.914712
18,Nareekamvally,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Bus Station,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Café,11.883045,75.348506
20,Pathampara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,12.111562,75.608442
23,Pazhassikari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.946222,75.414078
24,Pilathara,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Bus Station,Indian Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Café,12.089595,75.49564


In [64]:
#Cluster 3 for Kannur
Taliparamba_merged.loc[Taliparamba_merged['Cluster Labels'] == 2, Taliparamba_merged.columns[[0] + list(range(5, Taliparamba_merged.shape[1]))]]

Unnamed: 0,Area,Bus Station,Café,Clothing Store,Coffee Shop,Cosmetics Shop,Currency Exchange,Fabric Shop,Fast Food Restaurant,Gym / Fitness Center,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
21,Payyan R.S.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,Cluster Labels,Fast Food Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.958785,75.467972
22,Payyavur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,Cluster Labels,Fast Food Restaurant,Jewelry Store,Bakery,Basketball Court,Burger Joint,Bus Station,Café,11.949375,75.447582


In [66]:
#Cluster 1 for Kozhikode
kzk_merged.loc[kzk_merged['Cluster Labels'] == 0, kzk_merged.columns[[0] + list(range(5, kzk_merged.shape[1]))]]

Unnamed: 0,Area,Arts & Crafts Store,Asian Restaurant,Bakery,Beach,Boat or Ferry,Boutique,Bowling Alley,Bridal Shop,Burger Joint,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
2,Arakinar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Hotel,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.436896,76.058772
3,Beypore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store,11.200921,75.79819
4,Beypore North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Grocery Store,Cluster Labels,Café,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Furniture / Home Store,11.383467,75.908301
5,Calicut,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,...,Indian Restaurant,Hotel,Fast Food Restaurant,Juice Bar,Mobile Phone Shop,Pizza Place,Dessert Shop,Bowling Alley,11.173585,75.804002
7,Calicut Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Ice Cream Shop,Indian Restaurant,Restaurant,Café,Cluster Labels,Fast Food Restaurant,Coffee Shop,Convenience Store,11.213722,75.798346
10,Calicut Courts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Bus Stop,Kerala Restaurant,Middle Eastern Restaurant,Cluster Labels,Fish Market,Convenience Store,Department Store,Dessert Shop,11.284319,75.791957
12,Calicut R.S.,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Movie Theater,Indian Restaurant,Platform,Market,Asian Restaurant,Hotel,Train Station,Juice Bar,11.258753,75.78041
15,Cheruvadi,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,Boutique,Cluster Labels,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop,Fast Food Restaurant,11.240626,75.790925
19,East Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Airport,Movie Theater,Cluster Labels,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop,11.357631,75.807721
20,Edakkad West Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Café,Cluster Labels,Harbor / Marina,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.26957,75.825751


In [67]:
#Cluster 2 for Kozhikode
kzk_merged.loc[kzk_merged['Cluster Labels'] == 1, kzk_merged.columns[[0] + list(range(5, kzk_merged.shape[1]))]]

Unnamed: 0,Area,Arts & Crafts Store,Asian Restaurant,Bakery,Beach,Boat or Ferry,Boutique,Bowling Alley,Bridal Shop,Burger Joint,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
0,Adivaram Pudupadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Neighborhood,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop,11.488039,76.013078
1,Alli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,10.850516,76.271083
11,Calicut Medical College,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Clothing Store,Cafeteria,Airport Terminal,Coffee Shop,Accessories Store,Halal Restaurant,11.27579,75.782236
14,Chennamangallur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.248657,75.779806
39,Kutaranni,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.192389,75.85049
41,Mailellampara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.173455,75.835243
43,Marikunnu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.233603,75.822446
57,Puduppadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Café,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Furniture / Home Store,11.455389,75.997118


In [68]:
#Cluster 3 for Kozhikode
kzk_merged.loc[kzk_merged['Cluster Labels'] == 2, kzk_merged.columns[[0] + list(range(5, kzk_merged.shape[1]))]]

Unnamed: 0,Area,Arts & Crafts Store,Asian Restaurant,Bakery,Beach,Boat or Ferry,Boutique,Bowling Alley,Bridal Shop,Burger Joint,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,Latitude,Longitude
6,Calicut Arts & Science College,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Market,Bakery,Furniture / Home Store,Gym,Grocery Store,Gym / Fitness Center,Halal Restaurant,11.173585,75.804002
8,Calicut Civil Station,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Restaurant,Furniture / Home Store,Bakery,Fast Food Restaurant,Coffee Shop,Convenience Store,Department Store,11.262623,75.767309
9,Calicut Collectorate,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Restaurant,Furniture / Home Store,Bakery,Fast Food Restaurant,Coffee Shop,Convenience Store,Department Store,11.283644,75.791106
13,Chelavur,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Restaurant,Bakery,Café,Fast Food Restaurant,Coffee Shop,Convenience Store,11.272211,75.837198
16,Cheruvannur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,River,Furniture / Home Store,Fast Food Restaurant,Clothing Store,Coffee Shop,Convenience Store,Department Store,11.155988,75.811267
17,Chevarambalam,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Gym,Bakery,Fish Market,Coffee Shop,Convenience Store,Department Store,Dessert Shop,11.46284,75.943003
18,Chevayur,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Indian Restaurant,Restaurant,Bakery,Café,Fast Food Restaurant,Coffee Shop,Convenience Store,11.294967,75.914387
27,Kakkoti,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Bakery,Fried Chicken Joint,Coffee Shop,Convenience Store,Department Store,Dessert Shop,Fast Food Restaurant,11.277539,75.811267
29,Kannancheri,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Bakery,Gym / Fitness Center,Indian Restaurant,Supermarket,Juice Bar,Market,Fish Market,11.296241,75.940664
31,Karaparamba,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,Cluster Labels,Men's Store,Fast Food Restaurant,Bakery,Coffee Shop,Convenience Store,Department Store,Dessert Shop,11.29032,75.774641


## [Discussion](#Discussion)
Based on cluster for each cities above, we believe that classification for each cluster can be done better with calculation of venues categories (most common) in each cities. Referring to each cluster, we can't determine clearly what represent in each cluster by using Foursquare - Most Common Venue data.

However, for the sake of this project we assumed each cluster as follow:
Cluster 1: Kannur: Tourism
Cluster 2: Kannur: Residential
Cluster 3: Kannur: Mix
Cluster 1: Kozhikode: Tourism
Cluster 2: Kozhikode: Residential
Cluster 3: Kozhikode: Mix
What is lacking at this point is a systematic, quantitative way to identify and distinguish different district and to describe the correlation most common venues as recorded in Foursquare. The reality is however more complex: similar cities might have or might not have similar common venues. A further step in this classification would be to find a method to extract these common venues and integrate the spatial correlations between different of areas or district.

We believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the different cities. Further studies are indeed needed in order to relate the data acquired, then observe it to more meaningful and objective results.

## [Conclusion](#Conclusion)
Using Foursquare API, we can captured data of common places all around the world. Using it, we refer back to our main objectives, which is to determine; the similarity or dissimilarity of both cities classification of area located inside the city whether it is residential, tourism places, or others
In conclusion, both cities Kannur and Kozhikode are the center of attraction among Kerala. However, to declare both cities are similar or dissimilar base on common venues visited is quite difficult. Both cities is similar in some venues also dissimilar in certain venues. And for classification based on common venues, again we must have more systematic or quantitative way to identify and declare this. Comparison can be made, but no such method or quantitative data to determine this. We hope in the future, a method to determine it can be establish and explore for references.

Thank you,
Sharika
Kannur