# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find the air quality index of Kolkata, India and it's relationship with the neighbourhoods of the city. Specifically, this report will be targeted to stakeholders interested in **climate change** and ever increasing concern of **degrading air quality** ie. Government, NGOs or independent organizations.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Data <a name="data"></a>

Based on definition of the problem, factors that will influence the decission are:
* PM 2.5 readings from a particular neighbourhood
* intrinsic characteristic of neighbourhood influencing air quality
* location data of the neighbourhood

I decided to use coordinates to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* coordinates of Kolkata **geolocator**
* different venues and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Kolkata neighbourhoods and their respective PM 2.5 emission data from the a csv I found online from https://cleair.io/

### Finding the latitude and longitude of Kolkata using geolocator

In [2]:
address = 'Kolkata, IN'

geolocator = Nominatim(user_agent="kol_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kolkata are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kolkata are 22.54541245, 88.3567751581234.


### Loading the pollution dataset of Kolkata

In [43]:
df = pd.read_csv("locations.csv")
df.rename(columns={"icon":"Neighborhood"}, inplace = True) 
#df.drop(df.columns[3], axis = 1, inplace = True) 
df

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5
0,Victoria Memorial,88.34556,22.545673,92.56
1,Howrah Station,88.3373,22.583,156.78
2,Taratala Road (Marine Engineering & Research I...,88.309656,22.515289,45.65
3,Chetla (Desher Khabar),88.337631,22.51727,115.95
4,Lords More,88.357841,22.502047,84.96
5,"Adarsha Palli (Ray Bahadur Road, Lions Club)",88.327682,22.499827,65.96
6,City Centre 2,88.4501,22.6223,88.65
7,Karunamoyee Crossing,88.4214,22.5865,101.27
8,Pallisree (Nabarun Club),88.375265,22.483984,73.69
9,Garia (Depot),88.377689,22.465832,91.68


### The data gives us a neighbourhood, it's coordinates and their average PM2.5 emissions over a certain period of time. Let's visualize it

In [86]:
# create map of Kolkata using latitude and longitude values
map_kolkata = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# add markers to map
for lat, lng, neighborhood, pm in zip(df['latitude'], df['longitude'], df['Neighborhood'], df['PM2.5']):
    label = '{}, {}'.format(neighborhood, pm)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kolkata)  
    
map_kolkata

### Initializing foursquare credentials

In [45]:
CLIENT_ID = '1YIN3BTQWIJL3VGWHDWYT3ECVGNG0QEWSAMZ0S3QAOUXPGOY' # your Foursquare ID
CLIENT_SECRET = 'VBOXXITNIQNFPPQ0Y0N4OPWN4Q4ZOAERS3UYRWWX2YMGV4I0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1YIN3BTQWIJL3VGWHDWYT3ECVGNG0QEWSAMZ0S3QAOUXPGOY
CLIENT_SECRET:VBOXXITNIQNFPPQ0Y0N4OPWN4Q4ZOAERS3UYRWWX2YMGV4I0


### Checking the coordinates of the first neighborhood on the dataframe

In [46]:
neighborhood_latitude = df.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Victoria Memorial are 22.5456733, 88.3455603.


### Initializing the url for GET call to foursquare

In [47]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=1YIN3BTQWIJL3VGWHDWYT3ECVGNG0QEWSAMZ0S3QAOUXPGOY&client_secret=VBOXXITNIQNFPPQ0Y0N4OPWN4Q4ZOAERS3UYRWWX2YMGV4I0&v=20180605&ll=22.5456733,88.3455603&radius=500&limit=100'

In [48]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb2a786963d29001b37a240'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Park Street - Taltola - Shakespeare Sarani',
  'headerFullLocation': 'Park Street - Taltola - Shakespeare Sarani, Kolkata',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 11,
  'suggestedBounds': {'ne': {'lat': 22.550173304500007,
    'lng': 88.35042358284053},
   'sw': {'lat': 22.541173295499995, 'lng': 88.34069701715947}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c0218258ef2c9b66d9c16fc',
       'name': 'Victoria Memorial',
       'location': {'address': "1, Queen's Way",
        'lat': 22.545844129353117,
        'lng': 88.34289036897952,
   

## Methodology <a name="methodology"></a>

In the first step, we have **collected the required data**: location and names of every venue in our given neighbourhoods. 

In the second step, we have **found the types, categories of all venues** (according to Foursquare categorization).

In the third step, we will **explore the neighbourhoods and their venue characteristics**, so as to give us an idea as to what kind of a neighbourhood it is

In the fourth step, we will **explore each neighbourhood separately** and find the top 10 venues of each neighbourhood

In the fifth step, we will perform **k-means clustering** on the data so as to **find clusters of similar neighbourhoods** together

In the sixth step, we will find **the average PM 2.5 ratings** of these neighbourhoods, whether they are **similar for a given cluster and how they differ from cluster to cluster.**

In the seventh step, we will find what the **venue data says about the neighbourhood** and how it is **related to the PM 2.5 emissions of the same.**

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. 

In [49]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Getting the top few venues around Victoria Memorial, the first neighbourhood on our df

In [50]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Victoria Memorial,History Museum,22.545844,88.34289
1,Maidan,Field,22.549906,88.344219
2,Kenilworth Hotel,Hotel,22.546211,88.350133
3,Academy of Fine Arts,Art Gallery,22.543275,88.345138
4,Nandan,Indie Theater,22.542034,88.34544


In [51]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

11 venues were returned by Foursquare.


### Exploring the neighbourhoods of Kolkata

In [52]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [53]:
kolkata_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Victoria Memorial
Howrah Station
Taratala Road (Marine Engineering & Research Institute)
Chetla (Desher Khabar)
Lords More
Adarsha Palli (Ray Bahadur Road, Lions Club)
City Centre 2
Karunamoyee Crossing
Pallisree (Nabarun Club)
Garia (Depot)
Ajoynagar
Ruby More
Safui Para
Ballygunge Phari
Topsia more
Moulali (Kolkata Youth Center)
Beleghata (Building more)
Esplanade (park in front of Victoria House)
Phoolbagan
Ultadanga (below foot bridge)
Girish Park
Shyambazar (Five Points)
BNR (Engine Gate)
Rabindra Bharati University
IIM Calcutta
Asoka Cinema Hall
Sodepur
Thakurpukur Cancer Hospital
Santragachi Station
Mandirtala
Indian Botanic Garden
Sonarpur
Agarpara
Dhalai Bridge
Kavi Subhash metro station
Mother's Wax Museum
TCS Gitanjali Park
Sarsuna
One Rajarhat
Cossipore Gun Shell Factory
Khardaha
Belgharia Head Post Office
Birati
St. Xavier's University, Kolkata


### A list of all the neighbourhoods, the venues they have and their categories

In [54]:
print(kolkata_venues.shape)
kolkata_venues.head()

(213, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Memorial,22.545673,88.34556,Victoria Memorial,22.545844,88.34289,History Museum
1,Victoria Memorial,22.545673,88.34556,Maidan,22.549906,88.344219,Field
2,Victoria Memorial,22.545673,88.34556,Kenilworth Hotel,22.546211,88.350133,Hotel
3,Victoria Memorial,22.545673,88.34556,Academy of Fine Arts,22.543275,88.345138,Art Gallery
4,Victoria Memorial,22.545673,88.34556,Nandan,22.542034,88.34544,Indie Theater


In [55]:
kolkata_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adarsha Palli (Ray Bahadur Road, Lions Club)",1,1,1,1,1,1
Agarpara,1,1,1,1,1,1
Ajoynagar,4,4,4,4,4,4
Asoka Cinema Hall,4,4,4,4,4,4
BNR (Engine Gate),2,2,2,2,2,2
Ballygunge Phari,11,11,11,11,11,11
Beleghata (Building more),4,4,4,4,4,4
Belgharia Head Post Office,2,2,2,2,2,2
Birati,3,3,3,3,3,3
Chetla (Desher Khabar),6,6,6,6,6,6


In [56]:
print('There are {} uniques categories.'.format(len(kolkata_venues['Venue Category'].unique())))

There are 76 uniques categories.


### Now analyzing each neighbourhood

In [57]:
# one hot encoding
kolkata_onehot = pd.get_dummies(kolkata_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kolkata_onehot['Neighborhood'] = kolkata_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kolkata_onehot.columns[-1]] + list(kolkata_onehot.columns[:-1])
kolkata_onehot = kolkata_onehot[fixed_columns]

kolkata_onehot.head()

Unnamed: 0,Watch Shop,ATM,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Bakery,Bank,Bengali Restaurant,Bookstore,Botanical Garden,Breakfast Spot,Brewery,Bus Station,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,Field,Flea Market,Food Court,Food Truck,Furniture / Home Store,Gym,History Museum,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Italian Restaurant,Jewelry Store,Karaoke Bar,Market,Men's Store,Metro Station,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Neighborhood,Park,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Plaza,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Shopping Mall,South Indian Restaurant,Tea Room,Tex-Mex Restaurant,Tibetan Restaurant,Train Station,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Memorial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Memorial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Memorial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Memorial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Memorial,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [58]:
kolkata_onehot.shape

(213, 76)

### Grouping the rows by neighborhood by taking the mean of the frequency of occurrence of each category

In [59]:
kolkata_grouped = kolkata_onehot.groupby('Neighborhood').mean().reset_index()
kolkata_grouped

Unnamed: 0,Neighborhood,Watch Shop,ATM,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Bakery,Bank,Bengali Restaurant,Bookstore,Botanical Garden,Breakfast Spot,Brewery,Bus Station,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,Field,Flea Market,Food Court,Food Truck,Furniture / Home Store,Gym,History Museum,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Italian Restaurant,Jewelry Store,Karaoke Bar,Market,Men's Store,Metro Station,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Park,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Plaza,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Shopping Mall,South Indian Restaurant,Tea Room,Tex-Mex Restaurant,Tibetan Restaurant,Train Station,Vegetarian / Vegan Restaurant
0,"Adarsha Palli (Ray Bahadur Road, Lions Club)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agarpara,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ajoynagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Asoka Cinema Hall,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BNR (Engine Gate),0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ballygunge Phari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909
6,Beleghata (Building more),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
7,Belgharia Head Post Office,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Birati,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0
9,Chetla (Desher Khabar),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [60]:
kolkata_grouped.shape

(44, 76)

### Each neighborhood along with the top 5 most common venues

In [61]:
num_top_venues = 5

for hood in kolkata_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = kolkata_grouped[kolkata_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adarsha Palli (Ray Bahadur Road, Lions Club)----
                venue  freq
0        Dance Studio   1.0
1          Watch Shop   0.0
2              Market   0.0
3  Mughlai Restaurant   0.0
4       Movie Theater   0.0


----Agarpara----
                venue  freq
0                 ATM   1.0
1         Karaoke Bar   0.0
2  Mughlai Restaurant   0.0
3       Movie Theater   0.0
4     Motorcycle Shop   0.0


----Ajoynagar----
                  venue  freq
0  Fast Food Restaurant  0.25
1    Mughlai Restaurant  0.25
2           Bus Station  0.25
3                Bakery  0.25
4            Watch Shop  0.00


----Asoka Cinema Hall----
                 venue  freq
0                  ATM  0.25
1   Mughlai Restaurant  0.25
2    Indian Restaurant  0.25
3  Indie Movie Theater  0.25
4               Market  0.00


----BNR (Engine Gate)----
                venue  freq
0                 ATM   1.0
1         Karaoke Bar   0.0
2  Mughlai Restaurant   0.0
3       Movie Theater   0.0
4     Motorcycle Shop 

In [62]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### A dataframe to display the top 10 venues for each neighborhood

In [63]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = kolkata_grouped['Neighborhood']

for ind in np.arange(kolkata_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(kolkata_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adarsha Palli (Ray Bahadur Road, Lions Club)",Dance Studio,Vegetarian / Vegan Restaurant,Clothing Store,Coffee Shop,Convenience Store,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant
1,Agarpara,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
2,Ajoynagar,Bus Station,Fast Food Restaurant,Bakery,Mughlai Restaurant,Vegetarian / Vegan Restaurant,Dhaba,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
3,Asoka Cinema Hall,ATM,Indian Restaurant,Indie Movie Theater,Mughlai Restaurant,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
4,BNR (Engine Gate),ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store


### Running k-means to cluster the neighborhood into 5 clusters

In [64]:
# set number of clusters
kclusters = 5

kolkata_grouped_clustering = kolkata_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kolkata_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 1, 1, 2, 1, 1, 2, 1, 1])

In [65]:
kolkata_grouped_clustering

Unnamed: 0,Watch Shop,ATM,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Bakery,Bank,Bengali Restaurant,Bookstore,Botanical Garden,Breakfast Spot,Brewery,Bus Station,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,Field,Flea Market,Food Court,Food Truck,Furniture / Home Store,Gym,History Museum,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Italian Restaurant,Jewelry Store,Karaoke Bar,Market,Men's Store,Metro Station,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Park,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Plaza,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Shopping Mall,South Indian Restaurant,Tea Room,Tex-Mex Restaurant,Tibetan Restaurant,Train Station,Vegetarian / Vegan Restaurant
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
7,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### A new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [66]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

kolkata_merged = df

# merge kolkata_grouped with kolkata_data to add latitude/longitude for each neighborhood
kolkata_merged = kolkata_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

kolkata_merged.head() 

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Victoria Memorial,88.34556,22.545673,92.56,1,Shopping Mall,Food Court,Performing Arts Venue,History Museum,Art Gallery,Hotel,Field,Planetarium,Indie Theater,Coffee Shop
1,Howrah Station,88.3373,22.583,156.78,1,Platform,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Dumpling Restaurant
2,Taratala Road (Marine Engineering & Research I...,88.309656,22.515289,45.65,2,ATM,Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba
3,Chetla (Desher Khabar),88.337631,22.51727,115.95,1,Indian Sweet Shop,Bengali Restaurant,Park,Pharmacy,Jewelry Store,Vegetarian / Vegan Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
4,Lords More,88.357841,22.502047,84.96,1,Café,Clothing Store,Chinese Restaurant,Dumpling Restaurant,Multiplex,Department Store,Indian Restaurant,Plaza,Coffee Shop,Sandwich Place


In [67]:
kolkata_merged

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Victoria Memorial,88.34556,22.545673,92.56,1,Shopping Mall,Food Court,Performing Arts Venue,History Museum,Art Gallery,Hotel,Field,Planetarium,Indie Theater,Coffee Shop
1,Howrah Station,88.3373,22.583,156.78,1,Platform,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Dumpling Restaurant
2,Taratala Road (Marine Engineering & Research I...,88.309656,22.515289,45.65,2,ATM,Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba
3,Chetla (Desher Khabar),88.337631,22.51727,115.95,1,Indian Sweet Shop,Bengali Restaurant,Park,Pharmacy,Jewelry Store,Vegetarian / Vegan Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
4,Lords More,88.357841,22.502047,84.96,1,Café,Clothing Store,Chinese Restaurant,Dumpling Restaurant,Multiplex,Department Store,Indian Restaurant,Plaza,Coffee Shop,Sandwich Place
5,"Adarsha Palli (Ray Bahadur Road, Lions Club)",88.327682,22.499827,65.96,0,Dance Studio,Vegetarian / Vegan Restaurant,Clothing Store,Coffee Shop,Convenience Store,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant
6,City Centre 2,88.4501,22.6223,88.65,1,Fast Food Restaurant,Restaurant,Watch Shop,Hotel,Bookstore,Department Store,Indian Restaurant,Pizza Place,Cocktail Bar,Asian Restaurant
7,Karunamoyee Crossing,88.4214,22.5865,101.27,1,Food Truck,Bus Station,Park,Arts & Crafts Store,Fast Food Restaurant,Market,Vegetarian / Vegan Restaurant,Department Store,Convenience Store,Dance Studio
8,Pallisree (Nabarun Club),88.375265,22.483984,73.69,4,Department Store,Vegetarian / Vegan Restaurant,Clothing Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Dhaba,Discount Store,Dumpling Restaurant
9,Garia (Depot),88.377689,22.465832,91.68,1,Multicuisine Indian Restaurant,Chinese Restaurant,Metro Station,Plaza,Flea Market,Field,Fast Food Restaurant,Electronics Store,Dumpling Restaurant,Discount Store


### Visualizing the resultant clusters

In [88]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kolkata_merged['latitude'], kolkata_merged['longitude'], kolkata_merged['Neighborhood'], kolkata_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

### Analyzing each cluster separately

#### Cluster 1

In [69]:
c0=kolkata_merged.loc[kolkata_merged['Cluster Labels'] == 0]
c0

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,"Adarsha Palli (Ray Bahadur Road, Lions Club)",88.327682,22.499827,65.96,0,Dance Studio,Vegetarian / Vegan Restaurant,Clothing Store,Coffee Shop,Convenience Store,Deli / Bodega,Department Store,Dhaba,Discount Store,Dumpling Restaurant


In [70]:
c0['PM2.5'].mean()

65.96

#### Cluster 2

In [71]:
c1 = kolkata_merged.loc[kolkata_merged['Cluster Labels'] == 1]
c1

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Victoria Memorial,88.34556,22.545673,92.56,1,Shopping Mall,Food Court,Performing Arts Venue,History Museum,Art Gallery,Hotel,Field,Planetarium,Indie Theater,Coffee Shop
1,Howrah Station,88.3373,22.583,156.78,1,Platform,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Dumpling Restaurant
3,Chetla (Desher Khabar),88.337631,22.51727,115.95,1,Indian Sweet Shop,Bengali Restaurant,Park,Pharmacy,Jewelry Store,Vegetarian / Vegan Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
4,Lords More,88.357841,22.502047,84.96,1,Café,Clothing Store,Chinese Restaurant,Dumpling Restaurant,Multiplex,Department Store,Indian Restaurant,Plaza,Coffee Shop,Sandwich Place
6,City Centre 2,88.4501,22.6223,88.65,1,Fast Food Restaurant,Restaurant,Watch Shop,Hotel,Bookstore,Department Store,Indian Restaurant,Pizza Place,Cocktail Bar,Asian Restaurant
7,Karunamoyee Crossing,88.4214,22.5865,101.27,1,Food Truck,Bus Station,Park,Arts & Crafts Store,Fast Food Restaurant,Market,Vegetarian / Vegan Restaurant,Department Store,Convenience Store,Dance Studio
9,Garia (Depot),88.377689,22.465832,91.68,1,Multicuisine Indian Restaurant,Chinese Restaurant,Metro Station,Plaza,Flea Market,Field,Fast Food Restaurant,Electronics Store,Dumpling Restaurant,Discount Store
10,Ajoynagar,88.394577,22.488743,80.14,1,Bus Station,Fast Food Restaurant,Bakery,Mughlai Restaurant,Vegetarian / Vegan Restaurant,Dhaba,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega
11,Ruby More,88.401792,22.513483,124.49,1,Café,Sandwich Place,Coffee Shop,Bus Station,Indian Sweet Shop,Plaza,Hotel,Fast Food Restaurant,Field,Flea Market
12,Safui Para,88.384202,22.502452,78.21,1,Market,IT Services,Bus Station,Vegetarian / Vegan Restaurant,Discount Store,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba


In [72]:
c1['PM2.5'].mean()

95.92709677419352

#### Cluster 3

In [73]:
c2 = kolkata_merged.loc[kolkata_merged['Cluster Labels'] == 2]
c2

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Taratala Road (Marine Engineering & Research I...,88.309656,22.515289,45.65,2,ATM,Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba
22,BNR (Engine Gate),88.310434,22.54335,48.97,2,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
23,Rabindra Bharati University,88.373197,22.622906,30.32,2,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
24,IIM Calcutta,88.2997,22.4447,42.36,2,ATM,Sandwich Place,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba
27,Thakurpukur Cancer Hospital,88.3197,22.4591,50.69,2,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
32,Agarpara,88.3931,22.6834,57.19,2,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
33,Dhalai Bridge,88.3927,22.466,37.64,2,ATM,Metro Station,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
39,Cossipore Gun Shell Factory,88.3709,22.6194,43.67,2,ATM,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Discount Store
41,Belgharia Head Post Office,88.384,22.6604,47.73,2,ATM,Pharmacy,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba
43,"St. Xavier's University, Kolkata",88.4875,22.5619,40.36,2,ATM,Discount Store,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba


In [74]:
c2['PM2.5'].mean()

44.458000000000006

#### Cluster 4

In [75]:
c3 = kolkata_merged.loc[kolkata_merged['Cluster Labels'] == 3]
c3

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Indian Botanic Garden,88.2855,22.5565,62.69,3,Botanical Garden,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dhaba,Dumpling Restaurant


In [76]:
c3['PM2.5'].mean()

62.69

#### Cluster 5

In [77]:
c4 = kolkata_merged.loc[kolkata_merged['Cluster Labels'] == 4]
c4

Unnamed: 0,Neighborhood,longitude,latitude,PM2.5,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Pallisree (Nabarun Club),88.375265,22.483984,73.69,4,Department Store,Vegetarian / Vegan Restaurant,Clothing Store,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Dhaba,Discount Store,Dumpling Restaurant


In [78]:
c4['PM2.5'].mean()

73.69

In [79]:
result = [{'Cluster': 'Cluster 1', 'PM2.5': round(c0['PM2.5'].mean(),2)},\
          {'Cluster': 'Cluster 2', 'PM2.5': round(c1['PM2.5'].mean(),2) },\
          {'Cluster': 'Cluster 3', 'PM2.5': round(c2['PM2.5'].mean(),2) },\
          {'Cluster': 'Cluster 4', 'PM2.5': round(c3['PM2.5'].mean(),2) },\
          {'Cluster': 'Cluster 5', 'PM2.5': round(c4['PM2.5'].mean(),2) }]

In [80]:
result

[{'Cluster': 'Cluster 1', 'PM2.5': 65.96},
 {'Cluster': 'Cluster 2', 'PM2.5': 95.93},
 {'Cluster': 'Cluster 3', 'PM2.5': 44.46},
 {'Cluster': 'Cluster 4', 'PM2.5': 62.69},
 {'Cluster': 'Cluster 5', 'PM2.5': 73.69}]

## Mean PM2.5 pollution from every cluster

In [83]:
result_df = pd.DataFrame(result) 
result_df

Unnamed: 0,Cluster,PM2.5
0,Cluster 1,65.96
1,Cluster 2,95.93
2,Cluster 3,44.46
3,Cluster 4,62.69
4,Cluster 5,73.69


## Conclusion <a name="conclusion"></a>

As we can see, **Cluster 2 and Cluster 3** are the **main clusters** that have been formed, each with very distinguished characteristics.
**Cluster 2** consists of neighbourhoods where most common venues are shopping malls, cafes and office buildings (see dataframe c1) - clearly indicating that these are **commercial areas**. Now the air pollution characteristic is interesting to note, as the PM 2.5 emissions in these regions are **significantly higher** than other neighbourhoods and the average reading of PM 2.5 over all the neighbourhoods in Cluster 2 is **95.93 µg/m³**
**Cluster 3** consists of neighbourhoods where most common venues are ATMS, convenience stores and restaurants (see dataframe c2) - clearly indicating that these are **residential areas**. Now here also, the air pollution characteristic is interesting to note, as the PM 2.5 emissions in these regions are **significantly lower** than other neighbourhoods and the average reading of PM 2.5 over all the neighbourhoods in Cluster 3 is **44.46 µg/m³**

### *This result goes on to show that the air quality index of residential areas is much lesser than that of commercial areas in Kolkata, India.*