<a href="https://colab.research.google.com/github/mjalalimanesh/IBM-Data-Science-Foursquare-Project/blob/main/Tehran_Venue_Segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Tehran</font></h1>

Notebook for IBM Cognitive Class Data Science Capstone Course

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in Tehran. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the DBSCAN clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in TEHRAN and their emerging clusters.


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Download and Explore Dataset</a>

2.  <a href="#item2">Explore Neighborhoods in Tehran</a>

3.  <a href="#item3">Analyze Each Neighborhood</a>

4.  <a href="#item4">Cluster Neighborhoods</a>

5.  <a href="#item5">Examine Clusters</a>  
    </font>
    </div>


Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>


## 1. Download and Explore Dataset


Tehran has 22 districts and around 376 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 22 districts and the neighborhoods that exist in each district as well as the the latitude and logitude coordinates of each neighborhood. 

I tried to gather a dataset for this task: (https://github.com/mjalalimanesh/IBM-Data-Science-Foursquare-Project/blob/main/tehran_neighborhoods.csv)


#### Load and explore the data


Next, let's load the data.


In [2]:
neighborhoods = pd.read_csv('https://raw.githubusercontent.com/mjalalimanesh/IBM-Data-Science-Foursquare-Project/main/tehran_neighborhoods.csv')

Quickly examine the resulting dataframe.


In [3]:
neighborhoods.head()

Unnamed: 0.1,Unnamed: 0,name,district,lat,lng
0,0,اراج,1,35.794309,51.487576
1,1,ازگل,1,35.788786,51.515819
2,2,امام زاده قاسم,1,35.812907,51.43947
3,3,اوین,1,35.799515,51.393836
4,4,باغ فردوس,1,35.794955,51.423576


In [4]:
neighborhoods.dtypes

Unnamed: 0      int64
name           object
district        int64
lat           float64
lng           float64
dtype: object

In [5]:
neighborhoods.drop('Unnamed: 0', axis=1, inplace=True)
neighborhoods.columns = ['Neighborhood', 'District', 'Latitude', 'Longitude']
neighborhoods.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 377 entries, 0 to 376
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Neighborhood  377 non-null    object 
 1   District      377 non-null    int64  
 2   Latitude      377 non-null    float64
 3   Longitude     377 non-null    float64
dtypes: float64(2), int64(1), object(1)
memory usage: 11.9+ KB


Check to see if there is any duplicate rows

In [6]:
neighborhoods[neighborhoods.duplicated(subset=['Neighborhood', 'District'],keep=False)]

Unnamed: 0,Neighborhood,District,Latitude,Longitude


And make sure that the dataset has all 22 districts and around 376 neighborhoods.


In [7]:
print('The dataframe has {} districts and {} neighborhoods.'.format(
        len(neighborhoods['District'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 22 districts and 377 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Tehran.


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.


In [8]:
address = 'Tehran, Iran'

geolocator = Nominatim(user_agent="tehran_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tehran are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Tehran are 35.7006177, 51.4013785.


#### Create a map of Tehran with neighborhoods superimposed on top.


In [9]:
# to solve folium not showing farsi characters properly
!pip install git+https://github.com/python-visualization/branca.git

Collecting git+https://github.com/python-visualization/branca.git
  Cloning https://github.com/python-visualization/branca.git to /tmp/pip-req-build-0l2zqot_
  Running command git clone -q https://github.com/python-visualization/branca.git /tmp/pip-req-build-0l2zqot_
Building wheels for collected packages: branca
  Building wheel for branca (setup.py) ... [?25l[?25hdone
  Created wheel for branca: filename=branca-0.4.1+4.gac45f1e-cp36-none-any.whl size=24517 sha256=6ccbd458742a018eb3a3aa27214f81bf0171a701633810830351b1ed3778d759
  Stored in directory: /tmp/pip-ephem-wheel-cache-1u2qqzbq/wheels/35/53/0e/e948a7acd723b43de05b811ef71128b8f561c5b8d15b621383
Successfully built branca


In [10]:
#!/usr/bin/env python
 
# -*- coding: utf-8 -*-
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, district, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['District'], neighborhoods['Neighborhood']):
  
    label = 'منطقه {}, {}'.format(district, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective district.


Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.


#### Define Foursquare Credentials and Version


In [14]:
CLIENT_ID = 'RZ4H2M3IIHZWWXT0LEQQL44UT5DQITVYQY3GLIWFWIDV3NDR' # your Foursquare ID
CLIENT_SECRET = 'KNGLLHDYW5HPZ5VR4BMFUIW1S1OUDRUX3C13533QP2IKGWN3' # your Foursquare Secret
VERSION = '20201106'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RZ4H2M3IIHZWWXT0LEQQL44UT5DQITVYQY3GLIWFWIDV3NDR
CLIENT_SECRET:KNGLLHDYW5HPZ5VR4BMFUIW1S1OUDRUX3C13533QP2IKGWN3


#### Let's explore the first neighborhood in our dataframe.


Get the neighborhood's name.


In [53]:
neighborhoods.loc[0, 'Neighborhood']

'اراج'

Get the neighborhood's latitude and longitude values.


In [54]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of اراج are 35.794309337615026, 51.48757607936932.


#### Now, let's get the top 100 venues that are in اراج within a radius of 500 meters.


First, let's create the GET request URL. Name your URL **url**.


In [77]:
# type your answer here
LIMIT = 500 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=RZ4H2M3IIHZWWXT0LEQQL44UT5DQITVYQY3GLIWFWIDV3NDR&client_secret=KNGLLHDYW5HPZ5VR4BMFUIW1S1OUDRUX3C13533QP2IKGWN3&v=20201106&ll=35.794309337615026,51.48757607936932&radius=1000&limit=500'

Send the GET request and examine the resutls


In [81]:
results = requests.get(url).json()
results['meta']['errorDetail']

'Quota exceeded'

From the Foursquare lab in the previous module, we know that all the information is in the _items_ key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.


In [57]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.


In [58]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,مجموعه ورزشی سازمان صنایع دفاع,Soccer Field,35.792839,51.485819
1,Yaran Daryan Supermarket | سوپرمارکت یاران دری...,Supermarket,35.791378,51.489182
2,Golchin Square | میدان گلچین (میدان گلچین),Plaza,35.791313,51.489184
3,زمين چمن صنايع دفاع,Soccer Field,35.790771,51.485688


And how many venues were returned by Foursquare?


In [59]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


<a id='item2'></a>


## 2. Explore Neighborhoods in Tehran


#### Let's create a function to repeat the same process to all the neighborhoods in District 6


In [87]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    i = 0
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(i+1, ', ', name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        req = requests.get(url).json()
        if 'groups' in req["response"].keys():
          results = req["response"]['groups'][0]['items']
        else:
          print('name : ',name,', ', i + 1 ,'skipped, ', 'Error Detail : ', req['meta']['errorDetail'])
          i = i + 1
          continue  
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        i = i + 1

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called _tehran_venues_.


In [88]:
# type your answer here
tehran_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

1 ,  اراج
name :  اراج ,  1 skipped,  Error Detail :  Quota exceeded
2 ,  ازگل
name :  ازگل ,  2 skipped,  Error Detail :  Quota exceeded
3 ,  امام زاده قاسم
name :  امام زاده قاسم ,  3 skipped,  Error Detail :  Quota exceeded
4 ,  اوین
name :  اوین ,  4 skipped,  Error Detail :  Quota exceeded
5 ,  باغ فردوس
name :  باغ فردوس ,  5 skipped,  Error Detail :  Quota exceeded
6 ,  تجریش
name :  تجریش ,  6 skipped,  Error Detail :  Quota exceeded
7 ,  جماران
name :  جماران ,  7 skipped,  Error Detail :  Quota exceeded
8 ,  جوزستان
name :  جوزستان ,  8 skipped,  Error Detail :  Quota exceeded
9 ,  چیذر
name :  چیذر ,  9 skipped,  Error Detail :  Quota exceeded
10 ,  حصار بوعلی
name :  حصار بوعلی ,  10 skipped,  Error Detail :  Quota exceeded
11 ,  حکمت-دزاشیب
name :  حکمت-دزاشیب ,  11 skipped,  Error Detail :  Quota exceeded
12 ,  دارآباد
name :  دارآباد ,  12 skipped,  Error Detail :  Quota exceeded
13 ,  دربند
name :  دربند ,  13 skipped,  Error Detail :  Quota exceeded
14 ,  درکه
name :  

KeyboardInterrupt: ignored

#### Let's check the size of the resulting dataframe


In [89]:
print(tehran_venues.shape)
tehran_venues.head()

(3689, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,اراج,35.794309,51.487576,مجموعه ورزشی سازمان صنایع دفاع,35.792839,51.485819,Soccer Field
1,اراج,35.794309,51.487576,Yaran Daryan Supermarket | سوپرمارکت یاران دری...,35.791378,51.489182,Supermarket
2,اراج,35.794309,51.487576,Golchin Square | میدان گلچین (میدان گلچین),35.791313,51.489184,Plaza
3,اراج,35.794309,51.487576,زمين چمن صنايع دفاع,35.790771,51.485688,Soccer Field
4,ازگل,35.788786,51.515819,دورهمی | Dorehami,35.790456,51.514138,Comedy Club


Let's check how many venues were returned for each neighborhood


In [73]:
tehran_venues.groupby('Neighborhood').count()['Venue']

Neighborhood
 خانی آباد شمالی                1
 شریعتی جنوبی                   4
آبشار                           4
آذربایجان                       6
آذری                            5
آرارات                          4
آرزانتین-ساعی                  33
آزادشهر-پیکان شهر               1
آسمان                          22
آشتیانی                         6
آلستوم                         17
آهنگران                         4
آپادانا                         4
ابن باویه و ظهیر آباد           3
ابوذر                          27
ابوذر شرقی                      6
ابوذر غربی                      4
ابوذز                           4
اتابک                           1
اراج                            4
اراضی عباس آباد                16
ارامنه جنوبی                    4
ارامنه شمالی                    3
اردیبهشت                        1
ارم                             5
ارگ پامنار                     13
ازگل                            1
استادمعین                       4
استخر                           4
ا

#### Let's find out how many unique categories can be curated from all the returned venues


In [90]:
print('There are {} uniques categories.'.format(len(tehran_venues['Venue Category'].unique())))

There are 281 uniques categories.


<a id='item3'></a>


## 3. Analyze Each Neighborhood


In [27]:
# one hot encoding
district6_onehot = pd.get_dummies(district6_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
district6_onehot['Neighborhood'] = district6_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [district6_onehot.columns[-1]] + list(district6_onehot.columns[:-1])
district6_onehot = district6_onehot[fixed_columns]

district6_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Ash and Haleem Place,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bed & Breakfast,Bookstore,Breakfast Spot,Buffet,Burger Joint,Bus Line,Bus Station,Butcher,Café,Candy Store,Chinese Restaurant,Chocolate Shop,Climbing Gym,Coffee Shop,College Cafeteria,College Gym,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Cultural Center,Department Store,Dizi Place,Donut Shop,Drugstore,Dry Cleaner,Electronics Store,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,General Entertainment,Gift Shop,Gilaki Restaurant,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Jegaraki,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Lawyer,Leather Goods Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Movie Theater,Music Store,Music Venue,Office,Optical Shop,Palace,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Restaurant,Russian Restaurant,Salon / Barbershop,Sandwich Place,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,Sporting Goods Shop,Sports Club,Stationery Store,Steakhouse,Supermarket,Tabbakhi,Taxi Stand,Tea Room,Tennis Stadium,Theater,Vegetarian / Vegan Restaurant,Video Store,Volleyball Court,Women's Store,Yoga Studio
0,ایرانشهر,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
1,ایرانشهر,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,ایرانشهر,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,ایرانشهر,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
4,ایرانشهر,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [28]:
district6_onehot.shape

(565, 118)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [29]:
district6_grouped = district6_onehot.groupby('Neighborhood').mean().reset_index()
district6_grouped

Unnamed: 0,Neighborhood,Accessories Store,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Ash and Haleem Place,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bed & Breakfast,Bookstore,Breakfast Spot,Buffet,Burger Joint,Bus Line,Bus Station,Butcher,Café,Candy Store,Chinese Restaurant,Chocolate Shop,Climbing Gym,Coffee Shop,College Cafeteria,College Gym,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Cultural Center,Department Store,Dizi Place,Donut Shop,Drugstore,Dry Cleaner,Electronics Store,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,General Entertainment,Gift Shop,Gilaki Restaurant,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Jegaraki,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Lawyer,Leather Goods Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Movie Theater,Music Store,Music Venue,Office,Optical Shop,Palace,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Restaurant,Russian Restaurant,Salon / Barbershop,Sandwich Place,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,Sporting Goods Shop,Sports Club,Stationery Store,Steakhouse,Supermarket,Tabbakhi,Taxi Stand,Tea Room,Tennis Stadium,Theater,Vegetarian / Vegan Restaurant,Video Store,Volleyball Court,Women's Store,Yoga Studio
0,آرزانتین-ساعی,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.030303,0.060606,0.0,0.181818,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.060606,0.0,0.090909,0.060606,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,امیرآباد,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
2,ایرانشهر,0.0,0.0,0.137255,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.196078,0.0,0.0,0.0,0.019608,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.0,0.019608,0.019608,0.0,0.078431,0.0,0.0,0.019608,0.019608,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.019608,0.0,0.078431,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078431,0.0,0.0,0.0,0.0,0.0
3,بهجت آباد,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.038462,0.0,0.0,0.019231,0.0,0.0,0.0,0.288462,0.0,0.019231,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0,0.019231,0.019231,0.019231,0.0,0.057692,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,دانشگاه تهران,0.0,0.0,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.12766,0.0,0.0,0.021277,0.0,0.0,0.0,0.425532,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0
5,سنایی,0.0,0.0,0.079365,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.269841,0.015873,0.0,0.015873,0.0,0.031746,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.031746,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.047619,0.0,0.0,0.015873,0.047619,0.031746,0.0,0.015873,0.0,0.0,0.079365,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,شریعتی,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,شیراز,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.02,0.02,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
8,فاطمی,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,قزل قلعه,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [30]:
district6_grouped.shape

(18, 118)

#### Let's print each neighborhood along with the top 5 most common venues


In [31]:
num_top_venues = 5

for hood in district6_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = district6_grouped[district6_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----آرزانتین-ساعی----
                venue  freq
0                Café  0.18
1        Perfume Shop  0.09
2         Bus Station  0.06
3  Persian Restaurant  0.06
4      Cosmetics Shop  0.06


----امیرآباد----
                      venue  freq
0                  Pharmacy  0.25
1            Ice Cream Shop  0.25
2            Tennis Stadium  0.25
3                    Bakery  0.25
4  Mediterranean Restaurant  0.00


----ایرانشهر----
                venue  freq
0                Café  0.20
1         Art Gallery  0.14
2  Persian Restaurant  0.08
3               Hotel  0.08
4             Theater  0.08


----بهجت آباد----
                venue  freq
0                Café  0.29
1      Sandwich Place  0.12
2  Persian Restaurant  0.06
3      Ice Cream Shop  0.04
4         Coffee Shop  0.04


----دانشگاه تهران----
         venue  freq
0         Café  0.43
1    Bookstore  0.13
2  Art Gallery  0.04
3        Plaza  0.04
4      Theater  0.04


----سنایی----
            venue  freq
0            Café  0.2

#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = district6_grouped['Neighborhood']

for ind in np.arange(district6_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(district6_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,آرزانتین-ساعی,Café,Perfume Shop,Cosmetics Shop,Persian Restaurant,Pastry Shop,Bus Station,Bakery,Park,Candy Store,Restaurant
1,امیرآباد,Ice Cream Shop,Pharmacy,Tennis Stadium,Bakery,Yoga Studio,Dry Cleaner,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Electronics Store
2,ایرانشهر,Café,Art Gallery,Persian Restaurant,Theater,Hotel,Sandwich Place,Movie Theater,Coffee Shop,Gym Pool,Intersection
3,بهجت آباد,Café,Sandwich Place,Persian Restaurant,Coffee Shop,Ice Cream Shop,Jewelry Store,Bookstore,Burger Joint,Music Store,Music Venue
4,دانشگاه تهران,Café,Bookstore,Art Gallery,Theater,Plaza,Hookah Bar,Palace,College Cafeteria,Coffee Shop,Persian Restaurant
5,سنایی,Café,Sandwich Place,Art Gallery,Bakery,Pizza Place,Persian Restaurant,Gift Shop,Coffee Shop,Furniture / Home Store,Burger Joint
6,شریعتی,Park,Restaurant,Buffet,Yoga Studio,Concert Hall,Cultural Center,Department Store,Dizi Place,Donut Shop,Drugstore
7,شیراز,Café,Park,Ice Cream Shop,Pizza Place,Burger Joint,Asian Restaurant,Pharmacy,Fast Food Restaurant,Bookstore,Electronics Store
8,فاطمی,Café,Persian Restaurant,Bookstore,Hookah Bar,Gym,Department Store,Comfort Food Restaurant,Plaza,Salon / Barbershop,Hotel
9,قزل قلعه,Park,Café,Concert Hall,Bakery,Department Store,Gym / Fitness Center,Pizza Place,Auto Garage,Drugstore,Fast Food Restaurant


<a id='item4'></a>


## 4. Cluster Neighborhoods


Run _k_-means to cluster the neighborhood into 5 clusters.


In [34]:
# set number of clusters
kclusters = 5

district6_grouped_clustering = district6_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(district6_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 0, 0, 0, 0, 3, 2, 0, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

district6_merged = district6

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
district6_merged = district6_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

district6_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ایرانشهر,6,35.708586,51.421945,0,Café,Art Gallery,Persian Restaurant,Theater,Hotel,Sandwich Place,Movie Theater,Coffee Shop,Gym Pool,Intersection
1,آرزانتین-ساعی,6,35.734849,51.415833,2,Café,Perfume Shop,Cosmetics Shop,Persian Restaurant,Pastry Shop,Bus Station,Bakery,Park,Candy Store,Restaurant
2,بهجت آباد,6,35.71904,51.412128,0,Café,Sandwich Place,Persian Restaurant,Coffee Shop,Ice Cream Shop,Jewelry Store,Bookstore,Burger Joint,Music Store,Music Venue
3,پارک لاله,6,35.713637,51.399122,0,Café,Restaurant,Coffee Shop,Movie Theater,Pool,Bed & Breakfast,Mediterranean Restaurant,Athletics & Sports,Auto Garage,Gym / Fitness Center
4,دانشگاه تهران,6,35.705399,51.399013,0,Café,Bookstore,Art Gallery,Theater,Plaza,Hookah Bar,Palace,College Cafeteria,Coffee Shop,Persian Restaurant


Finally, let's visualize the resulting clusters


In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(district6_merged['Latitude'], district6_merged['Longitude'], district6_merged['Neighborhood'], district6_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>


## 5. Examine Clusters


Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.


#### Cluster 1


In [37]:
district6_merged.loc[district6_merged['Cluster Labels'] == 0, district6_merged.columns[[1] + list(range(5, district6_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,6,Café,Art Gallery,Persian Restaurant,Theater,Hotel,Sandwich Place,Movie Theater,Coffee Shop,Gym Pool,Intersection
2,6,Café,Sandwich Place,Persian Restaurant,Coffee Shop,Ice Cream Shop,Jewelry Store,Bookstore,Burger Joint,Music Store,Music Venue
3,6,Café,Restaurant,Coffee Shop,Movie Theater,Pool,Bed & Breakfast,Mediterranean Restaurant,Athletics & Sports,Auto Garage,Gym / Fitness Center
4,6,Café,Bookstore,Art Gallery,Theater,Plaza,Hookah Bar,Palace,College Cafeteria,Coffee Shop,Persian Restaurant
6,6,Café,Sandwich Place,Art Gallery,Bakery,Pizza Place,Persian Restaurant,Gift Shop,Coffee Shop,Furniture / Home Store,Burger Joint
8,6,Café,Persian Restaurant,Bookstore,Hookah Bar,Gym,Department Store,Comfort Food Restaurant,Plaza,Salon / Barbershop,Hotel
11,6,Café,Hotel,Steakhouse,Food Court,Video Store,Park,Pastry Shop,Pizza Place,Shopping Mall,Tabbakhi
12,6,Café,Persian Restaurant,Bookstore,Gym / Fitness Center,Hotel,Theater,Pastry Shop,Tabbakhi,Men's Store,Pizza Place
13,6,Café,Hookah Bar,Theater,Coffee Shop,Pastry Shop,Hotel,Italian Restaurant,French Restaurant,Park,Art Gallery


#### Cluster 2


In [38]:
district6_merged.loc[district6_merged['Cluster Labels'] == 1, district6_merged.columns[[1] + list(range(5, district6_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,6,Ice Cream Shop,Pharmacy,Tennis Stadium,Bakery,Yoga Studio,Dry Cleaner,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Electronics Store


#### Cluster 3


In [39]:
district6_merged.loc[district6_merged['Cluster Labels'] == 2, district6_merged.columns[[1] + list(range(5, district6_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,6,Café,Perfume Shop,Cosmetics Shop,Persian Restaurant,Pastry Shop,Bus Station,Bakery,Park,Candy Store,Restaurant
7,6,Café,Park,Ice Cream Shop,Pizza Place,Burger Joint,Asian Restaurant,Pharmacy,Fast Food Restaurant,Bookstore,Electronics Store
9,6,Park,Café,Concert Hall,Bakery,Department Store,Gym / Fitness Center,Pizza Place,Auto Garage,Drugstore,Fast Food Restaurant
14,6,Pet Store,Café,Persian Restaurant,Bakery,Drugstore,Cultural Center,Park,Coffee Shop,Pizza Place,Sandwich Place
15,6,Bakery,Juice Bar,Pastry Shop,Ice Cream Shop,Burger Joint,Bookstore,Plaza,Fried Chicken Joint,Italian Restaurant,Park
16,6,Soccer Field,Fast Food Restaurant,Bakery,Park,Pool,Plaza,Drugstore,Supermarket,Italian Restaurant,Gym


#### Cluster 4


In [40]:
district6_merged.loc[district6_merged['Cluster Labels'] == 3, district6_merged.columns[[1] + list(range(5, district6_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,6,Park,Restaurant,Buffet,Yoga Studio,Concert Hall,Cultural Center,Department Store,Dizi Place,Donut Shop,Drugstore


#### Cluster 5


In [41]:
district6_merged.loc[district6_merged['Cluster Labels'] == 4, district6_merged.columns[[1] + list(range(5, district6_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,6,Snack Place,Pizza Place,Gym / Fitness Center,Drugstore,Café,Kebab Restaurant,Yoga Studio,Cultural Center,Department Store,Dizi Place


## Clustering with DBSCAN

In [42]:
import numpy as np 
from sklearn.cluster import DBSCAN 
from sklearn.preprocessing import StandardScaler 

In [43]:
epsilon = 0.3
minimumSamples = 1
district6_grouped_clustering_dbscan = StandardScaler().fit(district6_grouped_clustering).transform(district6_grouped_clustering)
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(district6_grouped_clustering)
labels = db.labels_
labels

array([0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0])

In [44]:
# add clustering labels
neighborhoods_venues_sorted['DBSCAN Cluster Labels'] = labels

district6_merged = district6

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
district6_merged = district6_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

district6_merged # check the last columns!

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,DBSCAN Cluster Labels
0,ایرانشهر,6,35.708586,51.421945,0,Café,Art Gallery,Persian Restaurant,Theater,Hotel,Sandwich Place,Movie Theater,Coffee Shop,Gym Pool,Intersection,0
1,آرزانتین-ساعی,6,35.734849,51.415833,2,Café,Perfume Shop,Cosmetics Shop,Persian Restaurant,Pastry Shop,Bus Station,Bakery,Park,Candy Store,Restaurant,0
2,بهجت آباد,6,35.71904,51.412128,0,Café,Sandwich Place,Persian Restaurant,Coffee Shop,Ice Cream Shop,Jewelry Store,Bookstore,Burger Joint,Music Store,Music Venue,0
3,پارک لاله,6,35.713637,51.399122,0,Café,Restaurant,Coffee Shop,Movie Theater,Pool,Bed & Breakfast,Mediterranean Restaurant,Athletics & Sports,Auto Garage,Gym / Fitness Center,0
4,دانشگاه تهران,6,35.705399,51.399013,0,Café,Bookstore,Art Gallery,Theater,Plaza,Hookah Bar,Palace,College Cafeteria,Coffee Shop,Persian Restaurant,0
5,شریعتی,6,35.71766,51.383313,3,Park,Restaurant,Buffet,Yoga Studio,Concert Hall,Cultural Center,Department Store,Dizi Place,Donut Shop,Drugstore,2
6,سنایی,6,35.721523,51.420858,0,Café,Sandwich Place,Art Gallery,Bakery,Pizza Place,Persian Restaurant,Gift Shop,Coffee Shop,Furniture / Home Store,Burger Joint,0
7,شیراز,6,35.747199,51.403301,2,Café,Park,Ice Cream Shop,Pizza Place,Burger Joint,Asian Restaurant,Pharmacy,Fast Food Restaurant,Bookstore,Electronics Store,0
8,فاطمی,6,35.719942,51.394577,0,Café,Persian Restaurant,Bookstore,Hookah Bar,Gym,Department Store,Comfort Food Restaurant,Plaza,Salon / Barbershop,Hotel,0
9,قزل قلعه,6,35.728669,51.394446,2,Park,Café,Concert Hall,Bakery,Department Store,Gym / Fitness Center,Pizza Place,Auto Garage,Drugstore,Fast Food Restaurant,0


In [45]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

geodata = 'https://raw.githubusercontent.com/mjalalimanesh/tehran-districts-neighborhoods-map-geojson/main/tehran_neighborhoods_376.geojson'

folium.GeoJson(
    geodata,
    name='geojson'
).add_to(map_clusters)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(district6_merged['Latitude'], district6_merged['Longitude'], district6_merged['Neighborhood'], district6_merged['DBSCAN Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       

#folium.LayerControl().add_to(map_clusters)

map_clusters

### Thank you for completing this lab!

**Modified By Mohamad Jalalimanesh with Tehran Data**

This notebook was created by [Alex Aklson](https://www.linkedin.com/in/aklson?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) and [Polong Lin](https://www.linkedin.com/in/polonglin?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ). I hope you found this lab interesting and educational. Feel free to contact us if you have any questions!


In [51]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

geodata = 'https://raw.githubusercontent.com/mjalalimanesh/tehran-districts-neighborhoods-map-geojson/main/tehran_neighborhoods_376.geojson'

folium.GeoJson(
    geodata,
    name='geojson'
).add_to(map_clusters)


# add markers to the map
markers_colors = []
for lat, lon, venue_name, neighb in zip(district6_venues['Venue Latitude'], district6_venues['Venue Longitude'], district6_venues['Venue'], district6_venues['Neighborhood']):
    label = folium.Popup(str(venue_name) + ',  ' + str(neighb), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=2,
        popup=label,
        color='red',
        fill_opacity=0.7).add_to(map_clusters)
       
for lat, lon, neighb in zip(district6_venues['Neighborhood Latitude'], district6_venues['Neighborhood Longitude'], district6_venues['Neighborhood']):
    label = folium.Popup(str(neighb), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=4,
        popup=label,
        color='green',
        fill_opacity=0.7).add_to(map_clusters)
#folium.LayerControl().add_to(map_clusters)

map_clusters

This notebook is part of a course on **Coursera** called _Applied Data Science Capstone_. If you accessed this notebook outside the course, you can take this course online by clicking [here](http://cocl.us/DP0701EN_Coursera_Week3_LAB2).


<hr>

Copyright © 2018 [Cognitive Class](https://cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ).
