## Capstone Project: The Battle Of Neighbourhoods

 ## *Segmenting and Clustering selected Neighbourhoods of Hyderabad*
 
 

#### Methodology
1. Importing Libraries
2. Data Acquisition
3. Explore the selected Neighborhoods in Hyderabad
4. Define Foursquare Credentials and Version
5. Cluster Neighborhoods
6. Examine Clusters

### 1. Importing Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print("Libraries downloaded. ")

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries downloaded. 


### 2. Data Acquisition and Cleaning

 ### Download and Explore Dataset

Load the data

In [2]:
hyd_data = pd.read_csv("C:\\Users\johny\Desktop\capstone_hyd_gps_data.csv")
hyd_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Hyderabad,Osmania University,17.40558,78.51615
1,Hyderabad,EFLU,17.42365,78.52601
2,Hyderabad,NIN,17.42776,78.52791
3,Hyderabad,IICT,17.42196,78.53956
4,Hyderabad,CCMB,17.42102,78.54104
5,Hyderabad,IIITH,17.4448,78.34976
6,Hyderabad,UoH,17.45674,78.32638
7,Hyderabad,ISB,17.43536,78.34075


In [4]:
#Removing last two rows

#hyd_data.dropna(subset = ['Borough'], inplace = True)
#hyd_data

### 3. Explore and Cluster the selected Neighborhoods in Hyderabad

#### Use geopy library to get the latitude and longitude values of Hyderabad City


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent hyd_explorer, as shown below.

In [5]:
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="hyd_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad City are 17.3616079, 78.4746286.


#### Create a map of Toronto with neighborhoods superimposed on top

In [15]:
# create map of Hyderabad using latitude and longitude values
map_hyderabad = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(hyd_data['Latitude'], hyd_data['Longitude'], hyd_data['Borough'], hyd_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='darkblue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hyderabad)  
    
map_hyderabad

### 4. Define Foursquare Credentials and Version

In [17]:
CLIENT_ID = 'ODLODYLV4RTQ3RIDVAU00NPTWHXXKJSO0NOVKKBRG3GDH4JG' # your Foursquare ID
CLIENT_SECRET = 'T1P4RQTIGL2V2QMWEJMSOD12O0LMYRDK32O2ABAM1FSHH3J2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ODLODYLV4RTQ3RIDVAU00NPTWHXXKJSO0NOVKKBRG3GDH4JG
CLIENT_SECRET:T1P4RQTIGL2V2QMWEJMSOD12O0LMYRDK32O2ABAM1FSHH3J2


#### Let's explore the first neighborhood in our dataframe

Get the neighborhood's name

In [18]:
hyd_data.loc[0, 'Neighborhood']

'Osmania University'

In [19]:
neighborhood_name = hyd_data.loc[0, 'Neighborhood'] # neighborhood name

rosedale_latitude = hyd_data.loc[0, 'Latitude'] # neighborhood latitude value
rosedale_longitude = hyd_data.loc[0, 'Longitude'] # neighborhood longitude value



print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               rosedale_latitude, 
                                                               rosedale_longitude))



Latitude and longitude values of Osmania University are 17.40558, 78.51615.


#### Now, let's get the top 100 venues that are in Osmania University within a radius of 500 meters.

First, let's create the GET request URL. Name your URL 'url'.

In [20]:
LIMIT = 100
radius = 1000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    rosedale_latitude, 
    rosedale_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=ODLODYLV4RTQ3RIDVAU00NPTWHXXKJSO0NOVKKBRG3GDH4JG&client_secret=T1P4RQTIGL2V2QMWEJMSOD12O0LMYRDK32O2ABAM1FSHH3J2&v=20180605&ll=17.40558,78.51615&radius=1000&limit=100'

Send the GET request and examine the resutls

In [21]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f153a2c58c198104530d432'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Erstwhile Circle-III',
  'headerFullLocation': 'Erstwhile Circle-III, Hyderabad',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 13,
  'suggestedBounds': {'ne': {'lat': 17.41458000900001,
    'lng': 78.52556427322143},
   'sw': {'lat': 17.396579990999992, 'lng': 78.50673572677856}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '51f3c3e0498e37a1baab8fb8',
       'name': 'Subway',
       'location': {'lat': 17.40417263478385,
        'lng': 78.51494973189583,
        'labeledLatLngs': [{'label': 'display',
          'lat': 17.40417263478385,
          '

All the information is in the 'items' key. Before we proceed, let's borrow the 'get_category_type' function from the Foursquare lab.

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [23]:
venues = results['response']['groups'][0]['items']
#venues
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Subway,Sandwich Place,17.404173,78.51495
1,Daily Bread,Café,17.403554,78.514961
2,Baskin-Robbins,Ice Cream Shop,17.404311,78.510034
3,cafe coffee day,Coffee Shop,17.405558,78.515897
4,Surabhi Grand,Indian Restaurant,17.404938,78.515171
5,Raghavendra Tiffins,Indian Restaurant,17.399157,78.512198
6,Satya Supermarket,Convenience Store,17.401208,78.513491
7,Reliance Digital,Electronics Store,17.405578,78.510056
8,Heritage Fresh Super Market,Convenience Store,17.402173,78.52147
9,Kwality Walls Express,Ice Cream Shop,17.407347,78.509956


In [24]:
print('{} venues were returned by Foursquare.'. format(nearby_venues.shape[0]))

13 venues were returned by Foursquare.


In [25]:
nearby_venues.shape

(13, 4)

### Explore remaining Neighborhoods 

Let's create a function to repeat the same process to all the 6 selected neighborhoods from Hyderabad

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called downtown_venues

In [27]:
hyderabad_venues = getNearbyVenues(names = hyd_data['Neighborhood'],
                                   latitudes = hyd_data['Latitude'],
                                   longitudes = hyd_data['Longitude']
                                  )

Osmania University
EFLU
NIN
IICT
CCMB
IIITH
UoH
ISB


In [28]:
hyderabad_venues.shape

(38, 7)

In [29]:
hyderabad_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Osmania University,17.40558,78.51615,Subway,17.404173,78.51495,Sandwich Place
1,Osmania University,17.40558,78.51615,Daily Bread,17.403554,78.514961,Café
2,Osmania University,17.40558,78.51615,cafe coffee day,17.405558,78.515897,Coffee Shop
3,Osmania University,17.40558,78.51615,Surabhi Grand,17.404938,78.515171,Indian Restaurant
4,Osmania University,17.40558,78.51615,Xtacy,17.404745,78.515083,Coffee Shop
5,Osmania University,17.40558,78.51615,Adikmet Cafe,17.409298,78.516197,Café
6,Osmania University,17.40558,78.51615,Shade Restaurant,17.405024,78.51227,Asian Restaurant
7,EFLU,17.42365,78.52601,University Garden,17.422848,78.525102,Garden Center
8,EFLU,17.42365,78.52601,Metro,17.42437,78.528351,Convenience Store
9,EFLU,17.42365,78.52601,Cake Basket,17.426081,78.523987,Bakery


In [30]:
hyderabad_venues.tail(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
28,IIITH,17.4448,78.34976,Kritunga Restaurant,17.446314,78.352824,Indian Restaurant
29,IIITH,17.4448,78.34976,99 variety dosa,17.447721,78.347584,South Indian Restaurant
30,IIITH,17.4448,78.34976,Tea point DLF - Gate 1,17.447006,78.353119,Café
31,IIITH,17.4448,78.34976,TCS Cafeteria Synergy Park,17.448286,78.35178,Cafeteria
32,ISB,17.43536,78.34075,Barista,17.432642,78.343286,Coffee Shop
33,ISB,17.43536,78.34075,Recreation Center,17.435251,78.338943,College Rec Center
34,ISB,17.43536,78.34075,Executive Housing Bar,17.437007,78.338598,Bar
35,ISB,17.43536,78.34075,ISB Swimming Pool,17.432777,78.338204,Pool
36,ISB,17.43536,78.34075,MS Campus - Gym 1,17.431699,78.3427,Gym
37,ISB,17.43536,78.34075,08 lounge,17.43229,78.337498,Lounge


Let's check how many venues were returned for each neighborhood

In [31]:
hyderabad_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CCMB,7,7,7,7,7,7
EFLU,4,4,4,4,4,4
IICT,5,5,5,5,5,5
IIITH,5,5,5,5,5,5
ISB,6,6,6,6,6,6
NIN,4,4,4,4,4,4
Osmania University,7,7,7,7,7,7


In [32]:
len (hyderabad_venues['Venue Category'].unique())

21

### Analyze Each Neighborhood

In [34]:
# one hot encoding
hyderabad_onehot = pd.get_dummies(hyderabad_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hyderabad_onehot['Neighborhood'] = hyderabad_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hyderabad_onehot.columns[-1]] + list(hyderabad_onehot.columns[:-1])
hyderabad_onehot = hyderabad_onehot[fixed_columns]

hyderabad_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Bar,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Rec Center,Convenience Store,Electronics Store,Garden Center,Gym,Indian Restaurant,Lounge,Metro Station,Pool,Restaurant,Sandwich Place,South Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant
0,Osmania University,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
1,Osmania University,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Osmania University,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Osmania University,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,Osmania University,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [35]:
hyderabad_onehot.tail()

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Bar,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Rec Center,Convenience Store,Electronics Store,Garden Center,Gym,Indian Restaurant,Lounge,Metro Station,Pool,Restaurant,Sandwich Place,South Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant
33,ISB,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
34,ISB,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
35,ISB,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
36,ISB,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
37,ISB,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


In [36]:
hyderabad_onehot.shape

(38, 22)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [37]:
hyderabad_grouped = hyderabad_onehot.groupby('Neighborhood').mean().reset_index()
hyderabad_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Bar,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Rec Center,Convenience Store,Electronics Store,Garden Center,Gym,Indian Restaurant,Lounge,Metro Station,Pool,Restaurant,Sandwich Place,South Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant
0,CCMB,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.428571,0.0,0.142857,0.0,0.285714,0.0,0.0,0.0,0.142857
1,EFLU,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,IICT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.2
3,IIITH,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0
4,ISB,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
5,NIN,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Osmania University,0.142857,0.0,0.0,0.0,0.285714,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [38]:
num_top_venues = 5

for hood in hyderabad_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hyderabad_grouped[hyderabad_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----CCMB----
                           venue  freq
0              Indian Restaurant  0.43
1                     Restaurant  0.29
2  Vegetarian / Vegan Restaurant  0.14
3                  Metro Station  0.14
4                            Gym  0.00


----EFLU----
               venue  freq
0      Garden Center  0.25
1               Café  0.25
2  Convenience Store  0.25
3             Bakery  0.25
4  Indian Restaurant  0.00


----IICT----
                           venue  freq
0              Indian Restaurant   0.4
1  Vegetarian / Vegan Restaurant   0.2
2                     Restaurant   0.2
3                  Metro Station   0.2
4                            Gym   0.0


----IIITH----
                     venue  freq
0        Indian Restaurant   0.2
1                  Stadium   0.2
2                Cafeteria   0.2
3                     Café   0.2
4  South Indian Restaurant   0.2


----ISB----
                venue  freq
0              Lounge  0.17
1                 Bar  0.17
2         Coffe

#### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [71]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
hyderabad_venues_sorted = pd.DataFrame(columns=columns)
hyderabad_venues_sorted['Neighborhood'] = hyderabad_grouped['Neighborhood']

for ind in np.arange(hyderabad_grouped.shape[0]):
    hyderabad_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hyderabad_grouped.iloc[ind, :], num_top_venues)

hyderabad_venues_sorted.head(8)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CCMB,Indian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant
1,EFLU,Garden Center,Bakery,Café,Convenience Store,Electronics Store,Bar,Cafeteria,Chinese Restaurant,Coffee Shop,College Rec Center
2,IICT,Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant
3,IIITH,South Indian Restaurant,Cafeteria,Café,Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Bar,Chinese Restaurant
4,ISB,Bar,Pool,Lounge,Gym,Coffee Shop,College Rec Center,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Cafeteria
5,NIN,Bakery,Electronics Store,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bar,Cafeteria,Café,Coffee Shop,College Rec Center,Convenience Store
6,Osmania University,Café,Coffee Shop,Asian Restaurant,Sandwich Place,Indian Restaurant,Convenience Store,Bakery,Bar,Cafeteria,Chinese Restaurant


### 5. Cluster Neighborhoods

Run k-means to cluster the neighborhood into 3 clusters.

In [260]:
# set number of clusters
kclusters = 4

hyderabad_grouped_clustering = hyderabad_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hyderabad_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 2, 2, 3, 2])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [261]:
# add clustering labels

#hyderabad_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hyderabad_merged = hyd_data

# merge hyderabad_grouped with hyderabad_data to add latitude/longitude for each neighborhood
hyderabad_merged = hyderabad_merged.join(hyderabad_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

hyderabad_merged.head(8) # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hyderabad,Osmania University,17.40558,78.51615,2.0,Café,Coffee Shop,Asian Restaurant,Sandwich Place,Indian Restaurant,Convenience Store,Bakery,Bar,Cafeteria,Chinese Restaurant
1,Hyderabad,EFLU,17.42365,78.52601,0.0,Garden Center,Bakery,Café,Convenience Store,Electronics Store,Bar,Cafeteria,Chinese Restaurant,Coffee Shop,College Rec Center
2,Hyderabad,NIN,17.42776,78.52791,3.0,Bakery,Electronics Store,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bar,Cafeteria,Café,Coffee Shop,College Rec Center,Convenience Store
3,Hyderabad,IICT,17.42196,78.53956,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant
4,Hyderabad,CCMB,17.42102,78.54104,1.0,Indian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant
5,Hyderabad,IIITH,17.4448,78.34976,2.0,South Indian Restaurant,Cafeteria,Café,Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Bar,Chinese Restaurant
6,Hyderabad,UoH,17.45674,78.32638,,,,,,,,,,,
7,Hyderabad,ISB,17.43536,78.34075,2.0,Bar,Pool,Lounge,Gym,Coffee Shop,College Rec Center,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Cafeteria


In [262]:
#Remove UoH
#hyderabad_merged.drop([6], axis = 0)

In [263]:
#In next line of code, color = rainbow [cluster -1] is not accepting float values for cluster labels. 
#So converting 'Cluster Labels' to int.
# fillna(0.0) - to handle missing values
hyderabad_merged["Cluster Labels"] = hyderabad_merged["Cluster Labels"].fillna(0.0).astype(int) 

Finally, let's visualize the resulting clusters



In [264]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hyderabad_merged['Latitude'], hyderabad_merged['Longitude'], hyderabad_merged['Neighborhood'], hyderabad_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow [cluster - 1],
        fill=True,
        fill_color= rainbow [cluster -1],
        fill_opacity=0.7).add_to(map_clusters)
    

map_clusters

### Examine each of the three clusters

In [265]:
hyderabad_merged.loc[hyderabad_merged['Cluster Labels'] == 0, hyderabad_merged.columns[[1] + list(range(5, hyderabad_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,EFLU,Garden Center,Bakery,Café,Convenience Store,Electronics Store,Bar,Cafeteria,Chinese Restaurant,Coffee Shop,College Rec Center
6,UoH,,,,,,,,,,


In [266]:
hyderabad_merged.loc[hyderabad_merged['Cluster Labels'] == 1, hyderabad_merged.columns[[1] + list(range(5, hyderabad_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,IICT,Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant
4,CCMB,Indian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Metro Station,Convenience Store,Bakery,Bar,Cafeteria,Café,Chinese Restaurant


In [267]:
hyderabad_merged.loc[hyderabad_merged['Cluster Labels'] == 2, hyderabad_merged.columns[[1] + list(range(5, hyderabad_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Osmania University,Café,Coffee Shop,Asian Restaurant,Sandwich Place,Indian Restaurant,Convenience Store,Bakery,Bar,Cafeteria,Chinese Restaurant
5,IIITH,South Indian Restaurant,Cafeteria,Café,Indian Restaurant,Stadium,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Bar,Chinese Restaurant
7,ISB,Bar,Pool,Lounge,Gym,Coffee Shop,College Rec Center,Vegetarian / Vegan Restaurant,Convenience Store,Bakery,Cafeteria


In [268]:
hyderabad_merged.loc[hyderabad_merged['Cluster Labels'] == 3, hyderabad_merged.columns[[1] + list(range(5, hyderabad_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,NIN,Bakery,Electronics Store,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bar,Cafeteria,Café,Coffee Shop,College Rec Center,Convenience Store


### Explore Trending Venues

In [77]:
# define URL
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

# send GET request and get trending venues
results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '5f153ea6bea21c0c7b0fbb0a'},
 'response': {'venues': []}}

#### Check if any venues are trending at this time

In [78]:
if len(results['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # filter columns
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # filter the category for each row
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [79]:
# display trending venues
trending_venues_df

'No trending venues are available at the moment!'