# Planning A Concert In Mumbai

## 1. Introduction/Business Problem

Architects is a well known UK metal band who wants to play a show in India, specifically in Mumbai owing to its huge population and big fan following from the city itself. We have been instructed as an agency to provide them with the following details:

 - Venues
 - Accomodation
 - Advertising Agencies
 - Travel Agencies

## 2. Data Description

All the data will be fetched from the Foursquare API. The rough data format which will be used is:

 - Name
 - Latitude
 - Longitude
 - Category

## 3. Importing Libraries

In [1]:
# Load in the env file
%load_ext dotenv
%dotenv

from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder
import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np
import pandas as pd
import foursquare
import os
import folium

## 4. Instantiating the Foursquare client

In [2]:
fq_client = foursquare.Foursquare(
    client_id=os.getenv('FOURSQUARE_CLIENT_ID'), 
    client_secret=os.getenv('FOURSQUARE_CLIENT_SECRET')
)

## 5. Creating required functions

First, we need a function to help us fetch venues by category id and latitude, longitude of the interested place

In [3]:
def retrieve_venues_by_category_id(category_id: str, latlong: str) -> list:
    response = fq_client.venues.search(params={'ll': latlong, 'categoryId': category_id})
    venues = response['venues']
    
    final_venues = list()
    
    for venue in venues:
        venue_dict = {
            'Name': venue['name'],
            'Latitude': venue['location']['lat'],
            'Longitude': venue['location']['lng'],
            'Category': venue['categories'][0]['shortName']
        }
        
        final_venues.append(venue_dict)
    
    return final_venues

Now we need a function to help us plot the places on a map

In [4]:
def plot_on_map(
    latitude: str, 
    longitude: str, 
    start_zoom: int, 
    venue_dataframe: pd.DataFrame, 
    color: str,
    name: str,
    ) -> None:
    
    map_ = folium.Map(location=[latitude, longitude], zoom_start=start_zoom)
    
    for lat, long, label in zip(
        venue_dataframe['Latitude'], 
        venue_dataframe['Longitude'], 
        venue_dataframe['Name']
    ):
        label = folium.Popup(label, parse_html=True)
        
        folium.CircleMarker(
            [lat, long],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7,
            parse_html=False
        ).add_to(map_)
        
    map_.save(f"./../visualizations/{name}.html")
    return map_

Lastly, a function to create the dataframe, save it in CSV format and then returning the dataframe itself

In [5]:
def create_and_save_dataframe(data: list, name: str) -> pd.DataFrame:
    df = pd.DataFrame(data)
    df.to_csv(f"./../datasets/{name}.csv")
    
    return df

## 6. Fetching all required data from Foursquare

Fetching coordinates for Mumbai

In [6]:
address = 'Mumbai, India'

geolocator = Nominatim(user_agent="project_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
latlng_string = f"{latitude},{longitude}"

Now we start fetching the data we require from foursquare API and save and form a dataframe out of it

In [7]:
# VENUES
concert_hall_id = "5032792091d4c4b30a586d5c"
concert_hall_data = retrieve_venues_by_category_id(category_id=concert_hall_id, latlong=latlng_string)
concert_hall_df = create_and_save_dataframe(data=concert_hall_data, name="concerthall")

# ACCOMODATION
accomodation_id = "4bf58dd8d48988d1fa931735"
accomodation_data = retrieve_venues_by_category_id(category_id=accomodation_id, latlong=latlng_string)
accomodation_df = create_and_save_dataframe(data=accomodation_data, name="accomodation")

# ADVERTISING AGENCIES
ad_agency_id = "52e81612bcbc57f1066b7a3d"
ad_agency_data = retrieve_venues_by_category_id(category_id=ad_agency_id, latlong=latlng_string)
ad_agency_df = create_and_save_dataframe(data=ad_agency_data, name="adagency")

# TRAVEL AGENCIES
travel_agency_id = "4f04b08c2fb6e1c99f3db0bd"
travel_agency_data = retrieve_venues_by_category_id(category_id=travel_agency_id, latlong=latlng_string)
travel_agency_df = create_and_save_dataframe(data=travel_agency_data, name="travelagency")

Now we combine all of them into one dataset

In [8]:
final_data = list()

final_data.extend(concert_hall_data)
final_data.extend(accomodation_data)
final_data.extend(ad_agency_data)
final_data.extend(travel_agency_data)

In [9]:
final_df = create_and_save_dataframe(data=final_data, name="final")

final_df.shape

(100, 4)

## 7. Displaying all locations on Map

Concert Halls Map

In [10]:
map_ = plot_on_map(
        latitude=latitude, 
        longitude=longitude, 
        start_zoom=11, 
        venue_dataframe=concert_hall_df, 
        color="blue", 
        name="Concert Halls"
    )

map_

Accomodations Map

In [11]:
map_ = plot_on_map(
        latitude=latitude, 
        longitude=longitude, 
        start_zoom=11, 
        venue_dataframe=accomodation_df, 
        color="green", 
        name="Accomodations"
    )

map_

Advertising Agencies Map

In [12]:
map_ = plot_on_map(
        latitude=latitude, 
        longitude=longitude, 
        start_zoom=11, 
        venue_dataframe=ad_agency_df, 
        color="red", 
        name="Advertising Agencies"
    )

map_

Travel Agencies Map

In [41]:
map_ = plot_on_map(
        latitude=latitude, 
        longitude=longitude, 
        start_zoom=11, 
        venue_dataframe=travel_agency_df, 
        color="yellow", 
        name="Travel Agencies"
    )

map_

## 8. Cluster Analysis

We'll create 10 clusters of data with the venues we have generated

In [14]:
k_clusters = 10

final_df_stripped = final_df.drop('Name', 1)
final_df_stripped = final_df_stripped.drop('Category', 1)
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(final_df_stripped)

In [15]:
kmeans.labels_

array([8, 6, 1, 0, 7, 0, 7, 0, 4, 4, 1, 7, 2, 6, 5, 2, 7, 0, 1, 0, 6, 0,
       2, 1, 5, 7, 0, 8, 0, 1, 9, 1, 1, 6, 0, 9, 0, 8, 6, 8, 9, 9, 0, 0,
       3, 6, 8, 0, 0, 8, 0, 8, 2, 0, 6, 0, 2, 6, 6, 9, 8, 4, 4, 0, 7, 5,
       6, 4, 8, 6, 5, 2, 9, 0, 7, 4, 1, 8, 1, 1, 1, 1, 1, 2, 0, 7, 0, 6,
       8, 9, 8, 0, 6, 9, 7, 8, 6, 0, 5, 6])

In [16]:
final_df["Cluster Labels"] = kmeans.labels_

In [17]:
final_df

Unnamed: 0,Name,Latitude,Longitude,Category,Cluster Labels
0,Youtube Fanfest,19.071101,72.874889,Concert Hall,8
1,National Centre for the Performing Arts (NCPA),18.925631,72.819861,Performing Arts,6
2,18.99 Latitude,19.006190,72.828195,Concert Hall,1
3,"aditya birla hall, babulnath",19.089628,72.851738,Concert Hall,0
4,balashram,19.240656,72.860563,Concert Hall,7
...,...,...,...,...,...
95,Flight Shop,19.100964,72.918282,Travel Agency,8
96,Global Aviation Service,18.937053,72.827427,Travel Agency,6
97,Anjali Travels And Tours,19.104370,72.853830,Travel Agency,0
98,Rto Office,19.085346,73.009686,Travel Agency,5


Saving the dataframe as a CSV file

In [42]:
final_df.to_csv("./../datasets/clustered.csv")

Now let's view the venues clusterwise

In [43]:
for i in range(k_clusters):
    print(f"CLUSTER {i + 1}")
    print(final_df[final_df["Cluster Labels"] == i][["Name", "Category"]])

CLUSTER 1
                                                 Name            Category
3                        aditya birla hall, babulnath        Concert Hall
5                                   Chatwani Banquets        Concert Hall
7   JRM Grounds (Vile Parle), Colosseum Pre Fest E...        Concert Hall
17                        JRM Grounds, Vile Parle (W)        Concert Hall
19                                          Filmalaya        Concert Hall
21                            Navin Bhai Thakkar Hall        Concert Hall
26                               Swami Nityanand Hall        Concert Hall
28                                  Chatwani Banquets        Concert Hall
34             Courtyard Mumbai International Airport               Hotel
36                            JW Marriott Mumbai Juhu               Hotel
42                                 Hotel Sea Princess               Hotel
43                                         Ibis Hotel               Hotel
47                       Rad

Plotting a map displaying all clusters

In [44]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(final_df['Latitude'], final_df['Longitude'], final_df['Name'], final_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters.save("./../visualizations/clusters.html")
map_clusters