# Distinctive Venue Characteristics of Pilgrimage and Non-Pilgrimage Cities - A Case Study of Saudi Arabia

### Introduction
The Kingdom of Saudi Arabia is one of the major producers of petroleum products in the world. Its economy is mainly dependent on the export of petroleum and petro-chemical products. It is also an important country for Muslims because two of their pilgrimage areas namely the Holy Kaaba and the Mosque of the Prophet Muhammad (peace be upon him) are situated in Mecca and Madinah, respectively. The Mecca and the Madinah are the two holy cities. 



### Problem Description
Every year a great number of people visit these two cities. In 2019, more than 2 million people visited Mecca for performing the hajj (an Islamic ritual). The number of religious tourists have been following an increasing trend in this country and it is expected that in 2025 the number of religious tourists will rise to between 25 and 30 million. The increase in tourists will support the country’s economy. However, it is also important for the planner and the decision maker to understand the associated challenges to meet the requirements of the tourists. Therefore, it is important for the planner and the decision maker to understand the venue characteristics of these cities and their distinctiveness compared other non-pilgrimage cities. Generally, it is perceived that the venue characteristics will be different for pilgrimage cities and non-pilgrimage cities. 

### Data 
The main source of the data for this project is the Foursquare. A data-frame was created based on the selected six cities (i.e., Mecca, Madinah, Khobar, Dammam, Riyad, and Jeddah) of Saudi Arabia and their location information. The information pertinent to venues were collected from Foursquare.  

### Methodology
In this project, six cities are selected namely, Mecca, Madinah, Khobar, Dammam, Riyad, and Jeddah. Mecca and Madinah are known as pilgrimage cities. Other selected cities are not known as pilgrimage cities. For each city, 100 venues are selected for the radius of 5 kilometer. The 10 most common venues are ranked for each city. Finally, a clustering technique (i.e., K-mean clustering technique) is used to cluster the cities based on the most common venues. The input data for the clustering technique were pre-processed through the principal component analysis to reduce the number of variables. In this study, we considered 95% variability in selecting the input for the clustering technique. Finally, the clustering results will be used to describe the distinctive characteristics of pilgrimage and non-pilgrimage cities. 

In [1]:
# Import Necessary Libraries

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')


import folium
from geopy.geocoders import Nominatim
#from pyproj import Proj
from tqdm import tqdm
import requests
from collections import deque
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib import pyplot as plt

%matplotlib inline

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

## Location of Saudi Arabia

In [74]:
# location of Saudi Arabia
geolocator = Nominatim(user_agent="foursquare_agent")
loc_Madinah = geolocator.geocode('Saudi Arabia')
lat_SA = loc_Madinah.latitude
long_SA = loc_Madinah.longitude

### Dataframe consists of cities and their location

In [39]:
cities = pd.DataFrame(columns=['city', 'latitude', 'longitude'])
cities=cities.append({'city':'Mecca', 'latitude': 0, 'longitude': 0}, ignore_index=True)
cities=cities.append({'city':'Madinah', 'latitude': 0, 'longitude': 0}, ignore_index=True)
cities=cities.append({'city':'Khobar', 'latitude': 0, 'longitude': 0}, ignore_index=True)
cities=cities.append({'city':'Dammam', 'latitude': 0, 'longitude': 0}, ignore_index=True)
cities=cities.append({'city':'Riyad', 'latitude': 0, 'longitude': 0}, ignore_index=True)
cities=cities.append({'city':'Jeddah', 'latitude': 0, 'longitude': 0}, ignore_index=True)
import time
#cities['latitude']=0
print(cities['city'][1])
for x in range(0,len(cities)):
    location = geolocator.geocode(cities['city'][x])
    time.sleep(2)
    cities.at[x, 'latitude']=location.latitude
    cities.at[x, 'longitude']=location.longitude
print("data",cities, sep='\n')

Madinah
data
      city latitude longitude
0    Mecca  21.4208   39.8269
1  Madinah  24.4712   39.6111
2   Khobar   26.304    50.196
3   Dammam  26.4368    50.104
4    Riyad   24.632   46.7151
5   Jeddah  21.5822    39.164


#### Define Foursquare Credentials

In [40]:
CLIENT_ID = 'V4WGC1JD24CAQTRLRUVNBRK41I5O3INEANYG3TNBI1LHR1VC' # Foursquare ID
CLIENT_SECRET = 'NGQPVQ4LIXSKIZ3R4VQBWMDAEWCIY3Z2OGGWTJSMUYZ4OKX2' # Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: V4WGC1JD24CAQTRLRUVNBRK41I5O3INEANYG3TNBI1LHR1VC
CLIENT_SECRET:NGQPVQ4LIXSKIZ3R4VQBWMDAEWCIY3Z2OGGWTJSMUYZ4OKX2


### Function for Getting Nearby Venues

In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in tqdm(zip(names, latitudes, longitudes), total = names.size):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Venue Information

In [44]:
City_venues = getNearbyVenues(cities.city,
                            cities.latitude,
                            cities.longitude)

100%|██████████| 6/6 [00:03<00:00,  1.80it/s]


In [45]:
City_venues.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mecca,21.420847,39.826869,Makkah Clock Royal Tower (برج الساعة),21.419935,39.825555,Building
1,Mecca,21.420847,39.826869,Ice Cream Alasema (ايسكريم العاصمة),21.418833,39.827217,Ice Cream Shop
2,Mecca,21.420847,39.826869,Raffles Makkah Palace (قصر مكة رافلز),21.419659,39.825415,Hotel
3,Mecca,21.420847,39.826869,King Abdulaziz Endowment (وقف الملك عبدالعزيز),21.419938,39.825561,Hotel
4,Mecca,21.420847,39.826869,Clock Tower Museum,21.418184,39.825581,Museum


In [46]:

City_venues.groupby("City").Venue.count().sort_values(ascending=False).head()

City
Riyad      100
Mecca      100
Madinah    100
Khobar     100
Jeddah     100
Name: Venue, dtype: int64

In [47]:
#Number of Unique Venue Categories
print('There are {} uniques categories.'.format(len(City_venues['Venue Category'].unique())))

There are 123 uniques categories.


In [48]:
City_t = pd.get_dummies(City_venues["Venue Category"],
                             prefix = "",
                             prefix_sep = "")

City_t["City"] = City_venues["City"]


nindex = list(City_t.columns).index("City")
cols = deque(City_t.columns)
cols.rotate(-nindex)
cols = list(cols)
City_t = City_t[cols]

City_t.head()

Unnamed: 0,City,African Restaurant,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,...,Tea Room,Theme Restaurant,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Waterfront,Wings Joint
0,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [63]:
City_grouped = City_t.groupby('City').mean().reset_index()
City_grouped.head()

Unnamed: 0,City,African Restaurant,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,...,Tea Room,Theme Restaurant,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Waterfront,Wings Joint
0,Dammam,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Jeddah,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
2,Khobar,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Madinah,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
4,Mecca,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01


In [64]:
City_grouped.shape

(6, 124)

In [65]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top Common Venues 

In [68]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
City_venues_sorted = pd.DataFrame(columns=columns)
City_venues_sorted['City'] = City_grouped['City']

for ind in np.arange(City_grouped.shape[0]):
    City_venues_sorted.iloc[ind, 1:] = return_most_common_venues(City_grouped.iloc[ind, :], num_top_venues)

City_venues_sorted

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dammam,Coffee Shop,Dessert Shop,Bakery,Ice Cream Shop,Juice Bar,Café,Donut Shop,Burger Joint,Middle Eastern Restaurant,Gym / Fitness Center
1,Jeddah,Coffee Shop,Dessert Shop,Breakfast Spot,Hotel,Ice Cream Shop,Café,Gym / Fitness Center,Fried Chicken Joint,Lounge,Chinese Restaurant
2,Khobar,Coffee Shop,Bakery,Food Truck,Café,Gym / Fitness Center,Donut Shop,Dessert Shop,Park,Hotel,Middle Eastern Restaurant
3,Madinah,Hotel,Café,Dessert Shop,Coffee Shop,Ice Cream Shop,Breakfast Spot,Middle Eastern Restaurant,Bagel Shop,Fried Chicken Joint,Juice Bar
4,Mecca,Hotel,Coffee Shop,Restaurant,Middle Eastern Restaurant,Café,Fast Food Restaurant,Breakfast Spot,Fried Chicken Joint,Ice Cream Shop,African Restaurant
5,Riyad,Coffee Shop,Donut Shop,Breakfast Spot,Middle Eastern Restaurant,Bakery,Hotel,Historic Site,Candy Store,Sandwich Place,History Museum


### Clustering Analysis

In [56]:
#City Clustering
#preprocessing
pca = PCA(.95)
City_grouped_clustering = pca.fit_transform(City_grouped.drop('City', 1))


In [69]:

# set number of clusters
kclusters = 2

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(City_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:10])
print(kmeans.labels_.shape)

[1 1 1 0 0 1]
(6,)


In [70]:
#cluster label
City_grouped["Cluster Labels"] = kmeans.labels_ + 1

# add clustering labels
City_combined = cities.merge(City_grouped, left_on = "city", right_on = "City", how = "outer")
City_combined = City_combined.join(City_venues_sorted.set_index('City'), on='City')

City_combined["Cluster Labels"] = City_combined["Cluster Labels"].fillna(0).astype("int")

City_combined.head() # check the last columns!

Unnamed: 0,city,latitude,longitude,City,African Restaurant,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mecca,21.4208,39.8269,Mecca,0.02,0.0,0.01,0.0,0.0,0.0,...,Hotel,Coffee Shop,Restaurant,Middle Eastern Restaurant,Café,Fast Food Restaurant,Breakfast Spot,Fried Chicken Joint,Ice Cream Shop,African Restaurant
1,Madinah,24.4712,39.6111,Madinah,0.02,0.0,0.0,0.0,0.0,0.01,...,Hotel,Café,Dessert Shop,Coffee Shop,Ice Cream Shop,Breakfast Spot,Middle Eastern Restaurant,Bagel Shop,Fried Chicken Joint,Juice Bar
2,Khobar,26.304,50.196,Khobar,0.0,0.02,0.0,0.0,0.0,0.0,...,Coffee Shop,Bakery,Food Truck,Café,Gym / Fitness Center,Donut Shop,Dessert Shop,Park,Hotel,Middle Eastern Restaurant
3,Dammam,26.4368,50.104,Dammam,0.0,0.0,0.0,0.0,0.01,0.02,...,Coffee Shop,Dessert Shop,Bakery,Ice Cream Shop,Juice Bar,Café,Donut Shop,Burger Joint,Middle Eastern Restaurant,Gym / Fitness Center
4,Riyad,24.632,46.7151,Riyad,0.0,0.0,0.01,0.01,0.0,0.0,...,Coffee Shop,Donut Shop,Breakfast Spot,Middle Eastern Restaurant,Bakery,Hotel,Historic Site,Candy Store,Sandwich Place,History Museum


### Results

In [71]:
# create map

map_clusters = folium.Map(location=[lat_SA, long_SA], zoom_start=5)

kclusters = kclusters + 1

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, cluster in zip(City_combined['latitude'],
                                  City_combined['longitude'],
                                  City_combined['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters)
       
map_clusters

In [72]:
City_combined.loc[City_combined['Cluster Labels'] == 1, 
                     "1st Most Common Venue":"10th Most Common Venue"].head()

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hotel,Coffee Shop,Restaurant,Middle Eastern Restaurant,Café,Fast Food Restaurant,Breakfast Spot,Fried Chicken Joint,Ice Cream Shop,African Restaurant
1,Hotel,Café,Dessert Shop,Coffee Shop,Ice Cream Shop,Breakfast Spot,Middle Eastern Restaurant,Bagel Shop,Fried Chicken Joint,Juice Bar


In [73]:
City_combined.loc[City_combined['Cluster Labels'] == 2, 
                     "1st Most Common Venue":"10th Most Common Venue"].head()

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Coffee Shop,Bakery,Food Truck,Café,Gym / Fitness Center,Donut Shop,Dessert Shop,Park,Hotel,Middle Eastern Restaurant
3,Coffee Shop,Dessert Shop,Bakery,Ice Cream Shop,Juice Bar,Café,Donut Shop,Burger Joint,Middle Eastern Restaurant,Gym / Fitness Center
4,Coffee Shop,Donut Shop,Breakfast Spot,Middle Eastern Restaurant,Bakery,Hotel,Historic Site,Candy Store,Sandwich Place,History Museum
5,Coffee Shop,Dessert Shop,Breakfast Spot,Hotel,Ice Cream Shop,Café,Gym / Fitness Center,Fried Chicken Joint,Lounge,Chinese Restaurant


### Discussion
The first most common venue for the cities Dammam, Jeddah, Khobar, and Riyad is coffee shop and on the other hand, the first most common venue for the pilgrimage cities (i.e., Mecca and Madinah) is hotel. The fitness center is one of the most common venues for 3 non-pilgrimage cities but this venue type is not one of the most common venues for the pilgrimage cities. Based on the venue ranking, it appears that the pilgrimage and non-pilgrimage cities are distinct.
The further studies on the venue characteristics of pilgrimage and non-pilgrimage cities will help the planner to plan the new development required for welcoming additional tourists including the religious tourists. 


### Conclusion
This project investigated the most common venues of six selected cities of Saudi Arabia. Based on venues, the cities are clustered into two classes using K-mean clustering technique. The clustering model successfully separated the pilgrimage and non-pilgrimage cities. Based on this very limited scale study, it is concluded that the most common venues of pilgrimage and non-pilgrimage cities are different. Therefore, the planner should investigate this issue comprehensively so that the expected great number of tourists in the pilgrimage cities can be welcomed appropriately. 