# INTRODUCTION
Saudi Arabia has been in the news a lot recently, many of the stories have been about the reforms MBS is instituting. Many don't beleive that Saudi Arabia can modernize. I will screen the major cities in Saudi Arabia using KNN to determine how many cities are similar to Mecca. My hypothesis is Mecca is the most international and most open city. Because over two million tourists come to Mecca for their Hajj every year, the city must cater to people from many backgrounds. By using the Foursquare API I will analyze which cities are similar to Mecca. If many cities are similar, then we can conclude that other cities are becoming more international. This will also be visualized using the folium libary

In [1]:
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim 
from sklearn.cluster import KMeans
import folium
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors

# Data
By using the top 11 most populated cities in Saudi Arabia I will analyze the most popular venues in each.

In [2]:
data = pd.read_csv("SaudiArabia.csv")
data

Unnamed: 0,City,Province,Population,Density,Urban Area(km2),Metro Area(km2),Latitude,Longitude
0,Riyadh,Riyadh,6500000,"3,024/km2",1000.0,1815.0,24.68216,46.68719
1,Jeddah,Makkah,3900000,"2,921/km2",1500.0,3000.0,21.48169,39.18284
2,Mecca,Makkah,1800000,"4,200/km2",850.0,1200.0,21.42111,39.80692
3,Medina,Al Madinah,1600000,,589.0,,24.46728,39.60641
4,Dammam,Eastern,1300000,,800.0,,26.283,50.2
5,Tabuk,Tabuk,800000,,,,28.3613,36.5692
6,Buraidah,Al-Qassim,700000,360/km2,1291.0,1290.0,26.34888,43.95771
7,Khamis Mushait,Asir,600000,,,,18.30609,42.73392
8,Abha,Asir,500000,,,,18.21691,42.50088
9,Al-Khobar,Eastern,400000,,,,26.28664,50.21435


In [3]:
data.shape

(12, 8)

In [4]:
data.drop(['Province', 'Population', 'Density', 'Urban Area(km2)', 'Metro Area(km2)'], axis=1, inplace=True)
data.head()

Unnamed: 0,City,Latitude,Longitude
0,Riyadh,24.68216,46.68719
1,Jeddah,21.48169,39.18284
2,Mecca,21.42111,39.80692
3,Medina,24.46728,39.60641
4,Dammam,26.283,50.2


# Methodology
Import Foursquare credentials, and visualize data using the Folium Library

In [5]:
CLIENT_ID = 'VJ5SPP3S0SZGHZ2EEUWXFPFHRDDYJHSJ0GMYDTFQ2K3MBKKT' # your Foursquare ID
CLIENT_SECRET = 'Z2HIKBSVLZDT1LCYENZNJWEETSGTN2HA32ZUZMIJJPZP2QFT' # your Foursquare Secret
VERSION = '20181022'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VJ5SPP3S0SZGHZ2EEUWXFPFHRDDYJHSJ0GMYDTFQ2K3MBKKT
CLIENT_SECRET:Z2HIKBSVLZDT1LCYENZNJWEETSGTN2HA32ZUZMIJJPZP2QFT


## Query Nominatim to Get Longitude and Latitude coordinates

In [6]:
address = 'Saudi Arabia'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of SaudiArabia are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of SaudiArabia are 25.6242618, 42.3528328.


## Import Folium and related libaries

In [None]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


print('Libraries imported.')

## Visualize location of each city Using Folium

In [11]:
map_Saudi = folium.Map(location=[latitude, longitude], zoom_start=4)

for lat, lng, city in zip(data['Latitude'], data['Longitude'],data['City']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Saudi)  
    
map_Saudi

## Query Foursquare for each city 

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Explore each city

In [13]:
saudi_venues = getNearbyVenues(names=data['City'],
                                   latitudes=data['Latitude'],
                                   longitudes=data['Longitude']
                                  )

Riyadh
Jeddah
Mecca
Medina
Dammam
Tabuk
Buraidah
Khamis Mushait
Abha
Al-Khobar
Al Bahah
Najran


In [14]:
saudi_venues.rename(columns={'Neighborhood':'City'},inplace=True)
saudi_venues.head()

Unnamed: 0,City,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Riyadh,24.68216,46.68719,dr.CAFE COFFEE (د.كيف),24.679473,46.686706,Coffee Shop
1,Riyadh,24.68216,46.68719,Pizza Roma,24.686092,46.686299,Pizza Place
2,Riyadh,24.68216,46.68719,Starbucks (ستاربكس),24.689529,46.685666,Coffee Shop
3,Riyadh,24.68216,46.68719,Harvey Nichols (هارڤي نيكلز),24.688972,46.684727,Department Store
4,Riyadh,24.68216,46.68719,Zara (زارا),24.688991,46.683231,Boutique


## Number of Venues returned from each city

In [15]:
saudi_venues.groupby('City').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abha,30,30,30,30,30,30
Al Bahah,12,12,12,12,12,12
Al-Khobar,30,30,30,30,30,30
Buraidah,30,30,30,30,30,30
Dammam,30,30,30,30,30,30
Jeddah,30,30,30,30,30,30
Khamis Mushait,30,30,30,30,30,30
Mecca,30,30,30,30,30,30
Medina,30,30,30,30,30,30
Najran,14,14,14,14,14,14


In [16]:
print('There are {} uniques categories.'.format(len(saudi_venues['Venue Category'].unique())))

There are 88 uniques categories.


In [18]:
# one hot encoding
saudi_onehot = pd.get_dummies(saudi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
saudi_onehot['City'] = data['City'] 

# move neighborhood column to the first column
index = saudi_onehot.columns.get_loc("City")
fixed_columns = [saudi_onehot.columns[int(index)]] + list(saudi_onehot.columns[:int(index)]) + list(saudi_onehot.columns[int(index)+1:saudi_onehot.shape[1]])
saudi_onehot = saudi_onehot[fixed_columns]

saudi_onehot.head(20)

Unnamed: 0,City,African Restaurant,Airport,American Restaurant,Antique Shop,Arepa Restaurant,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,...,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Tea Room,Track,Trail,Turkish Restaurant,Watch Shop,Waterfront
0,Riyadh,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Jeddah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Medina,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Dammam,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Tabuk,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Buraidah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Khamis Mushait,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Abha,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Al-Khobar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
city_grouped = saudi_onehot.groupby('City').mean().reset_index()
city_grouped

Unnamed: 0,City,African Restaurant,Airport,American Restaurant,Antique Shop,Arepa Restaurant,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,...,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Tea Room,Track,Trail,Turkish Restaurant,Watch Shop,Waterfront
0,Abha,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Al Bahah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Al-Khobar,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Buraidah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Dammam,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Jeddah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Khamis Mushait,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Mecca,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Medina,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Najran,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Find top 5 Venues for each city

In [20]:
num_top_venues = 5

for hood in city_grouped['City']:
    print("----"+hood+"----")
    temp = city_grouped[city_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abha----
                       venue  freq
0                Pizza Place   1.0
1         African Restaurant   0.0
2                     Museum   0.0
3                     Mosque   0.0
4  Middle Eastern Restaurant   0.0


----Al Bahah----
                  venue  freq
0  Gym / Fitness Center   1.0
1    African Restaurant   0.0
2         Jewelry Store   0.0
3              Mountain   0.0
4                Mosque   0.0


----Al-Khobar----
           venue  freq
0    Coffee Shop   1.0
1  Grocery Store   0.0
2         Museum   0.0
3       Mountain   0.0
4         Mosque   0.0


----Buraidah----
                venue  freq
0         Men's Store   1.0
1  African Restaurant   0.0
2             Airport   0.0
3            Mountain   0.0
4              Mosque   0.0


----Dammam----
                venue  freq
0            Boutique   1.0
1  African Restaurant   0.0
2         Music Venue   0.0
3            Mountain   0.0
4              Mosque   0.0


----Jeddah----
                       venue  f

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Final Dataframe

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['City'] = city_grouped['City']

for ind in np.arange(city_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(city_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abha,Pizza Place,Waterfront,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
1,Al Bahah,Gym / Fitness Center,Waterfront,Eastern European Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
2,Al-Khobar,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
3,Buraidah,Men's Store,Waterfront,Garden,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
4,Dammam,Boutique,Waterfront,Coffee Shop,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
5,Jeddah,Pizza Place,Waterfront,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
6,Khamis Mushait,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
7,Mecca,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
8,Medina,Department Store,Waterfront,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Dessert Shop,Diner,Doner Restaurant,Donut Shop
9,Najran,Shopping Mall,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop


# KNN To cluster Cities

In [34]:
# set number of clusters
kclusters = 3

city_grouped_clustering = city_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(city_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 1, 0, 0, 2, 1, 1, 0, 0], dtype=int32)

In [35]:
city_merged = data

# add clustering labels
city_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
city_merged = city_merged.join(city_venues_sorted.set_index('City'), on='City')

city_merged.head() # check the last columns!

Unnamed: 0,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Riyadh,24.68216,46.68719,2,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
1,Jeddah,21.48169,39.18284,0,Pizza Place,Waterfront,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
2,Mecca,21.42111,39.80692,1,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
3,Medina,24.46728,39.60641,0,Department Store,Waterfront,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Dessert Shop,Diner,Doner Restaurant,Donut Shop
4,Dammam,26.283,50.2,0,Boutique,Waterfront,Coffee Shop,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop


# Folium visualization of KNN Analysis 

In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(city_merged['Latitude'], city_merged['Longitude'], city_merged['City'], city_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Cluster 1

In [37]:
city_merged.loc[city_merged['Cluster Labels'] == 0, city_merged.columns[[0] + list(range(4, city_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Jeddah,Pizza Place,Waterfront,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
3,Medina,Department Store,Waterfront,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Dessert Shop,Diner,Doner Restaurant,Donut Shop
4,Dammam,Boutique,Waterfront,Coffee Shop,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
8,Abha,Pizza Place,Waterfront,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
9,Al-Khobar,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
11,Najran,Shopping Mall,Clothing Store,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop


# Cluster 2

In [39]:
city_merged.loc[city_merged['Cluster Labels'] == 1, city_merged.columns[[0] + list(range(4, city_merged.shape[1]))]]


Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Mecca,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
6,Buraidah,Men's Store,Waterfront,Garden,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant
7,Khamis Mushait,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
10,Al Bahah,Gym / Fitness Center,Waterfront,Eastern European Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant


# Cluster 3

In [40]:
city_merged.loc[city_merged['Cluster Labels'] == 2, city_merged.columns[[0] + list(range(4, city_merged.shape[1]))]]

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Riyadh,Coffee Shop,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop
5,Tabuk,Electronics Store,Watch Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Diner,Doner Restaurant,Donut Shop


# Discussion:

Cluster 1 Returned:<br>
Jeddah <br> Medina<br>
Damman
<br>Abha
<br>Al-Khobar
<br>Najran

Cluster 2 returned:<br>
Mecca
<br>Buraidah
<br>Khamis Mushait
<br>Al Bahah
<br><br>
Cluster 3 returned:
<br>Riyadh
<br>Tabuk

# Conclusion

Given that neither Riyadh nor Jeddah ended up in the same cluster as Mecca we can't conclude that the rest of the coutnry is opening up based only on the foursquare data. Instead we find that coastal cities have similar venues and we find inland cities have similar venues. Further analysis could be done by doing a historical analysis on the changes going on within each city.