<h1 align=center><img src = "https://www.weather-atlas.com/weather/images/city/4/3/2042934-1500.jpg" width = 400> </a>

<h1 align=center><font size = 5>Riyadh Gym & Fitness centers analysis</font></h1>

## Introduction

Riyadh has accounted for the maximum number of organized fitness centers in KSA, primarily for being the commercial capital of the country.

Here, we present some insights to help business owners who are interested in this industry and are looking to build a gym inside Riyadh to know which districts should they target.

## Data 

Data related to  Riyadh districts was tricky to find, however after going through several github links I found some large datasets in the formats of JSON and SQL.

The JSON link includes all boundaries of all the districts in Saudi Arabia, however we are only interested in the coordinates of Riyadh (Latitudes, Longitudes)

The links:

https://github.com/Faisal0sal/Saudi-Arabia-Regions-Cities-and-Districts
https://github.com/aalmangour/Saudi_GIS_Data/blob/master/sa_neighborhoods.sql

After analyzing the data, preprocessing the SQL file through Excel Power Query to clean it and process it to be in the desired format (shown here)
The Foursquare API is also used to find out popular venue details and categories in each district
GeoPy Python library is used as well for Maps visualizations

 


*Disclaimer:*

*The overall data quality is rather poor due to the fact that places in Saudi Arabia doesn’t provide Foursquare with the level of detail that is essential to this study. It’s also worth mentioning that spelling of districts names makes it difficult to match the data required and thus that results in missing data points in some cases.*


## Table of Contents

[Data section](#Data-processing) </a>

[Results](#Results) </a>

[Clustering](#Clustering) </a>
 

### Import necessary Libraries

In [24]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.

### Define Foursquare Credentials and Version

In [25]:
CLIENT_ID = '40TUM012AY1QSEBJISSYUMRL5KSHJTEUOQPEGWO2NCMFB0AM' # your Foursquare ID
CLIENT_SECRET = 'MYCH5UVVOJJNXRNRA4LPRFDFRKE1ENZF5DIJTCWRIQGJLUHG' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 40TUM012AY1QSEBJISSYUMRL5KSHJTEUOQPEGWO2NCMFB0AM
CLIENT_SECRET:MYCH5UVVOJJNXRNRA4LPRFDFRKE1ENZF5DIJTCWRIQGJLUHG


#### Getting Riyadh City address (latitude and longitude coordinates)

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>foursquare_agent</em>, as shown below.

In [26]:
address = 'Riyadh, Saudi Arabia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Riyadh are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Riyadh are 24.6319692, 46.7150648.


<a id="item1"></a>

#### Search for a specific venue category (Gyms)
> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&query=`**QUERY**`&radius=`**RADIUS**`&limit=`**LIMIT**

#### Now, we define a query to search for Gyms that is within 500,000 metres from the center of Riyadh

In [27]:
search_query = 'gym'
radius = 500000
print(search_query + ' .... OK!')

gym .... OK!


#### Define the corresponding URL

In [28]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=40TUM012AY1QSEBJISSYUMRL5KSHJTEUOQPEGWO2NCMFB0AM&client_secret=MYCH5UVVOJJNXRNRA4LPRFDFRKE1ENZF5DIJTCWRIQGJLUHG&ll=24.6319692,46.7150648&v=20180604&query=gym&radius=500000&limit=500'

#### Send the GET Request and examine the results

In [29]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee35e709ca1e36a52e22bc9'},
 'response': {'venues': [{'id': '5b8ab80be96d0c00397ae123',
    'name': 'Gold’s Gym',
    'location': {'lat': 24.665898,
     'lng': 46.681739,
     'labeledLatLngs': [{'label': 'display',
       'lat': 24.665898,
       'lng': 46.681739}],
     'distance': 5062,
     'postalCode': '12714',
     'cc': 'SA',
     'city': 'الرياض',
     'state': 'منطقة الرياض\u200e',
     'country': 'المملكة العربية السعودية',
     'formattedAddress': ['الرياض 12714', 'المملكة العربية السعودية']},
    'categories': [{'id': '4bf58dd8d48988d176941735',
      'name': 'Gym',
      'pluralName': 'Gyms',
      'shortName': 'Gym',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/gym_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1591959186',
    'hasPerk': False},
   {'id': '53b309c3498ee6ba6d20a978',
    'name': 'Royal Guard Gym',
    'location': {'lat': 24.648900104919694,
     'lng': 46.69550

#### Get relevant part of JSON and transform it into a *pandas* dataframe

In [30]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.crossStreet
0,5b8ab80be96d0c00397ae123,Gold’s Gym,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1591959186,False,24.665898,46.681739,"[{'label': 'display', 'lat': 24.665898, 'lng':...",5062,12714.0,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض 12714, المملكة العربية السعودية]",,
1,53b309c3498ee6ba6d20a978,Royal Guard Gym,[],v-1591959186,False,24.6489,46.695501,"[{'label': 'display', 'lat': 24.64890010491969...",2733,,SA,,,المملكة العربية السعودية,[المملكة العربية السعودية],,
2,5d2c85c4c386450030ee8e6e,Gold’s Gym (جولدز جيم),"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",v-1591959186,False,24.666643,46.681526,"[{'label': 'display', 'lat': 24.666643, 'lng':...",5139,12714.0,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض 12714, المملكة العربية السعودية]",,
3,52ada10311d2696d665f2f2c,Sheraton Gym,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1591959186,False,24.638206,46.712072,"[{'label': 'display', 'lat': 24.63820564155472...",757,,SA,Rhyad,,المملكة العربية السعودية,"[Rhyad, المملكة العربية السعودية]",,
4,5223630011d2750f437ce89a,Male Gym / Fitness Center - Armed Forces Offic...,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",v-1591959186,False,24.701574,46.716957,"[{'label': 'display', 'lat': 24.70157393814730...",7750,,SA,,,المملكة العربية السعودية,[المملكة العربية السعودية],,


#### Define information of interest and filter dataframe

In [31]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,address,crossStreet,id
0,Gold’s Gym,Gym,24.665898,46.681739,"[{'label': 'display', 'lat': 24.665898, 'lng':...",5062,12714.0,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض 12714, المملكة العربية السعودية]",,,5b8ab80be96d0c00397ae123
1,Royal Guard Gym,,24.6489,46.695501,"[{'label': 'display', 'lat': 24.64890010491969...",2733,,SA,,,المملكة العربية السعودية,[المملكة العربية السعودية],,,53b309c3498ee6ba6d20a978
2,Gold’s Gym (جولدز جيم),Gym / Fitness Center,24.666643,46.681526,"[{'label': 'display', 'lat': 24.666643, 'lng':...",5139,12714.0,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض 12714, المملكة العربية السعودية]",,,5d2c85c4c386450030ee8e6e
3,Sheraton Gym,Gym,24.638206,46.712072,"[{'label': 'display', 'lat': 24.63820564155472...",757,,SA,Rhyad,,المملكة العربية السعودية,"[Rhyad, المملكة العربية السعودية]",,,52ada10311d2696d665f2f2c
4,Male Gym / Fitness Center - Armed Forces Offic...,Gym / Fitness Center,24.701574,46.716957,"[{'label': 'display', 'lat': 24.70157393814730...",7750,,SA,,,المملكة العربية السعودية,[المملكة العربية السعودية],,,5223630011d2750f437ce89a
5,Extreme Gym,Gym,24.640476,46.717504,"[{'label': 'display', 'lat': 24.64047648159066...",978,,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض, المملكة العربية السعودية]",,,4e91d6c577c89cb9223fd9ec
6,Alfrusiya Gym (نادي الرياضي - الفروسية),Gym / Fitness Center,24.66353,46.734733,"[{'label': 'display', 'lat': 24.66353026147303...",4037,,SA,,,المملكة العربية السعودية,[المملكة العربية السعودية],,,56b75dad498e16a7309bf58a
7,Holiday Inn Izdihar - Gym,Gym,24.788135,46.714072,"[{'label': 'display', 'lat': 24.78813501889443...",17384,,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[2907 Sh. Abdulwahab Bin Abdullah St,, الرياض,...","2907 Sh. Abdulwahab Bin Abdullah St,",,51f6d12d498e576541bccece
8,Tulip Inn Fitness Gym,Gym,24.643001,46.717967,"[{'label': 'display', 'lat': 24.64300083373137...",1262,11416.0,SA,Riyadh,KSA,المملكة العربية السعودية,"[Al Batha (Bin Jalawi), Riyadh 11416, المملكة ...",Al Batha,Bin Jalawi,507a99dde4b0e1f138c292fe
9,Gym of knig salman Air Base,Gym,24.716215,46.740398,"[{'label': 'display', 'lat': 24.71621531112510...",9721,,SA,الرياض,منطقة الرياض‎,المملكة العربية السعودية,"[الرياض, المملكة العربية السعودية]",,,59663aa665cdf80755ca9b72


#### Let's visualize the Gyms that are nearby

In [32]:
dataframe_filtered.name

0                                            Gold’s Gym
1                                       Royal Guard Gym
2                                Gold’s Gym (جولدز جيم)
3                                          Sheraton Gym
4     Male Gym / Fitness Center - Armed Forces Offic...
5                                           Extreme Gym
6               Alfrusiya Gym (نادي الرياضي - الفروسية)
7                             Holiday Inn Izdihar - Gym
8                                 Tulip Inn Fitness Gym
9                           Gym of knig salman Air Base
10                               Gym Kitchen (جيم كتشن)
11                                           Hilton Gym
12                                GYM - Executive Hotel
13    LAVA Fitness at KFMC | Ladies Gym (النادي الري...
14                     Gym, Spa&Sauna. Akariya Compound
15                             Nwc's GYM (Life Fitness)
16                           Vitamin Gym (نادي فيتامين)
17                                           gym

In [33]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around Riyadh

# add a red circle marker to represent Riyadh
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Riyadh',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the GYMs as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

# Data processing

In [34]:
import pandas as pd
import numpy as np

#the data were obtained online through an SQL file found on GitHub that was scrabed and processed
#All the preprocessing was done in Power Query to turn the SQL file to a suitable dataframe
Dist=pd.read_excel('https://srv-file10.gofile.io/download/Y9FcIf/RiyadhDistrictsData2.xlsx')
Dist

Unnamed: 0,ID,DistrictAR,DistrictEN,City,Latitude,Longitude
0,763,حطين,Hitien,Riyadh,26.331606,44.883862
1,764,قرطبة,Qurtubah,Riyadh,26.327879,44.848296
2,765,الفيصلية,Al Faysaliyyah,Riyadh,26.293811,44.80228
3,766,العزيزية,Al Aziziyyah,Riyadh,26.307914,44.815816
4,767,القدس,Al Quds,Riyadh,26.302859,44.804438
5,768,الخالدية,Al Khalidiyyah,Riyadh,26.299103,44.825027
6,769,الروضة,Ar Rawdah,Riyadh,26.293558,44.825436
7,770,الربوة,Al Rabwah,Riyadh,26.321627,44.88754
8,771,الريان,Ar Riyan,Riyadh,26.312589,44.891405
9,772,النهضة,An Nahdah,Riyadh,26.303259,44.896583


In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [36]:
# Return districts names

Riyadh_venues = getNearbyVenues(names=Dist['DistrictEN'],
                                   latitudes=Dist['Latitude'],
                                   longitudes=Dist['Longitude']
                                  )

 Hitien
 Qurtubah
 Al Faysaliyyah
 Al Aziziyyah
 Al Quds
 Al Khalidiyyah
 Ar Rawdah
 Al Rabwah
 Ar Riyan
 An Nahdah
 Margh
 As Sanaiyyah
 Badr
 Uhd
 Tuwik
 Al Farouq
 As Sediq
 Al Yarmok
 Al Falih
 Al Yamamah
 Alkah
 Al Deriyah
 Al Muntazah
 Al Andalus
 Semnan
 As Salam
 As Sieh
 Urierah


In [37]:
# Top venues with categories of the neighborhoods
print(Riyadh_venues.shape)
Riyadh_venues

(47, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Al Faysaliyyah,26.293811,44.80228,توب تشكن/top Chicken,26.293546,44.805734,Fast Food Restaurant
1,Al Faysaliyyah,26.293811,44.80228,twin coffee (توأم القهوة),26.294876,44.806953,Coffee Shop
2,Al Faysaliyyah,26.293811,44.80228,Othaim Market (أسواق العثيم),26.296068,44.798515,Grocery Store
3,Al Faysaliyyah,26.293811,44.80228,Max (ماكس),26.293068,44.806973,Clothing Store
4,Al Faysaliyyah,26.293811,44.80228,Caif Cafè,26.295068,44.806593,Café
5,Al Faysaliyyah,26.293811,44.80228,Pizza Hut,26.2941,44.806561,Pizza Place
6,Al Faysaliyyah,26.293811,44.80228,Elmasah GYM,26.294446,44.804535,Gym / Fitness Center
7,Al Faysaliyyah,26.293811,44.80228,Lovers Coffee,26.294905,44.805279,Coffee Shop
8,Al Faysaliyyah,26.293811,44.80228,Chicken Plus,26.295092,44.805944,Burger Joint
9,Al Faysaliyyah,26.293811,44.80228,سوق الزلفي,26.294917,44.806076,Flea Market


In [38]:
Riyadh_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Al Aziziyyah,2,2,2,2,2,2
Al Farouq,4,4,4,4,4,4
Al Faysaliyyah,11,11,11,11,11,11
Al Khalidiyyah,3,3,3,3,3,3
Al Muntazah,2,2,2,2,2,2
Al Quds,4,4,4,4,4,4
Al Yamamah,1,1,1,1,1,1
Al Yarmok,4,4,4,4,4,4
Alkah,3,3,3,3,3,3
Ar Rawdah,5,5,5,5,5,5


In [39]:
print('There are {} uniques categories.'.format(len(Riyadh_venues['Venue Category'].unique())))

There are 33 uniques categories.


In [40]:
# one hot encoding
Riyadh_onehot = pd.get_dummies(Riyadh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Riyadh_onehot['Neighborhood'] = Riyadh_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Riyadh_onehot.columns[-1]] + list(Riyadh_onehot.columns[:-1])
Riyadh_onehot = Riyadh_onehot[fixed_columns]

Riyadh_onehot.head()

Unnamed: 0,Neighborhood,BBQ Joint,Bakery,Burger Joint,Café,Candy Store,Castle,Clothing Store,Coffee Shop,Exhibit,...,Park,Pizza Place,Plaza,Resort,Restaurant,River,Shawarma Place,Supermarket,Turkish Restaurant,Windmill
0,Al Faysaliyyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Al Faysaliyyah,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Al Faysaliyyah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Al Faysaliyyah,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Al Faysaliyyah,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
Riyadh_onehot.shape

(47, 34)

In [42]:
Riyadh_grouped = Riyadh_onehot.groupby('Neighborhood').mean().reset_index()
Riyadh_grouped

Unnamed: 0,Neighborhood,BBQ Joint,Bakery,Burger Joint,Café,Candy Store,Castle,Clothing Store,Coffee Shop,Exhibit,...,Park,Pizza Place,Plaza,Resort,Restaurant,River,Shawarma Place,Supermarket,Turkish Restaurant,Windmill
0,Al Aziziyyah,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Farouq,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
2,Al Faysaliyyah,0.0,0.0,0.090909,0.090909,0.0,0.0,0.090909,0.181818,0.0,...,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Khalidiyyah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Al Muntazah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Al Quds,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
6,Al Yamamah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Al Yarmok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Alkah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0
9,Ar Rawdah,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0


In [43]:
Riyadh_grouped.shape

(13, 34)

In [44]:
#Top venues per district

num_top_venues = 5

for hood in Riyadh_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Riyadh_grouped[Riyadh_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Al Aziziyyah----
                       venue  freq
0                Candy Store   0.5
1                Flower Shop   0.5
2                  BBQ Joint   0.0
3                      Plaza   0.0
4  Middle Eastern Restaurant   0.0


---- Al Farouq----
                       venue  freq
0                  BBQ Joint  0.25
1         Falafel Restaurant  0.25
2             Shawarma Place  0.25
3       Fast Food Restaurant  0.25
4  Middle Eastern Restaurant  0.00


---- Al Faysaliyyah----
                  venue  freq
0           Coffee Shop  0.18
1  Gym / Fitness Center  0.09
2           Pizza Place  0.09
3         Grocery Store  0.09
4           Flea Market  0.09


---- Al Khalidiyyah----
                       venue  freq
0                     Market  0.33
1  Middle Eastern Restaurant  0.33
2       Fast Food Restaurant  0.33
3                  BBQ Joint  0.00
4                Pizza Place  0.00


---- Al Muntazah----
                venue  freq
0                Park   1.0
1           BBQ 

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Results

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Riyadh_grouped['Neighborhood']

for ind in np.arange(Riyadh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Riyadh_grouped.iloc[ind, :], num_top_venues)

#extracting to Excel for analysis to be presented in the report
neighborhoods_venues_sorted.to_excel("output.xlsx")  
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Al Aziziyyah,Candy Store,Flower Shop,Windmill,Gym,Bakery,Burger Joint,Café,Castle,Clothing Store,Coffee Shop
1,Al Farouq,BBQ Joint,Shawarma Place,Fast Food Restaurant,Falafel Restaurant,Coffee Shop,Flower Shop,Flea Market,Farm,Exhibit,Clothing Store
2,Al Faysaliyyah,Coffee Shop,Gym / Fitness Center,Clothing Store,Flea Market,Fast Food Restaurant,Mobile Phone Shop,Grocery Store,Pizza Place,Café,Burger Joint
3,Al Khalidiyyah,Fast Food Restaurant,Market,Middle Eastern Restaurant,Windmill,Exhibit,Flea Market,Farm,Falafel Restaurant,Coffee Shop,Grocery Store
4,Al Muntazah,Park,Windmill,Flower Shop,Flea Market,Fast Food Restaurant,Farm,Falafel Restaurant,Exhibit,Coffee Shop,Gym
5,Al Quds,Coffee Shop,Falafel Restaurant,Turkish Restaurant,Bakery,Burger Joint,Café,Candy Store,Castle,Clothing Store,Gym
6,Al Yamamah,Fast Food Restaurant,Windmill,Gym,Bakery,Burger Joint,Café,Candy Store,Castle,Clothing Store,Coffee Shop
7,Al Yarmok,Farm,Ice Cream Shop,Optical Shop,Windmill,Exhibit,Flea Market,Fast Food Restaurant,Falafel Restaurant,Coffee Shop,Grocery Store
8,Alkah,Resort,Exhibit,Windmill,Coffee Shop,Flower Shop,Flea Market,Fast Food Restaurant,Farm,Falafel Restaurant,Clothing Store
9,Ar Rawdah,Supermarket,Café,Fast Food Restaurant,Castle,Farm,Windmill,Exhibit,Flower Shop,Flea Market,Falafel Restaurant


# Clustering

In [47]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 10

Riyadh_grouped_clustering = Riyadh_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Riyadh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[:] 

array([5, 0, 2, 9, 1, 2, 3, 8, 6, 2, 7, 2, 4])

In [48]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Riyadh_merged = Dist

# merge Riyadh_grouped with Riyadh_data to add latitude/longitude for each neighborhood
Riyadh_merged = Riyadh_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='DistrictEN')

Riyadh_merged # check the last columns!

Unnamed: 0,ID,DistrictAR,DistrictEN,City,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,763,حطين,Hitien,Riyadh,26.331606,44.883862,,,,,,,,,,
1,764,قرطبة,Qurtubah,Riyadh,26.327879,44.848296,,,,,,,,,,
2,765,الفيصلية,Al Faysaliyyah,Riyadh,26.293811,44.80228,Coffee Shop,Gym / Fitness Center,Clothing Store,Flea Market,Fast Food Restaurant,Mobile Phone Shop,Grocery Store,Pizza Place,Café,Burger Joint
3,766,العزيزية,Al Aziziyyah,Riyadh,26.307914,44.815816,Candy Store,Flower Shop,Windmill,Gym,Bakery,Burger Joint,Café,Castle,Clothing Store,Coffee Shop
4,767,القدس,Al Quds,Riyadh,26.302859,44.804438,Coffee Shop,Falafel Restaurant,Turkish Restaurant,Bakery,Burger Joint,Café,Candy Store,Castle,Clothing Store,Gym
5,768,الخالدية,Al Khalidiyyah,Riyadh,26.299103,44.825027,Fast Food Restaurant,Market,Middle Eastern Restaurant,Windmill,Exhibit,Flea Market,Farm,Falafel Restaurant,Coffee Shop,Grocery Store
6,769,الروضة,Ar Rawdah,Riyadh,26.293558,44.825436,Supermarket,Café,Fast Food Restaurant,Castle,Farm,Windmill,Exhibit,Flower Shop,Flea Market,Falafel Restaurant
7,770,الربوة,Al Rabwah,Riyadh,26.321627,44.88754,,,,,,,,,,
8,771,الريان,Ar Riyan,Riyadh,26.312589,44.891405,,,,,,,,,,
9,772,النهضة,An Nahdah,Riyadh,26.303259,44.896583,,,,,,,,,,


# Thank you for your time

<a id="item2"></a>