# The Battle of Neighborhoods - Coursera Capstone Project

This notebook constitutes of the capstone project for the IBM Data Science Professional Certificate provided by Coursera.

The goal of this notebook is to determine optimal opening locations for different kinds of businesses, focused on the twin cities of Al-Khobar and Dammam, Saudi Arabia.

The end result will beproviding a simple way to get recommendations on where to open a certain business.

## Installing Dependencies

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
import os
import time
import json
import requests
from copy import deepcopy
from dotenv import load_dotenv
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from folium.plugins import HeatMap
import uuid
from IPython.display import display_javascript, display_html, display

#JSON Renderer

class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict) or isinstance(json_data, list):
            self.json_str = json.dumps(json_data)
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height: 600px; width:100%;font: 12px/18px monospace !important;"></div>'.format(self.uuid), raw=True)
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
            renderjson.set_show_to_level(2);
            document.getElementById('%s').appendChild(renderjson(%s))
        });
      """ % (self.uuid, self.json_str), raw=True)

## Creating GeoJSON for Saudi Arabia

To do any kind of analysis, we need to segregate the map into cities and districts, define boundries for the same and mark the centerpoints of each district.

Saudi Arabia can be neatly divided into provinces, cities and districts. We are particularly interested in Al-Khobar and Dammam, both of which are in the Eastern Province of Saudi Arabia.

Fortunately, a contributer on GitHub has already gathered the coordinate data, saving us the time to scrape it ourselves. However, we will have to build the GeoJSON ourselves. 

### Importing coordinates JSON

Source: https://github.com/homaily/Saudi-Arabia-Regions-Cities-and-Districts

Let us import that into pandas.

In [2]:
with open(r"/home/mohammed/Desktop/Musaddiq/Coursera_Capstone/json/cities.json", 'r', encoding='utf8', errors='ignore') as file:
    cities = json.load(file)
    
with open(r"/home/mohammed/Desktop/Musaddiq/Coursera_Capstone/json/districts.json", 'r', encoding='utf8', errors='ignore') as file:
    districts = json.load(file)

Let's take a look at how the JSON is structured. 

We can see city_id and district_id serve as the primary keys.

In [3]:
cities[1] #showing only 1 record out of many

{'city_id': 2,
 'region_id': 7,
 'name_ar': 'نعمي',
 'name_en': "Na'mi",
 'center': [28.30507995, 35.74931003]}

In [4]:
districts[1] #showing only 1 record out of many

{'district_id': 10100003002,
 'city_id': 3,
 'region_id': 1,
 'name_ar': 'حي النموذجية',
 'name_en': 'Al Namudhajiyah Dist.',
 'boundaries': [[[24.65018372, 46.70227584],
   [24.64939455, 46.7014039],
   [24.64915715, 46.70115918],
   [24.64892224, 46.70091159],
   [24.64868987, 46.70066116],
   [24.64857349, 46.70053129],
   [24.64846099, 46.70039739],
   [24.64835249, 46.70025959],
   [24.6482481, 46.70011803],
   [24.64817484, 46.70000666],
   [24.6481001, 46.69989647],
   [24.64616862, 46.69704741],
   [24.64689243, 46.693556],
   [24.64695517, 46.69322908],
   [24.64696023, 46.69320269],
   [24.6469764, 46.69308145],
   [24.64699883, 46.69296141],
   [24.64702746, 46.69284296],
   [24.64706219, 46.69272648],
   [24.6471029, 46.69261235],
   [24.64714948, 46.69250093],
   [24.64720176, 46.69239258],
   [24.64731083, 46.69223089],
   [24.64742364, 46.69207231],
   [24.64754013, 46.69191695],
   [24.64766023, 46.6917649],
   [24.64778384, 46.69161628],
   [24.64785942, 46.69151051],


### Getting city_id of Al Khobar and Dammam

In [5]:
for city in cities:
    if "Khobar" in city["name_en"]:
        print("The city_id of " + city["name_en"] + " is " + str(city["city_id"]) + ".")
    if "Dammam" in city["name_en"]:
        print("The city_id of " + city["name_en"] + " is " + str(city["city_id"]) + ".")

The city_id of Dammam is 13.
The city_id of Al Khobar is 31.


### Getting districts in Al Khobar and Dammam

Now we can get the districts that constitute Al Khobar and Dammam.

In [6]:
khobar_districts = []
dammam_districts = []

for district in districts:
    if district["city_id"] == 31:
        khobar_districts.append(district)
        
for district in districts:
    if district["city_id"] == 13:
        dammam_districts.append(district)

### Converting into GeoJSON format

We will create a copy of the data so we can convert latitude,longitude coordinates to longitude,latitude coordinates supported by GeoJSON.

More information here, see "Position" section: https://macwright.com/2015/03/23/geojson-second-bite.html 

In [7]:
khobar_districts_xy = deepcopy(khobar_districts)
dammam_districts_xy = deepcopy(dammam_districts)

Reversing boundaries coordinates as per GeoJSON format

In [8]:
for district in range(len(khobar_districts_xy)):
    for _ in khobar_districts_xy[district]["boundaries"][0]:
        _.reverse()
        
for district in range(len(dammam_districts_xy)):
    for _ in dammam_districts_xy[district]["boundaries"][0]:
        _.reverse()

Let's also define the centerpoints of the district

In [9]:
for district in range(len(khobar_districts_xy)):
    khobar_districts_xy[district]["center"] = [sum(x)/len(x) for x in zip(*khobar_districts_xy[district]["boundaries"][0])]
    khobar_districts_xy[district]["latitude"] = khobar_districts_xy[district]["center"][0]
    khobar_districts_xy[district]["longitude"] = khobar_districts_xy[district]["center"][1]    
    
for district in range(len(khobar_districts)):
    khobar_districts[district]["center"] = [sum(x)/len(x) for x in zip(*khobar_districts[district]["boundaries"][0])]
    khobar_districts[district]["latitude"] = khobar_districts[district]["center"][0]
    khobar_districts[district]["longitude"] = khobar_districts[district]["center"][1]
    
for district in range(len(dammam_districts_xy)):
    dammam_districts_xy[district]["center"] = [sum(x)/len(x) for x in zip(*dammam_districts_xy[district]["boundaries"][0])]
    dammam_districts_xy[district]["latitude"] = dammam_districts_xy[district]["center"][0]
    dammam_districts_xy[district]["longitude"] = dammam_districts_xy[district]["center"][1]
    
for district in range(len(dammam_districts)):
    dammam_districts[district]["center"] = [sum(x)/len(x) for x in zip(*dammam_districts[district]["boundaries"][0])]
    dammam_districts[district]["latitude"] = dammam_districts[district]["center"][0]
    dammam_districts[district]["longitude"] = dammam_districts[district]["center"][1]

Let's go ahead and put that into a Pandas DataFrame

In [10]:
khobar = pd.DataFrame(khobar_districts)
dammam = pd.DataFrame(dammam_districts)

khobar_xy = pd.DataFrame(khobar_districts_xy)
dammam_xy = pd.DataFrame(dammam_districts_xy)


khobar_xy

Unnamed: 0,district_id,city_id,region_id,name_ar,name_en,boundaries,center,latitude,longitude
0,10500031001,31,5,حي التحلية,At Tahliyah Dist.,"[[[50.21638279, 26.18209534], [50.21611424, 26...","[50.19358769394619, 26.17781863856502]",50.193588,26.177819
1,10500031002,31,5,حي ابن سيناء,Ibn Sina Dist.,"[[[50.21139022, 26.2507773], [50.21288113, 26....","[50.201773003593736, 26.241028340781252]",50.201773,26.241028
2,10500031003,31,5,حي الحزام الاخضر,Al Hizam Al Akhdar Dist.,"[[[50.19060022, 26.3089098], [50.19145503, 26....","[50.20295822975609, 26.305178597317074]",50.202958,26.305179
3,10500031004,31,5,حي صناعية الثقبة,Sinaiyah Ath Thuqbah Dist.,"[[[50.19749806, 26.24944191], [50.19719675, 26...","[50.19638499861113, 26.253792497222218]",50.196385,26.253792
4,10500031005,31,5,حي التعاون,At Taawun Dist.,"[[[50.19458339, 26.23354342], [50.1963495, 26....","[50.18608636999999, 26.226340009404773]",50.186086,26.22634
5,10500031006,31,5,حي الراكة الجنوبية,Ar Rakah Al Janubiyah Dist.,"[[[50.20035662, 26.33674744], [50.19986525, 26...","[50.203745644269645, 26.350680400337087]",50.203746,26.35068
6,10500031007,31,5,حي الخبر الشمالية,Al Khubar Ash Shamaliyah Dist.,"[[[50.21811139, 26.28045052], [50.21783621, 26...","[50.214117997064186, 26.29401644211008]",50.214118,26.294016
7,10500031008,31,5,حي مدينة العمال,Madinat Al Ummal Dist.,"[[[50.20487945, 26.28181475], [50.20483605, 26...","[50.205489442898546, 26.294228553913037]",50.205489,26.294229
8,10500031009,31,5,حي العقربية,Al Aqrabiyah Dist.,"[[[50.19759817, 26.28393879], [50.19757992, 26...","[50.189789413188414, 26.297885457971017]",50.189789,26.297885
9,10500031010,31,5,حي الخبر الجنوبية,Al Khubar Al Janubiyah Dist.,"[[[50.21598118, 26.26997671], [50.21521935, 26...","[50.20615011927536, 26.27328286536233]",50.20615,26.273283


Finally, we can parse all the above data into the GeoJSON format

In [11]:
features = []

#create Khobar GeoJSON
for district in range(len(khobar_districts_xy)):
    feature = {
            "type": "Feature",
            "geometry": {
                "type": "Polygon",
                "coordinates": khobar_districts_xy[district]["boundaries"]},
            "properties": {
                "district_id": khobar_districts_xy[district]["district_id"],
                "city_id": khobar_districts_xy[district]["city_id"],
                "name_en": khobar_districts_xy[district]["name_en"]}
        }

    features.append(feature)
    
khobar_geojson = {
    "type": "FeatureCollection",
    "features": features
}

khobar_geojson = json.dumps(khobar_geojson)

In [12]:
features = []

#create Dammam GeoJSON
for district in range(len(dammam_districts_xy)):
    feature = {
            "type": "Feature",
            "geometry": {
                "type": "Polygon",
                "coordinates": dammam_districts_xy[district]["boundaries"]},
            "properties": {
                "district_id": dammam_districts_xy[district]["district_id"],
                "city_id": dammam_districts_xy[district]["city_id"],
                "name_en": dammam_districts_xy[district]["name_en"]}
        }

    features.append(feature)
    
dammam_geojson = {
    "type": "FeatureCollection",
    "features": features
}

dammam_geojson = json.dumps(dammam_geojson)

The GeoJSON is now in the proper format for usage.

### Plotting districts and boundaries

In [13]:
khobar_data = khobar[["name_en", "district_id", "center"]]
dammam_data = dammam[["name_en", "district_id", "center"]]

In [14]:
# create a plain map
khobar_map = folium.Map(location=[26.2172,50.1971], zoom_start=12)

folium.GeoJson(khobar_geojson).add_to(khobar_map)

# display map
khobar_map

In [15]:
# create a plain map
dammam_map = folium.Map(location=[26.4207,50.0888], zoom_start=12)

folium.GeoJson(dammam_geojson).add_to(dammam_map)

# display map
dammam_map

## Using Foursquare API to retrieve popular venues in each district 

Having registered beforehand for the Foursqaure developer program (https://developer.foursquare.com/), we can use the API to get a list of popular venues in each district.

But first, security. We will the dotenv package to safely import our public and private keys to pass to the Foursquare API.

In [16]:
#using python-dotenv to protect Foursqaure credentials
%load_ext dotenv
%dotenv
import os

CLIENT_ID = os.getenv("CLIENT_ID") # your Foursquare ID
CLIENT_SECRET = os.getenv("CLIENT_SECRET") # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentials:')
print('CLIENT_ID SIZE: ' + str(len(CLIENT_ID)))
print('CLIENT_SECRET SIZE: ' + str(len(CLIENT_SECRET)))

Your credentials:
CLIENT_ID SIZE: 48
CLIENT_SECRET SIZE: 48


### Function to get nearby popular venues

This function will be called recursively to retrieve nearby venues. 

In [17]:
def getNearbyVenues(
    names,
    latitudes,
    longitudes,
    radius=500,
    ):

    venues_list = []
    for (name, lat, lng) in zip(names, latitudes, longitudes):

        # create the API request URL

        url = \
            'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT,
            )

        # make the GET request

        results = requests.get(url).json()['response']['groups'][0]['items']

        # return only relevant information for each nearby venue

        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name'],
            ) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list
                                 for item in venue_list])
    nearby_venues.columns = [
        'District',
        'District Latitude',
        'District Longitude',
        'Venue',
        'Venue Latitude',
        'Venue Longitude',
        'Venue Category',
        ]

    return nearby_venues


### Getting nearby venues

Let's run our function and utilize the Foursqaure API to get nearby venues for each District.

In [18]:
khobar_venues = getNearbyVenues(names=khobar['name_en'],
                                   latitudes=khobar['latitude'],
                                   longitudes=khobar['longitude']
                                  )

khobar_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,At Tahliyah Dist.,26.177819,50.193588,ملعب اسكان تحليه المياه - الخبر,26.178656,50.193112,Soccer Field
1,At Tahliyah Dist.,26.177819,50.193588,حديقة التحليه,26.179251,50.195127,Garden
2,At Tahliyah Dist.,26.177819,50.193588,معجنات الناعورة,26.177397,50.195792,Breakfast Spot
3,At Tahliyah Dist.,26.177819,50.193588,اسكان تحلية المياة - الخبر,26.179223,50.195842,Harbor / Marina
4,Ibn Sina Dist.,26.241028,50.201773,Al-Seef Cafe & Restaurant (مقهى ومطعم السيف),26.245261,50.201800,Hookah Bar
...,...,...,...,...,...,...,...
807,Al Amwaj Dist.,26.143114,50.145837,Alsubaie farm,26.142485,50.142363,Farm
808,Al Amwaj Dist.,26.143114,50.145837,three sixty,26.142226,50.141362,Food Truck
809,As Sadafah Dist.,26.368823,50.210278,CUE NINE billiard Club,26.365770,50.207950,Pool Hall
810,As Sadafah Dist.,26.368823,50.210278,استاد مدينة الأمير سعود بن جلوي الرياضية بالرا...,26.368982,50.205988,Soccer Stadium


In [19]:
dammam_venues = getNearbyVenues(names=dammam['name_en'],
                                   latitudes=dammam['latitude'],
                                   longitudes=dammam['longitude']
                                  )

dammam_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,An Nasriyah Dist.,26.424017,50.122047,Jawsq Festival Hall (قاعة الجوسق للإحتفالات),26.422153,50.118145,Event Space
1,An Nasriyah Dist.,26.424017,50.122047,بوفية السعادة,26.424839,50.118922,Breakfast Spot
2,An Nasriyah Dist.,26.424017,50.122047,Al Mira Centre,26.422722,50.120174,Furniture / Home Store
3,An Nasriyah Dist.,26.424017,50.122047,حلويات جرير القباني patisserie jareer,26.422789,50.123262,Pastry Shop
4,An Nasriyah Dist.,26.424017,50.122047,Qatif Mall,26.423454,50.121995,Shopping Mall
...,...,...,...,...,...,...,...
1065,Al Maha Dist.,26.485221,49.933393,Yousif Farm,26.481787,49.935994,Farm
1066,An Nada Dist.,26.376905,50.072901,Macdonald’s,26.377904,50.072297,Fast Food Restaurant
1067,An Nada Dist.,26.376905,50.072901,تموينات واحة النرجس,26.374269,50.074375,Grocery Store
1068,An Nada Dist.,26.376905,50.072901,تموينات القحطاني,26.374203,50.074840,Grocery Store


### Unique Venue Categories

Let's merge and list the unique categories gathered from both the citites.

In [20]:
khobar_unique_cat = khobar_venues['Venue Category'].unique()

print('There are {} unique categories in Al-Khobar.'.format(len(khobar_unique_cat)))

khobar_unique_cat

There are 160 unique categories in Al-Khobar.


array(['Soccer Field', 'Garden', 'Breakfast Spot', 'Harbor / Marina',
       'Hookah Bar', 'Shipping Store', 'Vape Store', 'Smoke Shop',
       'Auto Workshop', 'Café', 'Bakery', 'Coffee Shop', 'Chocolate Shop',
       'Seafood Restaurant', 'Italian Restaurant', 'Spa',
       'Organic Grocery', 'Salad Place', 'Salon / Barbershop',
       'Smoothie Shop', 'Sushi Restaurant', 'Flower Shop',
       'Frozen Yogurt Shop', 'Sandwich Place', 'Creperie', 'Juice Bar',
       'French Restaurant', 'Burger Joint', 'Kebab Restaurant',
       'Dessert Shop', 'Cosmetics Shop', 'Health Food Store',
       'Donut Shop', 'Shawarma Place', 'Fast Food Restaurant', 'Hotel',
       'Asian Restaurant', 'Food & Drink Shop', 'Pastry Shop',
       'Tourist Information Center', 'Motorcycle Shop', 'Farmers Market',
       'Athletics & Sports', 'Auto Garage', 'Stables', 'Restaurant',
       'Gym', 'Campground', 'Insurance Office', 'Photography Studio',
       'Grocery Store', 'Middle Eastern Restaurant', 'Falafel 

In [21]:
dammam_unique_cat = dammam_venues['Venue Category'].unique()

print('There are {} unique categories in Al-dammam.'.format(len(dammam_unique_cat)))

dammam_unique_cat

There are 174 unique categories in Al-dammam.


array(['Event Space', 'Breakfast Spot', 'Furniture / Home Store',
       'Pastry Shop', 'Shopping Mall', 'Chocolate Shop', 'Intersection',
       'Park', 'Cricket Ground', 'Construction & Landscaping',
       'Outdoor Gym', 'Burger Joint', 'IT Services',
       'Middle Eastern Restaurant', 'Café', 'Juice Bar', 'Food Truck',
       'Bakery', 'Smoothie Shop', 'Falafel Restaurant',
       'Afghan Restaurant', 'Supermarket', 'Soccer Stadium', 'Campground',
       'Cupcake Shop', 'Fruit & Vegetable Store', 'Hotel',
       'Fried Chicken Joint', 'Cosmetics Shop', 'Restaurant',
       'Clothing Store', 'Turkish Restaurant', 'Fast Food Restaurant',
       'Tea Room', 'Lebanese Restaurant', 'Coffee Shop', 'Market',
       'Smoke Shop', 'Discount Store', "Men's Store", 'BBQ Joint',
       'Asian Restaurant', 'Electronics Store', 'Bookstore',
       'Fish & Chips Shop', 'Pharmacy', 'Medical Supply Store',
       'Department Store', 'Jewelry Store', 'Ice Cream Shop',
       'Turkish Home Cooking R

In [22]:
unique_venue_categories = khobar_unique_cat.tolist() + dammam_unique_cat.tolist()

unique_venue_categories = np.unique(unique_venue_categories)

print('There are {} unique categories in both Al Khobar and Dammam overall.'.format(len(unique_venue_categories)))

unique_venue_categories

There are 220 unique categories in both Al Khobar and Dammam overall.


array(['ATM', 'Afghan Restaurant', 'African Restaurant',
       'Airport Terminal', 'American Restaurant', 'Antique Shop',
       'Arcade', 'Arepa Restaurant', 'Art Gallery', 'Arts & Crafts Store',
       'Asian Restaurant', 'Astrologer', 'Athletics & Sports',
       'Auto Garage', 'Auto Workshop', 'BBQ Joint', 'Baby Store',
       'Bakery', 'Bank', 'Bar', 'Basketball Court', 'Bathing Area', 'Bay',
       'Beach', 'Big Box Store', 'Bike Rental / Bike Share', 'Bistro',
       'Board Shop', 'Bookstore', 'Boutique', 'Bowling Alley',
       'Boxing Gym', 'Breakfast Spot', 'Bridal Shop', 'Bubble Tea Shop',
       'Buffet', 'Burger Joint', 'Bus Station', 'Business Service',
       'Butcher', 'Cafeteria', 'Café', 'Camera Store', 'Campground',
       'Canal', 'Candy Store', 'Cantonese Restaurant',
       'Chinese Restaurant', 'Chocolate Shop', 'Clothing Store',
       'Coffee Roaster', 'Coffee Shop', 'Concert Hall',
       'Construction & Landscaping', 'Convenience Store',
       'Cosmetics Sh

### Onehot encoding

We can do one-hot encoding to convert our variables into integers that we can easily analyze with our algorithms

In [23]:
## KHOBAR
# one hot encoding
khobar_onehot = pd.get_dummies(khobar_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
khobar_onehot['District'] = khobar_venues['District'] 

# move District column to the first column
khobar_onehot = khobar_onehot[ ['District'] + [ col for col in khobar_onehot.columns if col != 'District' ] ]


## DAMMAM
# one hot encoding
dammam_onehot = pd.get_dummies(dammam_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
dammam_onehot['District'] = dammam_venues['District'] 

# move District column to the first column
dammam_onehot = dammam_onehot[ ['District'] + [ col for col in dammam_onehot.columns if col != 'District' ] ]



khobar_onehot.head()

Unnamed: 0,District,ATM,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,...,Theme Restaurant,Tourist Information Center,Track,Trail,Turkish Restaurant,Vape Store,Video Game Store,Watch Shop,Waterfront,Yoga Studio
0,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ibn Sina Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
khobar_grouped = khobar_onehot.groupby('District').mean().reset_index()
dammam_grouped = dammam_onehot.groupby('District').mean().reset_index()

khobar_grouped

Unnamed: 0,District,ATM,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,...,Theme Restaurant,Tourist Information Center,Track,Trail,Turkish Restaurant,Vape Store,Video Game Store,Watch Shop,Waterfront,Yoga Studio
0,Al Amwaj Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Andalus Dist.,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,...,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Al Aqiq Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Aqrabiyah Dist.,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.030303,...,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0
4,Al Bahar Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Al Bandariyah Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.075,0.0,0.05
6,Al Buhayrah Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
7,Al Bustan Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Al Hada Dist.,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Al Hamra Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0


### Getting most common venues in each district

The venue data gathered up to this point is just a list. We can group that by frequency of the occurence of each venue within a particular district to get an idea of the kind of place the district is. 

In [25]:
#Function to return most common venues in each District
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

### Top Venues in Khobar's Districts

In [27]:
# create a new dataframe
khobar_venues_sorted = pd.DataFrame(columns=columns)
khobar_venues_sorted['District'] = khobar_grouped['District']

for ind in np.arange(khobar_grouped.shape[0]):
    khobar_venues_sorted.iloc[ind, 1:] = return_most_common_venues(khobar_grouped.iloc[ind, :], num_top_venues)

khobar_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Al Amwaj Dist.,Food Truck,Farm,Waterfront,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service
1,Al Andalus Dist.,Coffee Shop,Café,Restaurant,Bakery,Indian Restaurant,Lebanese Restaurant,Butcher,Breakfast Spot,Falafel Restaurant,Plaza
2,Al Aqiq Dist.,Resort,Yoga Studio,Ethiopian Restaurant,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Service
3,Al Aqrabiyah Dist.,Breakfast Spot,Coffee Shop,Shawarma Place,Tea Room,Bakery,Fast Food Restaurant,Indian Restaurant,Ice Cream Shop,Sandwich Place,Juice Bar
4,Al Bahar Dist.,Ice Cream Shop,Bay,Café,Yoga Studio,Event Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm
5,Al Bandariyah Dist.,Coffee Shop,Watch Shop,Boutique,Gym / Fitness Center,Food Truck,Dessert Shop,Restaurant,Seafood Restaurant,Yoga Studio,Track
6,Al Buhayrah Dist.,Lake,Trail,Yoga Studio,Ethiopian Restaurant,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
7,Al Bustan Dist.,Coffee Shop,Café,Furniture / Home Store,Electronics Store,Hotel,Indian Restaurant,Restaurant,Breakfast Spot,Eastern European Restaurant,Coffee Roaster
8,Al Hada Dist.,Coffee Shop,Plaza,Burger Joint,Asian Restaurant,Kids Store,Gym / Fitness Center,Lawyer,Park,Mediterranean Restaurant,Sandwich Place
9,Al Hamra Dist.,Waterfront,Hotel,Shopping Mall,Beach,Entertainment Service,Lake,Event Service,Yoga Studio,Farm,Food & Drink Shop


### Top Venues in Dammam's Districts

In [28]:
# create a new dataframe
dammam_venues_sorted = pd.DataFrame(columns=columns)
dammam_venues_sorted['District'] = dammam_grouped['District']

for ind in np.arange(dammam_grouped.shape[0]):
    dammam_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dammam_grouped.iloc[ind, :], num_top_venues)

dammam_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1St Industrial Dist.,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
1,2Nd Industrial City,Fast Food Restaurant,Restaurant,Fried Chicken Joint,Middle Eastern Restaurant,Mountain,Discount Store,Fish & Chips Shop,Department Store,Food Court,Food & Drink Shop
2,Ad Danah Dist.,Middle Eastern Restaurant,Café,Pizza Place,Restaurant,Food Court,Fast Food Restaurant,Candy Store,Steakhouse,Supermarket,Falafel Restaurant
3,Ad Dawasir Dist.,Jewelry Store,Men's Store,Clothing Store,Shop & Service,Market,ATM,Lawyer,Convenience Store,Coffee Shop,Restaurant
4,Al Adamah Dist.,Coffee Shop,Hotel,Pool Hall,Diner,Restaurant,Bakery,Yemeni Restaurant,Falafel Restaurant,Food & Drink Shop,Food
...,...,...,...,...,...,...,...,...,...,...,...
69,Madinat Al Ummal Dist.,Café,Coffee Shop,Indian Restaurant,Thai Restaurant,Middle Eastern Restaurant,Cafeteria,Furniture / Home Store,Beach,Burger Joint,Pet Store
70,Prince Muhammed Ibn Saud Dist.,Café,Breakfast Spot,Department Store,Gym / Fitness Center,Seafood Restaurant,Soccer Stadium,Stadium,Pakistani Restaurant,Shopping Mall,Juice Bar
71,Qasr Al Khalij Dist.,Middle Eastern Restaurant,Café,Palace,Yemeni Restaurant,Fast Food Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
72,Taibah Dist.,Insurance Office,Coffee Shop,Juice Bar,Tea Room,Donut Shop,Smoke Shop,Yemeni Restaurant,Farm,Food Court,Food & Drink Shop


## K-means Clustering

We can utilize the K-means Clustering machine learning algorithm to group the different districts into clusters based on their most common venues.

We will use 5 clusters, any higher than that leads to ineffective clutering. You can rerun this notebook with a different number of k clusters and observe the changes on the maps below.

In [29]:
khobar_data = khobar_data.rename(columns={"name_en": "District"})
dammam_data = dammam_data.rename(columns={"name_en": "District"})

### Khobar Clusters

In [30]:
# set number of clusters
kclusters = 5

khobar_grouped_clustering = khobar_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(khobar_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

# add clustering labels
khobar_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

khobar_merged = khobar_data

# merge khobar_grouped with khobar_data to add latitude/longitude for each District
khobar_merged = khobar_merged.join(khobar_venues_sorted.set_index('District'), on='District')

khobar_merged.dropna(axis=0, inplace = True)

khobar_merged.head() # check the last columns!

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,At Tahliyah Dist.,10500031001,"[26.17781863856502, 50.19358769394619]",1.0,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm
1,Ibn Sina Dist.,10500031002,"[26.241028340781252, 50.201773003593736]",1.0,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm
2,Al Hizam Al Akhdar Dist.,10500031003,"[26.305178597317074, 50.20295822975609]",1.0,Coffee Shop,Burger Joint,Juice Bar,Café,Bakery,Shawarma Place,Seafood Restaurant,Salad Place,Hotel,Organic Grocery
3,Sinaiyah Ath Thuqbah Dist.,10500031004,"[26.253792497222218, 50.19638499861113]",1.0,Auto Workshop,Hookah Bar,Smoke Shop,Motorcycle Shop,Tourist Information Center,Juice Bar,Falafel Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market
4,At Taawun Dist.,10500031005,"[26.226340009404773, 50.18608636999999]",0.0,Hookah Bar,Restaurant,Spa,Stables,Farmers Market,Café,Auto Garage,Athletics & Sports,Entertainment Service,Ethiopian Restaurant


### Dammam Clusters

In [31]:
# set number of clusters
kclusters = 5

dammam_grouped_clustering = dammam_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dammam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

# add clustering labels
dammam_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

dammam_merged = dammam_data

# merge dammam_grouped with dammam_data to add latitude/longitude for each District
dammam_merged = dammam_merged.join(dammam_venues_sorted.set_index('District'), on='District')

dammam_merged.dropna(axis=0, inplace = True)

dammam_merged.head() # check the last columns!

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,An Nasriyah Dist.,10500013002,"[26.424016825090902, 50.12204671218182]",0.0,Furniture / Home Store,Cricket Ground,Pastry Shop,Shopping Mall,Chocolate Shop,Park,Event Space,Breakfast Spot,Intersection,Fish & Chips Shop
2,1St Industrial Dist.,10500013003,"[26.39701435782051, 50.14309465038462]",0.0,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
3,Al Fanar Dist.,10500013004,"[26.40588090352941, 50.19358098470589]",0.0,IT Services,Burger Joint,Yemeni Restaurant,Farm,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
4,Al Athir Dist.,10500013005,"[26.434276787415733, 50.06036651483147]",0.0,Food Truck,Middle Eastern Restaurant,Juice Bar,Café,Yemeni Restaurant,Fast Food Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop
5,Al Jalawiyah Dist.,10500013006,"[26.43651751632076, 50.07545844943395]",0.0,Breakfast Spot,Outdoor Gym,Cupcake Shop,Fruit & Vegetable Store,Soccer Stadium,Campground,Supermarket,Falafel Restaurant,Smoothie Shop,Bakery


In [32]:
#Adding Latitude and Longitude of each District's Center
khobar_merged["Latitude"] = [ x[0] for x in khobar_merged["center"].tolist() ]
khobar_merged["Longitude"] = [ x[1] for x in khobar_merged["center"].tolist() ]

dammam_merged["Latitude"] = [ x[0] for x in dammam_merged["center"].tolist() ]
dammam_merged["Longitude"] = [ x[1] for x in dammam_merged["center"].tolist() ]

khobar_merged

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
0,At Tahliyah Dist.,10500031001,"[26.17781863856502, 50.19358769394619]",1.0,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,26.177819,50.193588
1,Ibn Sina Dist.,10500031002,"[26.241028340781252, 50.201773003593736]",1.0,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,26.241028,50.201773
2,Al Hizam Al Akhdar Dist.,10500031003,"[26.305178597317074, 50.20295822975609]",1.0,Coffee Shop,Burger Joint,Juice Bar,Café,Bakery,Shawarma Place,Seafood Restaurant,Salad Place,Hotel,Organic Grocery,26.305179,50.202958
3,Sinaiyah Ath Thuqbah Dist.,10500031004,"[26.253792497222218, 50.19638499861113]",1.0,Auto Workshop,Hookah Bar,Smoke Shop,Motorcycle Shop,Tourist Information Center,Juice Bar,Falafel Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market,26.253792,50.196385
4,At Taawun Dist.,10500031005,"[26.226340009404773, 50.18608636999999]",0.0,Hookah Bar,Restaurant,Spa,Stables,Farmers Market,Café,Auto Garage,Athletics & Sports,Entertainment Service,Ethiopian Restaurant,26.22634,50.186086
5,Ar Rakah Al Janubiyah Dist.,10500031006,"[26.350680400337087, 50.203745644269645]",1.0,Coffee Shop,Café,Gym,Hookah Bar,Stables,Restaurant,Campground,Flower Shop,Breakfast Spot,Photography Studio,26.35068,50.203746
6,Al Khubar Ash Shamaliyah Dist.,10500031007,"[26.29401644211008, 50.214117997064186]",1.0,Coffee Shop,Middle Eastern Restaurant,Tailor Shop,Café,Pakistani Restaurant,Seafood Restaurant,Spa,Burger Joint,Bakery,Furniture / Home Store,26.294016,50.214118
7,Madinat Al Ummal Dist.,10500031008,"[26.294228553913037, 50.205489442898546]",1.0,Seafood Restaurant,Coffee Shop,Clothing Store,Fast Food Restaurant,Flower Shop,Pastry Shop,Café,Stadium,Print Shop,Athletics & Sports,26.294229,50.205489
8,Al Aqrabiyah Dist.,10500031009,"[26.297885457971017, 50.189789413188414]",1.0,Breakfast Spot,Coffee Shop,Shawarma Place,Tea Room,Bakery,Fast Food Restaurant,Indian Restaurant,Ice Cream Shop,Sandwich Place,Juice Bar,26.297885,50.189789
9,Al Khubar Al Janubiyah Dist.,10500031010,"[26.27328286536233, 50.20615011927536]",1.0,Bakery,Middle Eastern Restaurant,Fried Chicken Joint,Pizza Place,Italian Restaurant,Coffee Shop,Clothing Store,Sandwich Place,Market,Furniture / Home Store,26.273283,50.20615


### Plotting Clusters and listing the districts in each cluster

Let's plot the clusters obtained from our K-means clustering in a Folium map. This provides a nice visual for the different kinds of clusters.

What do you think each cluster represents? For example, Al Khobar's Cluster 0 seems to represent areas with many restaurants and dining options. What about the rest?

### Al Khobar

In [33]:
# create map
map_clusters = folium.Map(location=[26.2172,50.1971], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(khobar_merged['Latitude'], khobar_merged['Longitude'], khobar_merged['District'], khobar_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [34]:
# Cluster 0:

khobar_merged.loc[khobar_merged['Cluster Label'] == 0, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
4,At Taawun Dist.,Hookah Bar,Restaurant,Spa,Stables,Farmers Market,Café,Auto Garage,Athletics & Sports,Entertainment Service,Ethiopian Restaurant,26.22634,50.186086
13,Al Jawharah Dist.,Hookah Bar,Gym / Fitness Center,Convenience Store,Hotel,Herbs & Spices Store,Entertainment Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,26.330016,50.195725
33,Ash Sheraa Dist.,Hookah Bar,Soccer Field,Farm,History Museum,Airport Terminal,Event Service,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,26.164534,50.148142
36,As Sawari Dist.,Hookah Bar,Waterfront,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Service,26.183197,50.153691


In [35]:
# Cluster 1

khobar_merged.loc[khobar_merged['Cluster Label'] == 1, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
0,At Tahliyah Dist.,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,26.177819,50.193588
1,Ibn Sina Dist.,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Event Service,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,26.241028,50.201773
2,Al Hizam Al Akhdar Dist.,Coffee Shop,Burger Joint,Juice Bar,Café,Bakery,Shawarma Place,Seafood Restaurant,Salad Place,Hotel,Organic Grocery,26.305179,50.202958
3,Sinaiyah Ath Thuqbah Dist.,Auto Workshop,Hookah Bar,Smoke Shop,Motorcycle Shop,Tourist Information Center,Juice Bar,Falafel Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market,26.253792,50.196385
5,Ar Rakah Al Janubiyah Dist.,Coffee Shop,Café,Gym,Hookah Bar,Stables,Restaurant,Campground,Flower Shop,Breakfast Spot,Photography Studio,26.35068,50.203746
6,Al Khubar Ash Shamaliyah Dist.,Coffee Shop,Middle Eastern Restaurant,Tailor Shop,Café,Pakistani Restaurant,Seafood Restaurant,Spa,Burger Joint,Bakery,Furniture / Home Store,26.294016,50.214118
7,Madinat Al Ummal Dist.,Seafood Restaurant,Coffee Shop,Clothing Store,Fast Food Restaurant,Flower Shop,Pastry Shop,Café,Stadium,Print Shop,Athletics & Sports,26.294229,50.205489
8,Al Aqrabiyah Dist.,Breakfast Spot,Coffee Shop,Shawarma Place,Tea Room,Bakery,Fast Food Restaurant,Indian Restaurant,Ice Cream Shop,Sandwich Place,Juice Bar,26.297885,50.189789
9,Al Khubar Al Janubiyah Dist.,Bakery,Middle Eastern Restaurant,Fried Chicken Joint,Pizza Place,Italian Restaurant,Coffee Shop,Clothing Store,Sandwich Place,Market,Furniture / Home Store,26.273283,50.20615
10,Ar Rawabi Dist.,Pool,Coffee Shop,Gym / Fitness Center,Gym,Movie Theater,Shipping Store,Café,Yoga Studio,Ethiopian Restaurant,Farmers Market,26.3332,50.206892


In [36]:
# Cluster 2

khobar_merged.loc[khobar_merged['Cluster Label'] == 2, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
31,Ishbiliya Dist.,Beach,French Restaurant,Food Truck,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,26.191814,50.215393


In [37]:
# Cluster 3

khobar_merged.loc[khobar_merged['Cluster Label'] == 3, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
34,Al Kawthar Dist.,Farm,Yoga Studio,Waterfront,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,26.152318,50.12429
39,Al Amwaj Dist.,Food Truck,Farm,Waterfront,Food Service,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,26.143114,50.145837


In [38]:
# Cluster 4

khobar_merged.loc[khobar_merged['Cluster Label'] == 4, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
37,Al Aqiq Dist.,Resort,Yoga Studio,Ethiopian Restaurant,Food & Drink Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Service,26.095232,50.145751


### Dammam

In [39]:
# create map
map_clusters = folium.Map(location=[26.2172,50.1971], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dammam_merged['Latitude'], dammam_merged['Longitude'], dammam_merged['District'], dammam_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
# Cluster 0

dammam_merged.loc[dammam_merged['Cluster Label'] == 0, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
1,An Nasriyah Dist.,Furniture / Home Store,Cricket Ground,Pastry Shop,Shopping Mall,Chocolate Shop,Park,Event Space,Breakfast Spot,Intersection,Fish & Chips Shop,26.424017,50.122047
2,1St Industrial Dist.,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.397014,50.143095
3,Al Fanar Dist.,IT Services,Burger Joint,Yemeni Restaurant,Farm,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.405881,50.193581
4,Al Athir Dist.,Food Truck,Middle Eastern Restaurant,Juice Bar,Café,Yemeni Restaurant,Fast Food Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,26.434277,50.060367
5,Al Jalawiyah Dist.,Breakfast Spot,Outdoor Gym,Cupcake Shop,Fruit & Vegetable Store,Soccer Stadium,Campground,Supermarket,Falafel Restaurant,Smoothie Shop,Bakery,26.436518,50.075458
...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,King Abdul Aziz Seaport Dist.,Bakery,Port,Middle Eastern Restaurant,Harbor / Marina,Yemeni Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,26.473577,50.193578
73,Al Amal Dist.,Soccer Field,Market,Frozen Yogurt Shop,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.343128,50.015997
78,Al Fursan Dist.,Bookstore,Resort,Health & Beauty Service,Park,Falafel Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,26.352721,49.960809
79,Al Maha Dist.,Farm,Yemeni Restaurant,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,26.485221,49.933393


In [41]:
# Cluster 1

dammam_merged.loc[dammam_merged['Cluster Label'] == 1, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
75,Al Hadabah Dist.,Campground,Yemeni Restaurant,Cricket Ground,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.334186,49.930036
76,Al Matar Dist.,Campground,Trail,Yemeni Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,26.380527,49.938546
77,Al Amanah Dist.,Lounge,Campground,Yemeni Restaurant,Fried Chicken Joint,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.378152,49.969845


In [42]:
# Cluster 2

dammam_merged.loc[dammam_merged['Cluster Label'] == 2, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
74,Ash Sharq Dist.,Bakery,Yemeni Restaurant,Frozen Yogurt Shop,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,26.365736,49.968355


In [43]:
# Cluster 3

dammam_merged.loc[dammam_merged['Cluster Label'] == 3, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
19,Dahiyat Al Malik Fahd Dist.,Music Venue,Yemeni Restaurant,Farm,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,26.430747,49.998841


In [44]:
# Cluster 4

dammam_merged.loc[dammam_merged['Cluster Label'] == 4, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
21,As Sinaiyah Dist.,Auto Garage,Palace,Falafel Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Fast Food Restaurant,26.450579,50.018045
42,Al Kuthriah Dist.,Auto Workshop,Auto Garage,Yemeni Restaurant,Fast Food Restaurant,Fried Chicken Joint,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,26.447342,50.049518
70,Al Khalidiyah Al Janubiyah Dist.,Auto Garage,Café,Auto Workshop,Yemeni Restaurant,Fast Food Restaurant,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,26.403955,50.166384


## Recommending Districts, Locations and Clusters

The final stage will be to recommend district, cluster and possible locations to open any kind of business.

## Recommending Districts for Businesses

A district will be recommended if similar venues are numerous and thriving in the same district. Since Foursquare by default returns only popular venues, we can safely assume that the list of venues represents the popular venues for that area. 

Hence, we can simply sort the districts by the highest number of venues of the same category and present the results as a top 10 list. 

We can then output a map showing the recommended districts and their boundaries.

In [67]:
def get_district_recommendation(city, venue_category, map):

    if city == "Khobar":
        
        top_districts = khobar_grouped.sort_values([venue_category], ascending=[False])

        top_districts_list = top_districts["District"].tolist()
        
    if city == "Dammam":
        
        top_districts = dammam_grouped.sort_values([venue_category], ascending=[False])

        top_districts_list = top_districts["District"].tolist()
    
    #return top 10 districts
    top_districts_list = top_districts_list[:10]
    
    #add districts to map
    for district in top_districts_list:
            
            if city == "Khobar":
                
                #Create district boundary GEoJSON
                feature = {
                    "type": "Feature",
                    "geometry": {
                        "type": "Polygon",
                        "coordinates": khobar_xy.loc[khobar_xy["name_en"] == district, "boundaries"].item()},
                    "properties": {
                        "district_id": khobar_xy.loc[khobar_xy["name_en"] == district, "district_id"].item(),
                        "city_id": khobar_xy.loc[khobar_xy["name_en"] == district, "city_id"].item(),
                        "name_en": khobar_xy.loc[khobar_xy["name_en"] == district, "name_en"].item()}
                    }

                geojson = {
                    "type": "FeatureCollection",
                    "features": [feature]
                }

                #Add district boundaries to map 
                folium.GeoJson(geojson).add_to(map)
                
            if city == "Dammam":
                
                #Create district boundary GEoJSON
                feature = {
                    "type": "Feature",
                    "geometry": {
                        "type": "Polygon",
                        "coordinates": dammam_xy.loc[dammam_xy["name_en"] == district, "boundaries"].item()},
                    "properties": {
                        "district_id": dammam_xy.loc[dammam_xy["name_en"] == district, "district_id"].item(),
                        "city_id": dammam_xy.loc[dammam_xy["name_en"] == district, "city_id"].item(),
                        "name_en": dammam_xy.loc[dammam_xy["name_en"] == district, "name_en"].item()}
                    }

                geojson = {
                    "type": "FeatureCollection",
                    "features": [feature]
                }

                #Add district boundaries to map 
                folium.GeoJson(geojson).add_to(map)

    return top_districts_list

In [88]:
#Choose between Khobar and Dammam
if CITY == "Khobar":

    recommendation_map = folium.Map(location = (((khobar_xy.loc[khobar_xy["name_en"] == DISTRICT, "longitude"].values)),((khobar_xy.loc[khobar_xy["name_en"] == DISTRICT, "latitude"].values))), zoom_start = 12)

if CITY == "Dammam":
    
    recommendation_map = folium.Map(location = (((dammam_xy.loc[dammam_xy["name_en"] == DISTRICT, "longitude"].values)),((dammam_xy.loc[dammam_xy["name_en"] == DISTRICT, "latitude"].values))), zoom_start = 12)


#Enter search criteria
CITY = "Dammam"
VENUE_CATEGORY = "Jewelry Store"

#Add title to Map
loc = VENUE_CATEGORY + " Locations  in " + CITY

title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   

recommendation_map.get_root().html.add_child(folium.Element(title_html))

print(f"We recommend the following districts to open a {VENUE_CATEGORY} in {CITY}:")
get_district_recommendation(CITY, VENUE_CATEGORY, recommendation_map)

We recommend the following districts to open a Jewelry Store in Dammam:


['Ad Dawasir Dist.',
 'As Salam Dist.',
 'As Suq Dist.',
 '1St Industrial Dist.',
 'Ar Rabie Dist.',
 'As Safa Dist.',
 'Ar Rayyan Dist.',
 'Ar Rawdah Dist.',
 'Ar Rakah Ash Shamaliyah Dist.',
 'Ar Rabiyah Dist.']

In [90]:
recommendation_map

## Recommending Locations for Businesses

Obviously, if there are other venues in the same area, there is a higher competition. But sometimes we want to open near to other similar businesses to benefit from the footfall. So to recommend possible locations, we will do two things:

1) Show the user a heatmap of similar businesses.

2) Future Improvement: Take an input Coordinate, a Competition Importance Factor (CIF) and a Footfall Importance Factor (FIF), and return a Recommendation Factor (RF).

In [77]:
#Function to get a Location Recommandation Map

def get_location_recommendation(city, district, venue_category, map):
    
    #Khobar
    if city == "Khobar":
        
        khobar_venues_copy = khobar_venues[khobar_venues["Venue Category"] == venue_category].copy()
        
        if not district == "NA": 
            khobar_venues_copy = khobar_venues_copy[khobar_venues_copy["District"] == district].copy() 
            
            #Create district boundary GEoJSON
            feature = {
                "type": "Feature",
                "geometry": {
                    "type": "Polygon",
                    "coordinates": khobar_xy.loc[khobar_xy["name_en"] == district, "boundaries"].item()},
                "properties": {
                    "district_id": khobar_xy.loc[khobar_xy["name_en"] == district, "district_id"].item(),
                    "city_id": khobar_xy.loc[khobar_xy["name_en"] == district, "city_id"].item(),
                    "name_en": khobar_xy.loc[khobar_xy["name_en"] == district, "name_en"].item()}
                }

            geojson = {
                "type": "FeatureCollection",
                "features": [feature]
            }
            
            #Add district boundaries to map 
            folium.GeoJson(geojson).add_to(map)
        
        khobar_venues_copy['count'] = 1
        
        #Add heatpoints to map where venues are located
        HeatMap(data=khobar_venues_copy[['Venue Latitude', 'Venue Longitude', 'count']].groupby(['Venue Latitude', 'Venue Longitude']).sum().reset_index().values.tolist(), radius=20).add_to(map)
    
    #Dammam
    if city == "Dammam":
        dammam_venues_copy = dammam_venues[dammam_venues["Venue Category"] == venue_category].copy()
        
        if not district == "NA": 
            dammam_venues_copy = dammam_venues_copy[dammam_venues_copy["District"] == district].copy() 
            
            #Create district boundary GEoJSON
            feature = {
                "type": "Feature",
                "geometry": {
                    "type": "Polygon",
                    "coordinates": dammam_xy.loc[dammam_xy["name_en"] == district, "boundaries"].item()},
                "properties": {
                    "district_id": dammam_xy.loc[dammam_xy["name_en"] == district, "district_id"].item(),
                    "city_id": dammam_xy.loc[dammam_xy["name_en"] == district, "city_id"].item(),
                    "name_en": dammam_xy.loc[dammam_xy["name_en"] == district, "name_en"].item()}
                }

            geojson = {
                "type": "FeatureCollection",
                "features": [feature]
            }

            #Add district boundaries to map 
            folium.GeoJson(geojson).add_to(map)
        
        dammam_venues_copy['count'] = 1
        
        #Add heatpoints to map where venues are located
        HeatMap(data=dammam_venues_copy[['Venue Latitude', 'Venue Longitude', 'count']].groupby(['Venue Latitude', 'Venue Longitude']).sum().reset_index().values.tolist(), radius=20).add_to(map)

Let's enter our choice of the District from one of the recommended Districts:

In [78]:
#Enter search criteria
DISTRICT = "Ad Dawasir Dist."

Now we can get our recommendation map:

In [79]:
#Choose between Khobar and Dammam
if CITY == "Khobar":

    recommendation_map = folium.Map(location = (((khobar_xy.loc[khobar_xy["name_en"] == DISTRICT, "longitude"].values)),((khobar_xy.loc[khobar_xy["name_en"] == DISTRICT, "latitude"].values))), zoom_start = 15)

if CITY == "Dammam":
    
    recommendation_map = folium.Map(location = (((dammam_xy.loc[dammam_xy["name_en"] == DISTRICT, "longitude"].values)),((dammam_xy.loc[dammam_xy["name_en"] == DISTRICT, "latitude"].values))), zoom_start = 15)

get_location_recommendation(CITY, DISTRICT, VENUE_CATEGORY, recommendation_map)

#Add title to Map
loc = DISTRICT

title_html = '''
             <h3 align="center" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   

recommendation_map.get_root().html.add_child(folium.Element(title_html))

#Display map
recommendation_map

## Recommending Clusters for Businesses



In [192]:
def get_cluster_recommendation(city, venue_category, map):    
    
    recommended_clusters = []
    empty = False

    if city == "Khobar":

        khobar_venues_sorted_onehot = khobar_venues_sorted.copy()

        prefixes = ["1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th"]

        for prefix in prefixes:

            columns = prefix + " Most Common Venue"

            x = [list(a) for a in khobar_venues_sorted_onehot["Cluster Label"][khobar_venues_sorted_onehot[columns].str.contains("venue_category")].items()]

            recommended_clusters.append(x)

            if recommended_clusters:

                for clust in recommended_clusters:

                    if clust:
                        print(columns)
                        print(clust)
                        empty = False

                    if not clust and not prefix == "10th":

                        empty = True

        if empty:

            print("Apologies")

            recommended_clusters = []

    if city == "Dammam":

        dammam_venues_sorted_onehot = dammam_venues_sorted.copy()

        prefixes = ["1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th"]

        for prefix in prefixes:

            columns = prefix + " Most Common Venue"

            x = [list(a) for a in dammam_venues_sorted_onehot["Cluster Label"][dammam_venues_sorted_onehot[columns].str.contains("venue_category")].items()]

            recommended_clusters.append(x)

            if recommended_clusters:

                for clust in recommended_clusters:

                    if clust:
                        print(columns)
                        print(clust)
                        empty = False

                    if not clust and not prefix == "10th":

                        empty = True

        if empty:

            print("Apologies")

            recommended_clusters = []

In [199]:
#Enter search criteria
CITY = "Khobar"
VENUE_CATEGORY = "Café"

get_cluster_recommendation(CITY,VENUE_CATEGORY, map)

Apologies
