# The Battle of Neighborhoods - Coursera Capstone Project

This notebook constitutes of the capstone project for the IBM Data Science Professional Certificate provided by Coursera.

The goal of this notebook is to determine optimal opening locations for different kinds of businesses, focused on the twin cities of Al-Khobar and Dammam, Saudi Arabia.

The end result will beproviding a simple way to get recommendations on where to open a certain business.

## Installing Dependencies

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
import os
import time
import json
import requests
from copy import deepcopy
from dotenv import load_dotenv
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

## Creating GeoJSON for Saudi Arabia

To do any kind of analysis, we need to segregate the map into cities and districts, define boundries for the same and mark the centerpoints of each district.

Saudi Arabia can be neatly divided into provinces, cities and districts. We are particularly interested in Al-Khobar and Dammam, both of which are in the Eastern Province of Saudi Arabia.

Fortunately, a contributer on GitHub has already gathered the coordinate data, saving us the time to scrape it ourselves. However, we will have to build the GeoJSON ourselves. 

### Importing coordinates JSON

Source: https://github.com/homaily/Saudi-Arabia-Regions-Cities-and-Districts

Let us import that into pandas.

In [2]:
with open(r"/home/mohammed/Desktop/Musaddiq/Coursera_Capstone/json/cities.json", 'r', encoding='utf8', errors='ignore') as file:
    cities = json.load(file)
    
with open(r"/home/mohammed/Desktop/Musaddiq/Coursera_Capstone/json/districts.json", 'r', encoding='utf8', errors='ignore') as file:
    districts = json.load(file)

Let's take a look at how the JSON is structured. 

We can see city_id and district_id serve as the primary keys.

In [3]:
cities

[{'city_id': 1,
  'region_id': 7,
  'name_ar': 'تبوك',
  'name_en': 'Tabuk',
  'center': [28.41463997, 36.53387003]},
 {'city_id': 2,
  'region_id': 7,
  'name_ar': 'نعمي',
  'name_en': "Na'mi",
  'center': [28.30507995, 35.74931003]},
 {'city_id': 3,
  'region_id': 1,
  'name_ar': 'الرياض',
  'name_en': 'Riyadh',
  'center': [24.69999996, 46.73333003]},
 {'city_id': 4,
  'region_id': 7,
  'name_ar': 'حميط',
  'name_en': 'Humayt',
  'center': [28.65152001, 35.38013]},
 {'city_id': 5,
  'region_id': 2,
  'name_ar': 'الطائف',
  'name_en': 'At Taif',
  'center': [21.26848005, 40.41667003]},
 {'city_id': 6,
  'region_id': 2,
  'name_ar': 'مكة المكرمة',
  'name_en': 'Makkah Al Mukarramah',
  'center': [21.42717994, 39.84349001]},
 {'city_id': 7,
  'region_id': 7,
  'name_ar': 'رجم الطيارة',
  'name_en': 'Rajm At Tayarah',
  'center': [29.60751999, 37.23556001]},
 {'city_id': 8,
  'region_id': 7,
  'name_ar': 'الثميد',
  'name_en': 'Ath Thumayd',
  'center': [29.93183995, 37.16726002]},
 {'c

In [4]:
districts[1] #showing only 1 record out of many thousands

{'district_id': 10100003002,
 'city_id': 3,
 'region_id': 1,
 'name_ar': 'حي النموذجية',
 'name_en': 'Al Namudhajiyah Dist.',
 'boundaries': [[[24.65018372, 46.70227584],
   [24.64939455, 46.7014039],
   [24.64915715, 46.70115918],
   [24.64892224, 46.70091159],
   [24.64868987, 46.70066116],
   [24.64857349, 46.70053129],
   [24.64846099, 46.70039739],
   [24.64835249, 46.70025959],
   [24.6482481, 46.70011803],
   [24.64817484, 46.70000666],
   [24.6481001, 46.69989647],
   [24.64616862, 46.69704741],
   [24.64689243, 46.693556],
   [24.64695517, 46.69322908],
   [24.64696023, 46.69320269],
   [24.6469764, 46.69308145],
   [24.64699883, 46.69296141],
   [24.64702746, 46.69284296],
   [24.64706219, 46.69272648],
   [24.6471029, 46.69261235],
   [24.64714948, 46.69250093],
   [24.64720176, 46.69239258],
   [24.64731083, 46.69223089],
   [24.64742364, 46.69207231],
   [24.64754013, 46.69191695],
   [24.64766023, 46.6917649],
   [24.64778384, 46.69161628],
   [24.64785942, 46.69151051],


### Getting city_id of Al Khobar and Dammam

In [5]:
for city in cities:
    if "Khobar" in city["name_en"]:
        print("The city_id of " + city["name_en"] + " is " + str(city["city_id"]) + ".")
    if "Dammam" in city["name_en"]:
        print("The city_id of " + city["name_en"] + " is " + str(city["city_id"]) + ".")

The city_id of Dammam is 13.
The city_id of Al Khobar is 31.


### Getting districts in Al Khobar and Dammam

Now we can get the districts that constitute Al Khobar and Dammam.

In [6]:
khobar_districts = []
dammam_districts = []

for district in districts:
    if district["city_id"] == 31:
        khobar_districts.append(district)
        
for district in districts:
    if district["city_id"] == 13:
        dammam_districts.append(district)

### Converting into GeoJSON format

We will create a copy of the data so we can convert latitude,longitude coordinates to longitude,latitude coordinates supported by GeoJSON.

More information here, see "Position" section: https://macwright.com/2015/03/23/geojson-second-bite.html 

In [7]:
khobar_districts_xy = deepcopy(khobar_districts)
dammam_districts_xy = deepcopy(dammam_districts)

Reversing boundaries coordinates as per GeoJSON format

In [8]:
for district in range(len(khobar_districts_xy)):
    for _ in khobar_districts_xy[district]["boundaries"][0]:
        _.reverse()
        
for district in range(len(dammam_districts_xy)):
    for _ in dammam_districts_xy[district]["boundaries"][0]:
        _.reverse()

Let's also define the centerpoints of the district

In [9]:
for district in range(len(khobar_districts_xy)):
    khobar_districts_xy[district]["center"] = [sum(x)/len(x) for x in zip(*khobar_districts_xy[district]["boundaries"][0])]
    khobar_districts_xy[district]["latitude"] = khobar_districts_xy[district]["center"][0]
    khobar_districts_xy[district]["longitude"] = khobar_districts_xy[district]["center"][1]    
    
for district in range(len(khobar_districts)):
    khobar_districts[district]["center"] = [sum(x)/len(x) for x in zip(*khobar_districts[district]["boundaries"][0])]
    khobar_districts[district]["latitude"] = khobar_districts[district]["center"][0]
    khobar_districts[district]["longitude"] = khobar_districts[district]["center"][1]
    
for district in range(len(dammam_districts_xy)):
    dammam_districts_xy[district]["center"] = [sum(x)/len(x) for x in zip(*dammam_districts_xy[district]["boundaries"][0])]
    dammam_districts_xy[district]["latitude"] = dammam_districts_xy[district]["center"][0]
    dammam_districts_xy[district]["longitude"] = dammam_districts_xy[district]["center"][1]
    
for district in range(len(dammam_districts)):
    dammam_districts[district]["center"] = [sum(x)/len(x) for x in zip(*dammam_districts[district]["boundaries"][0])]
    dammam_districts[district]["latitude"] = dammam_districts[district]["center"][0]
    dammam_districts[district]["longitude"] = dammam_districts[district]["center"][1]

Let's go ahead and put that into a Pandas DataFrame

In [10]:
khobar = pd.DataFrame(khobar_districts)
dammam = pd.DataFrame(dammam_districts)

khobar_xy = pd.DataFrame(khobar_districts_xy)
dammam_xy = pd.DataFrame(dammam_districts_xy)


khobar_xy

Unnamed: 0,district_id,city_id,region_id,name_ar,name_en,boundaries,center,latitude,longitude
0,10500031001,31,5,حي التحلية,At Tahliyah Dist.,"[[[50.21638279, 26.18209534], [50.21611424, 26...","[50.19358769394619, 26.17781863856502]",50.193588,26.177819
1,10500031002,31,5,حي ابن سيناء,Ibn Sina Dist.,"[[[50.21139022, 26.2507773], [50.21288113, 26....","[50.201773003593736, 26.241028340781252]",50.201773,26.241028
2,10500031003,31,5,حي الحزام الاخضر,Al Hizam Al Akhdar Dist.,"[[[50.19060022, 26.3089098], [50.19145503, 26....","[50.20295822975609, 26.305178597317074]",50.202958,26.305179
3,10500031004,31,5,حي صناعية الثقبة,Sinaiyah Ath Thuqbah Dist.,"[[[50.19749806, 26.24944191], [50.19719675, 26...","[50.19638499861113, 26.253792497222218]",50.196385,26.253792
4,10500031005,31,5,حي التعاون,At Taawun Dist.,"[[[50.19458339, 26.23354342], [50.1963495, 26....","[50.18608636999999, 26.226340009404773]",50.186086,26.22634
5,10500031006,31,5,حي الراكة الجنوبية,Ar Rakah Al Janubiyah Dist.,"[[[50.20035662, 26.33674744], [50.19986525, 26...","[50.203745644269645, 26.350680400337087]",50.203746,26.35068
6,10500031007,31,5,حي الخبر الشمالية,Al Khubar Ash Shamaliyah Dist.,"[[[50.21811139, 26.28045052], [50.21783621, 26...","[50.214117997064186, 26.29401644211008]",50.214118,26.294016
7,10500031008,31,5,حي مدينة العمال,Madinat Al Ummal Dist.,"[[[50.20487945, 26.28181475], [50.20483605, 26...","[50.205489442898546, 26.294228553913037]",50.205489,26.294229
8,10500031009,31,5,حي العقربية,Al Aqrabiyah Dist.,"[[[50.19759817, 26.28393879], [50.19757992, 26...","[50.189789413188414, 26.297885457971017]",50.189789,26.297885
9,10500031010,31,5,حي الخبر الجنوبية,Al Khubar Al Janubiyah Dist.,"[[[50.21598118, 26.26997671], [50.21521935, 26...","[50.20615011927536, 26.27328286536233]",50.20615,26.273283


Finally, we can parse all the above data into the GeoJSON format

In [11]:
features = []

for district in range(len(khobar_districts_xy)):
    feature = {
            "type": "Feature",
            "geometry": {
                "type": "Polygon",
                "coordinates": khobar_districts_xy[district]["boundaries"]},
            "properties": {
                "district_id": khobar_districts_xy[district]["district_id"],
                "city_id": khobar_districts_xy[district]["city_id"],
                "name_en": khobar_districts_xy[district]["name_en"]}
        }

    features.append(feature)
    
khobar_geojson = {
    "type": "FeatureCollection",
    "features": features
}

khobar_geojson = json.dumps(khobar_geojson)

print(json.dumps(json.loads(khobar_geojson), indent=2))

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              50.21638279,
              26.18209534
            ],
            [
              50.21611424,
              26.18118287
            ],
            [
              50.2156978,
              26.18140407
            ],
            [
              50.21392926,
              26.17970601
            ],
            [
              50.21352359,
              26.17971959
            ],
            [
              50.2118197,
              26.17803992
            ],
            [
              50.21144898,
              26.17732234
            ],
            [
              50.20759408,
              26.17361962
            ],
            [
              50.20861849,
              26.17277879
            ],
            [
              50.2061992,
              26.17056918
            ],
            [
     

In [12]:
features = []

for district in range(len(dammam_districts_xy)):
    feature = {
            "type": "Feature",
            "geometry": {
                "type": "Polygon",
                "coordinates": dammam_districts_xy[district]["boundaries"]},
            "properties": {
                "district_id": dammam_districts_xy[district]["district_id"],
                "city_id": dammam_districts_xy[district]["city_id"],
                "name_en": dammam_districts_xy[district]["name_en"]}
        }

    features.append(feature)
    
dammam_geojson = {
    "type": "FeatureCollection",
    "features": features
}

dammam_geojson = json.dumps(dammam_geojson)

print(json.dumps(json.loads(dammam_geojson), indent=2))

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              50.13177952,
              26.42026955
            ],
            [
              50.13205218,
              26.42399558
            ],
            [
              50.13260661,
              26.43157151
            ],
            [
              50.13320911,
              26.43118517
            ],
            [
              50.13579565,
              26.42954304
            ],
            [
              50.13773151,
              26.42825795
            ],
            [
              50.13859498,
              26.42774527
            ],
            [
              50.13929657,
              26.42740346
            ],
            [
              50.14018704,
              26.42698971
            ],
            [
              50.14057188,
              26.42680773
            ],
            [
  

### Plotting districts and boundaries

In [13]:
khobar_data = khobar[["name_en", "district_id", "center"]]
dammam_data = dammam[["name_en", "district_id", "center"]]

In [14]:
# create a plain map
khobar_map = folium.Map(location=[26.2172,50.1971], zoom_start=12)

folium.GeoJson(khobar_geojson).add_to(khobar_map)

# display map
khobar_map

In [15]:
# create a plain map
dammam_map = folium.Map(location=[26.4207,50.0888], zoom_start=12)

folium.GeoJson(dammam_geojson).add_to(dammam_map)

'''
folium.Choropleth(
    geo_data=dammam_geojson,
    name='Dammam',
    data=dammam_data,
    columns=['district_id', 'randNumCol'],
    key_on='feature.properties.district_id',
    fill_color='YlGnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Musaddiq'
).add_to(dammam_map)
'''

# display map
dammam_map

## Using Foursquare API to retrieve popular venues in each district 

Having registered beforehand for the Foursqaure developer program (https://developer.foursquare.com/), we can use the API to get a list of popular venues in each district.

But first, security. We will the dotenv package to safely import our public and private keys to pass to the Foursquare API.

In [16]:
#using python-dotenv to protect Foursqaure credentials
%load_ext dotenv
%dotenv
import os

CLIENT_ID = os.getenv("CLIENT_ID") # your Foursquare ID
CLIENT_SECRET = os.getenv("CLIENT_SECRET") # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentials:')
print('CLIENT_ID SIZE: ' + str(len(CLIENT_ID)))
print('CLIENT_SECRET SIZE: ' + str(len(CLIENT_SECRET)))

Your credentials:
CLIENT_ID SIZE: 48
CLIENT_SECRET SIZE: 48


### Function to get nearby popular venues

This function will be called recursively to retrieve nearby venues. 

In [17]:
def getNearbyVenues(
    names,
    latitudes,
    longitudes,
    radius=500,
    ):

    venues_list = []
    for (name, lat, lng) in zip(names, latitudes, longitudes):

        # create the API request URL

        url = \
            'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT,
            )

        # make the GET request

        results = requests.get(url).json()['response']['groups'][0]['items']

        # return only relevant information for each nearby venue

        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name'],
            ) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list
                                 for item in venue_list])
    nearby_venues.columns = [
        'District',
        'District Latitude',
        'District Longitude',
        'Venue',
        'Venue Latitude',
        'Venue Longitude',
        'Venue Category',
        ]

    return nearby_venues


### Getting nearby venues

In [18]:
khobar_venues = getNearbyVenues(names=khobar['name_en'],
                                   latitudes=khobar['latitude'],
                                   longitudes=khobar['longitude']
                                  )

khobar_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,At Tahliyah Dist.,26.177819,50.193588,ملعب اسكان تحليه المياه - الخبر,26.178656,50.193112,Soccer Field
1,At Tahliyah Dist.,26.177819,50.193588,حديقة التحليه,26.179251,50.195127,Garden
2,At Tahliyah Dist.,26.177819,50.193588,معجنات الناعورة,26.177397,50.195792,Breakfast Spot
3,At Tahliyah Dist.,26.177819,50.193588,اسكان تحلية المياة - الخبر,26.179223,50.195842,Harbor / Marina
4,Ibn Sina Dist.,26.241028,50.201773,Al-Seef Cafe & Restaurant (مقهى ومطعم السيف),26.245261,50.201800,Hookah Bar
...,...,...,...,...,...,...,...
819,Al Amwaj Dist.,26.143114,50.145837,Alsubaie farm,26.142485,50.142363,Farm
820,As Sadafah Dist.,26.368823,50.210278,CUE NINE billiard Club,26.365770,50.207950,Pool Hall
821,As Sadafah Dist.,26.368823,50.210278,استاد مدينة الأمير سعود بن جلوي الرياضية بالرا...,26.368982,50.205988,Soccer Stadium
822,As Sadafah Dist.,26.368823,50.210278,Cue Nine Billiard Q9 كيو ناين بليارد,26.365792,50.208045,Pool Hall


In [19]:
dammam_venues = getNearbyVenues(names=dammam['name_en'],
                                   latitudes=dammam['latitude'],
                                   longitudes=dammam['longitude']
                                  )

dammam_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,An Nasriyah Dist.,26.424017,50.122047,Jawsq Festival Hall (قاعة الجوسق للإحتفالات),26.422153,50.118145,Event Space
1,An Nasriyah Dist.,26.424017,50.122047,بوفية السعادة,26.424839,50.118922,Breakfast Spot
2,An Nasriyah Dist.,26.424017,50.122047,حلويات جرير القباني patisserie jareer,26.422789,50.123262,Pastry Shop
3,An Nasriyah Dist.,26.424017,50.122047,Qatif Mall,26.423454,50.121995,Shopping Mall
4,An Nasriyah Dist.,26.424017,50.122047,Al Mira Centre,26.422722,50.120174,Furniture / Home Store
...,...,...,...,...,...,...,...
1076,Al Maha Dist.,26.485221,49.933393,Yousif Farm,26.481787,49.935994,Farm
1077,An Nada Dist.,26.376905,50.072901,Macdonald’s,26.377904,50.072297,Fast Food Restaurant
1078,An Nada Dist.,26.376905,50.072901,تموينات واحة النرجس,26.374269,50.074375,Grocery Store
1079,An Nada Dist.,26.376905,50.072901,تموينات القحطاني,26.374203,50.074840,Grocery Store


### Unique Venue Categories

Let's merge and list the unique categories gathered from both the citites.

In [20]:
khobar_unique_cat = khobar_venues['Venue Category'].unique()

print('There are {} unique categories in Al-Khobar.'.format(len(khobar_unique_cat)))

khobar_unique_cat

There are 161 unique categories in Al-Khobar.


array(['Soccer Field', 'Garden', 'Breakfast Spot', 'Harbor / Marina',
       'Hookah Bar', 'Shipping Store', 'Vape Store', 'Smoke Shop',
       'Auto Workshop', 'Café', 'Bakery', 'Coffee Shop', 'Chocolate Shop',
       'Seafood Restaurant', 'Italian Restaurant', 'Organic Grocery',
       'Spa', 'Salad Place', 'Salon / Barbershop', 'Smoothie Shop',
       'Flower Shop', 'Sushi Restaurant', 'Frozen Yogurt Shop',
       'Sandwich Place', 'Creperie', 'Juice Bar', 'French Restaurant',
       'Burger Joint', 'Kebab Restaurant', 'Dessert Shop',
       'Cosmetics Shop', 'Donut Shop', 'Shawarma Place', 'Hotel',
       'Fast Food Restaurant', 'Asian Restaurant', 'Food & Drink Shop',
       'Pastry Shop', 'Tourist Information Center', 'Rental Car Location',
       'Farmers Market', 'Athletics & Sports', 'Auto Garage', 'Stables',
       'Restaurant', 'Gym', 'Insurance Office', 'Photography Studio',
       'Ice Cream Shop', 'Grocery Store', 'Middle Eastern Restaurant',
       'Falafel Restaurant', 

In [21]:
dammam_unique_cat = dammam_venues['Venue Category'].unique()

print('There are {} unique categories in Al-dammam.'.format(len(dammam_unique_cat)))

dammam_unique_cat

There are 182 unique categories in Al-dammam.


array(['Event Space', 'Breakfast Spot', 'Pastry Shop', 'Shopping Mall',
       'Furniture / Home Store', 'Chocolate Shop', 'Intersection', 'Park',
       'Cricket Ground', 'Construction & Landscaping', 'Outdoor Gym',
       'Burger Joint', 'Playground', 'IT Services', 'Hookah Bar',
       'Juice Bar', 'Afghan Restaurant', 'Bakery', 'Smoothie Shop',
       'Falafel Restaurant', 'Supermarket', 'Soccer Stadium',
       'Campground', 'Garden', 'Cupcake Shop', 'Fruit & Vegetable Store',
       'Cosmetics Shop', 'Hotel', 'Fried Chicken Joint',
       'Middle Eastern Restaurant', 'Restaurant', 'Turkish Restaurant',
       'Clothing Store', 'Fast Food Restaurant', 'Tea Room',
       'Lebanese Restaurant', 'Coffee Shop', 'Market', 'Smoke Shop',
       'Discount Store', "Men's Store", 'BBQ Joint', 'Baby Store',
       'Tailor Shop', 'Electronics Store', 'Bookstore',
       'Fish & Chips Shop', 'Beach', 'Rest Area', 'Pharmacy',
       'Medical Supply Store', 'Department Store', 'Jewelry Store',
 

In [22]:
unique_venue_categories = khobar_unique_cat.tolist() + dammam_unique_cat.tolist()

unique_venue_categories = np.unique(unique_venue_categories)

print('There are {} unique categories in both Al Khobar and Dammam overall.'.format(len(unique_venue_categories)))

unique_venue_categories

There are 228 unique categories in both Al Khobar and Dammam overall.


array(['ATM', 'Afghan Restaurant', 'African Restaurant',
       'Airport Terminal', 'American Restaurant', 'Amphitheater',
       'Antique Shop', 'Arcade', 'Arepa Restaurant', 'Art Gallery',
       'Arts & Crafts Store', 'Asian Restaurant', 'Astrologer',
       'Athletics & Sports', 'Auto Garage', 'Auto Workshop', 'BBQ Joint',
       'Baby Store', 'Bakery', 'Bank', 'Bar', 'Basketball Court',
       'Bathing Area', 'Bay', 'Beach', 'Bed & Breakfast', 'Big Box Store',
       'Bike Rental / Bike Share', 'Bistro', 'Board Shop',
       'Boat or Ferry', 'Bookstore', 'Boutique', 'Bowling Alley',
       'Boxing Gym', 'Breakfast Spot', 'Bridal Shop', 'Bubble Tea Shop',
       'Buffet', 'Burger Joint', 'Business Service', 'Butcher',
       'Cafeteria', 'Café', 'Campground', 'Canal', 'Candy Store',
       'Cantonese Restaurant', 'Chinese Restaurant', 'Chocolate Shop',
       'Clothing Store', 'Coffee Roaster', 'Coffee Shop', 'Concert Hall',
       'Construction & Landscaping', 'Convenience Store',

### Onehot encoding

We can do one-hot encoding to convert our variables into integers that we can easily analyze with our algorithms

In [23]:
## KHOBAR
# one hot encoding
khobar_onehot = pd.get_dummies(khobar_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
khobar_onehot['District'] = khobar_venues['District'] 

# move District column to the first column
khobar_onehot = khobar_onehot[ ['District'] + [ col for col in khobar_onehot.columns if col != 'District' ] ]


## DAMMAM
# one hot encoding
dammam_onehot = pd.get_dummies(dammam_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
dammam_onehot['District'] = dammam_venues['District'] 

# move District column to the first column
dammam_onehot = dammam_onehot[ ['District'] + [ col for col in dammam_onehot.columns if col != 'District' ] ]



khobar_onehot.head()

Unnamed: 0,District,ATM,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,...,Theme Restaurant,Tourist Information Center,Track,Trail,Turkish Restaurant,Vape Store,Video Game Store,Watch Shop,Waterfront,Yoga Studio
0,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,At Tahliyah Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ibn Sina Dist.,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
khobar_grouped = khobar_onehot.groupby('District').mean().reset_index()
dammam_grouped = dammam_onehot.groupby('District').mean().reset_index()

khobar_grouped

Unnamed: 0,District,ATM,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,...,Theme Restaurant,Tourist Information Center,Track,Trail,Turkish Restaurant,Vape Store,Video Game Store,Watch Shop,Waterfront,Yoga Studio
0,Al Amwaj Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Andalus Dist.,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,...,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Al Aqiq Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Al Aqrabiyah Dist.,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0
4,Al Bahar Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Al Bandariyah Dist.,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.081081,0.0,0.054054
6,Al Buhayrah Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
7,Al Bustan Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Al Hada Dist.,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Al Hamra Dist.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0


### Getting most common venues in each district

The venue data gathered up to this point is just a list. We can group that by frequency of the occurence of each venue within a particular district to get an idea of the kind of place the district is. 

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

### Top Venues in Khobar's Districts

In [27]:
# create a new dataframe
khobar_venues_sorted = pd.DataFrame(columns=columns)
khobar_venues_sorted['District'] = khobar_grouped['District']

for ind in np.arange(khobar_grouped.shape[0]):
    khobar_venues_sorted.iloc[ind, 1:] = return_most_common_venues(khobar_grouped.iloc[ind, :], num_top_venues)

khobar_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Al Amwaj Dist.,Farm,Yoga Studio,Hookah Bar,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,Ethiopian Restaurant
1,Al Andalus Dist.,Coffee Shop,Café,Restaurant,Bakery,Theme Restaurant,Clothing Store,Breakfast Spot,Lebanese Restaurant,Butcher,Falafel Restaurant
2,Al Aqiq Dist.,Resort,Yoga Studio,Entertainment Service,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Service,Ethiopian Restaurant
3,Al Aqrabiyah Dist.,Coffee Shop,Shawarma Place,Breakfast Spot,Bakery,Restaurant,Tea Room,Ice Cream Shop,Fast Food Restaurant,Café,Juice Bar
4,Al Bahar Dist.,Ice Cream Shop,Bay,Beach,Falafel Restaurant,Food Truck,Food & Drink Shop,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market
5,Al Bandariyah Dist.,Coffee Shop,Watch Shop,Seafood Restaurant,Boutique,Gym / Fitness Center,Dessert Shop,Yoga Studio,Track,Modern European Restaurant,Jewelry Store
6,Al Buhayrah Dist.,Lake,Trail,Gift Shop,Yoga Studio,Ethiopian Restaurant,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
7,Al Bustan Dist.,Coffee Shop,Café,Furniture / Home Store,Breakfast Spot,Indian Restaurant,Restaurant,Lounge,Supermarket,Boxing Gym,Flower Shop
8,Al Hada Dist.,Coffee Shop,Burger Joint,Plaza,Kids Store,Asian Restaurant,Gym / Fitness Center,Food Truck,Lawyer,Flower Shop,Mediterranean Restaurant
9,Al Hamra Dist.,Beach,Event Service,Shopping Mall,Lake,Entertainment Service,Waterfront,Hotel,Diner,Falafel Restaurant,Fishing Store


### Top Venues in Dammam's Districts

In [28]:
# create a new dataframe
dammam_venues_sorted = pd.DataFrame(columns=columns)
dammam_venues_sorted['District'] = dammam_grouped['District']

for ind in np.arange(dammam_grouped.shape[0]):
    dammam_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dammam_grouped.iloc[ind, :], num_top_venues)

dammam_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1St Industrial Dist.,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio
1,2Nd Industrial City,Fast Food Restaurant,Restaurant,Middle Eastern Restaurant,Mountain,Fried Chicken Joint,Yemeni Restaurant,Farm,Flower Shop,Flea Market,Fishing Spot
2,Ad Danah Dist.,Pizza Place,Restaurant,Steakhouse,Supermarket,Food Court,Fast Food Restaurant,Falafel Restaurant,Café,Candy Store,Middle Eastern Restaurant
3,Ad Dawasir Dist.,Jewelry Store,Men's Store,Market,Afghan Restaurant,Sporting Goods Shop,Ice Cream Shop,Intersection,Lawyer,Food Court,Clothing Store
4,Al Adamah Dist.,Coffee Shop,Hotel,Pool Hall,Restaurant,Yemeni Restaurant,Event Space,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop
...,...,...,...,...,...,...,...,...,...,...,...
69,Madinat Al Ummal Dist.,Burger Joint,Coffee Shop,Café,Go Kart Track,Indian Restaurant,Convenience Store,Middle Eastern Restaurant,Furniture / Home Store,Beach,Market
70,Prince Muhammed Ibn Saud Dist.,Café,Breakfast Spot,Department Store,Convenience Store,Soccer Stadium,Stadium,Seafood Restaurant,Juice Bar,Pakistani Restaurant,Coffee Shop
71,Qasr Al Khalij Dist.,Middle Eastern Restaurant,Art Gallery,Palace,Café,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
72,Taibah Dist.,Insurance Office,Coffee Shop,Tea Room,Juice Bar,Donut Shop,Business Service,Yemeni Restaurant,Farm,Food,Flower Shop


## K-means Clustering

We can utilize the K-means Clustering machine learning algorithm to group the different districts into clusters based on their most common venues.

We will use 5 clusters, any higher than that leads to ineffective clutering. You can rerun this notebook with a different number of k clusters and observe the changes on the maps below.

In [29]:
khobar_data = khobar_data.rename(columns={"name_en": "District"})
dammam_data = dammam_data.rename(columns={"name_en": "District"})

### Khobar Clusters

In [30]:
# set number of clusters
kclusters = 5

khobar_grouped_clustering = khobar_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(khobar_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

# add clustering labels
khobar_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

khobar_merged = khobar_data

# merge khobar_grouped with khobar_data to add latitude/longitude for each District
khobar_merged = khobar_merged.join(khobar_venues_sorted.set_index('District'), on='District')

khobar_merged.dropna(axis=0, inplace = True)

khobar_merged.head() # check the last columns!

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,At Tahliyah Dist.,10500031001,"[26.17781863856502, 50.19358769394619]",1.0,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Fishing Store,Fast Food Restaurant,Farmers Market,Farm
1,Ibn Sina Dist.,10500031002,"[26.241028340781252, 50.201773003593736]",0.0,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Ethiopian Restaurant,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
2,Al Hizam Al Akhdar Dist.,10500031003,"[26.305178597317074, 50.20295822975609]",1.0,Coffee Shop,Burger Joint,Juice Bar,Café,Fast Food Restaurant,Organic Grocery,Bakery,Hotel,Salad Place,Seafood Restaurant
3,Sinaiyah Ath Thuqbah Dist.,10500031004,"[26.253792497222218, 50.19638499861113]",0.0,Auto Workshop,Hookah Bar,Smoke Shop,Juice Bar,Rental Car Location,Tourist Information Center,Fishing Store,Fast Food Restaurant,Farmers Market,Farm
4,At Taawun Dist.,10500031005,"[26.226340009404773, 50.18608636999999]",3.0,Hookah Bar,Restaurant,Spa,Farmers Market,Stables,Auto Garage,Athletics & Sports,Café,Electronics Store,Entertainment Service


### Dammam Clusters

In [31]:
# set number of clusters
kclusters = 5

dammam_grouped_clustering = dammam_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dammam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

# add clustering labels
dammam_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

dammam_merged = dammam_data

# merge dammam_grouped with dammam_data to add latitude/longitude for each District
dammam_merged = dammam_merged.join(dammam_venues_sorted.set_index('District'), on='District')

dammam_merged.dropna(axis=0, inplace = True)

dammam_merged.head() # check the last columns!

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,An Nasriyah Dist.,10500013002,"[26.424016825090902, 50.12204671218182]",0.0,Furniture / Home Store,Cricket Ground,Breakfast Spot,Pastry Shop,Intersection,Park,Chocolate Shop,Shopping Mall,Event Space,Fast Food Restaurant
2,1St Industrial Dist.,10500013003,"[26.39701435782051, 50.14309465038462]",0.0,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio
3,Al Fanar Dist.,10500013004,"[26.40588090352941, 50.19358098470589]",0.0,IT Services,Burger Joint,Playground,History Museum,Event Space,Flea Market,Fishing Spot,Fish & Chips Shop,Home Service,Film Studio
4,Al Athir Dist.,10500013005,"[26.434276787415733, 50.06036651483147]",0.0,Hookah Bar,Juice Bar,Afghan Restaurant,Antique Shop,Arepa Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
5,Al Jalawiyah Dist.,10500013006,"[26.43651751632076, 50.07545844943395]",0.0,Breakfast Spot,Afghan Restaurant,Supermarket,Cupcake Shop,Falafel Restaurant,Campground,Soccer Stadium,Fruit & Vegetable Store,Garden,Bakery


In [32]:
khobar_merged["Latitude"] = [ x[0] for x in khobar_merged["center"].tolist() ]
khobar_merged["Longitude"] = [ x[1] for x in khobar_merged["center"].tolist() ]

dammam_merged["Latitude"] = [ x[0] for x in dammam_merged["center"].tolist() ]
dammam_merged["Longitude"] = [ x[1] for x in dammam_merged["center"].tolist() ]

khobar_merged

Unnamed: 0,District,district_id,center,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
0,At Tahliyah Dist.,10500031001,"[26.17781863856502, 50.19358769394619]",1.0,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,26.177819,50.193588
1,Ibn Sina Dist.,10500031002,"[26.241028340781252, 50.201773003593736]",0.0,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Ethiopian Restaurant,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,26.241028,50.201773
2,Al Hizam Al Akhdar Dist.,10500031003,"[26.305178597317074, 50.20295822975609]",1.0,Coffee Shop,Burger Joint,Juice Bar,Café,Fast Food Restaurant,Organic Grocery,Bakery,Hotel,Salad Place,Seafood Restaurant,26.305179,50.202958
3,Sinaiyah Ath Thuqbah Dist.,10500031004,"[26.253792497222218, 50.19638499861113]",0.0,Auto Workshop,Hookah Bar,Smoke Shop,Juice Bar,Rental Car Location,Tourist Information Center,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,26.253792,50.196385
4,At Taawun Dist.,10500031005,"[26.226340009404773, 50.18608636999999]",3.0,Hookah Bar,Restaurant,Spa,Farmers Market,Stables,Auto Garage,Athletics & Sports,Café,Electronics Store,Entertainment Service,26.22634,50.186086
5,Ar Rakah Al Janubiyah Dist.,10500031006,"[26.350680400337087, 50.203745644269645]",1.0,Gym,Coffee Shop,Café,Photography Studio,Hookah Bar,Flower Shop,Ice Cream Shop,Insurance Office,Grocery Store,Breakfast Spot,26.35068,50.203746
6,Al Khubar Ash Shamaliyah Dist.,10500031007,"[26.29401644211008, 50.214117997064186]",1.0,Coffee Shop,Middle Eastern Restaurant,Pakistani Restaurant,Breakfast Spot,Café,Tailor Shop,Furniture / Home Store,Fried Chicken Joint,Turkish Restaurant,Bakery,26.294016,50.214118
7,Madinat Al Ummal Dist.,10500031008,"[26.294228553913037, 50.205489442898546]",1.0,Seafood Restaurant,Coffee Shop,Gym / Fitness Center,Flower Shop,Athletics & Sports,Organic Grocery,Clothing Store,Café,Middle Eastern Restaurant,Fast Food Restaurant,26.294229,50.205489
8,Al Aqrabiyah Dist.,10500031009,"[26.297885457971017, 50.189789413188414]",1.0,Coffee Shop,Shawarma Place,Breakfast Spot,Bakery,Restaurant,Tea Room,Ice Cream Shop,Fast Food Restaurant,Café,Juice Bar,26.297885,50.189789
9,Al Khubar Al Janubiyah Dist.,10500031010,"[26.27328286536233, 50.20615011927536]",1.0,Bakery,Clothing Store,Middle Eastern Restaurant,Pizza Place,Italian Restaurant,Coffee Shop,Sandwich Place,Market,Fried Chicken Joint,Furniture / Home Store,26.273283,50.20615


### Plotting Clusters and listing the districts in each cluster

Let's plot the clusters obtained from our K-means clustering in a Folium map. This provides a nice visual for the different kinds of clusters.

What do you think each cluster represents? For example, Al Khobar's Cluster 0 seems to represent areas with many restaurants and dining options. What about the rest?

### Al Khobar

In [33]:
# create map
map_clusters = folium.Map(location=[26.2172,50.1971], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(khobar_merged['Latitude'], khobar_merged['Longitude'], khobar_merged['District'], khobar_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [34]:
# Cluster 0:

khobar_merged.loc[khobar_merged['Cluster Label'] == 0, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
1,Ibn Sina Dist.,Auto Workshop,Hookah Bar,Smoke Shop,Vape Store,Shipping Store,Ethiopian Restaurant,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,26.241028,50.201773
3,Sinaiyah Ath Thuqbah Dist.,Auto Workshop,Hookah Bar,Smoke Shop,Juice Bar,Rental Car Location,Tourist Information Center,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,26.253792,50.196385
28,Sinaiyah Al Fawaziyah Dist.,Auto Garage,Motorcycle Shop,Auto Workshop,Yoga Studio,Falafel Restaurant,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,26.241888,50.214588


In [35]:
# Cluster 1

khobar_merged.loc[khobar_merged['Cluster Label'] == 1, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
0,At Tahliyah Dist.,Breakfast Spot,Harbor / Marina,Garden,Soccer Field,Yoga Studio,Event Service,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,26.177819,50.193588
2,Al Hizam Al Akhdar Dist.,Coffee Shop,Burger Joint,Juice Bar,Café,Fast Food Restaurant,Organic Grocery,Bakery,Hotel,Salad Place,Seafood Restaurant,26.305179,50.202958
5,Ar Rakah Al Janubiyah Dist.,Gym,Coffee Shop,Café,Photography Studio,Hookah Bar,Flower Shop,Ice Cream Shop,Insurance Office,Grocery Store,Breakfast Spot,26.35068,50.203746
6,Al Khubar Ash Shamaliyah Dist.,Coffee Shop,Middle Eastern Restaurant,Pakistani Restaurant,Breakfast Spot,Café,Tailor Shop,Furniture / Home Store,Fried Chicken Joint,Turkish Restaurant,Bakery,26.294016,50.214118
7,Madinat Al Ummal Dist.,Seafood Restaurant,Coffee Shop,Gym / Fitness Center,Flower Shop,Athletics & Sports,Organic Grocery,Clothing Store,Café,Middle Eastern Restaurant,Fast Food Restaurant,26.294229,50.205489
8,Al Aqrabiyah Dist.,Coffee Shop,Shawarma Place,Breakfast Spot,Bakery,Restaurant,Tea Room,Ice Cream Shop,Fast Food Restaurant,Café,Juice Bar,26.297885,50.189789
9,Al Khubar Al Janubiyah Dist.,Bakery,Clothing Store,Middle Eastern Restaurant,Pizza Place,Italian Restaurant,Coffee Shop,Sandwich Place,Market,Fried Chicken Joint,Furniture / Home Store,26.273283,50.20615
10,Ar Rawabi Dist.,Soccer Field,Coffee Shop,Pool,Gym / Fitness Center,Gym,Shipping Store,Movie Theater,Café,Entertainment Service,Farmers Market,26.3332,50.206892
11,Al Yarmok Dist.,Coffee Shop,Café,Tailor Shop,Restaurant,Watch Shop,Hotel,Donut Shop,Coffee Roaster,Pizza Place,Japanese Restaurant,26.312833,50.219619
12,Qurtubah Dist.,Pool Hall,Bakery,Soccer Field,Coffee Shop,Trail,Gym,Lebanese Restaurant,Bistro,Event Service,Fishing Store,26.340919,50.187923


In [36]:
# Cluster 2

khobar_merged.loc[khobar_merged['Cluster Label'] == 2, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
37,Al Aqiq Dist.,Resort,Yoga Studio,Entertainment Service,Fishing Store,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Service,Ethiopian Restaurant,26.095232,50.145751


In [37]:
# Cluster 3

khobar_merged.loc[khobar_merged['Cluster Label'] == 3, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
4,At Taawun Dist.,Hookah Bar,Restaurant,Spa,Farmers Market,Stables,Auto Garage,Athletics & Sports,Café,Electronics Store,Entertainment Service,26.22634,50.186086
33,Ash Sheraa Dist.,Hookah Bar,Farm,Soccer Field,History Museum,Department Store,Dessert Shop,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,26.164534,50.148142
36,As Sawari Dist.,Hookah Bar,Farm,Yoga Studio,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,Ethiopian Restaurant,26.183197,50.153691


In [38]:
# Cluster 4

khobar_merged.loc[khobar_merged['Cluster Label'] == 4, khobar_merged.columns[[0] + list(range(4, khobar_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
34,Al Kawthar Dist.,Farm,Yoga Studio,Hookah Bar,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,Ethiopian Restaurant,26.152318,50.12429
39,Al Amwaj Dist.,Farm,Yoga Studio,Hookah Bar,Flower Shop,Fishing Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Service,Ethiopian Restaurant,26.143114,50.145837


### Dammam

In [39]:
# create map
map_clusters = folium.Map(location=[26.2172,50.1971], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dammam_merged['Latitude'], dammam_merged['Longitude'], dammam_merged['District'], dammam_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
# Cluster 0

dammam_merged.loc[dammam_merged['Cluster Label'] == 0, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
1,An Nasriyah Dist.,Furniture / Home Store,Cricket Ground,Breakfast Spot,Pastry Shop,Intersection,Park,Chocolate Shop,Shopping Mall,Event Space,Fast Food Restaurant,26.424017,50.122047
2,1St Industrial Dist.,Furniture / Home Store,Construction & Landscaping,Outdoor Gym,Yemeni Restaurant,Falafel Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,26.397014,50.143095
3,Al Fanar Dist.,IT Services,Burger Joint,Playground,History Museum,Event Space,Flea Market,Fishing Spot,Fish & Chips Shop,Home Service,Film Studio,26.405881,50.193581
4,Al Athir Dist.,Hookah Bar,Juice Bar,Afghan Restaurant,Antique Shop,Arepa Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,26.434277,50.060367
5,Al Jalawiyah Dist.,Breakfast Spot,Afghan Restaurant,Supermarket,Cupcake Shop,Falafel Restaurant,Campground,Soccer Stadium,Fruit & Vegetable Store,Garden,Bakery,26.436518,50.075458
...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,Al Hussam Dist.,Dessert Shop,Cafeteria,Yemeni Restaurant,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,26.392597,50.173185
68,King Abdul Aziz Seaport Dist.,Bakery,Port,Middle Eastern Restaurant,Harbor / Marina,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,26.473577,50.193578
73,Al Amal Dist.,Soccer Field,Market,Gym,Yemeni Restaurant,Farm,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,26.343128,50.015997
78,Al Fursan Dist.,Health & Beauty Service,Resort,Park,Bookstore,Yemeni Restaurant,Falafel Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,26.352721,49.960809


In [41]:
# Cluster 1

dammam_merged.loc[dammam_merged['Cluster Label'] == 1, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
75,Al Hadabah Dist.,Campground,Yemeni Restaurant,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,Fast Food Restaurant,26.334186,49.930036
76,Al Matar Dist.,Campground,Trail,Yemeni Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,Fast Food Restaurant,26.380527,49.938546
77,Al Amanah Dist.,Campground,Lounge,Yemeni Restaurant,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,Fast Food Restaurant,26.378152,49.969845


In [42]:
# Cluster 2

dammam_merged.loc[dammam_merged['Cluster Label'] == 2, dammam_merged.columns[[0] + list(range(4For a district, , dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
74,Ash Sharq Dist.,Farm,Diner,Yemeni Restaurant,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,26.365736,49.968355
79,Al Maha Dist.,Farm,Yemeni Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,26.485221,49.933393


In [43]:
# Cluster 3

dammam_merged.loc[dammam_merged['Cluster Label'] == 3, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
19,Dahiyat Al Malik Fahd Dist.,Music Venue,Yemeni Restaurant,Food Court,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,Fast Food Restaurant,26.430747,49.998841


In [44]:
# Cluster 4

dammam_merged.loc[dammam_merged['Cluster Label'] == 4, dammam_merged.columns[[0] + list(range(4, dammam_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude
21,As Sinaiyah Dist.,Auto Garage,Palace,Event Space,Food,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Film Studio,Fast Food Restaurant,26.450579,50.018045
42,Al Kuthriah Dist.,Auto Garage,Auto Workshop,Yemeni Restaurant,Farm,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,26.447342,50.049518
70,Al Khalidiyah Al Janubiyah Dist.,Auto Garage,Café,Auto Workshop,Yemeni Restaurant,Farm,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,26.403955,50.166384


## Recommending Districts for Businesses

The final stage will be to recommend both a District and a Cluster to open any kind of business.

To recommend a district or a cluster, we need to define a target. We'll call that Feasability. The higher the Feasability score, the better that area is to open that particular business.

How do we calculate the Feasibility? Let's brainstorm.

Firstly, I would like my business to be in an area where similar businesses are thriving. Let's call that the Similarity. We can formalize that by saying;

Similarity = (Number of Businesses of the Same Category in Top 10 Common Venues)/10

However, I would also like to not compete directly with other businesses doing the same hting. Let's call that the Competition. 

