# IBM Applied Data Science Capstone Course by Coursera

## Week 5: Business Case idea, and Data
## What would be the best areas in Dubai to open a new restaurant?

### 1. Introduction

Dubai is a world-class city with residents and visitors from all parts of the globe. As such, there is a large number of restaurants serving cuisines to satify any palette, and providing a dining experience from the very basic to the most exquisite.
This is also a great opportunity for someone to open a new restaurant. However, given the thriving restaurant scene, competition and a diverse population, it is challenging to find the best location for a restaurant and a particular cuisine.
<br>In this project we will analyze this business case to find the best places for a new restaurant, by using available data for the existing restaurants and clustering the Dubai communities.

### 2. Data

#### 2.1 Dubai Communities

We will get the list of the Dubai communities from Wikipedia: https://en.wikipedia.org/wiki/List_of_communities_in_Dubai. 
For each community, it provides the below data:
1. Name
2. Area
3. Population
4. Population Density

For this project we will use these features to cluster the communities.

#### 2.2 Foursquare Data

To analyze and cluster the existing restaurants, we will use Foursquare API to query the neighborhoods for the necessary data. 

A link to your Notebook on your Github repository, showing your code. 
A full report consisting of all of the following components:
Introduction where you discuss the business problem and who would be interested in this project.
Data where you describe the data that will be used to solve the problem and the source of the data.
Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
Results section where you discuss the results.
Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
Conclusion section where you conclude the report.
Your choice of a presentation or blogpost.

## Data Science Process

### 1. Import Libraries

In [2]:
! pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 13.2MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1


In [3]:
! pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 13.3MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [4]:
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import geocoder
import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

print("Libraries imported.")

Libraries imported.


### 2. Scrape data from Wikipedia into a DataFrame

In [134]:
url='https://en.wikipedia.org/wiki/List_of_communities_in_Dubai'
df=pd.read_html(url, header=0)[0]

In [135]:
df

Unnamed: 0,Community Number,Community (English),Community (Arabic),Area(km2),Population(2000),Population density(/km2)
0,126,Abu Hail,أبو هيل,1.27 km²,21414,"16,861.4/km²"
1,711,Al Awir First,العوير الأولى,,,
2,721,Al Awir Second,العوير الثانية,,,
3,333,Al Bada,البدع,0.82 km²,18816,22946/km²
4,122,Al Baraha,البراحة,1.104 km²,7823,"7,086/km²"
5,373,Al Barsha First,البرشاء الأولى,,,
6,376,Al Barsha Second,البرشاء الثانية,,,
7,671,Al Barsha South First,البرشاء جنوب الاولى,,,
8,672,Al Barsha South Second,البرشاء جنوب الثانية,,,
9,673,Al Barsha South Third,البرشاء جنوب الثالثة,,,


In [136]:
df.shape

(130, 6)

In [46]:
# Drop NaN population rows
#df.dropna(inplace=True)

In [47]:
df.shape

(90, 6)

In [137]:
df_dxb = df[['Community Number', 'Community (English)', 'Population(2000)']].copy()
df_dxb.columns = ['Num', 'Neighborhood', 'Population']

In [138]:
df_dxb.head()

Unnamed: 0,Num,Neighborhood,Population
0,126,Abu Hail,21414.0
1,711,Al Awir First,
2,721,Al Awir Second,
3,333,Al Bada,18816.0
4,122,Al Baraha,7823.0


In [139]:
df_dxb.shape

(130, 3)

### 3. Get the geographical coordinates

In [140]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Dubai'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [141]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in df_dxb["Neighborhood"].tolist() ]

In [142]:
coords

[[25.28308000000004, 55.33435000000003],
 [25.18593000000004, 55.54126000000008],
 [25.167920000000038, 55.543310000000076],
 [25.21861000000007, 55.26406000000003],
 [25.282800000000066, 55.31678000000005],
 [25.11483000000004, 55.19136000000003],
 [25.107230000000072, 55.20485000000008],
 [25.08958000000007, 55.23424000000006],
 [25.077390000000037, 55.24267000000003],
 [25.062290000000075, 55.23995000000008],
 [25.093420000000037, 55.19044000000008],
 [25.269250000000056, 55.29944000000006],
 [25.272170000000074, 55.30157000000003],
 [25.243370000000027, 55.352670000000046],
 [25.269510000000025, 55.30884000000003],
 [25.25696000000005, 55.30246000000005],
 [25.29871000000003, 55.33546000000007],
 [25.237130000000036, 55.27707000000004],
 [25.220540000000028, 55.34166000000005],
 [25.233420000000024, 55.29001000000005],
 [25.245290000000068, 55.30364000000003],
 [25.27177000000006, 55.33762000000007],
 [25.24282000000005, 55.48440000000005],
 [25.22784000000007, 55.522320000000036],

In [143]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_coords.shape

(130, 2)

In [144]:
# merge the coordinates into the original dataframe
df_dxb['Latitude'] = df_coords['Latitude']
df_dxb['Longitude'] = df_coords['Longitude']

In [145]:
# check the neighborhoods and the coordinates
print(df_dxb.shape)
print(df_dxb)

(130, 5)
     Num                   Neighborhood  Population   Latitude  Longitude
0    126                       Abu Hail       21414  25.283080  55.334350
1    711                  Al Awir First         NaN  25.185930  55.541260
2    721                 Al Awir Second         NaN  25.167920  55.543310
3    333                        Al Bada       18816  25.218610  55.264060
4    122                      Al Baraha        7823  25.282800  55.316780
5    373                Al Barsha First         NaN  25.114830  55.191360
6    376               Al Barsha Second         NaN  25.107230  55.204850
7    671          Al Barsha South First         NaN  25.089580  55.234240
8    672         Al Barsha South Second         NaN  25.077390  55.242670
9    673          Al Barsha South Third         NaN  25.062290  55.239950
10   375                Al Barsha Third         NaN  25.093420  55.190440
11   114                      Al Buteen        2364  25.269250  55.299440
12   113                     

In [146]:
# Drop NaN population rows
df_dxb.dropna(inplace=True)

In [147]:
df_dxb.shape

(101, 5)

In [149]:
print(df_dxb)

     Num                 Neighborhood  Population  Latitude  Longitude
0    126                     Abu Hail       21414  25.28308   55.33435
3    333                      Al Bada       18816  25.21861   55.26406
4    122                    Al Baraha        7823  25.28280   55.31678
11   114                    Al Buteen        2364  25.26925   55.29944
12   113                   Al Dhagaya       10896  25.27217   55.30157
13   214                   Al Garhoud        4466  25.24337   55.35267
15   313            Al Hamriya, Dubai       15104  25.25696   55.30246
16   131              Al Hamriya Port          83  25.29871   55.33546
17   322                   Al Hudaiba        7699  25.23713   55.27707
18   326                    Al Jaddaf        2990  25.22054   55.34166
19   323                  Al Jafiliya       11619  25.23342   55.29001
20   318                    Al Karama       45674  25.24529   55.30364
21   128                   Al Khabisi        6737  25.27177   55.33762
24   3

In [155]:
df_dxb[(df_dxb.Population == "12,374/km²")]

Unnamed: 0,Num,Neighborhood,Population,Latitude,Longitude
54,316,Al Raffa,"12,374/km²",25.2573,55.2867


In [157]:
#pd.set_option('display.max_rows', 1000)
df_dxb.replace(to_replace = "12,374/km²", value="12374", inplace=True)
print(df_dxb)


     Num                 Neighborhood Population  Latitude  Longitude
0    126                     Abu Hail      21414  25.28308   55.33435
3    333                      Al Bada      18816  25.21861   55.26406
4    122                    Al Baraha       7823  25.28280   55.31678
11   114                    Al Buteen       2364  25.26925   55.29944
12   113                   Al Dhagaya      10896  25.27217   55.30157
13   214                   Al Garhoud       4466  25.24337   55.35267
15   313            Al Hamriya, Dubai      15104  25.25696   55.30246
16   131              Al Hamriya Port         83  25.29871   55.33546
17   322                   Al Hudaiba       7699  25.23713   55.27707
18   326                    Al Jaddaf       2990  25.22054   55.34166
19   323                  Al Jafiliya      11619  25.23342   55.29001
20   318                    Al Karama      45674  25.24529   55.30364
21   128                   Al Khabisi       6737  25.27177   55.33762
24   324            

In [77]:
# save the DataFrame as CSV file
df_dxb.to_csv("df_dxb.csv", index=False)

In [158]:
# get the coordinates of Dubai
address = 'Dubai, UAE'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dubai, UAE {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dubai, UAE 25.0657, 55.1713.


In [214]:
# create map of Toronto using latitude and longitude values
map_dxb = folium.Map(location=[latitude, longitude], zoom_start=11)
colordict = {0: 'lightblue', 1: 'green', 2: 'orange', 3: 'red'}
df_dxb['pop_quartile'] = pd.qcut(df_dxb['Population'], 4, labels=False)

In [160]:
pd.qcut(df_dxb['Population'], 4, labels=False).value_counts()

2    28
0    26
1    25
3    22
Name: Population, dtype: int64

In [161]:
df_dxb.head()

Unnamed: 0,Num,Neighborhood,Population,Latitude,Longitude,pop_quartile
0,126,Abu Hail,21414,25.28308,55.33435,1
3,333,Al Bada,18816,25.21861,55.26406,1
4,122,Al Baraha,7823,25.2828,55.31678,3
11,114,Al Buteen,2364,25.26925,55.29944,1
12,113,Al Dhagaya,10896,25.27217,55.30157,0


In [177]:
#index = df_dxb.Population.max()
#df_dxb.ix[df_dxb.idxmax()]
#df_dxb.iloc[df_dxb.Population.idxmax(), 0:2]
#df_dxb[df_dxb['Population']==df_dxb['Population'].max()]

In [226]:
map_dxb = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, neighborhood, pop, popq in zip(df_dxb['Latitude'], df_dxb['Longitude'], df_dxb['Neighborhood'], df_dxb['Population'], df_dxb['pop_quartile']):
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup = (str(neighborhood).capitalize() + '<br>'
                 'Population: ' + str(pop) + '<br>'
                 #'Lat, Long: ' + "{0:.15}".format(lat) +', ' + "{0:.15}".format(lng)
                 'Lat, Long: ' + str(lat) +', ' + str(lng)
                ),
        color=colordict[popq],
        threshold_scale=[0,1,2,3],
        fill_color=colordict[popq],
        fill=True,
        fill_opacity=0.5).add_to(map_dxb)  
    
map_dxb

In [227]:
map_dxb = folium.Map(location=[latitude, longitude], zoom_start=11)
for lat, lng, neighborhood in zip(df_dxb['Latitude'], df_dxb['Longitude'], df_dxb['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_dxb)  


In [228]:
import pandas as pd
import folium
from folium.plugins import FastMarkerCluster, MarkerCluster

file_url = 'http://www2.census.gov/geo/docs/maps-data/data/gazetteer/2016_Gazetteer/2016_Gaz_zcta_national.zip'
#Pandas usually infers zips are numerics, but we lose our leading zeroes so let's go with the object dtype
df = pd.read_csv(file_url, sep='\t', dtype={'GEOID' : object}) 
df.columns = df.columns.str.strip() #some column names have some padding

df = df.sample(1000)

folium_map = folium.Map(location=[38, -97],
                        zoom_start=4.4,
                        tiles='CartoDB dark_matter')

mc = MarkerCluster(name="Marker Cluster")

for index, row in df.dropna().iterrows():
    popup_text = "{}<br> ALAND: {:,}<br> AWATER: {:,}".format(
                      index,
                      row["ALAND_SQMI"],
                      row["AWATER_SQMI"]
                      )
    folium.CircleMarker(location=[row["INTPTLAT"],row["INTPTLONG"]],
                        radius= 10,
                        color="red",
                        popup=popup_text,
                        fill=True).add_to(mc)

mc.add_to(folium_map)

folium.LayerControl().add_to(folium_map)



<folium.map.LayerControl at 0x7f92b100dba8>

In [229]:
# The code was removed by Watson Studio for sharing.

In [231]:
LIMIT = 100
radius = 2000

venues = []

for lat, long, neighborhood in zip(df_dxb['Latitude'], df_dxb['Longitude'], df_dxb['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [232]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(7585, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abu Hail,25.28308,55.33435,Habib Bakery,25.281124,55.332774,Bakery
1,Abu Hail,25.28308,55.33435,Gold's Gym,25.282698,55.341019,Gym
2,Abu Hail,25.28308,55.33435,Al Douri Roastery,25.277057,55.328223,Bakery
3,Abu Hail,25.28308,55.33435,Union Co-Operative Society,25.282769,55.340896,Department Store
4,Abu Hail,25.28308,55.33435,Fitness Time (وقت اللياقة),25.289077,55.347913,Gym


In [233]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abu Hail,100,100,100,100,100,100
Al Bada,100,100,100,100,100,100
Al Baraha,100,100,100,100,100,100
Al Buteen,100,100,100,100,100,100
Al Dhagaya,100,100,100,100,100,100
Al Garhoud,100,100,100,100,100,100
Al Hamriya Port,55,55,55,55,55,55
"Al Hamriya, Dubai",100,100,100,100,100,100
Al Hudaiba,100,100,100,100,100,100
Al Jaddaf,70,70,70,70,70,70


In [234]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 303 uniques categories.


In [235]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Bakery', 'Gym', 'Department Store', 'Market',
       'Performing Arts Venue', 'Fast Food Restaurant',
       'Seafood Restaurant', 'Indian Restaurant', 'Hotel',
       'Mediterranean Restaurant', 'Restaurant', 'Café',
       'Fried Chicken Joint', 'Asian Restaurant', 'Iraqi Restaurant',
       'Ice Cream Shop', 'Middle Eastern Restaurant',
       'American Restaurant', 'Dessert Shop', 'Bowling Alley',
       'Nightclub', 'Lounge', 'Burger Joint', 'Burrito Place',
       'BBQ Joint', 'Comedy Club', 'Pool Hall', 'Buffet', 'Pizza Place',
       'Fishing Store', 'Bavarian Restaurant', 'Italian Restaurant',
       'Gym / Fitness Center', 'Coffee Shop', 'Thai Restaurant',
       'Hotel Bar', 'Hookah Bar', 'Convenience Store',
       'Moroccan Restaurant', 'Filipino Restaurant', 'Chinese Restaurant',
       'Smoke Shop', 'Sports Bar', 'Food Court', 'Soccer Field',
       'Tea Room', 'Arepa Restaurant', 'Bar', 'Beach', 'Shopping Plaza'],
      dtype=object)

In [236]:
# check if the results contain "Restaurant"
"Restaurant" in venues_df['VenueCategory'].unique()

True

In [237]:
# one hot encoding
dxb_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dxb_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dxb_onehot.columns[-1]] + list(dxb_onehot.columns[:-1])
dxb_onehot = dxb_onehot[fixed_columns]

print(dxb_onehot.shape)
dxb_onehot.head()

(7585, 304)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,...,Volleyball Court,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abu Hail,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Abu Hail,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Abu Hail,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Abu Hail,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Abu Hail,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [238]:
dxb_grouped = dxb_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(dxb_grouped.shape)
dxb_grouped

(101, 304)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,...,Volleyball Court,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abu Hail,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Al Bada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Al Baraha,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
3,Al Buteen,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
4,Al Dhagaya,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
5,Al Garhoud,0.01,0.0,0.0,0.01,0.1,0.02,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,Al Hamriya Port,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Al Hamriya, Dubai",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Al Hudaiba,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,Al Jaddaf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [240]:
len(dxb_grouped[dxb_grouped["Restaurant"] > 0])

85

In [246]:
dxb_rest = dxb_grouped[["Neighborhoods","Restaurant"]]

In [247]:
dxb_rest

Unnamed: 0,Neighborhoods,Restaurant
0,Abu Hail,0.04
1,Al Bada,0.02
2,Al Baraha,0.06
3,Al Buteen,0.05
4,Al Dhagaya,0.05
5,Al Garhoud,0.02
6,Al Hamriya Port,0.018182
7,"Al Hamriya, Dubai",0.04
8,Al Hudaiba,0.06
9,Al Jaddaf,0.028571


In [248]:
# set number of clusters
kclusters = 3

dxb_clustering = dxb_rest.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dxb_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 0, 0, 1, 1, 0, 0, 1], dtype=int32)

In [249]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
dxb_merged = dxb_rest.copy()

# add clustering labels
dxb_merged["Cluster Labels"] = kmeans.labels_

In [250]:
dxb_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
dxb_merged.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,Abu Hail,0.04,0
1,Al Bada,0.02,1
2,Al Baraha,0.06,0
3,Al Buteen,0.05,0
4,Al Dhagaya,0.05,0


In [252]:

# merge dxb_merged with dxb_df to add latitude/longitude for each neighborhood
dxb_merged = dxb_merged.join(df_dxb.set_index("Neighborhood"), on="Neighborhood")

print(dxb_merged.shape)
dxb_merged.head() # check the last columns!


(101, 8)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Num,Population,Latitude,Longitude,pop_quartile
0,Abu Hail,0.04,0,126,21414,25.28308,55.33435,1
1,Al Bada,0.02,1,333,18816,25.21861,55.26406,1
2,Al Baraha,0.06,0,122,7823,25.2828,55.31678,3
3,Al Buteen,0.05,0,114,2364,25.26925,55.29944,1
4,Al Dhagaya,0.05,0,113,10896,25.27217,55.30157,0


In [253]:
# sort the results by Cluster Labels
print(dxb_merged.shape)
dxb_merged.sort_values(["Cluster Labels"], inplace=True)
dxb_merged

(101, 8)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Num,Population,Latitude,Longitude,pop_quartile
0,Abu Hail,0.04,0,126,21414,25.28308,55.33435,1
36,Al Qusais Industrial First,0.051546,0,242,2099,25.29176,55.38661,1
37,Al Qusais Industrial Fourth,0.084746,0,247,206,25.29533,55.39675,1
38,Al Qusais Industrial Second,0.082192,0,243,2090,25.29026,55.39364,1
39,Al Qusais Industrial Third,0.083333,0,246,162,25.29585,55.3954,0
40,Al Qusais Second,0.046154,0,233,7657,25.26563,55.38771,3
43,Al Ras,0.05,0,112,6812,25.26758,55.29459,3
45,Al Rigga,0.07,0,119,5684,25.26706,55.3089,3
46,Al Sabkha,0.06,0,115,2627,25.26895,55.30257,1
48,Al Safa Second,0.0375,0,357,6291,25.16633,55.23183,3


In [254]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dxb_merged['Latitude'], dxb_merged['Longitude'], dxb_merged['Neighborhood'], dxb_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters