<a href="https://colab.research.google.com/github/makhlouf279/Capstone-Project---The-Battle-of-Neighborhoods/blob/main/The_Battle_of_Neighborhoods_Dubai_Coffe_Shop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Objective:**

As part of the Capstone project, we will try to perform exploratory analysis using location data and run through the Foursquare API to achieve our objective.

Our objective is to find an ideal location for us to open a Coffee Shop.

**Stage 1: Data:**

For the above objective, we will be using open-data acquired from the Dubai Statistics Center. The data is available in the form of an Excel sheet, which will require a considerable amount of refinement. The data source is accessible at below location:

Report URL https://www.dsc.gov.ae/Report/DSC_SYB_2019_01%20_%2002.xlsx

We chose this data source because it contains the list of communities, and their corresponding population updated until 2019.

In [125]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from bs4 import BeautifulSoup # library for web scrapping 
import geocoder
from geopy.geocoders import Nominatim
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import folium # plotting library
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [126]:
pip install  geocoder



# New Section

Let's set our credentials for utilizing and making API calls

In [127]:
CLIENT_ID = 'DN4KQ4LBO2NND0FBGSE4CFCNI1DZ4X5YUQFGZGIT5X4V01ZD' #  my Foursquare ID
CLIENT_SECRET = 'P2KICLYC0BCWJUWHEIVSAUXN0OLUGH5V0MHCVGKNO4DOMW13' #  my Foursquare Secret
VERSION = '20201124'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DN4KQ4LBO2NND0FBGSE4CFCNI1DZ4X5YUQFGZGIT5X4V01ZD
CLIENT_SECRET:P2KICLYC0BCWJUWHEIVSAUXN0OLUGH5V0MHCVGKNO4DOMW13


**Step 1.1: Extract data**

In [128]:
# reading excel report from the source.

data_url = 'https://www.dsc.gov.ae/Report/DSC_SYB_2019_01%20_%2002.xlsx'
df_dubai = pd.read_excel(data_url)

# determining structure
df_dubai.shape

(247, 5)

**Step 1.2: Data Wrangling:**

Because the report has a considerable amount of header and footer data, we will be removing it.

In [129]:
df_dubai

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,,,,,
1,عدد السكان المقدر حسب القطاع والمنطقة - إمارة ...,,,,
2,Number of Estimated Population by Sector and C...,,,,
3,` (2019),,,,
4,جـــدول ( 02 - 01 ) Table,,,,
5,رقم \nالمنطقة,القطاع والمنطقة,مجموع السكان\nTotal population,Sector & Community,Community Code
6,,,,,
7,101,نخلة ديرة,2,NAKHLAT DEIRA,101
8,111,الكورنيش,1735,AL CORNICHE,111
9,112,الرأس,7460,AL RASS,112


In [130]:
# removing header information

df_dubai = df_dubai.iloc[7:]
df_dubai.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
7,101,نخلة ديرة,2,NAKHLAT DEIRA,101
8,111,الكورنيش,1735,AL CORNICHE,111
9,112,الرأس,7460,AL RASS,112
10,113,الضغاية,15899,AL DHAGAYA,113
11,114,البطين,2841,AL BUTEEN,114


In [131]:
# removing footer from the report

df_dubai = df_dubai[:-6]
df_dubai.tail()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
236,978,سيح شعيله,3,SAIH SHUA'ALAH,978
237,981,مقطره,804,MUGATRAH,981
238,987,الليان 1,10,AL LAYAN 1,987
239,988,الليان 2,0,AL LAYAN 2,988
240,991,حفير,0,HEFAIR,991


The report also contains additional columns which we do not require as they represent the same information in Arabic.

In [133]:
df_dubai = df_dubai[['Unnamed: 2', 'Unnamed: 3']]

Renaming column headers

In [134]:
df_dubai.rename(columns = {'Unnamed: 2':'population', 'Unnamed: 3':'community'}, inplace = True)

**Step 1.3: Data Wrangling Continues:**

If you look at the report, the communities are split in Sectors. These sectors are in the report as splitter rows, which we need to remove.

In [135]:
# Get the indexs of the rows which have text like 'Sector' in community column

sector_index = df_dubai[df_dubai['community'].isin(['Sector 1', 'Sector 2', 'Sector 3', 'Sector 4', 'Sector 5', 'Sector 6', 'Sector 7', 'Sector 8'])].index

# droping the rows based on found indeces
df_dubai.drop(sector_index, inplace = True)

df_dubai.shape

(226, 2)

Lets change the order of the columns in our dataframe

In [136]:
df_dubai = df_dubai[['community', 'population']]
df_dubai.head()

Unnamed: 0,community,population
7,NAKHLAT DEIRA,2
8,AL CORNICHE,1735
9,AL RASS,7460
10,AL DHAGAYA,15899
11,AL BUTEEN,2841


Lets sort the dataframe by population (descending)

In [137]:
df_dubai.sort_values(by = ['population'], inplace = True, ascending = False)
df_dubai.head(10)

Unnamed: 0,community,population
56,MUHAISANAH SECOND,196316
107,AL GOZE IND. SECOND,159978
153,JABAL ALI INDUSTRIAL FIRST,128975
163,WARSAN FIRST,106072
23,HOR AL ANZ,83187
147,JABAL ALI FIRST,75287
77,AL KARAMA,75066
152,DUBAI INVESTMENT PARK1,69956
20,AL MURQABAT,69771
51,MURDAF,64355



We will be extracting coordinates using GeoPy by leveraging Google Maps or some other data source provider. When I was scouting for the data, I noticed that in our report, the area names have a suffix like FIRST, SECOND, THIRD, ETC., While the same areas were marked with number 1, 2, 3 in Google Maps.

This means if I have to pass WARSAN FIRST to GeoPy, it won't find the coordinates. To solve this problem, we will replace the suffix with numerical values

In [138]:
df_dubai.replace('FIRST', '1', regex = True, inplace = True)
df_dubai.replace('SECOND', '2', regex = True, inplace = True)
df_dubai.replace('THIRD', '3', regex = True, inplace = True)
df_dubai.replace('FOURTH', '4', regex = True, inplace = True)
df_dubai.replace('FIFTH', '5', regex = True, inplace = True)
df_dubai.replace('SIXTH', '6', regex = True, inplace = True)
df_dubai.head(5)

Unnamed: 0,community,population
56,MUHAISANAH 2,196316
107,AL GOZE IND. 2,159978
153,JABAL ALI INDUSTRIAL 1,128975
163,WARSAN 1,106072
23,HOR AL ANZ,83187



Removing industrial areas from out list of communities as we are only intreseted in commercial+residential areas for our Coffee Shop

In [139]:
df_dubai = df_dubai[~df_dubai.community.str.contains('IND.')]
df_dubai.head()

Unnamed: 0,community,population
56,MUHAISANAH 2,196316
163,WARSAN 1,106072
23,HOR AL ANZ,83187
147,JABAL ALI 1,75287
77,AL KARAMA,75066



Some of the names of locality in this dataset were not as they are represented in map providers. For example, 'Al Quoz' is named as 'Al Goze.' This can cause inconsistency and may leave us excluding the populated areas from our analysis. Following are the naming corrections which we had to.

In [141]:
df_dubai.replace('JABAL ALI 1', 'JEBEL ALI', regex = True, inplace = True)
df_dubai.replace('AL KALIJ AL TEJARI', 'BUSINESS BAY', inplace = True)
df_dubai.replace('AL WAHEDA', 'AL WUHEIDA', inplace = True)
df_dubai.replace('AL THANYAH 3 (EMIRATE HILLS 2)', 'EMIRATES HILLS 2', inplace = True)
df_dubai.replace('NADD HESSA', 'DUBAI SILICON OASIS', inplace = True)
df_dubai.replace('AL THANYAH 1 (V. RABIE SAHRA\'A)', 'TECOM', inplace = True)
df_dubai.replace('MENA JABAL ALI', 'JEBEL ALI NORTH FREE ZONE', inplace = True)
df_dubai.replace('AL HEBIAH 4', 'DUBAI SPORTS CITY', inplace = True)
df_dubai.replace('UM SOUQAIM 2', 'UMM SUQEIM 2', inplace = True)
df_dubai.replace('UM SOUQAIM 1', 'UMM SUQUEIM 1', inplace = True)
df_dubai.replace('AL HEBIAH 1', 'MOTOR CITY', inplace = True)
df_dubai.replace('MURDAF', 'MIRDIF', regex = True, inplace = True)
df_dubai.replace('PARK1', 'PARK 1', regex = True, inplace = True)
df_dubai.replace('PARK2', 'PARK 2', regex = True, inplace = True)
df_dubai.replace('MURQABAT', 'MURAQABAT', regex = True, inplace = True)
df_dubai.replace('MARSA DUBAI (AL MINA AL SEYAHI) ', 'MARSA DUBAI', inplace = True)
df_dubai.replace('AL BADA', 'AL BADA\'A', regex = True, inplace = True)
df_dubai.replace('SUQ', 'SOUQ', regex = True, inplace = True)
df_dubai.replace('AL THANYAH 5 (EMIRATE HILLS 1) ', 'EMIRATES HILLS 1', inplace = True)
df_dubai.replace('AL THANYAH 4 (EMIRATE HILLS 3) ', 'EMIRATES HILLS 3', inplace = True)
df_dubai.replace('NAD AL HAMAR', 'NADD AL HAMAR', inplace = True)
df_dubai.replace('AL SOUQ AL KABEER', 'BUR DUBAI', inplace = True)
df_dubai.replace('AL BAESHAA 2', 'AL BARSHA 2', inplace = True)
df_dubai.replace('MADINAT DUBAI AL MELAHEYAH (AL MINA)', 'DUBAI MARITIME CITY', inplace = True)
df_dubai.replace('AL DHAGAYA', 'AL RAS', inplace = True)
df_dubai.replace('GOZE', 'QUOZ', regex = True, inplace = True)
df_dubai.replace('AL REGA', 'AL RIGGA', inplace = True)
df_dubai.replace('WADI AL SAFA 3', 'LIVING LEGENDS', inplace = True)
df_dubai.replace('AL HEBIAH 5', 'REMRAAM', inplace = True)
df_dubai.replace('AL SAFFA 1', 'AL SAFA 1', inplace = True)
df_dubai.replace('UM SOUQAIM 3', 'UMM SUQEIM 3', inplace = True)
df_dubai.replace('REGA AL BUTEEN', 'RIGGAT AL BUTEEN', inplace = True)
pd.set_option('display.max_rows', None)
df_dubai.replace('MUHAISANAH 4', 'MUHAISNAH 4', inplace = True)
df_dubai.replace('OUD AL MUTEEN 1', 'OUD AL MUTEENA 1', inplace = True)
df_dubai.replace('WADI AL SAFA 6 (ARABIAN RANCHES)', 'ARABIAN RANCHES', inplace = True)

Now that we have our desired dataframe, we will proceed to Stage 2 of our work.

**Stage 2: Coordinates:**

In stage 2, we will extract each community's coordinates and append it to our data frame.

To minimize the time required to extract such information, we will be obtaining the coordinates of the top 100 communities with the highest population.

**Step 2.1: Top 100**

In [142]:
# getting the top 200 communities based on population

df_communities = df_dubai.head(200)

**Step 2.2: GeoPy**

In [143]:
# defining function to get coordinates based on community name

def get_latitude_longitude(community_name):
    # initialize your variable to None
    lat_lng_coords = None
    
    # loop until you get the coordinates
    #while(location is None):
    geolocator = Nominatim(user_agent="waqa5_ahm3d_capstone")
    location = geolocator.geocode('{}, Dubai, United Arab Emirates'.format(community_name))
    
    latitude = location.latitude
    longitude = location.longitude
    
    return latitude, longitude

Now time to loop through Top 100 communities and append their coordinates into dataframe

In [144]:

for i, row in df_communities.head(100).iterrows():
    community_name = row['community']
    
    #Function call
    try:
        lat, long = get_latitude_longitude(community_name)
        
        #Appending to dataframe
        df_communities.loc[i, 'latitude'] = lat
        df_communities.loc[i, 'longitude'] = long
    except:
        pass

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)


In [146]:
#Dropping NaN entries from our dataset

df_communities.dropna(inplace = True)
df_communities.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,community,population,latitude,longitude
56,MUHAISANAH 2,196316,25.280555,55.410502
163,WARSAN 1,106072,25.163154,55.422077
23,HOR AL ANZ,83187,25.279548,55.341053
147,JEBEL ALI,75287,25.040996,55.13356
77,AL KARAMA,75066,25.238448,55.303458
152,DUBAI INVESTMENT PARK 1,69956,25.010873,55.165855
20,AL MURAQABAT,69771,25.265104,55.329721
51,MIRDIF,64355,25.220229,55.423
43,AL NAHDA 2,61936,25.290592,55.376731
121,MARSA DUBAI,61047,25.087754,55.146172



As you can see from above, it requires allot of efforts to make your data usable as per your requirement.

I will be saving this dataset and will publish this on Kaggle for anyone in future looking for top 100 communities in Dubai along with their population.

In [147]:
print('The dataframe has {} communities.'.format(
        len(df_communities['community'].unique()),
        df_communities.shape[0]
    )
)

#Resetting index

df_communities.reset_index(drop=True, inplace=True)

The dataframe has 87 communities.



**Stage 3: Mapping:**

Let's take a look at Dubai and based on our dataset, lets see where all these communities are.

For mapping, we will be using Folium.

**Step 3.1: Get Dubai city coordinates**

In [148]:
#Using Nominatim, we will get latitude and longitude for Dubai city

dxb_address = 'Dubai, United Arab Emirates'

geolocator = Nominatim(user_agent="dxb_explorer")
location = geolocator.geocode(dxb_address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Dubai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dubai are 25.0750095, 55.18876088183319.


**Step 3.2: Mapping Dubai via Folium**

With Folium, we will map out Dubai and then place markers for each community we have in our dataframe

In [149]:
# create map of Dubai using latitude and longitude values
map_dubai = folium.Map(location = [latitude, longitude], zoom_start = 11)

# add markers to map
for lat, lng, label in zip(df_communities['latitude'], df_communities['longitude'], df_communities['community']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 7,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_dubai)  
    
map_dubai

Because we sorted our dataframe based on population and we picked top 100 communities out of the complete dataset, we are able to cover most of the residential/commercial communities. But we did missed out few of them.

For the said purpose we are discussing, I think we are good to go )

**Stage 4: Foursquare**

Now that we have everything we need, let's proceed to next step, i.e. Foursquare

**Step 4.1: Start Small**

Let's start with a single community and see what we get from Foursquare

In [150]:

df_communities.loc[0, 'community']

'MUHAISANAH 2 '

Getting coordinates

In [151]:
community_latitude = df_communities.loc[0, 'latitude'] # neighborhood latitude value
community_longitude = df_communities.loc[0, 'longitude'] # neighborhood longitude value

community_name = df_communities.loc[0, 'community'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(community_name, community_latitude, community_longitude))

Latitude and longitude values of MUHAISANAH 2  are 25.2805548, 55.4105021.



Let's generate GET URL for Foursquare API call. We will be requesting for Top 100 venues in the locality

In [152]:

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, community_latitude, community_longitude, VERSION, 500, 100)
url

'https://api.foursquare.com/v2/venues/explore?client_id=DN4KQ4LBO2NND0FBGSE4CFCNI1DZ4X5YUQFGZGIT5X4V01ZD&client_secret=P2KICLYC0BCWJUWHEIVSAUXN0OLUGH5V0MHCVGKNO4DOMW13&ll=25.2805548,55.4105021&v=20201124&radius=500&limit=100'

In [153]:
small_results =requests.get(url).json()

In [154]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [155]:
# Cleaning the results

venues = small_results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,McDonald's (ماكدونالدز),Fast Food Restaurant,25.281423,55.411649
1,LuLu Center,Grocery Store,25.280531,55.410506
2,McDonald's LuLu Village,Fast Food Restaurant,25.281949,55.411271
3,Al Ansari Exchange,Currency Exchange,25.281536,55.411369
4,Lion Gym,Gym,25.278308,55.412009
5,Amer Quick Plus,Business Service,25.280714,55.41544



Let's see the total number of venues returned by Foursquare

In [156]:
print('{} venues were returned by Foursquare for {}.'.format(nearby_venues.shape[0], community_name))

6 venues were returned by Foursquare for MUHAISANAH 2 .



**Step 4.2: Explore Dubai**

As our initial test for single community turned out good, lets get the list of all venues across all communities.

For this purpose, let's create a function which will loop through all communities and will compile the list venues

In [157]:
def getDubaiVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Community',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Category']
    
    return(nearby_venues)

Let's Loop

In [158]:
LIMIT = 100

dubai_venues = getDubaiVenues(names = df_communities['community'], latitudes = df_communities['latitude'], longitudes = df_communities['longitude'])


Let's take a peak inside the venues

In [160]:

print('{} venues were returned by Foursquare.'.format(dubai_venues.shape[0], community_name))

dubai_venues.head()

1496 venues were returned by Foursquare.


Unnamed: 0,Community,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,MUHAISANAH 2,25.280555,55.410502,McDonald's (ماكدونالدز),25.281423,55.411649,Fast Food Restaurant
1,MUHAISANAH 2,25.280555,55.410502,LuLu Center,25.280531,55.410506,Grocery Store
2,MUHAISANAH 2,25.280555,55.410502,McDonald's LuLu Village,25.281949,55.411271,Fast Food Restaurant
3,MUHAISANAH 2,25.280555,55.410502,Al Ansari Exchange,25.281536,55.411369,Currency Exchange
4,MUHAISANAH 2,25.280555,55.410502,Lion Gym,25.278308,55.412009,Gym


Let's check how many venues per community were returned by Foursquare

In [162]:
dubai_venues.groupby('Community').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
Community,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABU HAIL,4,4,4,4,4,4
AL BARAHA,13,13,13,13,13,13
AL BARSHA 2,7,7,7,7,7,7
AL BARSHA SOUTH 1,3,3,3,3,3,3
AL BARSHA SOUTH 2,3,3,3,3,3,3
AL BARSHA SOUTH 4,3,3,3,3,3,3
AL BARSHA SOUTH 5,3,3,3,3,3,3
AL BARSHAA 1,56,56,56,56,56,56
AL BARSHAA 3,56,56,56,56,56,56
AL GARHOUD,3,3,3,3,3,3



Because we have are intrested in Coffee Shop category, so we have to see which categories of venues are returned by Foursquare

In [163]:
print('There are {} uniques categories.'.format(len(dubai_venues['Category'].unique())))

There are 198 uniques categories.


**Stage 5: Prepare & Analyze**

Let's start analyzing our data for each community and transform it so we can utilize it efficiently during ML process

**Step 5.1: Prepare**

Let's prepare our data so it can conform to ML standards.

Categories provided by Foursquare are in label form, the machine learning algorithms cannot operate on label data directly. They require all input variables and output variables to be numeric.

To transform our data to numerical form, we will perform One Hot Encoding. This will transpose out Category lebels in to Features/Columns in our dataframe with value as 0 or 1.

In [164]:
# one hot encoding
dubai = pd.get_dummies(dubai_venues[['Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dubai['Community'] = dubai_venues['Community'] 

# move neighborhood column to the first column
fixed_columns = [dubai.columns[-1]] + list(dubai.columns[:-1])
dubai = dubai[fixed_columns]

print(dubai.shape)
dubai.head()

(1496, 199)


Unnamed: 0,Community,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bar,Basketball Court,Beach,Bed & Breakfast,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Business Service,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,...,Scenic Lookout,Science Museum,Seafood Restaurant,Shawarma Place,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Syrian Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theme Park Ride / Attraction,Theme Restaurant,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Women's Store
0,MUHAISANAH 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,MUHAISANAH 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,MUHAISANAH 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,MUHAISANAH 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,MUHAISANAH 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's find the category mean of for each community

In [165]:
dubai_grouped = dubai.groupby(["Community"]).mean().reset_index()

dubai_grouped.head(10)

Unnamed: 0,Community,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bar,Basketball Court,Beach,Bed & Breakfast,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Business Service,Cafeteria,Café,Campground,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cupcake Shop,...,Scenic Lookout,Science Museum,Seafood Restaurant,Shawarma Place,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Syrian Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theme Park Ride / Attraction,Theme Restaurant,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Women's Store
0,ABU HAIL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AL BARAHA,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,AL BARSHA 2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,AL BARSHA SOUTH 1,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,AL BARSHA SOUTH 2,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,AL BARSHA SOUTH 4,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,AL BARSHA SOUTH 5,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,AL BARSHAA 1,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.035714,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0
8,AL BARSHAA 3,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.035714,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0
9,AL GARHOUD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0



**Step 5.2: Filter**

Because our data set included above 200 categories, and we are only intrested in analyzing Coffee Shop, Let's filter our data frame.

Let's see all the categories available to us

In [166]:
dubai_venues['Category'].unique()

array(['Fast Food Restaurant', 'Grocery Store', 'Currency Exchange',
       'Gym', 'Business Service', 'Indian Restaurant',
       'Chinese Restaurant', 'Moroccan Restaurant', 'Pool Hall',
       'Pizza Place', 'Kitchen Supply Store', 'Garden Center',
       'Department Store', 'Fried Chicken Joint', 'Spa',
       'African Restaurant', 'Restaurant', 'Building', 'Clothing Store',
       'Sandwich Place', 'Supermarket', 'Coffee Shop', 'Park',
       'Middle Eastern Restaurant', 'Vegetarian / Vegan Restaurant',
       'Seafood Restaurant', 'Café', 'Hookah Bar',
       'Indonesian Restaurant', 'Asian Restaurant', 'Japanese Restaurant',
       'South Indian Restaurant', 'Pakistani Restaurant', 'Juice Bar',
       'Korean Restaurant', 'Dessert Shop', 'Hotel',
       'Filipino Restaurant', 'Diner', 'Burger Joint', 'Ice Cream Shop',
       'Convenience Store', 'Dumpling Restaurant',
       'North Indian Restaurant', 'American Restaurant', 'Cosmetics Shop',
       'Paper / Office Supplies Store

As we can see above, we have a category called Coffee Shop, Let's see how many Coffee Shop in total we have.

In [167]:
len(dubai_grouped[dubai_grouped['Coffee Shop'] > 0])

31

Let's filter and get the list of Coffee Shop.

In [168]:
coffee_venues = dubai_grouped[['Community', 'Coffee Shop']]
coffee_venues.sort_values(by = 'Coffee Shop', ascending=False).head(22)

Unnamed: 0,Community,Coffee Shop
32,AL SATWA,0.5
3,AL BARSHA SOUTH 1,0.333333
4,AL BARSHA SOUTH 2,0.333333
5,AL BARSHA SOUTH 4,0.333333
6,AL BARSHA SOUTH 5,0.333333
62,MIRDIF,0.263158
39,AL WASL,0.25
12,AL JAFLIYA,0.214286
55,JEBEL ALI,0.166667
63,MOTOR CITY,0.148148



**Stage 6: Clustering & Analysis**

Now let's create clusters of communities based on where the Coffee Shop are situated. Once we have a visual of the cluster, we can start breaking those clusters and see how many are in each cluster.

**Step 6.1: Clustering**

Let's cluster our communities into 5

In [169]:
# set number of clusters
k = 5

dxb_clustering = coffee_venues.drop(["Community"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters = k, random_state = 0).fit(dxb_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 3, 2, 1, 1, 1, 1, 3, 3, 3], dtype=int32)


Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [170]:
dubai_merged = coffee_venues.copy()

# add clustering labels
dubai_merged["Cluster"] = kmeans.labels_

In [171]:
dubai_merged.head()

Unnamed: 0,Community,Coffee Shop,Cluster
0,ABU HAIL,0.0,3
1,AL BARAHA,0.0,3
2,AL BARSHA 2,0.142857,2
3,AL BARSHA SOUTH 1,0.333333,1
4,AL BARSHA SOUTH 2,0.333333,1



Now merging our dubai_grouped data with dubai_venues_data to add latitude/longitude for each neighborhood

In [172]:
dubai_merged = dubai_merged.join(dubai_venues.set_index("Community"), on="Community")

dubai_merged.sort_values(by = 'Coffee Shop', inplace = True)
dubai_merged.head()

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,ABU HAIL,0.0,3,25.286029,55.328865,Lively,25.285194,55.325276,Track
30,AL RIGGA,0.0,3,25.267208,55.31024,Mark Inn Hotel Deira Dubai,25.270367,55.309826,Hotel
30,AL RIGGA,0.0,3,25.267208,55.31024,KFC,25.26884,55.305971,Fried Chicken Joint
30,AL RIGGA,0.0,3,25.267208,55.31024,Apple Cafe Restaurant,25.266124,55.30877,Restaurant
30,AL RIGGA,0.0,3,25.267208,55.31024,Chongqing Liu Yi Shou Steamboat,25.269595,55.313515,Asian Restaurant


Let's see how the Clusters look like

In [173]:
 #create map
map_clus = folium.Map(location = [latitude, longitude], zoom_start = 11)

# set color scheme for the clusters
x = np.arange(k)

ys = [ i + x + ( i * x ) ** 2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dubai_merged['Neighborhood Latitude'], dubai_merged['Neighborhood Longitude'], dubai_merged['Community'], dubai_merged['Cluster']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clus)
       
map_clus

In [174]:
# Cluster: 0

dubai_merged.loc[dubai_merged['Cluster'] == 0].head(25)

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
70,OUD METHA,0.032787,0,25.237476,55.311828,Deans Fujiya Supermarket,25.23703,55.307489,Gourmet Shop
70,OUD METHA,0.032787,0,25.237476,55.311828,Black Iris Cafe - Al Nasr Sports Club,25.24124,55.311765,Café
70,OUD METHA,0.032787,0,25.237476,55.311828,Zoom,25.234211,55.314323,Convenience Store
70,OUD METHA,0.032787,0,25.237476,55.311828,Golden Hall,25.237804,55.30735,Hookah Bar
70,OUD METHA,0.032787,0,25.237476,55.311828,Lemon Grass Thai Restaurant,25.233959,55.309179,Thai Restaurant
70,OUD METHA,0.032787,0,25.237476,55.311828,Bay Leaf,25.234664,55.308185,Indian Restaurant
70,OUD METHA,0.032787,0,25.237476,55.311828,crystal lounge,25.240404,55.314103,Cocktail Bar
70,OUD METHA,0.032787,0,25.237476,55.311828,Ortai Spa Traditional Thai Retreat,25.237102,55.307165,Spa
70,OUD METHA,0.032787,0,25.237476,55.311828,Puranmal Sweets,25.237075,55.307288,Indian Restaurant
70,OUD METHA,0.032787,0,25.237476,55.311828,Fakhreldine Restaurant,25.23424,55.312397,Restaurant


In [175]:
# Cluster: 1

dubai_merged.loc[dubai_merged['Cluster'] == 1].head(25)

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
39,AL WASL,0.25,1,25.195933,55.255737,Contessa,25.194424,55.259766,Bridal Shop
39,AL WASL,0.25,1,25.195933,55.255737,Elevation Burger - Jumeirah Beach road,25.198777,55.252414,Burger Joint
39,AL WASL,0.25,1,25.195933,55.255737,Starbucks (ستاربكس),25.194726,55.254548,Coffee Shop
39,AL WASL,0.25,1,25.195933,55.255737,كافتيريا سلسبيل,25.195233,55.25726,Food
62,MIRDIF,0.263158,1,25.220229,55.423,Chicken Tikka Inn,25.217606,55.419267,Indian Restaurant
62,MIRDIF,0.263158,1,25.220229,55.423,Paperchase,25.222976,55.425049,Paper / Office Supplies Store
62,MIRDIF,0.263158,1,25.220229,55.423,Uptown Park,25.222071,55.42702,Park
62,MIRDIF,0.263158,1,25.220229,55.423,Mellow Yellow Bakeshop & Cafe,25.223908,55.424755,Bakery
62,MIRDIF,0.263158,1,25.220229,55.423,Mary Foot Spa,25.223972,55.424718,Nail Salon
62,MIRDIF,0.263158,1,25.220229,55.423,Starbucks (ستاربكس),25.2229,55.4254,Coffee Shop


In [176]:
# Cluster: 2

dubai_merged.loc[dubai_merged['Cluster'] == 2].head(25)

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
15,AL MANARA,0.1,2,25.144818,55.214453,Choithrams,25.147682,55.211904,Convenience Store
15,AL MANARA,0.1,2,25.144818,55.214453,Sandella's Flatbread Cafe,25.147171,55.210378,Café
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Starbucks,25.264496,55.303989,Coffee Shop
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Al Seef Hotel by Jumeirah,25.263773,55.305246,Hotel
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Alfanar Restaurant & Cafe (مطعم و مقهى الفنر),25.263778,55.305463,Middle Eastern Restaurant
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Sul Fiume Al Seef,25.262957,55.307083,Restaurant
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Dukkan Burger,25.260909,55.309132,Burger Joint
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Khofo,25.263495,55.306126,Theme Restaurant
10,AL HAMRIYA,0.1,2,25.260774,55.304996,Bombay Star Juice Center,25.256975,55.306228,Juice Bar
15,AL MANARA,0.1,2,25.144818,55.214453,Lepont لي بون,25.146936,55.210419,Dessert Shop


In [177]:
# Cluster: 3

dubai_merged.loc[dubai_merged['Cluster'] == 3].head(25)

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,ABU HAIL,0.0,3,25.286029,55.328865,Lively,25.285194,55.325276,Track
30,AL RIGGA,0.0,3,25.267208,55.31024,Mark Inn Hotel Deira Dubai,25.270367,55.309826,Hotel
30,AL RIGGA,0.0,3,25.267208,55.31024,KFC,25.26884,55.305971,Fried Chicken Joint
30,AL RIGGA,0.0,3,25.267208,55.31024,Apple Cafe Restaurant,25.266124,55.30877,Restaurant
30,AL RIGGA,0.0,3,25.267208,55.31024,Chongqing Liu Yi Shou Steamboat,25.269595,55.313515,Asian Restaurant
30,AL RIGGA,0.0,3,25.267208,55.31024,McDonald's,25.270704,55.309751,Fast Food Restaurant
30,AL RIGGA,0.0,3,25.267208,55.31024,Danial Restaurant,25.265841,55.309189,Restaurant
30,AL RIGGA,0.0,3,25.267208,55.31024,Carlton Tower Hotel,25.267162,55.306881,Hotel
30,AL RIGGA,0.0,3,25.267208,55.31024,Day To Day,25.267561,55.308448,Department Store
30,AL RIGGA,0.0,3,25.267208,55.31024,Yanshuang Restaurant,25.266269,55.308514,Chinese Restaurant


In [178]:
# Cluster: 4

dubai_merged.loc[dubai_merged['Cluster'] == 4].head(25)

Unnamed: 0,Community,Coffee Shop,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Category
32,AL SATWA,0.5,4,25.22086,55.273762,Beverly Perfumes,25.221928,55.269627,Cosmetics Shop
32,AL SATWA,0.5,4,25.22086,55.273762,Starbucks,25.220988,55.278277,Coffee Shop


**Conclusion**

As you can see from the above Folium map, Communities in cluster 3, marked as Bleu, do not have any 'Coffee Shop,' Which gives us a lot of choices to select where we want to start our business.

There are certainly fewer Coffee Shop in communities in cluster 2, 4 and  there is a potential to have a successful business.

