# Predicting a suitable Neighboorhood for a Hotel using Foursquare data 

### Introduction 

My client owns a successful Hotel in "Stuyvesant Town" in Manhattan, NY and now has the opportunity to gow their business and open a second Hotel in the city of Düsseldorf in Germany. Often the success of a Hotel is linked to its location since it is one of the most important criteria for most customers staying at a Hotel. Given the success of the Hotel in Manhattan, my client would like to open the 2nd Hotel in Düsseldorf city center. Before doing any further in depth analysis on any neighborhoods in Düsseldorf, my client would like to narrow down their options by focusing on neighborhoods that are comparable to "Stuyvesant Town" using data. 

My task is to use and analyze the available online data to give location recommendations to my client based on their requirements. 

### The Data

There is plenty of data available online but we will be using the official data povided by the city's official website "https://opendata.duesseldorf.de/" for location and neighborhood information. We will also be exploring the the venues in target neighborhoods using the Foursquare services. The client has also provided a reference venue data for "Stuyvesant Town" on which the anaysis should be based.

Client Reference Data

![Client Reference Data](https://github.com/mloukhieh/testrepo/blob/master/Capture.PNG?raw=true)

### Importing all the necessary packages and libraries

In [1]:
#!conda install conda=4.6.14 (incase there are problems with installing conda, uncomment this code to run an older version)
#!conda update --all --yes

!conda install -c conda-forge folium=0.5.0 --yes 
!conda install -c conda-forge geopy --yes 

print("packages installed!")

Collecting package metadata: done
Solving environment: done


  current version: 4.6.14
  latest version: 4.9.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2020.6.20          |   py37he5f6b98_2         151 KB  conda-forge
    conda-4.9.0                |   py37he5f6b98_1         3.0 MB  conda-forge
    conda-package-handling-1.7.2|   py37hb5d75c8_0         915 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1h             |       h516909a_0         2.1 MB  conda

In [2]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

print("Libraries Imported!")

Libraries Imported!


### Downloading and previewing the Data from the Düsseldorf city website

In [3]:
url = "https://opendata.duesseldorf.de/sites/default/files/Stadtteile%20D%C3%BCsseldorf%202017.csv"
df = pd.read_csv(url)

In [4]:
df.head()

Unnamed: 0,Stadtteil;Stadtteilnummer;Stadtbezirksnummer
0,Altstadt;011;1
1,Angermund;055;5
2,Benrath;095;9
3,Bilk;036;3
4,Carlstadt;012;1


As we can see there the data has everything we need but it needs some adjustments which we will outline in the next section "Data Wrangling"

## 1- Exploring and Wrangling Data

Our target here is have a clean data frame that is listing all the target neighborhoods together with their coordinates

In [5]:
# first we split the column "Stadtteil;Stadtteilnummer;Stadtbezirksnummer" into 3 separate columns

new = df["Stadtteil;Stadtteilnummer;Stadtbezirksnummer"].str.split(";", n = -1, expand = True) 
  
# making separate Stadtteil column from new data frame 
df["Stadtteil"]= new[0] 
  
# making separate Stadtteilnummer column from new data frame 
df["Stadtteilnummer"]= new[1] 

# making separate Stadtbezirksnummer column from new data frame 
df["Stadtbezirksnummer"]= new[2]

df.head()

Unnamed: 0,Stadtteil;Stadtteilnummer;Stadtbezirksnummer,Stadtteil,Stadtteilnummer,Stadtbezirksnummer
0,Altstadt;011;1,Altstadt,11,1
1,Angermund;055;5,Angermund,55,5
2,Benrath;095;9,Benrath,95,9
3,Bilk;036;3,Bilk,36,3
4,Carlstadt;012;1,Carlstadt,12,1


Our client would like to open the Hotel in Düsseldorf City Center which means in Neighborhoods that are in Disctrict Number (Stadtbezirksnummer) 1, 2 or 3

In [6]:
# Dropping column "Stadtteil;Stadtteilnummer;Stadtbezirksnummer" that was split into 3 
df.drop(columns =["Stadtteil;Stadtteilnummer;Stadtbezirksnummer"], inplace = True) 

#rename "Stadtteil" "Neighborhood" 
df.rename(columns={"Stadtteil":"Neighborhood"}, inplace=True)

# convert Stadtbezirksnummer to integer
df["Stadtbezirksnummer"] = df["Stadtbezirksnummer"].astype("int")

# Delete rows where Stadtbezirksnummer is 4 to 10
df_duss = df.drop(df[df.Stadtbezirksnummer.isin([4, 5, 6, 7, 8, 9, 10])].index)

# Dropping unecessary columns
df_duss.drop(columns =["Stadtteilnummer","Stadtbezirksnummer"], inplace = True) 

df_duss

Unnamed: 0,Neighborhood
0,Altstadt
3,Bilk
4,Carlstadt
5,Derendorf
6,Düsseltal
8,Flehe
9,Flingern Nord
10,Flingern Süd
11,Friedrichstadt
14,Golzheim


Now that we have all the target Neighborhoods in one dataframe we need to append the coordinates for each neighnorhood.To do that we will use the geolocator function and and a for loop

In [7]:
#get the coordinates of the differnt neighborhoods using geolocator
for n in df_duss["Neighborhood"]:
    address = n, ', Dusseldorf'
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    
    print (n, "/", latitude, longitude)

Altstadt / 51.2259125 6.7735672
Bilk / 51.2027583 6.7851015
Carlstadt / 51.2221416 6.7733942
Derendorf / 51.2445487 6.7922488
Düsseltal / 51.2378412 6.812116
Flehe / 51.1922044 6.7717128
Flingern Nord / 51.2313815 6.8132378
Flingern Süd / 51.2210094 6.8100603
Friedrichstadt / 51.2135645 6.7816997
Golzheim / 51.2507945 6.7599633
Hafen / 51.2170292 6.7335758
Hamm / 51.2035725 6.7388087
Oberbilk / 51.2136887 6.8024279
Pempelfort / 51.2396009 6.7796845
Stadtmitte / 51.2219385 6.7844229
Unterbilk / 51.210055 6.7669651
Volmerswerth / 51.1885784 6.7490097


Now that we have the coordinates for all target neighborhoods we can create the necessary data frame "df_duss_coor" that has the neighborhood names and coordinates

In [8]:
# Put it all in one dataframe "df_duss_coor"
d = {"Neighborhood" : ["Altstadt", "Bilk", "Carlstadt", "Derendorf", "Düsseltal", "Flehe", "Flingern Nord", "Flingern Süd", "Friedrichstadt", "Golzheim", "Hafen", "Hamm", "Oberbilk", "Pempelfort", "Stadtmitte", "Unterbilk", "Volmerswerth"], 
     "latitude" : [51.2259125, 51.2027583, 51.2221416, 51.2445487, 51.2378412, 51.1922044, 51.2313815, 51.2210094, 51.2135645, 51.2507945, 51.2170292, 51.2035725, 51.2136887, 51.2396009, 51.2219385, 51.210055, 51.1885784], 
     "longitude": [6.7735672, 6.7851015, 6.7733942, 6.7922488, 6.812116, 6.7717128, 6.8132378, 6.8100603, 6.7816997, 6.7599633, 6.7335758, 6.7388087, 6.8024279, 6.7796845, 6.7844229, 6.7669651, 6.7490097]}
df_duss_coor = pd.DataFrame(data=d)
df_duss_coor


Unnamed: 0,Neighborhood,latitude,longitude
0,Altstadt,51.225912,6.773567
1,Bilk,51.202758,6.785101
2,Carlstadt,51.222142,6.773394
3,Derendorf,51.244549,6.792249
4,Düsseltal,51.237841,6.812116
5,Flehe,51.192204,6.771713
6,Flingern Nord,51.231381,6.813238
7,Flingern Süd,51.221009,6.81006
8,Friedrichstadt,51.213564,6.7817
9,Golzheim,51.250794,6.759963


### Visualizing the target neighborhoods on the map

First we need to get the coordinates for Düsseldorf using geolocator

In [9]:
address = 'Dusseldorf, DE'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dusseldorf are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dusseldorf are 51.2254018, 6.7763137.


Now we proceed to ploting the map and the markers for each target neighborhood

In [10]:
# create map of Dusseldorf using latitude and longitude values
map_duss = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, neighborhood in zip(df_duss_coor['latitude'], df_duss_coor['longitude'], df_duss_coor['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_duss)  
    
map_duss

### Exploring the target neighborhoods using Foursquare 

First connect to foursquare using client ID and secret

In [11]:
#Define Foursquare Credentials and Version
CLIENT_ID = 'IVDOAISHPYL3MC3OPERMAGK5JOMYWHJSOSSAOLMSJZORU0E2' # your Foursquare ID
CLIENT_SECRET = 'ZBX2ZOLBXLFWZBKHEHCLHA3PXF4B5DIU5BFM1XQUQ5NJ5LHQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IVDOAISHPYL3MC3OPERMAGK5JOMYWHJSOSSAOLMSJZORU0E2
CLIENT_SECRET:ZBX2ZOLBXLFWZBKHEHCLHA3PXF4B5DIU5BFM1XQUQ5NJ5LHQ


Now let's call the foursquare data for Düsseldorf and explore it 

In [16]:
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f96b0c908bd407f2d4310d1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Altstadt',
  'headerFullLocation': 'Altstadt, Düsseldorf',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 188,
  'suggestedBounds': {'ne': {'lat': 51.2299018045, 'lng': 6.783485825221867},
   'sw': {'lat': 51.2209017955, 'lng': 6.769141574778134}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '56eb2d9bcd10c4efdfb581cf',
       'name': 'Casita Mexicana',
       'location': {'address': 'Hunsrückenstr. 15',
        'lat': 51.22667595657295,
        'lng': 6.775478313848207,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.2266759565729

Looking at the data in the .json file we can see that all the information we need is in the "items" key. Like venue name, latitude, longitude, category type. Let's define a function that will return all that information into our "df_duss_coor" dataframe

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we can proceed to filling in our attributes into the newly created function to get all the venues names in our target neighborhoods together with latitude, longitude, category type

In [20]:
duss_venues = getNearbyVenues(names=df_duss_coor['Neighborhood'],
                                   latitudes=df_duss_coor['latitude'],
                                   longitudes=df_duss_coor['longitude']
                                  )

Altstadt
Bilk
Carlstadt
Derendorf
Düsseltal
Flehe
Flingern Nord
Flingern Süd
Friedrichstadt
Golzheim
Hafen
Hamm
Oberbilk
Pempelfort
Stadtmitte
Unterbilk
Volmerswerth


Now we visualize the results 

In [21]:
print(duss_venues.shape)
duss_venues.head()

(668, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altstadt,51.225912,6.773567,Casita Mexicana,51.226676,6.775478,Mexican Restaurant
1,Altstadt,51.225912,6.773567,Rösterei VIER,51.224536,6.773703,Coffee Shop
2,Altstadt,51.225912,6.773567,Rösterei VIER,51.22594,6.772294,Coffee Shop
3,Altstadt,51.225912,6.773567,Elephant Bar,51.226851,6.772636,Cocktail Bar
4,Altstadt,51.225912,6.773567,Bar Chérie,51.226886,6.772424,Bar


## 2- Data Analysis

Now that we have put the data into the required dataframe format we can begin analysing the data. Let's see how the neighborhoods compare when it comes to abundance of venues  

In [22]:
#Let's check how many venues were returned for each neighborhood
duss_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Altstadt,100,100,100,100,100,100
Bilk,20,20,20,20,20,20
Carlstadt,100,100,100,100,100,100
Derendorf,24,24,24,24,24,24
Flehe,8,8,8,8,8,8
Flingern Nord,37,37,37,37,37,37
Flingern Süd,28,28,28,28,28,28
Friedrichstadt,87,87,87,87,87,87
Golzheim,14,14,14,14,14,14
Hafen,3,3,3,3,3,3


In [23]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(duss_venues['Venue Category'].unique())))

There are 152 uniques categories.


The aim of this analysis is to be able to identify a suitable location for the Hotel based on the reference data provided by the client. The reference data is a table listing the 10 most popular venues in "Stuyvesant Town". To be able to find a similar neigborhood in Düsseldorf we need to cluster and classify the target neighborhoods based on the 10 most poplular venues for each neighborhood. In order to do that we need to turn all the venues categories into separate attributes and calculate the frequency of each category per neighborhood. For that we can use on hot encoding method which will return a table with venue categories as attributes and a "0" or "1" as values. With "0" meaning no such venue existst in the neighborhood and "1" meaning such venue exists.

In [24]:
# one hot encoding
duss_onehot = pd.get_dummies(duss_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
duss_onehot['Neighborhood'] = duss_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [duss_onehot.columns[-1]] + list(duss_onehot.columns[:-1])
duss_onehot = duss_onehot[fixed_columns]

duss_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Cosmetics Shop,Costume Shop,Cupcake Shop,Deli / Bodega,Dessert Shop,Dive Bar,Doner Restaurant,Drugstore,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gay Bar,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kids Store,Korean Restaurant,Laundromat,Liquor Store,Lounge,Massage Studio,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Museum,Music Venue,Nightclub,North Indian Restaurant,Office,Opera House,Organic Grocery,Park,Pastry Shop,Pedestrian Plaza,Perfume Shop,Pharmacy,Pizza Place,Plaza,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Rhenisch Restaurant,River,Rock Club,Russian Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Slovak Restaurant,Soba Restaurant,Soccer Field,Soup Place,Souvlaki Shop,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Taverna,Thai Restaurant,Theater,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Yoga Studio
0,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Altstadt,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [112]:
# And let's examine the new dataframe size.
duss_onehot.shape

(669, 153)

The new dataframe "duss_onehot" has 153 columns which validates our code since one column is the neighborhoods name and the remaining 152 columns represent the 152 unique categories found in the target neighborhoods.
Now let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category. This will help is later in identfying the most popular venues per neighborhoods based. The higher the frequency of a category the more popular is is. 


In [25]:
duss_grouped = duss_onehot.groupby('Neighborhood').mean().reset_index()
duss_grouped

Unnamed: 0,Neighborhood,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bakery,Bank,Bar,Beach,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Business Service,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Cosmetics Shop,Costume Shop,Cupcake Shop,Deli / Bodega,Dessert Shop,Dive Bar,Doner Restaurant,Drugstore,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gay Bar,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kids Store,Korean Restaurant,Laundromat,Liquor Store,Lounge,Massage Studio,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Museum,Music Venue,Nightclub,North Indian Restaurant,Office,Opera House,Organic Grocery,Park,Pastry Shop,Pedestrian Plaza,Perfume Shop,Pharmacy,Pizza Place,Plaza,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Rhenisch Restaurant,River,Rock Club,Russian Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Slovak Restaurant,Soba Restaurant,Soccer Field,Soup Place,Souvlaki Shop,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tapas Restaurant,Taverna,Thai Restaurant,Theater,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Yoga Studio
0,Altstadt,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.05,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.03,0.0,0.0,0.05,0.02,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.02,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.03,0.01,0.02,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0
1,Bilk,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Carlstadt,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.03,0.02,0.01,0.0,0.05,0.0,0.01,0.04,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.03,0.02,0.01,0.0,0.0,0.07,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0
3,Derendorf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0
4,Flehe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Flingern Nord,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.162162,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.027027,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.054054,0.0,0.0,0.0
6,Flingern Süd,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.071429,0.035714,0.035714,0.035714,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0
7,Friedrichstadt,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.057471,0.0,0.045977,0.0,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.022989,0.091954,0.011494,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.022989,0.0,0.0,0.0,0.0,0.0,0.011494,0.103448,0.0,0.011494,0.0,0.0,0.0,0.0,0.034483,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.011494,0.022989,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.045977,0.022989,0.0,0.0,0.034483,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.011494,0.011494,0.0,0.022989,0.022989,0.011494,0.0,0.057471,0.0,0.0,0.011494
8,Golzheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Hafen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's put the top 10 venues in each Neighborhood into a pandas dataframe. First, let's write a function to sort the venues in descending order.

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = duss_grouped['Neighborhood']

for ind in np.arange(duss_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(duss_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt,Café,Plaza,Brewery,Coffee Shop,Bar,Italian Restaurant,Steakhouse,Boutique,Ice Cream Shop,German Restaurant
1,Bilk,Tram Station,Hotel,Bakery,Greek Restaurant,Doner Restaurant,Shipping Store,Costume Shop,Café,Supermarket,Italian Restaurant
2,Carlstadt,Italian Restaurant,Coffee Shop,Boutique,Café,Clothing Store,Plaza,Ice Cream Shop,German Restaurant,Bakery,Brewery
3,Derendorf,Liquor Store,Supermarket,Restaurant,Colombian Restaurant,Shipping Store,Gym / Fitness Center,Juice Bar,Kids Store,German Restaurant,Gastropub
4,Flehe,Soccer Field,Skate Park,Tram Station,Gym,Park,Bakery,Drugstore,Doner Restaurant,Electronics Store,Dive Bar


## 3- Modeling and Clustering the Data

Now that the Data has been properly structured we can choose to compare line by line and each neighborhood to our data reference and give recommendations. This is only possible because we only have 16 target neighborhoods but this is tedious work and would be impossible in case we had a larger number of neighborhoods. In this case we can choose to cluster the neighborhoods in different clusters depending on the similarity in venue frequency between the neighborhoods. To do that we will use k-means to cluster the neighborhoods into 5 different clusters and then we will recommend the cluster that best matches our reference data. 

In [29]:
# Run k-means to cluster the neighborhood into 5 clusters.

# set number of clusters
kclusters = 5

duss_grouped_clustering = duss_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(duss_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 1, 2, 1, 1, 1, 1, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [30]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge neighborhoods_venues_sorted with df_duss_coor to add latitude/longitude for each neighborhood
duss_merged = df_duss_coor.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

duss_merged.head() 

Unnamed: 0,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt,51.225912,6.773567,1.0,Café,Plaza,Brewery,Coffee Shop,Bar,Italian Restaurant,Steakhouse,Boutique,Ice Cream Shop,German Restaurant
1,Bilk,51.202758,6.785101,0.0,Tram Station,Hotel,Bakery,Greek Restaurant,Doner Restaurant,Shipping Store,Costume Shop,Café,Supermarket,Italian Restaurant
2,Carlstadt,51.222142,6.773394,1.0,Italian Restaurant,Coffee Shop,Boutique,Café,Clothing Store,Plaza,Ice Cream Shop,German Restaurant,Bakery,Brewery
3,Derendorf,51.244549,6.792249,1.0,Liquor Store,Supermarket,Restaurant,Colombian Restaurant,Shipping Store,Gym / Fitness Center,Juice Bar,Kids Store,German Restaurant,Gastropub
4,Düsseltal,51.237841,6.812116,,,,,,,,,,,


Some values are missing and display "NaN" instead of a binary number. Let's get ride of the rows that do not contain useful data 

In [31]:
# Delete rows containing NaN
duss_merged.dropna(inplace=True)
duss_merged.reset_index(drop=True, inplace=True)
duss_merged.shape

(15, 14)

The cluster labels in the table are floats. Let's cast them into integers otherwise we will have a problem later on when trying to display the clusters on the map

In [32]:
duss_merged["Cluster Labels"] = duss_merged["Cluster Labels"].astype("int")
duss_merged.dtypes

Neighborhood               object
latitude                  float64
longitude                 float64
Cluster Labels              int64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

## 4- Examining the results 

Finally, let's visualize the resulting clusters on the Düssedorf map

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(duss_merged['latitude'], duss_merged['longitude'], duss_merged['Neighborhood'], duss_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We can see the distribution of the clusters on the map. Now let's examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories.

### Cluster 0

In [35]:
duss_merged.loc[duss_merged['Cluster Labels'] == 0, duss_merged.columns[[0] + list(range(4, duss_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bilk,Tram Station,Hotel,Bakery,Greek Restaurant,Doner Restaurant,Shipping Store,Costume Shop,Café,Supermarket,Italian Restaurant


### Cluster 1

In [135]:
duss_merged.loc[duss_merged['Cluster Labels'] == 1, duss_merged.columns[[0] + list(range(4, duss_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt,Café,Plaza,Brewery,Coffee Shop,Bar,Italian Restaurant,Steakhouse,Boutique,Ice Cream Shop,German Restaurant
2,Carlstadt,Italian Restaurant,Coffee Shop,Boutique,Café,Clothing Store,Plaza,Ice Cream Shop,German Restaurant,Bakery,Brewery
3,Derendorf,Liquor Store,Supermarket,Restaurant,Colombian Restaurant,Shipping Store,Gym / Fitness Center,Juice Bar,Kids Store,German Restaurant,Gastropub
5,Flingern Nord,Café,Asian Restaurant,Bakery,Pizza Place,Greek Restaurant,Italian Restaurant,Vietnamese Restaurant,Hotel,German Restaurant,Office
6,Flingern Süd,Hotel,Portuguese Restaurant,Electronics Store,Greek Restaurant,Italian Restaurant,Gourmet Shop,Doner Restaurant,Music Venue,Rock Club,Chinese Restaurant
7,Friedrichstadt,Hotel,Café,Vietnamese Restaurant,Bakery,Pizza Place,Bar,Italian Restaurant,Pub,Miscellaneous Shop,Middle Eastern Restaurant
8,Golzheim,Italian Restaurant,Bakery,Café,Metro Station,Salad Place,Fast Food Restaurant,Steakhouse,Supermarket,Beach,Modern European Restaurant
11,Oberbilk,Hotel,Bar,Hookah Bar,Korean Restaurant,Massage Studio,Metro Station,Mobile Phone Shop,Drugstore,Doner Restaurant,Comfort Food Restaurant
12,Pempelfort,Italian Restaurant,Café,Ice Cream Shop,Bakery,Hotel,Drugstore,Supermarket,Restaurant,Trattoria/Osteria,Vietnamese Restaurant
13,Stadtmitte,Japanese Restaurant,Korean Restaurant,Café,Hotel,Ramen Restaurant,Italian Restaurant,Coffee Shop,Grocery Store,Shopping Mall,Sushi Restaurant


### Cluster 2

In [36]:
duss_merged.loc[duss_merged['Cluster Labels'] == 2, duss_merged.columns[[0] + list(range(4, duss_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Flehe,Soccer Field,Skate Park,Tram Station,Gym,Park,Bakery,Drugstore,Doner Restaurant,Electronics Store,Dive Bar


### Cluster 3

In [37]:
duss_merged.loc[duss_merged['Cluster Labels'] == 3, duss_merged.columns[[0] + list(range(4, duss_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Hafen,Scenic Lookout,Boat or Ferry,River,Yoga Studio,Fried Chicken Joint,French Restaurant,Fountain,Food,Fast Food Restaurant,Farmers Market


### Cluster 4

In [39]:
duss_merged.loc[duss_merged['Cluster Labels'] == 4, duss_merged.columns[[0] + list(range(4, duss_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Hamm,Beach,Café,Gastropub,Garden,German Restaurant,Greek Restaurant,Bank,Hotel,Restaurant,Drugstore


## 5- Conclusion 

Client Reference Data

![Client Reference Data](https://github.com/mloukhieh/testrepo/blob/master/Capture.PNG?raw=true)

Looking at the refernce Data we can recommend the most suitable location to be in Cluster 3 which is neighborhood "Hafen". Stuyvesant and Hafen are similar neighborhoods since they are both on the river side, have harbors/boat/ferries, they contain public recreational venues like Fountains and Parks and they both have a farmer's market. In addition to the similarities it is worth noting that "Hafen" does not have hotels which minimizes potential competition and has few restaurants which is beneficial for the hotel's restaurant. 

The second choice would be Cluster 4 or neighborhood "Hamm" which is adjacent to "Hafen" and has similar attributes. 