# Data Science Capstone by Coursera

### Coding for Final Report - Opening a new Cafe in Kassel, Germany

+ Build a dataframe of neighborhoods in Kassel, Germany throgh data from Wikipedia page
+ Get the geographical coordinates of the neighborhoods
+ Obtain the venue data for the neighborhoods from Foursquare API
+ Explore and cluster the neighborhoods
+ Select the best cluster to open a new cafe

#### 1. Install packages 

In [2]:
pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/e8/b5/7bb03a696f2c9b7af792a8f51b82974e51c268f15e925fc834876a4efa0b/beautifulsoup4-4.9.0-py3-none-any.whl (109kB)
[K     |████████████████████████████████| 112kB 6.3MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/05/cf/ea245e52f55823f19992447b008bcbb7f78efc5960d77f6c34b5b45b36dd/soupsieve-2.0-py2.py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.0 soupsieve-2.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install geopy

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/53/fc/3d1b47e8e82ea12c25203929efb1b964918a77067a874b2c7631e2ec35ec/geopy-1.21.0-py2.py3-none-any.whl (104kB)
[K     |████████████████████████████████| 112kB 5.9MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.21.0
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 12.1MB/s ta 0:00:01
[?25hCollecting click (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl (82kB)
[K     |████████████████████████████████| 92kB 4.8MB/s  eta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Collecting future (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 10.0MB/s eta 0:00:01
Building wheel

#### 2. Import libraries 

In [5]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


#### 3. From Wikipedia into DataFrame

In [6]:
data = requests.get("https://de.wikipedia.org/wiki/Kategorie:Stadtteil_von_Kassel").text
soup = BeautifulSoup(data, 'html.parser')
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)
kl_df = pd.DataFrame({"Neighborhood": neighborhoodList})
kl_df.head()

Unnamed: 0,Neighborhood
0,Bad Wilhelmshöhe
1,Bettenhausen (Kassel)
2,Brasselsberg (Kassel)
3,Fasanenhof (Kassel)
4,Forstfeld (Kassel)


#### 4. Get geographical coordinates

In [7]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kassel, Germany'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [8]:
coords = [ get_latlng(neighborhood) for neighborhood in kl_df["Neighborhood"].tolist() ]

In [9]:
coords

[[51.31660206723354, 9.420743286192975],
 [51.313760923175366, 9.507323713649857],
 [51.288650000000075, 9.415340000000072],
 [51.33063008050087, 9.519509995528002],
 [51.29713299480513, 9.537739072728195],
 [51.36275042584529, 9.432810105233953],
 [51.34066000000007, 9.466560000000072],
 [51.31696952607233, 9.441507816915529],
 [51.35951194980332, 9.55925750393354],
 [51.31602650255086, 9.501225751639833],
 [51.28327000000007, 9.467520000000036],
 [51.31853000000007, 9.511830000000032],
 [51.28043000000008, 9.431680000000028],
 [49.24248000000006, 8.26484000000005],
 [51.277600000000064, 9.452090000000055],
 [51.34541000000007, 9.496480000000076],
 [51.34808250000002, 9.46928250000001],
 [51.31853000000007, 9.511830000000032],
 [51.29676000000006, 9.451780000000042],
 [51.31470000000007, 9.511140000000069],
 [51.624627050284055, 9.617859832926344],
 [51.32021142771955, 9.44433656148823],
 [51.28606158954837, 9.496284983849655],
 [51.31232356263808, 9.465765777081616],
 [51.31580810883

In [10]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
kl_df['Latitude'] = df_coords['Latitude']
kl_df['Longitude'] = df_coords['Longitude']
kl_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Bad Wilhelmshöhe,51.316602,9.420743
1,Bettenhausen (Kassel),51.313761,9.507324
2,Brasselsberg (Kassel),51.28865,9.41534
3,Fasanenhof (Kassel),51.33063,9.51951
4,Forstfeld (Kassel),51.297133,9.537739
5,Harleshausen,51.36275,9.43281
6,Jungfernkopf,51.34066,9.46656
7,Kirchditmold,51.31697,9.441508
8,Kragenhof,51.359512,9.559258
9,Mitte (Kassel),51.316027,9.501226


#### 5. Create map of Kassel 

In [12]:
address = 'Kassel, Germany'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [30]:
map_kl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

#### 6. Foursquare API

In [14]:
# define Foursquare Credentials and Version
CLIENT_ID = 'U4HX1OECYTILUXOM2AQKKOR34EL3TFOROT5ZWOJR0HTLSB2S' # your Foursquare ID
CLIENT_SECRET = 'QX3HHS1QCI1WKCTPMENDFO154JQGADGF4LCHMQUFUKAHOKRD' # your Foursquare Secret
VERSION = '20200605' # Foursquare API version

##### get the top Venues in 500 meter radius

In [33]:
radius = 500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [34]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(163, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Bad Wilhelmshöhe,51.316602,9.420743,Schloss Wilhelmshöhe,51.315103,9.416313,Palace
1,Bad Wilhelmshöhe,51.316602,9.420743,Schlosshotel Wilhelmshöhe,51.317628,9.415837,Hotel
2,Bad Wilhelmshöhe,51.316602,9.420743,Route 66,51.313061,9.423532,American Restaurant
3,Bad Wilhelmshöhe,51.316602,9.420743,H Wilhelmshöhe (Park),51.315419,9.423587,Tram Station
4,Bad Wilhelmshöhe,51.316602,9.420743,Schlosscafé Wilhelmshöhe,51.317075,9.416721,Café


##### What categories are there? 

In [17]:
venues_df['VenueCategory'].unique()[:50]

array(['Park', 'Palace', 'Hot Spring', 'German Restaurant', 'Brewery',
       'Greek Restaurant', 'Fountain', 'Monument / Landmark',
       'Indian Restaurant', 'Castle', 'Hotel', 'Historic Site', 'Café',
       'Ice Cream Shop', 'American Restaurant', 'Steakhouse',
       'Restaurant', 'Drugstore', 'Chinese Restaurant', 'Tram Station',
       'Food', 'Coffee Shop', 'Bakery', 'College Gym', 'Bookstore',
       'Trattoria/Osteria', 'Platform', 'Bank', 'Supermarket',
       'Art Museum', 'Art Gallery', 'Farmers Market',
       'African Restaurant', 'Music Venue', 'Italian Restaurant',
       'Nightclub', 'Plaza', 'Bagel Shop', 'Theater',
       'Gym / Fitness Center', 'History Museum', 'Cocktail Bar',
       'Science Museum', 'Pool', 'Multiplex', 'Clothing Store',
       'Automotive Shop', 'School', 'Scenic Lookout', 'Pub'], dtype=object)

#### 7. Analyze the neighborhood 

In [18]:

# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(1075, 116)


Unnamed: 0,Neighborhoods,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Garden,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Business Service,Café,Castle,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Concert Hall,Construction & Landscaping,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Electronics Store,Event Service,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food Truck,Fountain,Furniture / Home Store,Garden Center,Gas Station,Gastropub,German Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hookah Bar,Hot Spring,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Locksmith,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Palace,Park,Perfume Shop,Pet Store,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Rest Area,Restaurant,Scenic Lookout,School,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Soccer Field,Soccer Stadium,Spanish Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Theater,Toy / Game Store,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Wine Shop
0,Bad Wilhelmshöhe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bad Wilhelmshöhe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bad Wilhelmshöhe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bad Wilhelmshöhe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bad Wilhelmshöhe,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

(26, 116)


Unnamed: 0,Neighborhoods,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Garden,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Business Service,Café,Castle,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Concert Hall,Construction & Landscaping,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Electronics Store,Event Service,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food Truck,Fountain,Furniture / Home Store,Garden Center,Gas Station,Gastropub,German Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hookah Bar,Hot Spring,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Locksmith,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Palace,Park,Perfume Shop,Pet Store,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Rest Area,Restaurant,Scenic Lookout,School,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Soccer Field,Soccer Stadium,Spanish Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Theater,Toy / Game Store,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Wine Shop
0,Bad Wilhelmshöhe,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.027778,0.027778,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.083333,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.083333,0.027778,0.0,0.0,0.0,0.0
1,Bettenhausen (Kassel),0.010101,0.010101,0.030303,0.040404,0.010101,0.010101,0.0,0.010101,0.020202,0.010101,0.030303,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050505,0.0,0.0,0.030303,0.040404,0.050505,0.0,0.0,0.0,0.0,0.0,0.020202,0.040404,0.0,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.020202,0.0,0.0,0.0,0.030303,0.010101,0.010101,0.010101,0.010101,0.0,0.030303,0.010101,0.0,0.0,0.0,0.030303,0.0,0.0,0.010101,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.010101,0.010101,0.020202,0.0,0.010101,0.010101,0.010101,0.0,0.0,0.0,0.020202,0.010101,0.020202,0.0,0.0,0.010101,0.010101,0.020202,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.010101,0.0,0.060606,0.0,0.010101,0.0,0.0,0.0,0.0,0.020202,0.0,0.030303,0.010101,0.0
2,Brasselsberg (Kassel),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
3,Fasanenhof (Kassel),0.026316,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.0,0.026316,0.0,0.078947,0.0,0.0,0.078947,0.0,0.0
4,Forstfeld (Kassel),0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.347826,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Harleshausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
6,Jungfernkopf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.136364,0.136364,0.0,0.0,0.0,0.0,0.0
7,Kirchditmold,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.071429,0.0,0.02381,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.02381,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.02381,0.071429,0.02381,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.095238,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0
8,Kragenhof,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0
9,Mitte (Kassel),0.010204,0.010204,0.030612,0.040816,0.0,0.010204,0.0,0.010204,0.020408,0.010204,0.030612,0.010204,0.010204,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.05102,0.0,0.0,0.030612,0.040816,0.05102,0.0,0.0,0.0,0.0,0.0,0.020408,0.040816,0.0,0.0,0.0,0.010204,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.010204,0.010204,0.020408,0.0,0.0,0.0,0.020408,0.010204,0.010204,0.010204,0.010204,0.0,0.030612,0.010204,0.0,0.0,0.0,0.030612,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.010204,0.010204,0.010204,0.020408,0.0,0.010204,0.010204,0.0,0.0,0.0,0.0,0.020408,0.010204,0.030612,0.0,0.0,0.010204,0.010204,0.020408,0.0,0.0,0.010204,0.010204,0.0,0.0,0.0,0.020408,0.0,0.05102,0.010204,0.010204,0.0,0.0,0.0,0.0,0.020408,0.0,0.030612,0.010204,0.0


In [20]:
len(kl_grouped[kl_grouped["Coffee Shop"] > 0])

9

In [21]:
kl_grouped['Café'] > 0
kl_grouped["Coffee Shop"] > 0
kl_cafe = kl_grouped[["Neighborhoods","Café","Coffee Shop"]]
kl_cafe.head()

Unnamed: 0,Neighborhoods,Café,Coffee Shop
0,Bad Wilhelmshöhe,0.055556,0.027778
1,Bettenhausen (Kassel),0.050505,0.050505
2,Brasselsberg (Kassel),0.0,0.0
3,Fasanenhof (Kassel),0.052632,0.026316
4,Forstfeld (Kassel),0.0,0.0


#### 8. Cluster the neighborhoods with k-mean

In [37]:

# set number of clusters
kclusters = 3

kl_clustering = kl_cafe.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 0, 1, 0, 0, 2, 2, 0, 1], dtype=int32)

In [38]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_cafe.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [39]:
kl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
kl_merged.head()

Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels
0,Bad Wilhelmshöhe,0.055556,0.027778,1
1,Bettenhausen (Kassel),0.050505,0.050505,1
2,Brasselsberg (Kassel),0.0,0.0,0
3,Fasanenhof (Kassel),0.052632,0.026316,1
4,Forstfeld (Kassel),0.0,0.0,0


In [40]:
kl_merged = kl_merged.join(kl_df.set_index("Neighborhood"), on="Neighborhood")

print(kl_merged.shape)
kl_merged.head() # check the last columns!

(26, 6)


Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels,Latitude,Longitude
0,Bad Wilhelmshöhe,0.055556,0.027778,1,51.316602,9.420743
1,Bettenhausen (Kassel),0.050505,0.050505,1,51.313761,9.507324
2,Brasselsberg (Kassel),0.0,0.0,0,51.28865,9.41534
3,Fasanenhof (Kassel),0.052632,0.026316,1,51.33063,9.51951
4,Forstfeld (Kassel),0.0,0.0,0,51.297133,9.537739


In [41]:

# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(26, 6)


Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels,Latitude,Longitude
12,Nordshausen,0.0,0.0,0,51.28043,9.43168
22,Waldau (Kassel),0.0,0.0,0,51.286062,9.496285
20,Vorderer Westen,0.0,0.0,0,51.624627,9.61786
16,Rothenditmold,0.0,0.0,0,51.348083,9.469283
15,Philippinenhof-Warteberg,0.0,0.0,0,51.34541,9.49648
14,Oberzwehren,0.0,0.0,0,51.2776,9.45209
13,Oberneustadt,0.0,0.0,0,49.24248,8.26484
10,Niederzwehren,0.0,0.0,0,51.28327,9.46752
25,Wolfsanger / Hasenhecke,0.0,0.0,0,51.343815,9.542803
2,Brasselsberg (Kassel),0.0,0.0,0,51.28865,9.41534


##### Visualize the clustering

In [42]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### 9. Cluster Results

##### cluster 0

In [43]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels,Latitude,Longitude
12,Nordshausen,0.0,0.0,0,51.28043,9.43168
22,Waldau (Kassel),0.0,0.0,0,51.286062,9.496285
20,Vorderer Westen,0.0,0.0,0,51.624627,9.61786
16,Rothenditmold,0.0,0.0,0,51.348083,9.469283
15,Philippinenhof-Warteberg,0.0,0.0,0,51.34541,9.49648
14,Oberzwehren,0.0,0.0,0,51.2776,9.45209
13,Oberneustadt,0.0,0.0,0,49.24248,8.26484
10,Niederzwehren,0.0,0.0,0,51.28327,9.46752
25,Wolfsanger / Hasenhecke,0.0,0.0,0,51.343815,9.542803
2,Brasselsberg (Kassel),0.0,0.0,0,51.28865,9.41534


##### cluster 1

In [44]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels,Latitude,Longitude
9,Mitte (Kassel),0.05102,0.05102,1,51.316027,9.501226
1,Bettenhausen (Kassel),0.050505,0.050505,1,51.313761,9.507324
19,Unterneustadt,0.04,0.05,1,51.3147,9.51114
17,Südstadt (Kassel),0.05,0.05,1,51.31853,9.51183
24,Wesertor (Kassel),0.04,0.04,1,51.315808,9.505281
11,Nord-Holland (Kassel),0.05,0.05,1,51.31853,9.51183
3,Fasanenhof (Kassel),0.052632,0.026316,1,51.33063,9.51951
0,Bad Wilhelmshöhe,0.055556,0.027778,1,51.316602,9.420743


##### cluster 2

In [45]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Café,Coffee Shop,Cluster Labels,Latitude,Longitude
6,Jungfernkopf,0.045455,0.0,2,51.34066,9.46656
18,Süsterfeld-Helleböhn,0.028571,0.0,2,51.29676,9.45178
21,Wahlershausen,0.069767,0.0,2,51.320211,9.444337
7,Kirchditmold,0.071429,0.0,2,51.31697,9.441508
23,Wehlheiden,0.038961,0.012987,2,51.312324,9.465766


#### 10. Reuslts: 

The highest concentration of venues cafe / coffee shop is in cluster 1, with most cafes in cluster 0. Cluster 2 has a balanced picture