# Find the optimal Restaurant Location in Ulm using Machine Learning

## Introduction and Business Case

#### Setting:

Ulm is a city in the South of Germany. Its population is ca. 120.000 (EU notation). <br>
The city is divided up into 18 districts, which can further be split up into 55 neighborhoods. <br>
Just like in every other city, each district and neighborhood has its own characteristics regarding attractiveness, venues and population. <br>
This notebook tries to find locations for new restaurants in Ulm that promise high revenues.

#### Objective

This notebook tries to find the perfect neighborhood location for a new restaurant. <br>
Two questions are asked: <br>
* In which neighborhoods is the average revenue per restaurant especially high?
* Are these neighborhoods similar in their characteristics (i.e. venues and therefore attractiveness they offer that lead to people visiting these neighborhoods)? <br>

##### In case similar neighborhoods have similar restaurant revenues, a good location for a new restaurant can be found in those neighborhoods that are similar and have a high average restaurant revenue. The user can choose from the best neighborhoods.

#### Audience

The location of a restaurant is crucial for its revenue. <br>
This analysis addresses people who want to open a restaurant in Ulm and try to find a location that boosts their revenue.

#### Tools

* Folium maps (including choropleth)
* Foursquare API calls to obtain restaurant and venue information
* Geopy for geocoding
* K-Means Clustering from sklearn

## Data

#### Geospatial Information

Geospatial information, i.e. the 55 neighborhoods of Ulm as well as their borders are obtained from a <b>GeoJSON</b>-file found on this website http://daten.ulm.de/datenkatalog/offene_daten/31. <br>
This file is provided by the city of Ulm. <br>
It will be loaded into the notebook with the help of the "json" package contained in python. <br>
See the image below:

<img src="JSON.PNG">

#### Neighborhood Venue Information

Information on the venues of each neighborhood identified is obtained from the Places-Endpoint of Foursquare, a company that accumulates gigantic spatial datasets, see https://enterprise.foursquare.com/products/places <br> 
Foursquare powers Apple Maps, amongst others. <br>
The notebook will communicate with the Places-Endpoint of Foursquare via RESTful API calls handled by python package "requests".

#### Restaurant Revenue Information

The average restaurant revenue per neighborhood in Ulm is obtained from a csv-file found in a data catalogue provided by the city of Ulm. See the website http://daten.ulm.de/datenkatalog/offene_daten/40. <br>
The average revenue is an estimate based on statistical inference, since not every restaurant reports its revenue. <br>
The dataset contains revenue values for 36 from 55 neighborhoods. Missing values will be filled with the average reported revenue in my code. <br>
The dataset matches to the GeoJSON-file based on the "Neighborhood ID". <br>
See the image below:

<img src="Revenue.PNG">

## Code

### 1. Load data and pre-process

##### import relevant libraries

In [10]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


##### Load GeoJSON-file of Ulm neighborhoods, saved in a txt-document and write it into a JSON-file

In [9]:
# Load article, turn it into soup and get the <table>s
article = open("ulm_data.txt", encoding='utf-8').read()

# Extract the columns and write to a semicolon-delimited text file
with open("ulm_data.json", "w", encoding='utf-8') as json_file:
        print(article, file=json_file)

##### Load data from the JSON-file

In [71]:
with open('ulm_data.json', encoding='utf-8') as json_data:
    ulm_data = json.load(json_data)

##### Check data

In [49]:
neighborhoods_data = ulm_data['features']
neighborhoods_data

[{'type': 'Feature',
  'id': 0,
  'properties': {'name': 'Altstadt',
   'cartodb_id': 'ul-stv110',
   'created_at': '2013-02-20T04:06:07.501Z',
   'updated_at': '2013-02-20T04:06:07.744Z'},
  'geometry': {'type': 'Polygon',
   'coordinates': [[[9.999865, 48.397205],
     [9.999026, 48.396803],
     [9.998381, 48.396523],
     [9.998004, 48.396379],
     [9.997414, 48.396159],
     [9.996761, 48.395983],
     [9.996015, 48.39579],
     [9.995257, 48.395606],
     [9.994497, 48.395336],
     [9.994353, 48.395268],
     [9.993352, 48.39503],
     [9.992015, 48.394713],
     [9.990917, 48.394405],
     [9.990457, 48.394233],
     [9.989975, 48.393986],
     [9.989417, 48.393627],
     [9.98904, 48.393362],
     [9.98861, 48.393018],
     [9.986351, 48.394098],
     [9.985164, 48.394822],
     [9.984603, 48.395146],
     [9.984537, 48.395119],
     [9.984504, 48.395104],
     [9.983396, 48.395957],
     [9.981522, 48.395177],
     [9.981459, 48.395751],
     [9.981362, 48.397486],
     [9.9

##### Load neighborhood names and IDs from JSON

In [165]:
neighborhood_name_list=[]
neighborhood_id_list = []
for data in neighborhoods_data: 
    neighborhood_name = data["properties"]["name"]   
    neighborhood_id = data["properties"]["cartodb_id"]
    neighborhood_name_list.append(neighborhood_name)
    neighborhood_id_list.append(neighborhood_id)

##### Calculate the centroid of each neighborhood based on coordinates of the neighborhood's borders reported in the GeoJSON-file

In [74]:
lat_list_neighborhoods = []
long_list_neighborhoods = []

for data in neighborhoods_data: 
    lat_list_neigh=[]
    long_list_neigh=[]
    
    total_coordinates = data['geometry']['coordinates'][0]
    
    counter = len(total_coordinates)
    
    for i in range(counter):
        long_list_neigh.append(total_coordinates[i][0])
        lat_list_neigh.append(total_coordinates[i][1])
    
    lat_array = np.array(lat_list_neigh)
    long_array = np.array(long_list_neigh)
    
    lat_mean = np.mean(lat_array)
    long_mean = np.mean(long_array)
    
    lat_list_neighborhoods.append(lat_mean)
    long_list_neighborhoods.append(long_mean)

##### Create a dataframe that contains the neighborhoods' names, IDs, and centroid coordinates

In [166]:
column_names = ['Neighborhood', 'Neighborhood ID', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ulm_neighborhoods = pd.DataFrame(columns=column_names)
ulm_neighborhoods['Neighborhood'] = neighborhood_name_list
ulm_neighborhoods['Neighborhood ID'] = neighborhood_id_list
ulm_neighborhoods['Latitude'] = lat_list_neighborhoods
ulm_neighborhoods['Longitude'] = long_list_neighborhoods

ulm_neighborhoods.head()

Unnamed: 0,Neighborhood,Neighborhood ID,Latitude,Longitude
0,Altstadt,ul-stv110,48.399682,9.994659
1,Neustadt,ul-stv111,48.403614,9.994706
2,Karlstraße,ul-stv112,48.405194,9.986365
3,Michelsberg,ul-stv113,48.409727,9.986235
4,Gaisenberg,ul-stv114,48.408025,9.998274


##### Check the data types of the dataframe

In [167]:
ulm_neighborhoods.dtypes

Neighborhood        object
Neighborhood ID     object
Latitude           float64
Longitude          float64
dtype: object

##### Replace German special characters "ß", "ä", "ö", "ü" with their English counter-parts

In [168]:
ulm_neighborhoods["Neighborhood"] = ulm_neighborhoods["Neighborhood"].str.replace(pat="ß", repl="ss", regex=True)
ulm_neighborhoods["Neighborhood"] = ulm_neighborhoods["Neighborhood"].str.replace(pat="ä", repl="ae", regex=True)
ulm_neighborhoods["Neighborhood"] = ulm_neighborhoods["Neighborhood"].str.replace(pat="ö", repl="oe", regex=True)
ulm_neighborhoods["Neighborhood"] = ulm_neighborhoods["Neighborhood"].str.replace(pat="ü", repl="ue", regex=True)
ulm_neighborhoods.head()

Unnamed: 0,Neighborhood,Neighborhood ID,Latitude,Longitude
0,Altstadt,ul-stv110,48.399682,9.994659
1,Neustadt,ul-stv111,48.403614,9.994706
2,Karlstrasse,ul-stv112,48.405194,9.986365
3,Michelsberg,ul-stv113,48.409727,9.986235
4,Gaisenberg,ul-stv114,48.408025,9.998274


### 2. Create a map of Ulm with all neighborhoods

##### Use Geopy to obtain latitude and longitude of Ulm. I call my user agent 'Ulm_explorer'

In [169]:
ulm_address = "Ulm, Germany"

geolocator = Nominatim(user_agent="ulm_explorer")
location = geolocator.geocode(ulm_address)
ulm_latitude = location.latitude
ulm_longitude = location.longitude
print("The geograpical coordinates of Ulm, Germany are {}, {}.".format(ulm_latitude, ulm_longitude))

The geograpical coordinates of Ulm, Germany are 48.3974003, 9.9934336.


##### The following cell will render a map of Ulm including all neighborhoods. If you click on a marker on the map, a popup will show up indicating the neighborhood name.

In [170]:
# create map of Ulm using latitude and longitude values
map_ulm = folium.Map(location=[ulm_latitude, ulm_longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(ulm_neighborhoods["Latitude"], ulm_neighborhoods["Longitude"], ulm_neighborhoods["Neighborhood"]):
    label = "{}".format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill=True,
        fill_color="#3186cc",
        fill_opacity=0.7,
        parse_html=False).add_to(map_ulm)  

# show map    
map_ulm

### 3. Fetch restaurant information for each neighborhood of Ulm using Foursquare API calls

##### Initialize Foursquare credentials

In [171]:
CLIENT_ID = 'AG3PTHAAGC2E5BJNTYTYFZ0024IBFGGT223CQ1RFRS1RL0CS' 
CLIENT_SECRET = 'SRA2OIUMNQ3B2JZSF5UPEA1Q2Z3ELKJVQ4PQIKAGPRSU05YF'
VERSION = '20180605' # Foursquare API version

In [130]:
url = "https://api.foursquare.com/v2/venues/explore?&categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            ulm_latitude, 
            ulm_longitude, 
            500, 
            10)
            
# make the GET request
results = requests.get(url).json()["response"]["groups"][0]#["items"][1]#["venue"]["name"]
results

{'type': 'Recommended Places',
 'name': 'recommended',
 'items': [{'reasons': {'count': 0,
    'items': [{'summary': 'This spot is popular',
      'type': 'general',
      'reasonName': 'globalInteractionReason'}]},
   'venue': {'id': '5927f40f4a1cc037f29af7ba',
    'name': 'Barfüßer - Die Hausbrauerei',
    'location': {'address': 'Neue Straße 87-89',
     'lat': 48.3977110748162,
     'lng': 9.99371978343288,
     'labeledLatLngs': [{'label': 'display',
       'lat': 48.3977110748162,
       'lng': 9.99371978343288}],
     'distance': 40,
     'postalCode': '89073',
     'cc': 'DE',
     'city': 'Ulm',
     'state': 'Baden-Württemberg',
     'country': 'Deutschland',
     'formattedAddress': ['Neue Straße 87-89', '89073 Ulm', 'Deutschland']},
    'categories': [{'id': '56aa371ce4b08b9a8d573576',
      'name': 'Swabian Restaurant',
      'pluralName': 'Swabian Restaurants',
      'shortName': 'Swabian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/german_',
  

##### Create a function that scrapes all restaurants for each neighborhood in a radius of 500 m from its centroid from Foursquare

In [172]:
def getNearbyRestaurants(names, IDs, latitudes, longitudes, radius=500, LIMIT=15):
    
    restaurants_list=[]
    for name, ID, lat, lng in zip(names, IDs, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = "https://api.foursquare.com/v2/venues/explore?&categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        for counter in range(len(results)):
            restaurants_list.append([(
                name, 
                ID, 
                lat, 
                lng, 
                results[counter]["venue"]["name"], 
                results[counter]["venue"]["id"],
                results[counter]["venue"]["categories"][0]["name"])])

    # create the dataframe of venues    
    nearby_restaurants = pd.DataFrame([item for restaurant_list in restaurants_list for item in restaurant_list])
    nearby_restaurants.columns = ["Neighborhoods", 
                                  "Neighborhood ID",
                                  "Neighborhood Latitude", 
                                  "Neighborhood Longitude", 
                                  "Restaurant", 
                                  "Restaurant ID",
                                  "Restaurant Category"]
    
    return(nearby_restaurants)

##### Call the function from above with Ulm data and save result in dataframe "ulm_restaurants"

In [312]:
ulm_restaurants = getNearbyRestaurants(names=ulm_neighborhoods["Neighborhood"],
                                       IDs=ulm_neighborhoods["Neighborhood ID"],
                                       latitudes=ulm_neighborhoods["Latitude"],
                                       longitudes=ulm_neighborhoods["Longitude"]
                                      )

Altstadt
Neustadt
Karlstrasse
Michelsberg
Gaisenberg
Wilhelmsburg
Wielandstrasse
Friedrichsau
Safranberg
Eberhardtstrasse
Eichenplatz
Braunland
Boefingen-Gewebegebiet
Boefingen Sued
Boefingen Mitte
Boefingen Ost
Obertalfingen
Boefingen Nord
Noerdliche Wagnerstrasse
Blaubeurer Strasse-Gewerbegebiet
Schillerstrasse
Suedliche Wagnerstrasse
Donaubastion
Galgenberg
Unterer Kuhberg
Sedanstrasse
Saarlandstrasse
Mittlerer Kuhberg
Maehringer Weg
Eselsberg Mitte
Hetzenbaeumle
Lehrer Tal
Universitaet
Hasenkopf
Am Weinberg
Wanne
Tuermle
Haeringsacker
Alt-Soeflingen
Sonnenstrasse
Auf der Laue
Soeflingen-Gewerbegebiet
Roter Berg-Alt
Harthausen
Roter Berg-Neu
Alt-Wiblingen
Erenlauh
Wiblingen-Gewerbegebiet
Tannenplatz West
Tannenhof
Tannenplatz Sued
Tannenplatz Mitte
Eschwiesen
Daimlerstrasse
Riedhof


##### Check the new dataframe

In [313]:
print("Shape of dataframe:", ulm_restaurants.shape)
ulm_restaurants.head()

Shape of dataframe: (134, 7)


Unnamed: 0,Neighborhoods,Neighborhood ID,Neighborhood Latitude,Neighborhood Longitude,Restaurant,Restaurant ID,Restaurant Category
0,Altstadt,ul-stv110,48.399682,9.994659,simit cafe&bakery,5676d1cb498e2ee090e70fa8,Bakery
1,Altstadt,ul-stv110,48.399682,9.994659,Barfüßer - Die Hausbrauerei,5927f40f4a1cc037f29af7ba,Swabian Restaurant
2,Altstadt,ul-stv110,48.399682,9.994659,Zuckerbäcker,4bd9b1dc0115c9b60d3a7880,Bakery
3,Altstadt,ul-stv110,48.399682,9.994659,Damn Burger Co.,4df101c245ddbf3897d4ba81,Burger Joint
4,Altstadt,ul-stv110,48.399682,9.994659,Crêperie Kornhäusle,4ca3606d7f84224ba548c458,Creperie


In [134]:
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format("5927f40f4a1cc037f29af7ba", CLIENT_ID, CLIENT_SECRET, VERSION)
# make the GET request
result = requests.get(url).json()
result

{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5f203908bef74815759353b9'},
 'response': {}}

##### Define a function that scrapes the rating for each restaurant

In [123]:
def getRestaurantRating(neighborhoods, restaurants, IDs):
    rating_list=[]
    for neighborhood, restaurant, ID in zip(neighborhoods, restaurants, IDs):
        print(ID + " ---------- " + restaurant)
            
        # create the API request URL
        url = "https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}".format(ID, CLIENT_ID, CLIENT_SECRET, VERSION)
           
        # make the GET request
        result = requests.get(url).json()["response"]["venue"]
                        
        try:
            rating = result["response"]["venue"]["rating"]
        except:
            rating = 0
        
        # return only relevant information for each nearby venue
        rating_list.append([(neighborhood, restaurant, ID, rating)])
        
    # create the dataframe of venues    
    rating_frame = pd.DataFrame([item for rating in rating_list for item in rating])
    rating_frame.columns = ["Neighborhoods",
                  "Restaurant", 
                  "Restaurant ID",
                  "Rating"]
    
    return(rating_frame)

##### Call the function from above with Ulm restaurants and save result in dataframe "ulm_restaurants" (not used for analysis, but interesting to know)

In [125]:
ulm_restaurant_ratings = getRestaurantRating(neighborhoods=ulm_restaurants["Neighborhoods"], restaurants=ulm_restaurants["Restaurant"], IDs=ulm_restaurants["Restaurant ID"])

5676d1cb498e2ee090e70fa8 ---------- simit cafe&bakery
5927f40f4a1cc037f29af7ba ---------- Barfüßer - Die Hausbrauerei
4bd9b1dc0115c9b60d3a7880 ---------- Zuckerbäcker
4df101c245ddbf3897d4ba81 ---------- Damn Burger Co.
4ca3606d7f84224ba548c458 ---------- Crêperie Kornhäusle
53494c58498eb9a16c71ed9b ---------- Fräulein Berger
4be56792bcef2d7f22ba03e5 ---------- Café im Kornhauskeller
4ba4e828f964a52088c138e3 ---------- L'Osteria
4bf40dc0e5eba593db2d1f90 ---------- Asia Van
573f56f0498e85d47c8c673e ---------- Taj Mahal
4e2222a8d4c0d32590f6d757 ---------- Arslan
4b450231f964a520250126e3 ---------- Choclet
4d4c5e30f523a143ffca759d ---------- BellaVista
53e2a351498eea60fec99235 ---------- Ouzeria
4bb604592f70c9b657c08430 ---------- John Benton
4bd9b1dc0115c9b60d3a7880 ---------- Zuckerbäcker
53494c58498eb9a16c71ed9b ---------- Fräulein Berger
4ca3606d7f84224ba548c458 ---------- Crêperie Kornhäusle
4bf40dc0e5eba593db2d1f90 ---------- Asia Van
50c22f14e4b08b77f93fca44 ---------- Steak Restaur

KeyError: 'venue'

##### Read average revenue per restaurant for each neighborhood from csv-file

In [314]:
revenue = pd.read_csv("Revenue.csv")
revenue.set_index("Unnamed: 0", drop=True, inplace=True)
revenue.head()

Unnamed: 0_level_0,ID,Revenue
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1
0,ul-stv160,406370.0
1,ul-stv180,269584.0
2,ul-stv110,334637.0
3,ul-stv156,615312.0
4,ul-stv141,113757.0


##### Join revenue information to existing restaurant dataframe

In [315]:
ulm_restaurants = ulm_restaurants.join(revenue.set_index("ID"), on="Neighborhood ID")
ulm_restaurants["Revenue"].fillna(ulm_restaurants["Revenue"].mean(), inplace=True)
ulm_restaurants.head()

Unnamed: 0,Neighborhoods,Neighborhood ID,Neighborhood Latitude,Neighborhood Longitude,Restaurant,Restaurant ID,Restaurant Category,Revenue
0,Altstadt,ul-stv110,48.399682,9.994659,simit cafe&bakery,5676d1cb498e2ee090e70fa8,Bakery,334637.0
1,Altstadt,ul-stv110,48.399682,9.994659,Barfüßer - Die Hausbrauerei,5927f40f4a1cc037f29af7ba,Swabian Restaurant,334637.0
2,Altstadt,ul-stv110,48.399682,9.994659,Zuckerbäcker,4bd9b1dc0115c9b60d3a7880,Bakery,334637.0
3,Altstadt,ul-stv110,48.399682,9.994659,Damn Burger Co.,4df101c245ddbf3897d4ba81,Burger Joint,334637.0
4,Altstadt,ul-stv110,48.399682,9.994659,Crêperie Kornhäusle,4ca3606d7f84224ba548c458,Creperie,334637.0


In [316]:
print(ulm_restaurants["Revenue"].nunique())

38


In [317]:
ulm_neighs_grouped = ulm_restaurants.groupby(["Neighborhoods", "Neighborhood ID"], as_index=False).mean()
print(ulm_neighs_grouped.shape)
ulm_neighs_grouped.head()

(38, 5)


Unnamed: 0,Neighborhoods,Neighborhood ID,Neighborhood Latitude,Neighborhood Longitude,Revenue
0,Alt-Soeflingen,ul-stv160,48.39737,9.949305,406370.0
1,Alt-Wiblingen,ul-stv180,48.356191,9.984316,269584.0
2,Altstadt,ul-stv110,48.399682,9.994659,334637.0
3,Am Weinberg,ul-stv156,48.411689,9.952772,615312.0
4,Blaubeurer Strasse-Gewerbegebiet,ul-stv141,48.401384,9.971846,113757.0


### 4. Draw choropleth map of Ulm based on avg. restaurant revenue per neighborhood

In [318]:
ulm_geojson = r'ulm_data.json' # geojson file

# create a plain ulm map
ulm_choropleth = folium.Map(location=[ulm_latitude, ulm_longitude], zoom_start=12)

# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
ulm_choropleth.choropleth(
    geo_data=ulm_geojson,
    data=ulm_neighs_grouped,
    columns=["Neighborhood ID", "Revenue"],
    key_on="feature.properties.cartodb_id",
    fill_color='YlOrRd', 
    fill_opacity=0.9, 
    line_opacity=0.2,
    legend_name="Average Restaurant Revenue per Neighborhood"
)

# display map
ulm_choropleth

##### Add markers on the map. If you click on the marker, a popup will show up containing the neighborhood name

In [319]:
for lat, lng, neighborhood in zip(ulm_neighs_grouped["Neighborhood Latitude"], ulm_neighs_grouped["Neighborhood Longitude"], ulm_neighs_grouped["Neighborhoods"]):
    label = "{}".format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill=True,
        fill_color="#3186cc",
        fill_opacity=0.7,
        parse_html=False).add_to(ulm_choropleth)  
    
ulm_choropleth

### 5. Scrape venue information for the neighborhoods of Ulm in order to calculate the similarity of the neighborhoods based on their venue offerings

##### Define a function that calls Foursquare API to obtain venues

In [275]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=30):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    # create the dataframe of venues    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ["Neighborhoods", 
                  "Neighborhood Latitude", 
                  "Neighborhood Longitude", 
                  "Venue", 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

##### Call the function from above with Ulm data and save result in dataframe "ulm_venues"

In [276]:
ulm_venues = getNearbyVenues(names=ulm_neighs_grouped["Neighborhoods"],
                                   latitudes=ulm_neighs_grouped["Neighborhood Latitude"],
                                   longitudes=ulm_neighs_grouped["Neighborhood Longitude"]
                                  )

Alt-Soeflingen
Alt-Wiblingen
Altstadt
Am Weinberg
Blaubeurer Strasse-Gewerbegebiet
Boefingen Sued
Boefingen-Gewebegebiet
Braunland
Daimlerstrasse
Donaubastion
Eberhardtstrasse
Eichenplatz
Eselsberg Mitte
Friedrichsau
Gaisenberg
Galgenberg
Haeringsacker
Hasenkopf
Hetzenbaeumle
Karlstrasse
Lehrer Tal
Maehringer Weg
Michelsberg
Neustadt
Noerdliche Wagnerstrasse
Riedhof
Saarlandstrasse
Safranberg
Schillerstrasse
Sedanstrasse
Soeflingen-Gewerbegebiet
Sonnenstrasse
Suedliche Wagnerstrasse
Tannenplatz Mitte
Tuermle
Universitaet
Wanne
Wielandstrasse


##### Check the new dataframe

In [320]:
print(ulm_venues.shape)
ulm_venues.head()

(255, 7)


Unnamed: 0,Neighborhoods,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alt-Soeflingen,48.39737,9.949305,Klosterhof,48.396469,9.954176,Beer Garden
1,Alt-Soeflingen,48.39737,9.949305,Hotel Löwen,48.396795,9.954449,Hotel
2,Alt-Soeflingen,48.39737,9.949305,Zum Schatten,48.395436,9.955066,Diner
3,Alt-Soeflingen,48.39737,9.949305,Eiscafé De Marco,48.395458,9.954849,Ice Cream Shop
4,Alt-Soeflingen,48.39737,9.949305,Autohaus Kreisser GmbH & Co. KG,48.398592,9.952741,Auto Dealership


##### Check how many venue categories were found in Ulm

In [321]:
print('There are {} unique categories.'.format(len(ulm_venues['Venue Category'].unique())))

There are 88 unique categories.


### 6. Prepare dataframe by processing the venue features to be used to calculate the similarity of neighborhoods

##### Create a dataframe that lists for each neighborhood whether the venue category can be found in the neighborhood or not

In [279]:
# one hot encoding
ulm_onehot = pd.get_dummies(ulm_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ulm_onehot['Neighborhoods'] = ulm_venues['Neighborhoods'] 

# move neighborhood column to the first column
fixed_columns = [ulm_onehot.columns[-1]] + list(ulm_onehot.columns[:-1])
ulm_onehot = ulm_onehot[fixed_columns]

ulm_onehot.head()

Unnamed: 0,Neighborhoods,Aquarium,Asian Restaurant,Auto Dealership,Bagel Shop,Bakery,Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Brewery,Burger Joint,Bus Stop,Café,Chinese Restaurant,Church,Cocktail Bar,Coffee Shop,Concert Hall,Construction & Landscaping,Creperie,Dessert Shop,Diner,Drugstore,Electronics Store,Event Service,Exhibit,Farmers Market,Forest,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,German Restaurant,Greek Restaurant,Gym,Gym / Fitness Center,Gym Pool,Historic Site,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jewelry Store,Kebab Restaurant,Light Rail Station,Lounge,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Monument / Landmark,Museum,Nightclub,Organic Grocery,Outdoor Supply Store,Park,Pet Store,Pizza Place,Plaza,Pool,Print Shop,Restaurant,Rock Climbing Spot,Rock Club,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shipping Store,Smoothie Shop,Spanish Restaurant,Stadium,Steakhouse,Supermarket,Swabian Restaurant,Tennis Court,Theater,Theme Park Ride / Attraction,Toy / Game Store,Train Station,Trattoria/Osteria,Turkish Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Alt-Soeflingen,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alt-Soeflingen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alt-Soeflingen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alt-Soeflingen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alt-Soeflingen,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


##### Check the size of the new dataframe

In [280]:
ulm_onehot.shape

(255, 89)

##### Group the neighborhoods and calculate the mean of occurence of each category per neighborhood

In [281]:
ulm_venues_grouped = ulm_onehot.groupby('Neighborhoods').mean().reset_index()
ulm_venues_grouped.head()

Unnamed: 0,Neighborhoods,Aquarium,Asian Restaurant,Auto Dealership,Bagel Shop,Bakery,Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Brewery,Burger Joint,Bus Stop,Café,Chinese Restaurant,Church,Cocktail Bar,Coffee Shop,Concert Hall,Construction & Landscaping,Creperie,Dessert Shop,Diner,Drugstore,Electronics Store,Event Service,Exhibit,Farmers Market,Forest,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,German Restaurant,Greek Restaurant,Gym,Gym / Fitness Center,Gym Pool,Historic Site,Home Service,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jewelry Store,Kebab Restaurant,Light Rail Station,Lounge,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Monument / Landmark,Museum,Nightclub,Organic Grocery,Outdoor Supply Store,Park,Pet Store,Pizza Place,Plaza,Pool,Print Shop,Restaurant,Rock Climbing Spot,Rock Club,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shipping Store,Smoothie Shop,Spanish Restaurant,Stadium,Steakhouse,Supermarket,Swabian Restaurant,Tennis Court,Theater,Theme Park Ride / Attraction,Toy / Game Store,Train Station,Trattoria/Osteria,Turkish Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Alt-Soeflingen,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alt-Wiblingen,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Altstadt,0.0,0.0,0.0,0.0,0.066667,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.133333,0.066667,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0
3,Am Weinberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Blaubeurer Strasse-Gewerbegebiet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0


##### Check the size of the new dataframe

In [282]:
ulm_venues_grouped.shape

(37, 89)

##### Define a function that returns the most common venues per neighborhood

In [283]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:] #go through the category columns
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [284]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd'] # to get 1st, 2nd, 3rd as column headers

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind])) # top three most common venues
    except:
        columns.append('{}th Most Common Venue'.format(ind+1)) # fourth, fifth and higher most common venues

# create a new dataframe with column neighborhood and 10 most common venues
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhoods'] = ulm_venues_grouped['Neighborhoods']

for ind in np.arange(ulm_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ulm_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alt-Soeflingen,Burger Joint,Beer Garden,Diner,Ice Cream Shop,Print Shop,Italian Restaurant,Hotel,Auto Dealership,Home Service,Furniture / Home Store
1,Alt-Wiblingen,Hotel,Bakery,Miscellaneous Shop,Yoga Studio,Garden,Electronics Store,Event Service,Exhibit,Farmers Market,Forest
2,Altstadt,Café,Chinese Restaurant,Plaza,Bakery,Indian Restaurant,Dessert Shop,Church,Salad Place,Burger Joint,Mediterranean Restaurant
3,Am Weinberg,Bus Stop,Bistro,Yoga Studio,Gas Station,Electronics Store,Event Service,Exhibit,Farmers Market,Forest,Fried Chicken Joint
4,Blaubeurer Strasse-Gewerbegebiet,Hotel,Big Box Store,Drugstore,Electronics Store,Organic Grocery,Nightclub,Scandinavian Restaurant,Fried Chicken Joint,Furniture / Home Store,Supermarket


### 7. Cluster the neighborhoods with k-Means (similarity-based)

##### Run k-Means to cluster the neighborhoods. Five clusters will do the job for me. I will use the grouped dataframe with the mean occurence of each venue category per neighborhood. <br> K-Means will cluster similar neighborhoods (i.e. similar occurence of same venues) into the same cluster.

In [285]:
# set number of clusters
kclusters = 5

ulm_grouped_clustering = ulm_venues_grouped.drop('Neighborhoods', axis=1) # get rid of this feature as it is categorical

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ulm_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 0, 3, 1, 0, 3, 3, 3])

##### Create a dataframe including the cluster for each neighborhood

In [286]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
neighborhoods_venues_sorted['Cluster Label'] = neighborhoods_venues_sorted['Cluster Label'].astype('int32')
ulm_merged = ulm_venues_grouped.drop(ulm_venues_grouped.columns[1:90], axis=1)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ulm_merged = ulm_merged.join(neighborhoods_venues_sorted.set_index('Neighborhoods'), on='Neighborhoods')
ulm_merged = ulm_merged.join(ulm_neighborhoods.set_index('Neighborhood'), on='Neighborhoods')

ulm_merged.head() #check columns to the right

Unnamed: 0,Neighborhoods,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Neighborhood ID,Latitude,Longitude
0,Alt-Soeflingen,3,Burger Joint,Beer Garden,Diner,Ice Cream Shop,Print Shop,Italian Restaurant,Hotel,Auto Dealership,Home Service,Furniture / Home Store,ul-stv160,48.39737,9.949305
1,Alt-Wiblingen,3,Hotel,Bakery,Miscellaneous Shop,Yoga Studio,Garden,Electronics Store,Event Service,Exhibit,Farmers Market,Forest,ul-stv180,48.356191,9.984316
2,Altstadt,3,Café,Chinese Restaurant,Plaza,Bakery,Indian Restaurant,Dessert Shop,Church,Salad Place,Burger Joint,Mediterranean Restaurant,ul-stv110,48.399682,9.994659
3,Am Weinberg,0,Bus Stop,Bistro,Yoga Studio,Gas Station,Electronics Store,Event Service,Exhibit,Farmers Market,Forest,Fried Chicken Joint,ul-stv156,48.411689,9.952772
4,Blaubeurer Strasse-Gewerbegebiet,3,Hotel,Big Box Store,Drugstore,Electronics Store,Organic Grocery,Nightclub,Scandinavian Restaurant,Fried Chicken Joint,Furniture / Home Store,Supermarket,ul-stv141,48.401384,9.971846


##### Check number of neighborhoods per cluster

In [322]:
ulm_merged['Cluster Label'].value_counts()

3    19
1    10
0     6
4     1
2     1
Name: Cluster Label, dtype: int64

### 8. Visualize the clusters

##### Create choropleth map

In [323]:
# create map
map_clusters = folium.Map(location=[ulm_latitude, ulm_longitude], zoom_start=12) # Ulm map

map_clusters.choropleth(
    geo_data=ulm_geojson,
    data=ulm_neighs_grouped,
    columns=["Neighborhood ID", "Revenue"],
    key_on="feature.properties.cartodb_id",
    fill_color='YlOrRd', 
    fill_opacity=0.9, 
    line_opacity=0.2,
    legend_name="Average Restaurant Revenue per Neighborhood"
)

map_clusters

##### Add markers on the map that show the different clusters. If you click on the marker, a popup will show up containing the neighborhood name and cluster number

In [324]:
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neigh, cluster in zip(ulm_merged['Latitude'], ulm_merged['Longitude'], ulm_merged['Neighborhoods'], ulm_merged['Cluster Label']):
    label = folium.Popup(str(neigh) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters
# display map
map_clusters

## Results

We can see that **neighborhoods of cluster number 0** (red circles on the map above) on average have the **highest restaurant revenue** (dark red choropleth color). <br>
**What does this mean?** Well, first of all: **neighborhoods of the same circle color are similar with respect to the venues they offer**. This is why k-Means has placed them in the same cluster. <br> 
Neighborhoods of cluster number 0 (red circle) seem to offer similar venues that attract people to visit these neighborhoods. Once people visit these neighborhoods, they become hungry and end up in  a restaurant. This behavior **boosts the revenues of the restaurants of these neighborhoods**. 

**What venues are the ones boosting revenues?** <br>
Each neighborhood contained in cluster number 0 possesses venues that are connected to **"nature"**, i.e. parks, rivers, forest, etc.. <br>
**People seem to enjoy restaurants that are placed in "green" neighborhoods**. 

## Discussion

The outcome of this analysis is interesting. I would recommend people to open up a restaurant in one of the neighborhoods of cluster number 0. <br>
Nevertheless, this analysis should not be the only indicator for choosing a proper restaurant location. Not taken into account are property prices. I assume that the identified locations have high rental prices, which is a significant cost factor that has to be set into relation to the expected high revenue. <br>
In addition, I do not know how accurate the average restaurant revenue per neighborhood data is. It is the best I could find and should be workable. <br>
**In total, I think that the analysis helps to find good restaurant locations and should be taken into account as one of several factors that lead to a decision.**

## Conclusion

We have found a way to compare neighborhoods based on their similarity regarding venues. The neighborhoods are clustered and set into relation to expected restaurant revenues. This was done using k-Means and has created interesting results. <br>
In order to deepen the understanding in the next step, further analyses should apply **additional machine learning algorithms**. Since I have assigned numerical values to venue features, one could run regression models that identify the numerical impact of features on expected revenues. <br>
**Thanks for taking the time and I hope you found this notebook educational.**