# The Battle of Neighborhoods (Analysing a City of choice: Bonn, Germany)

Acknowledgement: This Juypter Notebook file is based on a notebook created by [Alex Aklson](https://www.linkedin.com/in/aklson/) and [Polong Lin](https://www.linkedin.com/in/polonglin/). For Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). The original notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/). 
<hr>
I will use the original notebook and adapt it for completing this Capstone project. 



## Introduction
The skills and tools learned in the previous sessions, specifically to use location data to explore a geographical location, will be applied by analysing the neighborhoods of the City of "Bonn" in Germany.


## 1. A description of the problem and a discussion of the background. 

For the "story telling" I will follow suggestion 2 (of the project description) and try to find answers for questions such as:

    "If someone is looking to open a restaurant, where would you recommend that they open it? If a contractor is trying to start their own business, where would you recommend that they setup their office? If you would move for a job which neighborhood would you choose for housing?"
    
As this project will be <b>peer-reviewed</b>, I assume that the reader knows the basics of Python Programming, API calls, the Folium Library, Choropleth maps, and k-clustering. As the reader might be unfamiliar with my City of Choice, i.e. the City of Bonn, I recommend that the interested reader consults the wikipedia page for additional information (https://en.wikipedia.org/wiki/Bonn).

The population of Bonn is with around 300 000 people much smaller than the previously studied Cities of New York and Toronto. Nevertheless, the city of Bonn was from 1949 to 1990, the capital of West Germany. Still roughly a third of all ministerial jobs in Germany are located in Bonn and the headquarters of Deutsche Post DHL and Deutsche Telekom, both DAX-listed corporations, are in Bonn. 

The Neighborhood Analysis exercised for New York and Toronto has clearly shown that location data retrieved from Foursquare severs provide valuable information about the local distribution of leisure venues. We have analysed the distribution of leisure venues in the context of <b>tourism</b> and of <b>moving from one neighborhood to another for a job offer</b>. 

Additional to providing such information for the City of Bonn, I will demonstrate how a visualization of the spatial distributions of venues such as cafés and restaurants helps to <b>understand the field of competitors</b>. This information is not only valuable for a person who wants to start such a buisness, but as well for investors who have to decide if such a buisness might be sucessful.

## 2. A description of the data and how it will be used to solve the problem

As I was quite impressed by the provided examples using <b>choropleth maps</b> to visualize crime rates in San Francisco or Migration to Canada, I will use a <b>open data GeoJSON file</b> (https://opendata.bonn.de/dataset/fl%C3%A4chen-der-ortsteile) for allowing choropleth map visualization in the context of analysing the City of Bonn.

Additionally, I will use publicly availiable data about the <b>population distribution</b> per <b>municipal district </b> from wikipedia (https://en.wikipedia.org/wiki/Bonn).

The <b>geocoder package</b> allowing to retrieve <b>arcgis data</b> by API calls will be used to retrieve longitude and latitude values.

As a correction of geo data was necessary, additional geo data will be used by copy-pasting wikipedia data. A parser approach as for the Toronto neighborhood analysis did not prove to be efficient as not a single page, but several pages needed to be searched for information.

The core data for this analysis will be retrieved by API calls from <b>Foursquare</b> servers as for the Toronto and New York analysis.


The <b>usage of the data to solve the problem</b>, i.e. providing information and finding answers relevant for tourists, people who want to change neighborhood, business founders and investors will be <b>similar</b> to the approach applied in the Toronto and New York analysis. The subsequent code and explanatory markdown cells illustrate how the data will be used solve the problem.


## Import necessary Libraries

In [1]:
!pip install geocoder
!pip install lxml
%matplotlib inline

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import geocoder # import geocoder
import folium # map rendering library
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm    # Matplotlib and associated plotting modules
import matplotlib.colors as colors
from sklearn.cluster import KMeans  # import k-means from clustering stage

print('Libraries imported.')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 3.9MB/s ta 0:00:011
[?25hCollecting click (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/dd/c0/4d8f43a9b16e289f36478422031b8a63b54b6ac3b1ba605d602f10dd54d6/click-7.1.1-py2.py3-none-any.whl (82kB)
[K     |████████████████████████████████| 92kB 16.9MB/s eta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Collecting future (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 5.6MB/s eta 0:00:01
Building wheels

In [2]:
# download countries geojson file
!wget --quiet https://opendata.arcgis.com/datasets/ec56dd4de6374c54b92b4eb5763edfce_0.geojson -O districts_bonn.json
#https://hub.arcgis.com/datasets/esri-de-content::ortsteile-bonn?geometry=6.463%2C50.627%2C7.771%2C50.780   Ortsteile - Bonn (districts of the City Bonn) 
#https://opendata.bonn.de/dataset/fl%C3%A4chen-der-ortsteile

print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [3]:
#Next, let's load the data. # reading the JSON data using json.load()
with open('districts_bonn.json',"r") as json_file:
    districts_bonn = json.load(json_file)
    
#districts_bonn #uncomment for testing

"Notice how all the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data."

In [4]:
districts_bonn_data = districts_bonn['features']
#districts_bonn_data[0] #Uncomment to take a look at the first item in this list.

"The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe."

In [5]:
# define the dataframe columns
column_names = ['ortsteil_bez', 'bezirk_bez', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods.head()

Unnamed: 0,ortsteil_bez,bezirk_bez,Latitude,Longitude


In [6]:
#Test
#g = geocoder.arcgis('{},{}, Germany'.format(neighborhoods['ortsteil_bez'][0],neighborhoods['bezirk_bez'][0]))
#latitude=g.latlng[0]
#longitude=g.latlng[1]
#print(latitude, longitude)

def Lat_Long_coordinates_for_city(district,city): #function to get coordinates
    lat_lng_coords = None     # initialize your variable to None
    while(lat_lng_coords is None):  # loop until you get the coordinates
        g = geocoder.arcgis('{},{}, Germany'.format(district,city))
        lat_lng_coords=g.latlng
    latitude=g.latlng[0]
    longitude=g.latlng[1]
    return latitude, longitude    

In [315]:
#Then let's loop through the data and fill the dataframe one row at a time. #function calling for coordinates takes quite a while
for data in districts_bonn_data:
    ortsteil_bez = data['properties']['ortsteil_bez'] 
    bezirk_bez = data['properties']['bezirk_bez']      
    neighborhood_lat = Lat_Long_coordinates_for_city(ortsteil_bez,bezirk_bez)[0]
    neighborhood_lon = Lat_Long_coordinates_for_city(ortsteil_bez,bezirk_bez)[1]
        
    neighborhoods = neighborhoods.append({'ortsteil_bez': ortsteil_bez,
                                          'bezirk_bez': bezirk_bez,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon
                                         },ignore_index=True)
neighborhoods.head()

Unnamed: 0,ortsteil_bez,bezirk_bez,Latitude,Longitude
0,Auerberg,Bonn,50.7569,7.06992
1,Bonn-Castell,Bonn,50.74368,7.10035
2,Bonn-Zentrum,Bonn,50.73599,7.10488
3,Buschdorf,Bonn,50.75914,7.05421
4,Dottendorf,Bonn,50.70425,7.11436


In [318]:
#neighborhoods.to_csv('neighborhoods_bonn_latlon.csv', index=False)  #save data to avoid API calls all the time the notebook is started anew.

In [7]:
g = geocoder.arcgis('Bonn, Germany')
latitude=g.latlng[0]
longitude=g.latlng[1]
print('The geograpical coordinate of Bonn, Germany are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bonn, Germany are 50.73242000000005, 7.101860000000045.


In [8]:
neighborhoods=pd.read_csv("neighborhoods_bonn_latlon.csv")
neighborhoods.head()

Unnamed: 0,ortsteil_bez,bezirk_bez,Latitude,Longitude
0,Auerberg,Bonn,50.7569,7.06992
1,Bonn-Castell,Bonn,50.74368,7.10035
2,Bonn-Zentrum,Bonn,50.73599,7.10488
3,Buschdorf,Bonn,50.75914,7.05421
4,Dottendorf,Bonn,50.70425,7.11436


In [38]:
#https://de.wikipedia.org/wiki/Bonn #manual insertion of data
overview_bonn={'Districts': ['Bad Godesberg', 'Beuel', 'Bonn', 'Hardtberg'], 
               'Population': [73172 , 66695 , 149733, 33360],
               'Latitude': [50.684722 , 50.734167 , 50.712222, 50.713526],
               'Longitude': [7.155 , 7.121667 , 7.087555, 7.053823]}
df_overview_bonn=pd.DataFrame.from_dict(overview_bonn)
df_overview_bonn


Unnamed: 0,Districts,Population,Latitude,Longitude
0,Bad Godesberg,73172,50.684722,7.155
1,Beuel,66695,50.734167,7.121667
2,Bonn,149733,50.712222,7.087555
3,Hardtberg,33360,50.713526,7.053823


In [10]:
print('There are {} Neighborhoods in {} Districts.'.format(len(neighborhoods['ortsteil_bez'].unique()),len(df_overview_bonn['Districts'].unique()) ))

There are 51 Neighborhoods in 4 Districts.


#### Visualization of the Districts with Folium.

In [49]:
# create map of a City using latitude and longitude values
map_bonn = folium.Map(location=[latitude-0.03, longitude+0.04], zoom_start=11.5)

# add markers to map
for lat, lng, district in zip(df_overview_bonn['Latitude'], df_overview_bonn['Longitude'], df_overview_bonn['Districts']):
    label = '{}'.format(district) #district
    label = folium.Popup(label, parse_html=True, max_width='100%')
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bonn)  
   
map_bonn

#### Visualization of the Population per District with Choropleth.

In [50]:
# generate choropleth map and show the population per district
districts_bonn = r'districts_bonn.json' # geojson file

map_bonn.choropleth(
    geo_data=districts_bonn,
    data=df_overview_bonn,
    columns=['Districts', 'Population'],
    key_on='feature.properties.bezirk_bez', #.ortsteil_bez'
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Population'
)


# display map
map_bonn

In [17]:
map_bonn.save('map_bonn.html') #no direct export to png seems to exist
print("Saving of map to html file was sucessful.")

Saving of map to html file was sucessful.


### Adding Markers for each Neighborhood with Folium (superimposed on existing Folium Map).

In [51]:
# add markers for each neighborhood

for lat, lng, district, city in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['ortsteil_bez'], neighborhoods['bezirk_bez']):
    label = '{}, {}'.format(district, city) #district
    label = folium.Popup(label, parse_html=True, max_width='100%')
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bonn)  

# display map
map_bonn

Analysing the markpoints reveals that the def Lat_Long_coordinates_for_city(district,city) function to get coordinates, which used the geocoder.arcgis feature, was leading to a clustering of latitude and longitude values particularly in the districts Bad Godesberg and Hardtberg. For the subsequent retrieval of Foursquare date, some values will be manually corrected.
1. wikipedia allows to retrieve different latitude and longitude values per neighborhood
2. the values from wikipedia are used to overwrite the latitude and longitude values for the neighborhoods in the districts Bad Godesberg and Hardtberg


In [52]:
#corrections of the dataset
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Alt-Godesberg', ['Latitude','Longitude']] = [50.692014,7.140461]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Friesdorf', ['Latitude','Longitude']] = [50.697639,7.128411]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Godesberg-Nord', ['Latitude','Longitude']] = [50.690574 ,7.149065]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Godesberg-Villenviertel', ['Latitude','Longitude']] = [50.692679,7.160125]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Heiderhof', ['Latitude','Longitude']] = [50.684570 ,7.153930]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Hochkreuz', ['Latitude','Longitude']] = [50.702791,7.140512]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Lannesdorf', ['Latitude','Longitude']] = [50.663889 ,7.170278]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Mehlem', ['Latitude','Longitude']] = [50.660833,7.191944]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Muffendorf', ['Latitude','Longitude']] = [50.671401 ,7.160618]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Pennenfeld', ['Latitude','Longitude']] = [50.674722,7.167222]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Plittersdorf', ['Latitude','Longitude']] = [50.700556 ,7.157778]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Rüngsdorf', ['Latitude','Longitude']] = [50.683877,7.170929]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Schweinheim', ['Latitude','Longitude']] = [50.682047,7.139783]

neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Brüser Berg', ['Latitude','Longitude']] = [50.698578,7.056227]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Duisdorf', ['Latitude','Longitude']] = [50.716406 ,7.051206]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Hardthöhe', ['Latitude','Longitude']] = [50.699167,7.040278]
neighborhoods.loc[neighborhoods['ortsteil_bez'] == 'Lengsdorf', ['Latitude','Longitude']] = [50.710944,7.068629]

neighborhoods.head()

Unnamed: 0,ortsteil_bez,bezirk_bez,Latitude,Longitude
0,Auerberg,Bonn,50.7569,7.06992
1,Bonn-Castell,Bonn,50.74368,7.10035
2,Bonn-Zentrum,Bonn,50.73599,7.10488
3,Buschdorf,Bonn,50.75914,7.05421
4,Dottendorf,Bonn,50.70425,7.11436


### Visualization of the corrected dataset.

In [54]:
# create map of a City using latitude and longitude values
map_bonn_corr_data  = folium.Map(location=[latitude-0.03, longitude+0.04], zoom_start=11.5)


map_bonn_corr_data.choropleth(
    geo_data=districts_bonn,
    data=df_overview_bonn,
    columns=['Districts', 'Population'],
    key_on='feature.properties.bezirk_bez', #.ortsteil_bez'
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Population'
)
    

for lat, lng, district, city in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['ortsteil_bez'], neighborhoods['bezirk_bez']):
    label = '{}, {}'.format(district, city) #district
    label = folium.Popup(label, parse_html=True, max_width='100%')
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bonn_corr_data)  
   
map_bonn_corr_data

## Analyzing Neighborhoods: First obtain relevant data from Foursquare database.

#### Define Foursquare Credentials and Version

In [55]:
CLIENT_ID = 'FZEJNJVOKE2JM34PNMP5EWOJW1G4YVWDIYXUEF0TZ3EI4DS1' # your Foursquare ID
CLIENT_SECRET = 'WBECOC31NVB1HW0EGEG5NOQB0B0VHYLVL3HW1CO3JWKBYMAN' # your Foursquare Secret
VERSION = '20200424' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FZEJNJVOKE2JM34PNMP5EWOJW1G4YVWDIYXUEF0TZ3EI4DS1
CLIENT_SECRET:WBECOC31NVB1HW0EGEG5NOQB0B0VHYLVL3HW1CO3JWKBYMAN


#### Let's explore the first neighborhood in our dataframe.

In [56]:
neighborhoods.loc[0, 'ortsteil_bez']

'Auerberg'

Get the neighborhood's latitude and longitude values.

In [57]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'ortsteil_bez'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Auerberg are 50.75690000000003, 7.069920000000025.


#### Now, let's get the top 25 venues that are in 'Auerberg' within a radius of 500 meters.

In [58]:
# type your answer here
LIMIT = 25 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=FZEJNJVOKE2JM34PNMP5EWOJW1G4YVWDIYXUEF0TZ3EI4DS1&client_secret=WBECOC31NVB1HW0EGEG5NOQB0B0VHYLVL3HW1CO3JWKBYMAN&v=20200424&ll=50.75690000000003,7.069920000000025&radius=500&limit=25'

Send the GET request and examine the resutls

In [59]:
results = requests.get(url).json()

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [60]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [61]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,H Kopenhagener Straße,Tram Station,50.757418,7.071644
1,REWE,Supermarket,50.755959,7.07682
2,PENNY,Supermarket,50.7563,7.076302
3,H Auerberger Mitte,Tram Station,50.755102,7.076088


In [62]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


Observation: Not for all neighboorhoods 25 entries exist.

#### Let's create a function to repeat the same process to all the neighborhoods in Bonn

In [63]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *bonn_venues*.

In [27]:
bonn_venues = getNearbyVenues(names=neighborhoods['ortsteil_bez'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Auerberg
Bonn-Castell
Bonn-Zentrum
Buschdorf
Dottendorf
Dransdorf
Endenich
Graurheindorf
Gronau
Ippendorf
Kessenich
Lessenich/Meßdorf
Nordstadt
Poppelsdorf
Röttgen
Südstadt
Tannenbusch
Ückesdorf
Venusberg
Weststadt
Alt-Godesberg
Friesdorf
Godesberg-Nord
Godesberg-Villenviertel
Heiderhof
Hochkreuz
Lannesdorf
Mehlem
Muffendorf
Pennenfeld
Plittersdorf
Rüngsdorf
Schweinheim
Beuel-Mitte
Beuel-Ost
Geislar
Hoholz
Holtorf
Holzlar
Küdinghoven
Limperich
Oberkassel
Pützchen/Bechlinghoven
Ramersdorf
Schwarzrheindorf / Vilich-Rheindorf
Vilich
Vilich- Müldorf
Brüser Berg
Duisdorf
Hardthöhe
Lengsdorf


#### Let's check the size of the resulting dataframe

In [64]:
print(bonn_venues.shape)
bonn_venues.head()

NameError: name 'bonn_venues' is not defined

Observation: Sometimes the API call failed and sometimes it worked -> The retrieved data was saved as backup file.

In [29]:
#bonn_venues.to_csv('bonn_venues_backup.csv', index=False)  

The saved backup file with all the relevant data can be used without making an API call.

In [65]:
bonn_venues=pd.read_csv("bonn_venues_backup.csv")
bonn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Auerberg,50.7569,7.06992,H Kopenhagener Straße,50.757418,7.071644,Tram Station
1,Auerberg,50.7569,7.06992,REWE,50.755959,7.07682,Supermarket
2,Auerberg,50.7569,7.06992,PENNY,50.7563,7.076302,Supermarket
3,Auerberg,50.7569,7.06992,H Auerberger Mitte,50.755102,7.076088,Tram Station
4,Bonn-Castell,50.74368,7.10035,Asia Viet Thai Bistro,50.741892,7.096418,Asian Restaurant


#### Let's find out how many unique categories can be curated from all the returned venues

In [66]:
print('There are {} uniques categories.'.format(len(bonn_venues['Venue Category'].unique())))
#print(bonn_venues['Venue Category'].unique())

There are 135 uniques categories.


Some of the Venue Category, e.g. 'Wine Bar' and 'Bar' belong to the same main category.

## "Choropleth analysis" - e.g. illustration of retrieved venues per neighborhood or cafe's per neighborhood!

Get an overview of the bonn_venue data retrieved from Foursquare.

In [67]:
data=bonn_venues.groupby('Neighborhood').count().reset_index()
data.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alt-Godesberg,11,11,11,11,11,11
1,Auerberg,4,4,4,4,4,4
2,Beuel-Mitte,5,5,5,5,5,5
3,Beuel-Ost,5,5,5,5,5,5
4,Bonn-Castell,19,19,19,19,19,19


Some comments: It can be clearly seen that not for all neighborhoods as much as 25 venues are stored on the Foursquare servers. This might influence the subsequent analysis.

### First Visualization of the data with choroplet (Venues per Neighborhood)

In [68]:
map_bonn_corr_data_venues = folium.Map(location=[latitude-0.03, longitude+0.04], zoom_start=11.5)

map_bonn_corr_data_venues.choropleth(
    geo_data=districts_bonn,
    data=bonn_venues.groupby('Neighborhood').count().reset_index(),
    columns=['Neighborhood', 'Venue Category'],
    key_on='feature.properties.ortsteil_bez', #.ortsteil_bez'bezirk_bez
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Venues'
)
map_bonn_corr_data_venues

## Second Visualization of the data with choroplet (e.g. 'Café' per Neighborhood)
Let's check for specific venue categories, e.g. 'Supermarket', 'Bar', 'Cafe', etc...

In [69]:
data_specific=bonn_venues.loc[bonn_venues['Venue Category']=='Café'].groupby('Neighborhood').count().reset_index()
data_specific.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bonn-Zentrum,2,2,2,2,2,2
1,Duisdorf,1,1,1,1,1,1
2,Endenich,1,1,1,1,1,1
3,Graurheindorf,1,1,1,1,1,1
4,Gronau,1,1,1,1,1,1


In [90]:
map_bonn_corr_data_specific_venue = folium.Map(location=[latitude-0.03, longitude+0.04], zoom_start=11.5)

map_bonn_corr_data_specific_venue.choropleth(
    geo_data=districts_bonn,
    data=bonn_venues.loc[bonn_venues['Venue Category']==('Café' or 'Coffee Shop')].groupby('Neighborhood').count().reset_index(),
    columns=['Neighborhood', 'Venue Category'],
    key_on='feature.properties.ortsteil_bez', #.ortsteil_bez'bezirk_bez
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Cafes and Coffee Shops per Neighborhood'
)
map_bonn_corr_data_specific_venue

## Analyze Each Neighborhood (same approach as for New York and Toronto Datasets)

In [91]:
# one hot encoding
bonn_onehot = pd.get_dummies(bonn_venues[['Venue Category']], prefix="", prefix_sep="")

# add values to first column
bonn_onehot['ortsteil_bez'] = bonn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bonn_onehot.columns[-1]] + list(bonn_onehot.columns[:-1])
bonn_onehot = bonn_onehot[fixed_columns]

bonn_onehot.head()

Unnamed: 0,ortsteil_bez,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Automotive Shop,Bagel Shop,Bakery,Bank,Bar,...,Theater,Tibetan Restaurant,Toy / Game Store,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar
0,Auerberg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
1,Auerberg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Auerberg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Auerberg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,Bonn-Castell,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [89]:
#list(bonn_onehot.columns.values) #Useful to identify Categories such as Coffee shops etc.

In [87]:
#bonn_onehot.shape

(459, 136)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [92]:
bonn_grouped = bonn_onehot.groupby('ortsteil_bez').mean().reset_index()
print(bonn_grouped.shape)
bonn_grouped.head()

(51, 136)


Unnamed: 0,ortsteil_bez,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Automotive Shop,Bagel Shop,Bakery,Bank,Bar,...,Theater,Tibetan Restaurant,Toy / Game Store,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar
0,Alt-Godesberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0
1,Auerberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
2,Beuel-Mitte,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Beuel-Ost,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bonn-Castell,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.105263,...,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [300]:
#num_top_venues = 5

#for hood in bonn_grouped['ortsteil_bez']:
#    print("----"+hood+"----")
#    temp = bonn_grouped[bonn_grouped['ortsteil_bez'] == hood].T.reset_index()
#    temp.columns = ['venue','freq']
#    temp = temp.iloc[1:]
#    temp['freq'] = temp['freq'].astype(float)
#    temp = temp.round({'freq': 2})
#    temp= temp[temp['freq'] != 0]   #modification necessary to avoid filling up with random data
#    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
#    print('\n') 
    

#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [93]:
def return_most_common_venues(row, index, num_top_venues):   

    x=pd.DataFrame({'mean': row.iloc[index, 1:]}).sort_values(by='mean', ascending=False).reset_index()   #.reset_index(level='class')
    
    for ind in np.arange(0,num_top_venues): 
        if x['mean'][ind].astype(float)==0:
            x['index'][ind]=None  #this function was modified because for some neighborhoods only one or two venue categories could be retrieve. Here None values are added.

    y=x['index'][:num_top_venues]
    z=np.array(y.values.tolist())
      
    return z

In [82]:
#x=bonn_grouped.iloc[1, 1:].sort_values(ascending=False) #x #x.index.values[0:10] #Code used for testing purposes

In [94]:
return_most_common_venues(bonn_grouped,1, 10) #Code used for testing purposes

array(['Supermarket', 'Tram Station', None, None, None, None, None, None,
       None, None], dtype=object)

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [95]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['ortsteil_bez']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['ortsteil_bez'] = bonn_grouped['ortsteil_bez']

for ind in np.arange(bonn_grouped.shape[0]): #all rows
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bonn_grouped,ind, num_top_venues) #.iloc[] is primarily integer position based (from 0 to length-1 of the axis)

neighborhoods_venues_sorted.head()

Unnamed: 0,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alt-Godesberg,Supermarket,Pet Store,Restaurant,Spanish Restaurant,Candy Store,Food & Drink Shop,Beer Store,Hardware Store,Gas Station,Turkish Restaurant
1,Auerberg,Supermarket,Tram Station,,,,,,,,
2,Beuel-Mitte,Theater,Pizza Place,Clothing Store,Sandwich Place,,,,,,
3,Beuel-Ost,Theater,Pizza Place,Clothing Store,Sandwich Place,,,,,,
4,Bonn-Castell,Pub,Bar,Beer Garden,Taverna,Grocery Store,Burger Joint,Concert Hall,Hookah Bar,Modern European Restaurant,Indian Restaurant


Observation: For 'Auerberg' there were only 4 venues retrieved from the database. They belong to 2 venue categories. The used approach has filled the "empty" places with None and as they have zero values in the "mean"-value database, they should be ignored in the subsequent clustering.

## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 4 clusters.

In [96]:
# set number of clusters
kclusters = 4

bonn_grouped_clustering = bonn_grouped.drop('ortsteil_bez', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bonn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 2, 2, 2, 2, 0, 2, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [97]:
# add clustering labels 

#uncomment next line if Cluster Labels are already inserted
#neighborhoods_venues_sorted=neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_venues_sorted['Cluster Labels']=kmeans.labels_
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Alt-Godesberg,Supermarket,Pet Store,Restaurant,Spanish Restaurant,Candy Store,Food & Drink Shop,Beer Store,Hardware Store,Gas Station,Turkish Restaurant
1,0,Auerberg,Supermarket,Tram Station,,,,,,,,
2,2,Beuel-Mitte,Theater,Pizza Place,Clothing Store,Sandwich Place,,,,,,
3,2,Beuel-Ost,Theater,Pizza Place,Clothing Store,Sandwich Place,,,,,,
4,2,Bonn-Castell,Pub,Bar,Beer Garden,Taverna,Grocery Store,Burger Joint,Concert Hall,Hookah Bar,Modern European Restaurant,Indian Restaurant


In [98]:
bonn_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bonn_merged = bonn_merged.join(neighborhoods_venues_sorted.set_index('ortsteil_bez'), on='ortsteil_bez')

bonn_merged.head() # check the last columns!

Unnamed: 0,ortsteil_bez,bezirk_bez,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Auerberg,Bonn,50.7569,7.06992,0,Supermarket,Tram Station,,,,,,,,
1,Bonn-Castell,Bonn,50.74368,7.10035,2,Pub,Bar,Beer Garden,Taverna,Grocery Store,Burger Joint,Concert Hall,Hookah Bar,Modern European Restaurant,Indian Restaurant
2,Bonn-Zentrum,Bonn,50.73599,7.10488,2,Café,Hotel,Korean Restaurant,Doner Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Gym / Fitness Center,Ice Cream Shop,Market
3,Buschdorf,Bonn,50.75914,7.05421,2,Playground,Supermarket,Automotive Shop,Soccer Field,Gym,,,,,
4,Dottendorf,Bonn,50.70425,7.11436,0,Supermarket,Bakery,Tram Station,Italian Restaurant,,,,,,


In [111]:
# create map
map_clusters = folium.Map(location=[latitude-0.03, longitude+0.04], zoom_start=11.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bonn_merged['Latitude'], bonn_merged['Longitude'], bonn_merged['ortsteil_bez'], bonn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

"Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster."

#### Cluster 0 "Supermarket is dominant venue"

In [100]:
bonn_merged.loc[bonn_merged['Cluster Labels'] == 0, bonn_merged.columns[[0] + list(range(5, bonn_merged.shape[1]))]].head()

Unnamed: 0,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Auerberg,Supermarket,Tram Station,,,,,,,,
4,Dottendorf,Supermarket,Bakery,Tram Station,Italian Restaurant,,,,,,
5,Dransdorf,Tram Station,Supermarket,Drugstore,Rental Car Location,Chinese Restaurant,Bus Stop,Casino,Bakery,,
14,Röttgen,Supermarket,Bakery,Hotel,Pizza Place,,,,,,
18,Venusberg,Supermarket,Pharmacy,Bus Stop,Bakery,Gastropub,,,,,


#### Cluster 1 "Just one entry, thus it is more an exception than a big cluster"

In [101]:
bonn_merged.loc[bonn_merged['Cluster Labels'] == 1, bonn_merged.columns[[0] + list(range(5, bonn_merged.shape[1]))]].head()

Unnamed: 0,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,Holzlar,Motorcycle Shop,,,,,,,,,


#### Cluster 2 ("leisure venues, not dominated by supermarkets or tram stations")

In [102]:
bonn_merged.loc[bonn_merged['Cluster Labels'] == 2, bonn_merged.columns[[0] + list(range(5, bonn_merged.shape[1]))]].head()

Unnamed: 0,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bonn-Castell,Pub,Bar,Beer Garden,Taverna,Grocery Store,Burger Joint,Concert Hall,Hookah Bar,Modern European Restaurant,Indian Restaurant
2,Bonn-Zentrum,Café,Hotel,Korean Restaurant,Doner Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Gym / Fitness Center,Ice Cream Shop,Market
3,Buschdorf,Playground,Supermarket,Automotive Shop,Soccer Field,Gym,,,,,
6,Endenich,Bus Stop,Bakery,Supermarket,River,German Restaurant,Organic Grocery,Café,Rock Club,Discount Store,Movie Theater
7,Graurheindorf,Café,Harbor / Marina,Soccer Field,Electronics Store,,,,,,


#### Cluster 3 "Just one entry, thus it is more an exception than a big cluster"

In [103]:
bonn_merged.loc[bonn_merged['Cluster Labels'] == 3, bonn_merged.columns[[0] + list(range(5, bonn_merged.shape[1]))]].head()

Unnamed: 0,ortsteil_bez,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Hardthöhe,IT Services,,,,,,,,,


### Conclusion from cluster analysis

Two dominant cluster can be found. Cluster 0 is dominated by supermarkets and Cluster 2 by other leisure venues.


The following choropleth map shall superpose supermarkets per neighborhood with the cluster folium map to illustrate the previous conclusion.

In [112]:
map_clusters.choropleth(
    geo_data=districts_bonn,
    data=bonn_venues.loc[bonn_venues['Venue Category']==('Supermarket' or 'Tram Station')].groupby('Neighborhood').count().reset_index(),
    columns=['Neighborhood', 'Venue Category'],
    key_on='feature.properties.ortsteil_bez', #.ortsteil_bez'bezirk_bez
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Supermarket or Tram Station'
)

map_clusters