# Data section

### List of Berlin neighborhoods

First of all it is necessary to know which neighborhoods are part of Berlin. <br>
Therefore the following wiki page is scraped https://de.wikipedia.org/wiki/Verwaltungsgliederung_Berlins <br>
In the middle of this page is the following table in german language available:
<img src="Berlin neighborhoods.png">

In [1]:
#import all relevant packages

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
import time
import numpy as np

#!conda install -c conda-forge folium=0.5.0 --yes
import folium
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [2]:
#reading the wiki table of Berlin neighborhoods
res = requests.get("https://de.wikipedia.org/wiki/Verwaltungsgliederung_Berlins")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[2] 
pd.read_html(str(table))

[     Nr.                   Ortsteil                      Bezirk  Fläche(km²)  \
 0    101                      Mitte                       Mitte         1070   
 1    102                     Moabit                       Mitte          772   
 2    103               Hansaviertel                       Mitte           53   
 3    104                 Tiergarten                       Mitte          517   
 4    105                    Wedding                       Mitte          923   
 5    106              Gesundbrunnen                       Mitte          613   
 6    201             Friedrichshain    Friedrichshain-Kreuzberg          978   
 7    202                  Kreuzberg    Friedrichshain-Kreuzberg         1040   
 8    301            Prenzlauer Berg                      Pankow         1100   
 9    302                  Weißensee                      Pankow          793   
 10   303                Blankenburg                      Pankow          603   
 11   304                Hei

In [70]:
# define the columns of a new dataframe
column_names = ['NH-number','Neighborhood','Borough', 'Area (km²)','Inhabitants','Inhabitants per km²'] 

#get rows of data table
table_rows = table.find_all('tr')

#iterate through the table rows and scrap the data
res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)
   
#create data frame including column names and rows
neighborhoods_Berlin = pd.DataFrame(res,columns=column_names)    
neighborhoods_Berlin.head(5)

Unnamed: 0,NH-number,Neighborhood,Borough,Area (km²),Inhabitants,Inhabitants per km²
0,101,Mitte,Mitte,1070,101.932,9526.0
1,102,Moabit,Mitte,772,79.512,10.299
2,103,Hansaviertel,Mitte,53,5.894,11.121
3,104,Tiergarten,Mitte,517,14.753,2854.0
4,105,Wedding,Mitte,923,86.688,9392.0


In [71]:
neighborhoods_Berlin.shape

(96, 6)

### Adding geo coordinates to the neighborhoods
The coordinates of the single neighborhoods are necessary for retrieving the foursquare data. <br>
That is why for each neighborhood the geolocator is called and the longitude and latitude is saved.

In [4]:
#Requesting geo coordinates of each neighborhood at geolocator and store it into two arrays
latitude_list = []
latitude_list = np.array(latitude_list)
longitude_list = []
longitude_list = np.array(longitude_list)

for data in neighborhoods_Berlin['Neighborhood']:
    address = 'Berlin, ' + data + ' , Germany'
    geolocator = Nominatim(user_agent="berlin_explorer")
    location = geolocator.geocode(address)
    latitude_list=np.append(latitude_list,values = location.latitude)
    longitude_list=np.append(longitude_list,values = location.longitude)
    print('The geograpical coordinates of {} are {}, {}.'.format(address, location.latitude, location.longitude))
    time.sleep(0.5)

The geograpical coordinates of Berlin, Mitte , Germany are 52.5199818, 13.4041591.
The geograpical coordinates of Berlin, Moabit , Germany are 52.5249451, 13.3696614.
The geograpical coordinates of Berlin, Hansaviertel , Germany are 52.519985, 13.3480704.
The geograpical coordinates of Berlin, Tiergarten , Germany are 52.5202262, 13.3704874.
The geograpical coordinates of Berlin, Wedding , Germany are 52.5427866, 13.3669996.
The geograpical coordinates of Berlin, Gesundbrunnen , Germany are 52.5491748, 13.3900758.
The geograpical coordinates of Berlin, Friedrichshain , Germany are 52.5107448, 13.4351709.
The geograpical coordinates of Berlin, Kreuzberg , Germany are 52.5022467, 13.395148581915826.
The geograpical coordinates of Berlin, Prenzlauer Berg , Germany are 52.549243, 13.4155955.
The geograpical coordinates of Berlin, Weißensee , Germany are 52.558329, 13.439551280343037.
The geograpical coordinates of Berlin, Blankenburg , Germany are 52.5914655, 13.4434822.
The geograpical co

In [72]:
#adding the coordinates to the dataframe by creating two new columns 'Longitude' and 'Latitude'
neighborhoods_Berlin['Latitude'] = latitude_list[:]
neighborhoods_Berlin['Longitude'] = longitude_list[:]
neighborhoods_Berlin.head()

Unnamed: 0,NH-number,Neighborhood,Borough,Area (km²),Inhabitants,Inhabitants per km²,Latitude,Longitude
0,101,Mitte,Mitte,1070,101.932,9526.0,52.519982,13.404159
1,102,Moabit,Mitte,772,79.512,10.299,52.524945,13.369661
2,103,Hansaviertel,Mitte,53,5.894,11.121,52.519985,13.34807
3,104,Tiergarten,Mitte,517,14.753,2854.0,52.520226,13.370487
4,105,Wedding,Mitte,923,86.688,9392.0,52.542787,13.367


In [73]:
neighborhoods_Berlin.shape

(96, 8)

In [74]:
#Saving the dataframe into a csv file to avoid scraping the data again
import os.path
path = 'Berlin_neighborhood.csv'
neighborhoods_Berlin.to_csv(path, index = False, header = True)

### Lets have a look into a map if the coordinates were submitted correctly

In [9]:
# convert Berlin address into latitude and longitude values
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of {} are {}, {}.'.format(address,latitude, longitude))

The geograpical coordinates of Berlin, Germany are 52.5170365, 13.3888599.


In [69]:
#exploring Berlin by creating a  Berlin map including the neighborhoods
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods_Berlin['Latitude'], neighborhoods_Berlin['Longitude'], neighborhoods_Berlin['Borough'], neighborhoods_Berlin['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin)


### Retrieving foursquare data of all neighborhoods
In this section all venues are listed being in a certain range of the neighborhoods center. <br>

In [12]:
#Setting foursquare credentials
CLIENT_ID = 'SWTQ1NMF0WWY2N4WNNUYAFNXQ2FBND3GTG2DNP4G5HT4U3PI' # your Foursquare ID
CLIENT_SECRET = 'UNNZPWCXSF1LQWMNNMEZBWMIVRP5TG0WC552ANYGB32FOSMN' # your Foursquare Secret
VERSION = '20200221'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SWTQ1NMF0WWY2N4WNNUYAFNXQ2FBND3GTG2DNP4G5HT4U3PI
CLIENT_SECRET:UNNZPWCXSF1LQWMNNMEZBWMIVRP5TG0WC552ANYGB32FOSMN


In [35]:
#Function to retrieve venues of each single neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=1000, LIMIT=1000):
    i = 1
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(str(i) + ": " + name)
        i = i + 1    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [75]:
#Calling the venues function to finally store all venues into a new dataframe
venues_Berlin = getNearbyVenues(names=neighborhoods_Berlin['Neighborhood'],
                                   latitudes=neighborhoods_Berlin['Latitude'],
                                   longitudes=neighborhoods_Berlin['Longitude']
                                  )

1: Mitte
2: Moabit
3: Hansaviertel
4: Tiergarten
5: Wedding
6: Gesundbrunnen
7: Friedrichshain
8: Kreuzberg
9: Prenzlauer Berg
10: Weißensee
11: Blankenburg
12: Heinersdorf
13: Karow
14: Stadtrandsiedlung Malchow
15: Pankow
16: Blankenfelde
17: Buch
18: Französisch Buchholz
19: Niederschönhausen
20: Rosenthal
21: Wilhelmsruh
22: Charlottenburg
23: Wilmersdorf
24: Schmargendorf
25: Grunewald
26: Westend
27: Charlottenburg-Nord
28: Halensee
29: Spandau
30: Haselhorst
31: Siemensstadt
32: Staaken
33: Gatow
34: Kladow
35: Hakenfelde
36: Falkenhagener Feld
37: Wilhelmstadt
38: Steglitz
39: Lichterfelde
40: Lankwitz
41: Zehlendorf
42: Dahlem
43: Nikolassee
44: Wannsee
45: Schöneberg
46: Friedenau
47: Tempelhof
48: Mariendorf
49: Marienfelde
50: Lichtenrade
51: Neukölln
52: Britz
53: Buckow
54: Rudow
55: Gropiusstadt
56: Alt-Treptow
57: Plänterwald
58: Baumschulenweg
59: Johannisthal
60: Niederschöneweide
61: Altglienicke
62: Adlershof
63: Bohnsdorf
64: Oberschöneweide
65: Köpenick
66: Friedr

In [76]:
#The size of the resulting dataframe
print(venues_Berlin.shape)
venues_Berlin.head()

(2989, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.519982,13.404159,Buchhandlung Walther König,52.521301,13.400758,Bookstore
1,Mitte,52.519982,13.404159,Kuppelumgang Berliner Dom,52.518966,13.400981,Scenic Lookout
2,Mitte,52.519982,13.404159,Radisson Blu,52.519561,13.402857,Hotel
3,Mitte,52.519982,13.404159,Waffel oder Becher,52.521007,13.403815,Ice Cream Shop
4,Mitte,52.519982,13.404159,Fat Tire Bike Tours,52.521233,13.40911,Bike Rental / Bike Share


In [77]:
#Number of returned venues per neighborhood
#neighborhoods_Berlin["Nr. of venues"] = 
venues_Berlin["Venue"].groupby(venues_Berlin['Neighborhood']).count()

Neighborhood
Adlershof                     35
Alt-Hohenschönhausen          47
Alt-Treptow                   85
Altglienicke                  11
Baumschulenweg                30
Biesdorf                      18
Blankenburg                    9
Blankenfelde                   4
Bohnsdorf                     17
Borsigwalde                   26
Britz                         11
Buch                           9
Buckow                        11
Charlottenburg               100
Charlottenburg-Nord           35
Dahlem                        40
Falkenberg                    17
Falkenhagener Feld             4
Fennpfuhl                     31
Französisch Buchholz          17
Friedenau                     79
Friedrichsfelde               27
Friedrichshagen                7
Friedrichshain               100
Frohnau                        7
Gatow                          5
Gesundbrunnen                 93
Gropiusstadt                  34
Grunewald                     33
Grünau                        

In [78]:
#Number of unique venue categories
print('There are {} uniques categories.'.format(len(venues_Berlin['Venue Category'].unique())))

There are 319 uniques categories.


In [79]:
#Adding the number of inhabitants per km² as a required parameter for the upcoming analysis
venues_Berlin['Inhabitants per km²'] = neighborhoods_Berlin['Inhabitants per km²']
venues_Berlin.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Inhabitants per km²
0,Mitte,52.519982,13.404159,Buchhandlung Walther König,52.521301,13.400758,Bookstore,9526.0
1,Mitte,52.519982,13.404159,Kuppelumgang Berliner Dom,52.518966,13.400981,Scenic Lookout,10.299
2,Mitte,52.519982,13.404159,Radisson Blu,52.519561,13.402857,Hotel,11.121
3,Mitte,52.519982,13.404159,Waffel oder Becher,52.521007,13.403815,Ice Cream Shop,2854.0
4,Mitte,52.519982,13.404159,Fat Tire Bike Tours,52.521233,13.40911,Bike Rental / Bike Share,9392.0
