# Clustering Neighborhoods in Hamburg

## 1. Introduction

   The Hamburg U-Bahn is a rapid transit system serving the cities of Hamburg, Norderstedt and Ahrensburg in Germany. Although technically an underground, most of the system's track length is above ground. The network is interconnected with the city's S-Bahn system, which also has underground sections. With Metro System is easier and faster to get to other parts of the city. Many people looks for their flats situated next to metro. At the same time they want to live close to shops, restaurants, coffeshops or parks.
   
The company specializing in long-term as well as short-term aparments renting wants to obtain deeper knowlage about city to build housing recommendation system based on clients preferences. There is a lot of recommendation systems on the market now, but while doing reaserch on Germany market we found out that there is none designed for meeting all needs in one place. People looking for housing needs to search among lots of websites to gain information they need. Different people can have their specific preferences such as favourite stores, shopping location etc. In my project I want to analyze different neighboorhoods of Hamburg which will help to build final recommendation system in the future.

## 2. Data 

### 2.1. Data needed for analysis

For the project its needed to obtain:

* geo-locational information about metro stations in Hamburg -  latitude and longitude of every station. Informations will be obtain based on names of stations. Data will be scrapped from wikipedia page: https://en.wikipedia.org/wiki/List_of_Hamburg_U-Bahn_stations and geo-location information will be added using geopy library.
* Forsqure API will be used to find location information about venues. Explore function will be used to get the most common venues categories next to each metro station like restaurants, art galeries, shops.

Thanks to collected data we will be able to compare neighborhoods and find differences between them which will help rental company to personalize they offer.

After collecting data some visualization and statystical analysis will be made. Location of stations will be shown on map prepered with folium library.  As the next step this data will help to group metro neighborhoods in clusters.


### 2.2. Data preparation

Importing libraries:

In [26]:
# Data manipulation 
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import itertools

# Data-gathering 
from bs4 import BeautifulSoup
import lxml
import json
import requests

# Geospacial tools
import folium
from geopy.geocoders import Nominatim

#additional librariess
from progressbar import ProgressBar
from time import sleep


Data used in analysis are gathered from wikipedia page with use of BeautifulSoup library:

In [15]:
website_url=requests.get('https://en.wikipedia.org/wiki/List_of_Hamburg_U-Bahn_stations').text

In [16]:
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of Hamburg U-Bahn stations - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XpQjYApAAEQAAGJqvp4AAACL","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_Hamburg_U-Bahn_stations","wgTitle":"List of Hamburg U-Bahn stations","wgCurRevisionId":943005601,"wgRevisionId":943005601,"wgArticleId":22088485,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 German-language sources (de)","Articles with short description","Commons category link is on Wikidata","

In [17]:

data = []
columns = []

My_table = soup.find('table',{'class':'wikitable sortable'})

for index, tr in enumerate(My_table.find_all('tr')):
    section = []
    for td in tr.find_all(['th','td']):
        section.append(td.text.rstrip('\n'))
    
#First row as header
    if (index == 0):
        columns = section
    else:
        data.append(section)
        
#converting to Pandas DataFrame
df_hamburg = pd.DataFrame(data = data,columns = columns)

#change in header
df_hamburg = df_hamburg.rename(columns = {'Line(s)[B]':'Lines'})

df_hamburg = df_hamburg.drop(columns = {"Location[C]", "Fare zone(s)", "Other connections[D]", "Date opened"})

df_hamburg.head(10)

Unnamed: 0,Station,Lines,Fare zone ring(s)
0,Ahrensburg Ost,U1,B
1,Ahrensburg West,U1,B
2,Alsterdorf,U1,A
3,Alter Teichweg,U1,A
4,Barmbek,U3,A
5,Baumwall,U3,A
6,Berliner Tor,"U2, U3",A
7,Berne,U1,B
8,Billstedt,U2,A
9,Borgweg,U3,A


In [18]:
hamburg_locations = df_hamburg[['Station']]
hamburg_locations = hamburg_locations.drop_duplicates(subset='Station')
hamburg_locations = hamburg_locations.reset_index(drop=True)
print(hamburg_locations.shape)
hamburg_locations.head()

(92, 1)


Unnamed: 0,Station
0,Ahrensburg Ost
1,Ahrensburg West
2,Alsterdorf
3,Alter Teichweg
4,Barmbek


Preparation of dataframe with latitude and longitude:

In [19]:
hamburg_locations['Latitude'] = np.nan
hamburg_locations['Longitude'] = np.nan
hamburg_locations.head()

Unnamed: 0,Station,Latitude,Longitude
0,Ahrensburg Ost,,
1,Ahrensburg West,,
2,Alsterdorf,,
3,Alter Teichweg,,
4,Barmbek,,


In [20]:
pbar = ProgressBar()
geolocator = Nominatim()
for index in pbar(range(0,hamburg_locations['Station'].shape[0])):
    address = hamburg_locations.loc[index,'Station'] + ", Germany"
    location = geolocator.geocode(address, timeout = None)
    if (location != None):
        hamburg_locations.loc[index,'Latitude'] = location.latitude
        hamburg_locations.loc[index,'Longitude'] = location.longitude
    sleep(1)

print(hamburg_locations.shape)
hamburg_locations.head()

  
  """
100% |########################################################################|

(92, 3)





Unnamed: 0,Station,Latitude,Longitude
0,Ahrensburg Ost,53.661347,10.24224
1,Ahrensburg West,53.664639,10.219403
2,Alsterdorf,53.610541,10.003889
3,Alter Teichweg,53.586202,10.064931
4,Barmbek,53.587386,10.044942


Next step is to define Foursquare credentials:

In [22]:
CLIENT_ID = 'E1UVNEQUN2JFJUUAGEXNKARGIIFEJVTRON50YA0MT5XEA1BJ' # your Foursquare ID

CLIENT_SECRET = 'MNQ0VWBXZJPEHOPVYU0QXCLP53ZFAVVJA5GLBLWGHPZTCOXI' # your Foursquare Secret

VERSION = '20180604'

LIMIT = 100

print('Your credentails:')

print('CLIENT_ID: ' + CLIENT_ID)

print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: E1UVNEQUN2JFJUUAGEXNKARGIIFEJVTRON50YA0MT5XEA1BJ
CLIENT_SECRET:MNQ0VWBXZJPEHOPVYU0QXCLP53ZFAVVJA5GLBLWGHPZTCOXI


Defining function that extracts the category of the venue:


In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print("ERROR: ", url)        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Hamburg venues around metro station will be gathered to dataframe:

In [25]:
hamburg_venues = getNearbyVenues(names=hamburg_locations['Station'],
                                   latitudes=hamburg_locations['Latitude'],
                                   longitudes=hamburg_locations['Longitude']
                                  )

print(hamburg_venues.shape)
hamburg_venues.head()

Ahrensburg Ost
Ahrensburg West
Alsterdorf
Alter Teichweg
Barmbek
Baumwall
Berliner Tor
Berne
Billstedt
Borgweg
Buchenkamp
Buckhorn
Burgstraße
Christuskirche
Dehnhaide
Emilienstraße
Elbbrücken
Eppendorfer Baum
Farmsen
Feldstraße
Fuhlsbüttel
Fuhlsbüttel Nord
Garstedt
Großhansdorf
Gänsemarkt
Habichtstraße
Hagenbecks Tierpark
Hagendeel
Hallerstraße
Hamburger Straße
Hammer Kirche
Hauptbahnhof Nord
Hauptbahnhof Süd
Hoheluftbrücke
Hoisbüttel
Horner Rennbahn
Hudtwalckerstraße
Joachim-Mähl-Straße
Jungfernstieg
Kellinghusenstraße
Kiekut
Kiwittsmoor
Klein Borstel
Klosterstern
Landungsbrücken
Langenhorn Markt
Langenhorn Nord
Lattenkamp
Legienstraße
Lohmühlenstraße
Lutterothstraße
Lübecker Straße
Meiendorfer Weg
Merkenstraße
Messehallen
Meßberg
Mundsburg
Mönckebergstraße
Mümmelmannsberg
Niendorf Markt
Niendorf Nord
Norderstedt Mitte
Ochsenzoll
Ohlsdorf
Ohlstedt
Osterstraße
Rathaus
Rauhes Haus
Richtweg
Ritterstraße
Rödingsmarkt
Saarlandstraße
Schippelsweg
Schlump
Schmalenbeck
Sengelmannstraße
Sieric

Unnamed: 0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ahrensburg Ost,53.661347,10.24224,Ringhotel Ahrensburg,53.661984,10.243314,Hotel
1,Ahrensburg Ost,53.661347,10.24224,Petit Muës,53.660455,10.239907,Gourmet Shop
2,Ahrensburg Ost,53.661347,10.24224,U Ahrensburg Ost,53.661227,10.242712,Metro Station
3,Ahrensburg West,53.664639,10.219403,Hansebäckerei Junge,53.663482,10.220533,Bakery
4,Ahrensburg West,53.664639,10.219403,Zum Griechen,53.66491,10.22069,Greek Restaurant
