#  Capstone Project: The Battle of the Neighborhoods

This is the introductory proposal of my Capstone Project.

## Definition of the Problem

My friends living in Cologne, Germany wish to relocate for professional reasons to Berlin, Germany. They reached out to me for recommendations on the Berlin suburbs. Rental prices are not their priority. They mentioned metrics like:

* number of shops
* number of restaurants
* number of cafes
* overall similarity to their favorite suburb in Cologne.

They know such a decision is very subjective but since I have also lived in Cologne and I know the Cologne suburbs quite a bit, they asked me if I could make a mapping of Berlin-to-Colone suburbs. That way, since they are familiar with the Cologne suburbs, they can get a first impression idea of the Berlin suburbs and reach easier a decision more tailored to their needs.

##  Description of the Data

In order to tackle such a problem, I will do the following:

1. I will identify all suburbs of Berlin and Cologne. The **Foursquare API** does not provide such information so I will use the **Open Street Maps API**. This way I will geolocalize each suburb by a representative point.

2. I will query the **Foursquare API** around each representative suburb point using a radius that remains to be determined such that enough of each suburb *character* is captured. Overlaps of queries from neighboring suburbs will blend the suburb boundaries which is a desired effect. This is because the reality on the ground is not influenced in any way by the administrative boundaries. A neighborhood can evolve across suburb boundaries and maintain its character. In such case, if the majority of the neighborhood lies only on one side of the boundary the similarity with the neighboring suburb will be missed if we do not allow for certain query buffer.

3. For the coordinate system transformation (CRS) from the World Geodesic System (WGS) latitude/longitude to Universal Transverse Mercador (UTM) cartesian I will use [EPSG:5243](https://epsg.io/5243) which is appropriate for Germany.

## Data Collection

Here I demonstrate the initial steps my project. I query the **Open Street Maps API** for the suburbs of Berlin and Cologne.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sb
import re

plt.rcParams.update({
    'figure.figsize': (20,10),
    'figure.dpi': 300,
    'savefig.dpi': 300,
    'font.size': 22,
    'axes.spines.left': True,
    'axes.spines.bottom': True,
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.linewidth': 1.1,
    'lines.linewidth': 1.5,
    'lines.markersize': 4,
    'xtick.labelsize': 22,
    'ytick.labelsize': 22,
    'xtick.major.size': 0,
    'xtick.minor.size': 0,
    'ytick.major.size': 0,
    'ytick.minor.size': 0,
    'grid.linewidth': 0,
    'legend.frameon': False,
    })




In [2]:
import json
import folium
from OSMPythonTools.overpass import Overpass, overpassQueryBuilder
from OSMPythonTools.nominatim import Nominatim

In [3]:
#  wrapper function to query OSM REST API once for data and then save them for reuse
def query_OSM(areaId, elementType='node', selector='', out='body', json_file='out.json'):
    try:
        with open(json_file, 'r') as f:
            query = json.load(f)
    except:
        query = Overpass().query(overpassQueryBuilder(area=areaId, elementType=elementType,
                                                      selector=selector, out=out)).toJSON()['elements']
        with open(json_file, 'w') as f:
            json.dump(query, f)
    return query

In [4]:
#  geolocalize Berlin and Cologne, the areaId is what we need for all OSM queries
nominatim = Nominatim()
Berlin= nominatim.query('Berlin, Germany')
Cologne = nominatim.query('Cologne, Germany')

###  collecting the suburbs

In [5]:
#  [Berlin] get suburbs names and center points
be_suburbs = query_OSM(areaId=Berlin.areaId(), elementType='node', selector='"place"="suburb"', out='body', 
          json_file='be_suburbs.json')

In [6]:
#  [Cologne] get suburbs names and center points
co_suburbs = query_OSM(areaId=Cologne.areaId(), elementType='node', selector='"place"="suburb"', out='body', 
          json_file='co_suburbs.json')

In [7]:
#  [Berlin] extract center coordinates and suburb names and convert to DataFrames
be_suburbs = pd.DataFrame(be_suburbs)
be_suburbs['suburb'] = be_suburbs.tags.apply(lambda t: t['name'])
be_suburbs = be_suburbs[['suburb', 'lat', 'lon']]

#  [Cologne] extract center coordinates and suburb names and convert to DataFrames
co_suburbs = pd.DataFrame(co_suburbs)
co_suburbs['suburb'] = co_suburbs.tags.apply(lambda t: t['name'])
co_suburbs = co_suburbs[['suburb', 'lat', 'lon']]

Let's visualize the representative points of the suburbs for the two cities.

In [8]:
#  Berlin suburbs
be_map = folium.Map(location=[float(Berlin.toJSON()[0]['lat']), float(Berlin.toJSON()[0]['lon'])], zoom_start=11)
for lat, lng, label in zip(be_suburbs.lat, be_suburbs.lon, be_suburbs.suburb):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(be_map)
be_map

In [9]:
#  Cologne suburbs
co_map = folium.Map(location=[float(Cologne.toJSON()[0]['lat']), float(Cologne.toJSON()[0]['lon'])], zoom_start=11)
for lat, lng, label in zip(co_suburbs.lat, co_suburbs.lon, co_suburbs.suburb):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(co_map)
co_map