# Data Preparing

As described in the data section, to compare the areas/boroughs of London/Frankfurt we have to get to now both cities a little bit better. London has a population of approximately 9 mio. people, covers 1,572 km$^{2}$ and is organised in city of London & 32 boroughs.

Frankfurt on the other side has a poulation of approximately 0,8 mio. people and covers only 248,31km$^{2}$. It is organized in 46 "Stadtteilen"/boroughs, but those boroughs are significantly samller then the boroughs of London. Because of that it is hard to compare the boroughs. 

I decided to compare the [London areas](https://en.wikipedia.org/wiki/List_of_areas_of_London) with the [boroughs of Frankfurt](https://de.wikipedia.org/wiki/Liste_der_Stadtteile_von_Frankfurt_am_Main) because the area and the population is more compareable.

E.g.
* London area: Barnes, area: 4,50 km$^{2}$, population: 21.218
* Frankfurt area: Ostend, area: 5,56 km$^{2}$, poulation: 29.171

In [1]:
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium

## Getting Areas of London

In [2]:
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_areas_of_London")
df = df[1]
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [3]:
df.drop(df.columns.difference(['Location','Post town', 'OS grid ref']), 1, inplace=True)
df.columns = ["Area", "Latitude", "Longitude"]
df.head()

Unnamed: 0,Area,Latitude,Longitude
0,Abbey Wood,LONDON,TQ465785
1,Acton,LONDON,TQ205805
2,Addington,CROYDON,TQ375645
3,Addiscombe,CROYDON,TQ345665
4,Albany Park,"BEXLEY, SIDCUP",TQ478728


In [4]:
# Delete all the (also ...)-strings after Borough name.
for i, s in enumerate(df["Area"]):
    if(" (" in s):
        num = s.find(" (")
        s = s[:num]
        df.at[i, "Area"] = s
df.head(50)

Unnamed: 0,Area,Latitude,Longitude
0,Abbey Wood,LONDON,TQ465785
1,Acton,LONDON,TQ205805
2,Addington,CROYDON,TQ375645
3,Addiscombe,CROYDON,TQ345665
4,Albany Park,"BEXLEY, SIDCUP",TQ478728
5,Aldborough Hatch,ILFORD,TQ455895
6,Aldgate,LONDON,TQ334813
7,Aldwych,LONDON,TQ307810
8,Alperton,WEMBLEY,TQ185835
9,Anerley,LONDON,TQ345695


### Adding latitude and longitude

In [5]:
for i, area in enumerate(df["Area"]):
    address = f'{area}, London, Great Britain'
    geolocator = Nominatim(user_agent="l_explorer", timeout=None)
    try:
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
    except:
        latitude = float('nan')
        longitude = float('nan')
    print(address, latitude, longitude)
    df.at[i, ["Latitude"]] = latitude
    df.at[i, ["Longitude"]] = longitude

df.head()


Abbey Wood, London, Great Britain 51.487621 0.1140504
Acton, London, Great Britain 51.5081402 -0.2732607
Addington, London, Great Britain 51.3586365 -0.0316347
Addiscombe, London, Great Britain 51.3796916 -0.0742821
Albany Park, London, Great Britain 51.4353837 0.1259653
Aldborough Hatch, London, Great Britain nan nan
Aldgate, London, Great Britain 51.5142477 -0.0757186
Aldwych, London, Great Britain 51.5116345 -0.1190235
Alperton, London, Great Britain 51.5408036 -0.3000963
Anerley, London, Great Britain 51.4075993 -0.0619394
Angel, London, Great Britain 51.5319458 -0.1061056
Aperfield, London, Great Britain 51.3166288 0.0327167
Archway, London, Great Britain 51.5654371 -0.1349977
Ardleigh Green, London, Great Britain 51.5841911 0.2209904
Arkley, London, Great Britain 51.6455827 -0.236258
Arnos Grove, London, Great Britain 51.6164024 -0.1332873
Balham, London, Great Britain 51.4456449 -0.1503643
Bankside, London, Great Britain 51.5125884 -0.3904739
Barbican, London, Great Britain 51.5

East Dulwich, London, Great Britain 51.4585551 -0.0727322
East Finchley, London, Great Britain 51.5871881 -0.1648137
East Ham, London, Great Britain 51.5329628 0.0553201
East Sheen, London, Great Britain 51.4623706 -0.267094
East Wickham, London, Great Britain 51.3773424 -0.0256802
Eastcote, London, Great Britain 51.5795418 -0.4016534
Eden Park, London, Great Britain 51.3901801 -0.0263691
Edgware, London, Great Britain 51.613673 -0.2749398
Edmonton, London, Great Britain 51.6203392 -0.0659791
Eel Pie Island, London, Great Britain 51.445126450000004 -0.3254075826905488
Elephant and Castle, London, Great Britain 51.4948884 -0.1005731
Elm Park, London, Great Britain 51.549579 0.1974303
Elmers End, London, Great Britain 51.3994675 -0.0416048
Elmstead, London, Great Britain 51.4181888 0.0469182
Eltham, London, Great Britain 51.4505687 0.0541253
Emerson Park, London, Great Britain 51.5686007 0.2202375
Enfield Highway, London, Great Britain 51.6570215 -0.0470996
Enfield Lock, London, Great Br

Loxford, London, Great Britain 51.5481943 0.084051
Maida Vale, London, Great Britain 51.527592 -0.1905916
Malden Rushett, London, Great Britain 51.3410523 -0.3190757
Manor House, London, Great Britain 51.5703208 -0.0963971
Manor Park, London, Great Britain 51.4269492 -0.1355069
Marks Gate, London, Great Britain 51.5863692 0.1348482
Maryland, London, Great Britain 51.5460532 0.0059223
Marylebone, London, Great Britain 51.5220589 -0.1504553
Mayfair, London, Great Britain 51.5110872 -0.147058
Maze Hill, London, Great Britain 51.482607 0.0038458
Merton Park, London, Great Britain 51.4079938 -0.2019654
Middle Park, London, Great Britain 51.4453135 0.0400849
Mile End, London, Great Britain 51.5253378 -0.033435
Mill Hill, London, Great Britain 51.6154423 -0.2330675
Millbank, London, Great Britain 51.4926121 -0.1290435
Millwall, London, Great Britain 51.4933423 -0.0212187
Mitcham, London, Great Britain 51.4058006 -0.1640787
Monken Hadley, London, Great Britain 51.66121575 -0.19100962162691934


Stockwell, London, Great Britain 51.472211 -0.1225014
Stoke Newington, London, Great Britain 51.5576971 -0.0772816
Stonebridge, London, Great Britain 51.5441099 -0.2762285
Stratford, London, Great Britain 51.541289 -0.0035472
Strawberry Hill, London, Great Britain 51.4385925 -0.3399367
Streatham, London, Great Britain 51.429769 -0.1311105
Stroud Green, London, Great Britain 51.5716864 -0.1114972
Sudbury, London, Great Britain 51.5143613 0.0640345
Sundridge, London, Great Britain 51.4198856 0.0202065
Surbiton, London, Great Britain 51.3937557 -0.3033105
Surrey Quays, London, Great Britain 51.4934207 -0.0478319
Sutton, London, Great Britain 51.3575114 -0.173640122600872
Swiss Cottage, London, Great Britain 51.543722 -0.174982
Sydenham, London, Great Britain 51.4269433 -0.0545387
Sydenham Hill, London, Great Britain 51.4326038 -0.072322
Teddington, London, Great Britain 51.4277844 -0.333653
Temple, London, Great Britain 51.5109659 -0.1143345
Temple Fortune, London, Great Britain 51.581912

Unnamed: 0,Area,Latitude,Longitude
0,Abbey Wood,51.4876,0.11405
1,Acton,51.5081,-0.273261
2,Addington,51.3586,-0.0316347
3,Addiscombe,51.3797,-0.0742821
4,Albany Park,51.4354,0.125965


In [6]:
df_l = df

## Deleting dublicates and nan values

In [7]:
print(df.shape)
print(df.isna().any())
df.dropna(0, inplace=True)
print(df.isna().any())

print("Duplicates?",df.duplicated().any())
df.drop_duplicates(inplace=True)
print(df.duplicated().any())
df.reset_index(inplace=True, drop=True)
df.shape

(533, 3)
Area         False
Latitude      True
Longitude     True
dtype: bool
Area         False
Latitude     False
Longitude    False
dtype: bool
Duplicates? True
False


(522, 3)

In [12]:
df.head(10)

Unnamed: 0,Area,Latitude,Longitude
0,Abbey Wood,51.4876,0.11405
1,Acton,51.5081,-0.273261
2,Addington,51.3586,-0.0316347
3,Addiscombe,51.3797,-0.0742821
4,Albany Park,51.4354,0.125965
5,Aldgate,51.5142,-0.0757186
6,Aldwych,51.5116,-0.119024
7,Alperton,51.5408,-0.300096
8,Anerley,51.4076,-0.0619394
9,Angel,51.5319,-0.106106


In [14]:
df.to_csv("london_areas_latlong.csv", index=False)    # uncomment if you want to save the dataframe

## Getting Frankfurt Stadtteile

In [9]:
# df_f means dataframe frankfurt
df_f = pd.read_html("https://de.wikipedia.org/wiki/Liste_der_Stadtteile_von_Frankfurt_am_Main")
df_f = df_f[0]
df_f.head()

Unnamed: 0,Nr.,Stadtteil,Fläche[3]in km²,Einwohner,Weiblich,Männlich,Deutsche,Ausländer,Ausländerin Prozent,Einwohnerje km²,Ortsbezirk,Stadtgebietseit,Vorherige Zugehörigkeit
0,1,Altstadt,,,,,,,373,8204,01 Innenstadt I,1866[Anm. 1],Freie Stadt Frankfurt
1,2,Innenstadt,,,,,,,468,4430,01 Innenstadt I,1866[Anm. 2],Freie Stadt Frankfurt
2,3,Bahnhofsviertel,,,,,,,54,6570,01 Innenstadt I,1866[Anm. 3],Freie Stadt Frankfurt
3,4,Westend-Süd,,,,,,,275,7538,02 Innenstadt II,1866[Anm. 3],Freie Stadt Frankfurt
4,5,Westend-Nord,,,,,,,293,6249,02 Innenstadt II,1866[Anm. 3],Freie Stadt Frankfurt


In [10]:
df_f.drop(df_f.columns.difference(['Stadtteil','Weiblich', 'Männlich']), 1, inplace=True)
df_f.columns = ["Borough", "Latitude", "Longitude"]
df_f.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Altstadt,,
1,Innenstadt,,
2,Bahnhofsviertel,,
3,Westend-Süd,,
4,Westend-Nord,,


### Adding latitude and longitude

In [11]:
for i, borough in enumerate(df_f["Borough"]):
    address = f'{borough}, Frankfurt, Germany'
    
    geolocator = Nominatim(user_agent="f_explorer", timeout=None)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    #print(latitude)
    df_f.at[i, ["Latitude"]] = latitude
    df_f.at[i, ["Longitude"]] = longitude

#df_f.to_csv("frankfurt_boroughs_latlong.csv", index=False)    # uncomment if you want to save the dataframe
df_f.head()


Unnamed: 0,Borough,Latitude,Longitude
0,Altstadt,50.1104,8.6829
1,Innenstadt,50.113,8.67434
2,Bahnhofsviertel,50.1077,8.66868
3,Westend-Süd,50.1152,8.66227
4,Westend-Nord,50.1264,8.66792
