# OpenStreetData Case Study for the Metro Area of Berlin, Germany

## Map Area

Berlin Metro Area, Germany. Berlin was choosen, since it is my hometown.

## Data Wrangling

I've downloaded the available data from https://mapzen.com/data/metro-extracts/ (May 2nd, 2016), extracted nodes and ways and imported the data into a sqlite database (See file data_preparation.py, database schama see schema.txt). Some unknown characters in usernames (kyrillic) lead to problem extracting the 'nodes' and 'ways' from the OSM File. SQL import errors lead to a not complete database. 

Therefore, the count of rows was checked against the csv files:

In [76]:
import sqlite3
from collections import Counter, defaultdict
import datetime

conn = sqlite3.connect('data1.db')
c = conn.cursor()

def time_from_timestamp(timestamp_input):
    year = int(timestamp_input[:4])
    month = int(timestamp_input[5:7])
    day = int(timestamp_input[8:10])
    return datetime.datetime(year, month, day)


In [2]:
# Warning: Takes some time with the big csv files.

import pandas as pd


df_nodes = pd.DataFrame.from_csv('nodes.csv')
df_nodes_tags = pd.DataFrame.from_csv('nodes_tags.csv')
df_ways = pd.DataFrame.from_csv('ways.csv')
df_ways_tags = pd.DataFrame.from_csv('ways_tags.csv')
df_ways_nodes = pd.DataFrame.from_csv('ways_nodes.csv')

In [3]:
print("Count of rows in csv files")
print("nodes: ", len(df_nodes.index.values))
print("nodes_tags: ", len(df_nodes_tags.index.values))
print("ways: ", len(df_ways.index.values))
print("ways_tags: ", len(df_ways_tags.index.values))
print("ways_nodes: ", len(df_ways_nodes.index.values))

Count of rows in csv files
nodes:  10460000
nodes_tags:  3658234
ways:  1596861
ways_tags:  4191676
ways_nodes:  13362536


Count of rows in database:

    SELECT Count(*) FROM nodes;
    > 10,460,000
    
    SELECT Count(*) FROM nodes_tags;
    > 3,658,235
    
    SELECT Count(*) FROM ways;
    > 1,596,861
    
    SELECT Count(*) FROM ways_tags;
    > 4,191,677
    
    SELECT Count(*) FROM ways_nodes;
    > 13,362,537

The tables nodes_tags, ways_tags and ways_nodes are one row longer. A manual inspection revealed that the column names were added as a data row. These rows were removed manualy. 

Furthermore, since nodes_tags and nodes_ways are sub to nodes and ways, ids from the tags file should allways refer to a valid id in the nodes or ways file.

In [4]:
print(set(df_nodes_tags.index.values) <= set(df_nodes.index.values))
print(set(df_ways_tags.index.values) <= set(df_ways.index.values))

True
True


In [2]:
conn = sqlite3.connect('data1.db')
c = conn.cursor()

nodes_set = set([n[0] for n in c.execute("SELECT id FROM nodes").fetchall()])
nodes_tags_set = set([n[0] for n in c.execute("SELECT id FROM nodes_tags").fetchall()])
ways_set = set([n[0] for n in c.execute("SELECT id FROM ways").fetchall()])
ways_tags_set = set([n[0] for n in c.execute("SELECT id FROM ways_tags").fetchall()])

print(nodes_tags_set <= nodes_set)
print(ways_tags_set <= ways_set)

True
True


## File Sizes

* 'berlin.osm':    2.29 GB (uncompressed)
* 'nodes.csv':      833 MB
* 'nodes_tags.csv': 131 MB
* 'ways.csv':        93 MB
* 'ways_nodes.csv': 316 MB
* 'ways_tags.csv':  140 MB

## Evaluating the data

While evaluating the data the following problems were encountered:

* (nodes table) Columns lat and lon use different precision. 
* (nodes_tags table) Column key has values that are probably inconsistencies, like 'addr' and 'address' or 'abbr' and 'abrevation'
* (nodes_tags table) The key 'fixme', 'FIXME' and 'TODO' was found.
* (ways_tags table) The column value holds unexpected values for column key filtered for maxspeed. 250 is unlikely (39 times) as well as 210 or 190. Also the max limit 30 seems to be encoded in various different ways (30, DE:zone30, DE:zone:30, DE:30, PL:zone30, DE:zone(:30), zone30)
* (ways_tags table) The column value holds unexpected values for the column key filtered by postcode. Postcodes are five digits starting (in Berlin) with a 1. '66-470' (1,632 times), '74-500' (1,486 times) and '74-505' (938 times) do not match this criteria. There are codes starting with a '0' (mostly area around Berlin) and one code is '39264' (a place called Deetz and quite a bit away from Berlin).

## Evaluating the problems

### nodes table : columns lat and lon

An evaluation found, that there are 20 rows in the nodes table, that have a precision of 3 places or less after the dot. Therefore, for most uses this would be considered not accurate enough. Most accuracy goes 4 places better. Examining the relates nodes_tags and ways_tags for these rows revelead that most rows represented an real world object that had a big 'footprint' and therefore can not be pinpointed to a very precise location. Five rows are villages and eleven are lakes. For the rest we might not rule out the possibilty that just zeros were removed at the end and that the lon and lat actually are the most precise coordinates

In [1]:
conn = sqlite3.connect('data1.db')
c = conn.cursor()

statement = """
SELECT lat, lon, id
FROM nodes
"""

data = c.execute(statement).fetchall()

data = [n for n in data if len(str(n[0])) <= 6 and len(str(n[1])) <= 6]

In [2]:
for n in data:
    print(n)

(52.889, 14.511, 75809260)
(52.925, 14.359, 75811800)
(52.931, 14.37, 75811870)
(52.929, 14.368, 75811875)
(52.912, 14.507, 75811998)
(52.909, 14.507, 75812002)
(52.876, 14.507, 75812076)
(52.829, 14.692, 75812951)
(52.834, 14.688, 75812984)
(52.837, 14.685, 75812992)
(52.47, 14.539, 240030792)
(52.95, 14.0, 240041259)
(52.745, 12.4, 240045498)
(51.85, 13.75, 240070698)
(52.45, 13.05, 240093837)
(52.249, 14.435, 792865253)
(52.36, 14.303, 1112433449)
(52.345, 14.59, 2584081061)
(52.75, 12.9, 3029244870)
(52.513, 13.388, 4469846192)


In [4]:
result = 0
for lon, lat, id_ in data:
    statement = "SELECT * FROM nodes_tags WHERE id = {}".format(id_)
    if not c.execute(statement).fetchall():
        result += 1

print("No tags attached: ", result)

No tags attached:  9


In [7]:
for lon, lat, id_ in data:
    statement = "SELECT * FROM nodes_tags WHERE id = {}".format(id_)
    d = c.execute(statement).fetchall()
    if d:
        print(id_,  "---------------")
        for n in d:
            if n[1] == 'place':          
                print(n)

75812951 ---------------
75812984 ---------------
75812992 ---------------
240030792 ---------------
(240030792, 'place', 'village', 'regular\r')
240041259 ---------------
(240041259, 'place', 'village', 'regular\r')
240045498 ---------------
(240045498, 'place', 'village', 'regular\r')
240070698 ---------------
(240070698, 'place', 'village', 'regular\r')
240093837 ---------------
(240093837, 'place', 'village', 'regular\r')
2584081061 ---------------
3029244870 ---------------
4469846192 ---------------


In [8]:
for lon, lat, id_ in data:
    statement = """SELECT ways_tags.key, ways_tags.value, ways_tags.type, nodes.id
    FROM ways_tags
    JOIN ways ON ways.id = ways_tags.id
    JOIN ways_nodes ON ways_nodes.id = ways.id
    JOIN nodes ON ways_nodes.node_id = nodes.id
      WHERE nodes.id = {};""".format(id_)

    d = c.execute(statement).fetchall()
    for n in d:
        print(n)

('name', 'Jezioro Narost', 'regular\r', 75809260)
('water', 'lake', 'regular\r', 75809260)
('source', 'Dshpak_landsat_lakes', 'regular\r', 75809260)
('de', 'Nordhausener See', 'name\r', 75809260)
('natural', 'water', 'regular\r', 75809260)
('name', 'Jezioro Mętno', 'regular\r', 75811800)
('water', 'lake', 'regular\r', 75811800)
('source', 'Dshpak_landsat_lakes', 'regular\r', 75811800)
('de', 'Mantelsee', 'name\r', 75811800)
('natural', 'water', 'regular\r', 75811800)
('name', 'Jezioro Mętno', 'regular\r', 75811870)
('water', 'lake', 'regular\r', 75811870)
('source', 'Dshpak_landsat_lakes', 'regular\r', 75811870)
('de', 'Mantelsee', 'name\r', 75811870)
('natural', 'water', 'regular\r', 75811870)
('name', 'Jezioro Mętno', 'regular\r', 75811875)
('water', 'lake', 'regular\r', 75811875)
('source', 'Dshpak_landsat_lakes', 'regular\r', 75811875)
('de', 'Mantelsee', 'name\r', 75811875)
('natural', 'water', 'regular\r', 75811875)
('name', 'Jezioro Jeleńskie', 'regular\r', 75811998)
('water', '

### nodes_tags table:  Inconsistent keys

In [11]:
statement = """
SELECT key, Count(key)
FROM nodes_tags
GROUP BY key
ORDER BY Count(key) DESC
LIMIT 30;
"""

nodes_tags_fetch = c.execute(statement).fetchall()

In [12]:
nodes_tags_fetch

[('housenumber', 387564),
 ('street', 386634),
 ('city', 383870),
 ('postcode', 381582),
 ('country', 364269),
 ('suburb', 322496),
 ('source', 263220),
 ('natural', 128327),
 ('leaf_type', 114582),
 ('leaf_cycle', 109711),
 ('name', 85603),
 ('amenity', 54410),
 ('highway', 37567),
 ('wheelchair', 31144),
 ('entrance', 27180),
 ('power', 24791),
 ('type', 21607),
 ('barrier', 19195),
 ('operator', 18235),
 ('shop', 17823),
 ('railway', 17451),
 ('created_by', 17023),
 ('ref', 16778),
 ('website', 15989),
 ('public_transport', 15660),
 ('inclusion', 13228),
 ('phone', 12625),
 ('bus', 11037),
 ('opening_hours', 9746),
 ('tourism', 7783)]

In [17]:
statement = """
SELECT key
FROM nodes_tags
"""

nodes_tags_fetch2 = c.execute(statement).fetchall()

In [18]:
nodes_tags_list = list(set([n[0] for n in nodes_tags_fetch2]))


In [24]:
possible_doubles = []
for tag in nodes_tags_list:
    for other_tag in nodes_tags_list:
        if tag == other_tag:
            pass
        elif ":" in other_tag:
            pass
        elif len(tag) == 2:
            pass
        elif other_tag.startswith(tag):
            possible_doubles.append([tag, other_tag])

# Uncomment to see possible doubles
# possible_doubles

I'm looking at:
* ['art', 'artist']
* 'drink_water', 'drinking_water'
* ['addr', 'address']

In [29]:
statement = """
SELECT * 
FROM nodes_tags
WHERE key = "art" OR key = "artist"
LIMIT 20;
"""

for n in c.execute(statement).fetchall():
    print(n)

(277525548, 'artist', 'Hermann Hosaeus', 'regular\r')
(313826576, 'artist', 'Brigitte und Martin Matschinsky-Denninghoff', 'regular\r')
(437150946, 'artist', 'Henry Moore', 'regular\r')
(440937810, 'artist', 'Gerhard Thieme', 'regular\r')
(1038089721, 'artist', 'Heinz Mack', 'regular\r')
(1049418449, 'artist', 'Rainer Fest', 'regular\r')
(1181494510, 'artist', 'Gottfried Gruner', 'regular\r')
(1375278177, 'art', 'yes', 'museum\r')
(1482191891, 'artist', 'Mark di Suvero', 'regular\r')
(1482191892, 'artist', 'Frank Stella', 'regular\r')
(1482191893, 'artist', 'Keith Haring', 'regular\r')
(1489096811, 'artist', 'Bernhard Heiliger', 'regular\r')
(1670240199, 'art', 'exhibition_space', 'regular\r')
(1977116866, 'art', 'gallery', 'regular\r')
(1979988661, 'artist', 'Ernst Leonhardt', 'regular\r')
(2123579006, 'artist', 'Egbert Broerken', 'regular\r')
(2123643791, 'artist', 'Egbert Broerken', 'regular\r')
(2123680364, 'artist', 'Egbert Broerken', 'regular\r')
(2554278342, 'artist', 'Gerhard S

'art' seems not to be an abrevation for 'artist'. 

In [32]:
statement = """
SELECT * 
FROM nodes_tags
WHERE key = "drink_water"
LIMIT 10;
"""

for n in c.execute(statement).fetchall():
    print(n)

(2459611822, 'drink_water', 'yes', 'regular\r')


In [34]:
statement = """
SELECT * 
FROM nodes_tags
WHERE key = "drinking_water"
GROUP BY value
LIMIT 25;
"""

for n in c.execute(statement).fetchall():
    print(n)

(3627125409, 'drinking_water', 'Bah', 'regular\r')
(4450877764, 'drinking_water', 'no', 'regular\r')
(3872785663, 'drinking_water', 'pump', 'regular\r')
(4367704200, 'drinking_water', 'seasonal', 'regular\r')
(4396179327, 'drinking_water', 'yes', 'regular\r')


In [35]:
statement = """
SELECT * 
FROM nodes_tags
WHERE key = "drinking_water"
AND value = "Bah"
LIMIT 25;
"""

for n in c.execute(statement).fetchall():
    print(n)

(3627125409, 'drinking_water', 'Bah', 'regular\r')


The single entry "drink_water" seems to be not different from "drinking_water". Therefore "drink_water" could be changed into "drinking_water".

The value also seems to be not standardized. It is sometime an boolean value (yes or no) and sometimes a further description. Also the value "Bah" seems pointless.

In [37]:
statement = """
SELECT key, Count(*)
FROM nodes_tags
WHERE key = "addr" OR key = "address"
GROUP BY key
"""

for n in c.execute(statement).fetchall():
    print(n)

('addr', 5486)
('address', 3)


In [38]:
statement = """
SELECT *
FROM nodes_tags
WHERE key = "address"
"""

for n in c.execute(statement).fetchall():
    print(n)

(1879818772, 'address', 'Am Parsteinsee 1, 16278 Angermünde', 'regular\r')
(1880630233, 'address', 'Haus Lindenstraße, Lindenstraße 43, 16278 Angermünde', 'regular\r')
(1901746292, 'address', 'Carrée Seestrasse, Geb.B, Aufg. 12/13; 3.OG', 'description\r')


In [39]:
statement = """
SELECT *
FROM nodes_tags
WHERE key = "addr"
LIMIT 10;
"""

for n in c.execute(statement).fetchall():
    print(n)

(59954787, 'addr', 'Müllerhaus, Alt-Marzahn 63, 12685 Berlin - Jürgen Wolf', 'contact\r')
(84644791, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(105184468, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(304565290, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(310462222, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(518840681, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(522469205, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(522472057, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(523257110, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')
(541075880, 'addr', 'Geoportal Berlin / Hauskoordinaten', 'source\r')


'addr' seems to be the dominantly used key. An inspection of the 3 rows did not reveal anything special. Therefore the 3 rows should be changed to 'addr

### nodes_tags table: fixme and todo keys

In [41]:
statement = """
SELECT key, Count(*)
FROM nodes_tags
WHERE key = "fixme" OR key = "FIXME" OR key = "todo" OR key = "TODO"
GROUP BY key
LIMIT 10;
"""
for n in c.execute(statement).fetchall():
    print(n)

('FIXME', 194)
('TODO', 1)
('fixme', 968)


In [42]:
statement = """
SELECT *
FROM nodes_tags
WHERE key = "fixme" OR key = "FIXME" OR key = "todo" OR key = "TODO"
LIMIT 3;
"""
for n in c.execute(statement).fetchall():
    print(n)

(484061, 'FIXME', 'according to my observation on 2016-07-29, there is no device anymore. Confirm? -- it is temporary', 'regular\r')
(27411605, 'fixme', 'Zs-Anzeiger?', 'regular\r')
(29123772, 'fixme', 'tracks not accurate', 'regular\r')


In [43]:
statement = """
SELECT *
FROM nodes_tags, nodes
WHERE nodes_tags.id = nodes.id AND nodes_tags.id = 484061
LIMIT 3;
"""
for n in c.execute(statement).fetchall():
    print(n)

(484061, 'FIXME', 'according to my observation on 2016-07-29, there is no device anymore. Confirm? -- it is temporary', 'regular\r', 484061, 52.290366, 12.9181252, 'gadacz', 121453, 13, 42375942, '2016-09-23T14:45:47Z\r')
(484061, 'highway', 'speed_camera', 'regular\r', 484061, 52.290366, 12.9181252, 'gadacz', 121453, 13, 42375942, '2016-09-23T14:45:47Z\r')
(484061, 'maxspeed', '100', 'regular\r', 484061, 52.290366, 12.9181252, 'gadacz', 121453, 13, 42375942, '2016-09-23T14:45:47Z\r')


An inspection revelead no simple solution to fix the 'fixme' programaticaly. I looked at the first row, and I don't know it the speed camera at the A9/A10 highway is temporary or not.

'FIXME', 'TODO', 'fixme' could be joined to just 'fixme'.

### ways_tags table: maxspeed

Sind Tempolimit 30 und Zone 30 das Gleiche?

The evaluation of the ways_tags table showed that the meaning of the key "maxspeed" is ambigious. I sometimes refers to the speedlimit imposed by law (eg. 30, 50, 100) and sometimes to the technical maxspeed as comissioned (230). Both have a different meaning and should not be mixed into one key. The type of the value also differed often between plain numbers (30, 50, ...), number with a unit or sometimes a long text with a description.

The evaluation also showed that the key to designate a speed limit of 30 kph was predominatly refered to by the value '30'. There were six more differenz kind of values that obviously meant the same (DE:zone30, etc.). 


In [19]:
# Find everything related to maxspeed 30:
statement = """
SELECT key, value, COUNT(key)
FROM ways_tags
WHERE key = "maxspeed"
AND value LIKE "%30%"
GROUP BY value
ORDER BY COUNT(key) DESC;
"""

print("key: maxspeed")
for n in c.execute(statement).fetchall():
    f = n[1][:15]
    print(f, " " * (20 - len(f) + (6 - len(str(n[2])))), n[2])

key: maxspeed
30                     31626
DE:zone30                830
DE:zone:30               824
DE:30                    201
130                      156
230                       91
Laut Anlage 13.           28
PL:zone30                 23
DE:zone(:30)               2
Zs3 "4" am Sign            1
zone30                     1


### ways_tags table: postcode

In [47]:
statement = """
SELECT value, COUNT(key)
FROM ways_tags
WHERE key = "postcode"
GROUP BY value
ORDER BY COUNT(key) DESC;
"""

for n in c.execute(statement).fetchall():
    if len(n[0]) != 5:
        print(n)

('66-470', 1632)
('74-500', 1486)
('74-505', 938)
('74-520', 860)
('74-510', 785)
('74-503', 671)
('74-405', 101)
('74-406', 78)
('69-113', 76)
('74-400', 69)
('66-629', 61)
('69-100', 12)
('74-404', 7)
('74-407', 4)
('74-311', 1)
('operator website, needs verification, might be 15366 Hoppegarten', 1)


In [52]:
statement = """
SELECT *
FROM ways_tags
WHERE key = "postcode"
AND value = '66-470'
LIMIT 1;
"""

for n in c.execute(statement).fetchall():
    print(n)

(88408900, 'postcode', '66-470', 'addr\r')


In [54]:
statement = """
SELECT *
FROM ways_tags, ways
WHERE ways_tags.id = ways.id
AND ways_tags.id = 88408900
"""

for n in c.execute(statement).fetchall():
    print(n)

(88408900, 'name', 'Intermarché', 'regular\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'shop', 'supermarket', 'regular\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'building', 'retail', 'regular\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'operator', 'Groupement des Mousquetaires', 'regular\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'city', 'Kostrzyn nad Odrą', 'addr\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'shape', 'flat', 'roof\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'wheelchair', 'limited', 'regular\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'street', 'Władysława Sikorskiego', 'addr\r', 88408900, 'ziomek_', 3266826, '7', 37922512, '2016-03-18T15:53:07Z\r')
(88408900, 'addr', 'mkostrzynn

My first guess was historical berlin postcodes, but an evaluation showed that the postcodes are correct polish postcodes for a a small town behind the border to germany.

## Evaluating the contributors

### Number of unique contributors

In [11]:
print(c.execute("SELECT Count(*) FROM (SELECT uid FROM nodes UNION SELECT uid FROM ways) tmp;").fetchall()[0][0])

7903


### Top 15 contributors by count

The top 15 constributors each amass considerable rate of above 100,000 each. The top contributor has over 2.378 Mil. Any amount like this can only achieved programaticaly.

In [23]:
statement = """
SELECT user, COUNT(*) FROM nodes
  GROUP BY user
UNION ALL
SELECT user, COUNT(*) FROM ways
  GROUP BY user
ORDER BY COUNT(*) DESC
LIMIT 15;
"""

for n in c.execute(statement).fetchall():
    nr = "{:,}".format(n[1])
    print(n[0], " " * (20 - len(n[0])), " " * (9 - len(nr)), nr)

atpl_pilot             2,378,801
jacobbraeutigam          574,371
r-michael                337,015
streckenkundler          335,778
anbr                     329,417
atpl_pilot               312,716
WegefanHB                281,135
Bot45715                 242,853
Konrad Aust              166,110
toaster                  156,494
Elwood                   151,421
g0ldfish                 145,945
geozeisig                120,498
Polarbear                116,260
Randbewohner             102,982


### Duration of contribution for top 15

In [74]:
statement = """
SELECT user, timestamp 
FROM nodes
"""

users_w_time = [[n[0], time_from_timestamp(n[1])] for n in c.execute(statement).fetchall()] 

In [79]:
users = defaultdict(list)

for name, time in users_w_time:
    users[name] += [time]



In [80]:
users

defaultdict(list,
            {'': [datetime.datetime(2007, 10, 1, 0, 0),
              datetime.datetime(2007, 10, 1, 0, 0),
              datetime.datetime(2009, 3, 29, 0, 0),
              datetime.datetime(2009, 3, 29, 0, 0),
              datetime.datetime(2008, 10, 13, 0, 0),
              datetime.datetime(2008, 11, 2, 0, 0),
              datetime.datetime(2008, 11, 2, 0, 0),
              datetime.datetime(2008, 11, 2, 0, 0),
              datetime.datetime(2008, 11, 2, 0, 0)],
             'StefanNoebel': [datetime.datetime(2011, 9, 3, 0, 0),
              datetime.datetime(2011, 9, 3, 0, 0),
              datetime.datetime(2011, 9, 4, 0, 0),
              datetime.datetime(2011, 9, 4, 0, 0),
              datetime.datetime(2011, 9, 4, 0, 0),
              datetime.datetime(2011, 9, 4, 0, 0),
              datetime.datetime(2011, 9, 4, 0, 0),
              datetime.datetime(2011, 9, 3, 0, 0),
              datetime.datetime(2011, 9, 3, 0, 0),
              datetime.datetime(2

### Top 15 longest active contributors

## Additional Evaluations

### Anemities

### Cluster of italien places

## Ideas for Improvement

The data for Berlin is generally on a high level. Common standards are partly missing for values like maxspeed on ways. It should be possible to work on this in a programmatical way.

## Conclusion

Berlin is big and cleaning up all the "fixme" and other open ends is a lifetime job. The data is generally considering the size quite good. 