## Räumliche Verknüpfung von OSM-Highway-Datensatz ohne `service` und SimRa-Datensatz 

In diesem Notebook werden die beiden Datensätze `osm_highway_without_service` und `simra_within_berlin`gemappt. 

- Für weitere Erläuterungen zum Vorgehen: siehe NotebooK [osm_highway_plus_simra](osm_highway_plus_simra)

In [2]:
import pandas as pd

import matplotlib.pyplot as plt
import geopandas as gpd

In [5]:
simra_data = gpd.read_file("../../data/processed_data/simra_within_berlin.geojson")

In [31]:
osm_highway = gpd.read_file("../../data/processed_data/osm_highway_without_service.geojson")

### Vergleich Koordinatensysteme

In [32]:
simra_data.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [33]:
osm_highway.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## Räumliche Verknüpfung (Spatial Join)

Um relevante Informationen aus beiden Datensätzen zu kombinieren, führen wir eine räumliche Verknüpfung (Spatial Join) unter Verwendung der `gpd.sjoin`-Methode von GeoPandas durch. Diese Methode ermöglicht es, geometrische Daten basierend auf ihren räumlichen Beziehungen zu verbinden.

In [34]:
# Räumliche Verknüpfung zwischen Polygondaten und den osm/fahrradnetzwerk/highway-Daten
intersections = gpd.sjoin(simra_data, osm_highway, how='left', predicate='intersects')

In [35]:
print(intersections.columns)

Index(['id', 'type', 'score', 'incidents', 'rides', 'markers', 'geometry',
       'index_right', 'highway'],
      dtype='object')


In [36]:
intersections.head()

Unnamed: 0,id,type,score,incidents,rides,markers,geometry,index_right,highway
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",23564.0,residential
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",68267.0,residential
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",162.0,residential
1,"[196724641, 196725586, 866264912].0",Junction,0.000649,1,1541,"[ [ [ 13.417860660000001, 52.514469009999999 ]...","POLYGON ((13.41751 52.51461, 13.41779 52.51442...",11276.0,secondary
1,"[196724641, 196725586, 866264912].0",Junction,0.000649,1,1541,"[ [ [ 13.417860660000001, 52.514469009999999 ]...","POLYGON ((13.41751 52.51461, 13.41779 52.51442...",11278.0,secondary


### Mehrere highway-Typen werden einem Polygon/Segment zugeordnet

Vgl. für weitere Erklärungen: Notebook [osm_highway_plus_simra](osm_highway_plus_simra)

### Überprüfen der NaN-Werte und deren Geometrien

In [37]:
# Zeilen, die NaN in 'highway' (bzw. in 'index_right') enthalten
nan_values = intersections[intersections['highway'].isna()]

print(f"Anzahl der Zeilen mit NaN in 'highway': {len(nan_values)}")

Anzahl der Zeilen mit NaN in 'highway': 572


### Entfernen der NaN-Werte

In [38]:
cleaned_intersections = intersections.dropna(subset=['highway'])

print(f"Anzahl der verbleibenden Zeilen nach dem Entfernen der NaN-Werte: {len(cleaned_intersections)}")

Anzahl der verbleibenden Zeilen nach dem Entfernen der NaN-Werte: 72095


In [39]:
# Sicherstellen, dass keine NaN-Werte mehr in der Spalte 'index_right' vorhanden sind
nan_values_after_cleaning = cleaned_intersections[cleaned_intersections['index_right'].isna()]

print(f"Anzahl der NaN-Werte nach der Bereinigung: {len(nan_values_after_cleaning)}")

Anzahl der NaN-Werte nach der Bereinigung: 0


In [44]:
cleaned_intersections.head(3)

Unnamed: 0,id,type,score,incidents,rides,markers,geometry,index_right,highway
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",23564.0,residential
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",68267.0,residential
0,[79310].0,Street,0.0,0,57,[ ],"POLYGON ((13.37410 52.53031, 13.37421 52.53020...",162.0,residential


### Gruppieren der Polygone und Zusammenfassen der `highway´-Werte

In [45]:
cleaned_intersections = gpd.GeoDataFrame(cleaned_intersections, geometry='geometry')

In [46]:
# Funktion zum Kombinieren der 'highway'-Werte 
def combine_highways(x):
    return ', '.join(x)  # Doppelte Einträge bleiben erhalten und werden verbunden

# Gruppieren nach 'id' und 'geometry' und Aggregation
grouped_data = cleaned_intersections.groupby(['id', 'geometry']).agg({
    'type': 'first',       # Erster Wert (da alle Werte gleich)
    'score': 'first',      
    'incidents': 'first',  
    'rides': 'first',      
    'markers': 'first',    
    'index_right': 'first',
    'highway': combine_highways  # Kombinieren der 'highway' Werte
}).reset_index()

In [47]:
# Umwandlung zurück in ein GeoDataFrame
grouped_data = gpd.GeoDataFrame(grouped_data, geometry='geometry', crs=cleaned_intersections.crs)

In [61]:
grouped_data.head(3)

Unnamed: 0,id,geometry,type,score,incidents,rides,markers,index_right,highway,highway_list
0,[100049].0,"POLYGON ((13.45412 52.54035, 13.45320 52.53977...",Street,0.0,0,138,[ ],35281.0,"secondary, secondary, cycleway, secondary","[secondary, secondary, cycleway, secondary]"
1,[100069498].0,"POLYGON ((13.52273 52.50704, 13.52248 52.50690...",Junction,0.0,0,200,[ ],44754.0,"residential, residential, residential","[residential, residential, residential]"
2,"[100078509, 288268004, 3888645535].0","POLYGON ((13.47754 52.51457, 13.47782 52.51438...",Junction,0.0,0,54,[ ],41983.0,"secondary, secondary, cycleway, secondary, sec...","[secondary, secondary, cycleway, secondary, se..."


In [49]:
grouped_data.shape

(15722, 9)

### Dummy-Variablen für die `highway`-Spalte erstellen

In [50]:
# Funktion zum Zerlegen der 'highway'-Spalten in Listen
grouped_data['highway_list'] = grouped_data['highway'].apply(lambda x: x.split(', '))

In [51]:
# Funktion zum Zerlegen der 'highway'-Spalten in Listen
grouped_data['highway_list'] = grouped_data['highway'].apply(lambda x: x.split(', '))

In [52]:
grouped_data.head(2)

Unnamed: 0,id,geometry,type,score,incidents,rides,markers,index_right,highway,highway_list
0,[100049].0,"POLYGON ((13.45412 52.54035, 13.45320 52.53977...",Street,0.0,0,138,[ ],35281.0,"secondary, secondary, cycleway, secondary","[secondary, secondary, cycleway, secondary]"
1,[100069498].0,"POLYGON ((13.52273 52.50704, 13.52248 52.50690...",Junction,0.0,0,200,[ ],44754.0,"residential, residential, residential","[residential, residential, residential]"


In [53]:
# Alle einzigartigen `highway`-Typen finden
unique_highways = sorted(set(sum(grouped_data['highway_list'].tolist(), [])))

print(unique_highways)

['cycleway', 'footway', 'highway_rare', 'living_street', 'path', 'primary', 'residential', 'secondary', 'tertiary', 'track']


### Berechnen und Zuordnen der Anteile der jeweiligen highway-Typen pro Polygon

In [54]:
# Funktion zur Berechnung der Anteilswerte - berechnet die Anteile der jeweiligen `highway`-Typen pro Zeile
def calculate_highway_ratios(row, highway_types):
    total_count = len(row['highway_list'])
    counts = pd.Series(row['highway_list']).value_counts()
    return {highway: counts.get(highway, 0) / total_count for highway in highway_types}

In [55]:
# Anwendung der Funktion auf den GeoDataFrame
highway_ratios = grouped_data.apply(calculate_highway_ratios, axis=1, highway_types=unique_highways) # axis=1 --> Fkt. wird aus Zeilen angewendet
ratios_df = pd.DataFrame(list(highway_ratios))

In [56]:
# Zusammenführen der Ergebnisse mit dem ursprünglichen GeoDataFrame
gdf = pd.concat([grouped_data, ratios_df], axis=1)

In [57]:
# Entfernen der temporären Spalte 'highway_list'
gdf.drop(columns=['highway_list'], inplace=True)

In [58]:
gdf.head(3)

Unnamed: 0,id,geometry,type,score,incidents,rides,markers,index_right,highway,cycleway,footway,highway_rare,living_street,path,primary,residential,secondary,tertiary,track
0,[100049].0,"POLYGON ((13.45412 52.54035, 13.45320 52.53977...",Street,0.0,0,138,[ ],35281.0,"secondary, secondary, cycleway, secondary",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0
1,[100069498].0,"POLYGON ((13.52273 52.50704, 13.52248 52.50690...",Junction,0.0,0,200,[ ],44754.0,"residential, residential, residential",0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,"[100078509, 288268004, 3888645535].0","POLYGON ((13.47754 52.51457, 13.47782 52.51438...",Junction,0.0,0,54,[ ],41983.0,"secondary, secondary, cycleway, secondary, sec...",0.384615,0.0,0.0,0.0,0.0,0.0,0.153846,0.461538,0.0,0.0


In [59]:
gdf.shape

(15722, 19)

In [62]:
# Speichern des gdf
output_filename = "../../data/processed_data/osm_highway_without_service_ratios.geojson"
gdf.to_file(output_filename, driver='GeoJSON')

print(f"Datei gespeichert.")

Datei gespeichert.


### Bemerkungen
Vergleich zum Datenatz osm_highway_plus_simra (Datensatz, in dem `service`enthalten ist):
 - highway ohne `service` gemappt mit Simra hat 15721 Einträge
 - highway mit `service` gemappt mit Simra hat 15917 Einträge