# 📝 Notebook-Dokumentation

**Notebook:** `11_attraktionen__prepare_via_osmium_filter_pbf_v02_germany.ipynb`  
**Beschreibung:**  
Dieses Notebook verarbeitet eine vollständige `.pbf`-Datei (OpenStreetMap-Rohdaten für Deutschland) in mehreren Schritten:  
1. **Filtern relevanter POIs mit `osmium`**,  
2. **Konvertieren in GeoPackage mit `ogr2ogr`**,  
3. **Zusammenführen aller Layer in ein einheitliches Parquet-Format**.

Das Ergebnis ist ein nutzbarer Geodatenbestand für die weitere Verarbeitung in der Erreichbarkeitsanalyse.

---

### ⚠️ Voraussetzungen

- **[Osmium Tool](https://osmcode.org/osmium-tool/)**  
  Wird verwendet für schnelles Filtern relevanter OSM-Tags.
- **[ogr2ogr (GDAL)](https://gdal.org/download.html)**  
  Zum Konvertieren von `.pbf` nach `.gpkg`. Alternativ kann `ogr2ogr` aus einer QGIS-Installation genutzt werden.
- Umgebungsvariablen müssen korrekt gesetzt sein, damit die Tools global verfügbar sind.

---

### 📥 Input

- **Rohdaten:**  
  `germany-latest.osm-25-03-14.pbf`  
  Quelle: [Geofabrik Downloadservice](https://download.geofabrik.de/europe/germany.html)

- **Konfigurationsdatei:**  
  `osmconf.ini` für `ogr2ogr` (definiert, wie OSM-Tags zu Feldern gemappt werden)

---

### 🔧 Verarbeitungsschritte

1. **Filtern mit Osmium (`tags-filter`)**
   - Es werden nur relevante POI-Kategorien extrahiert:
     - Bildung (z. B. `school`, `university`)
     - Gesundheit (z. B. `hospital`, `doctors`)
     - Freizeit & Kultur (z. B. `cinema`, `museum`, `leisure=*`)
     - Verwaltung & Behörden (z. B. `townhall`, `police`)
     - Einzelhandel & Dienstleistungen (z. B. `shop=*`, `atm`, `hairdresser`)
     - Freiraum & Natur (`forest`, `cemetery`, `meadow`, `landuse=*`)
   - Ergebnis: gefilterte `.pbf`-Datei z. B. `full_germany_osm-25-03-14.pbf`

2. **Konvertieren mit `ogr2ogr`**
   - Konvertierung der `.pbf` in eine **GeoPackage-Datei** (`.gpkg`)
   - Verwendung der `osmconf.ini` zur Steuerung der Tag-Auswertung
   - Format: GeoPackage mit separaten Layern für `points`, `lines`, `multipolygons` etc.

3. **Zusammenführen der Layer**
   - Alle Layer des `.gpkg` werden in ein gemeinsames GeoDataFrame geladen
   - Layernamen werden als Spalte gespeichert (`layer`)
   - Export als einheitliche `.parquet`-Datei für schnellen Zugriff in Pandas / GeoPandas

---

### 📤 Output

- `osm_files/full_germany_osm-25-03-14.pbf`  
  (gefilterte PBF-Datei mit relevanten Objekten)
  
- `osm_files/full_germany_osm-25-03-14.gpkg`  
  (strukturierte OSM-Daten als GeoPackage, mehrere Layer)

- `osm_files/full_germany_osm-25-03-14.parquet`  
  (kombinierter Geodatensatz für effiziente Weiterverarbeitung)

---

### 🧾 Hinweise

- Die Filterabfrage in `osmium` ist vollständig anpassbar (z. B. nach weiteren `tags`)
- Die Parquet-Datei kann direkt in Jupyter oder in Scoring-Pipelines verwendet werden
- Für große Länderdateien dauert der gesamte Prozess nur ca. **1–2 Minuten** bei SSD

---


In [1]:
import geopandas as gpd
import pandas as pd


import subprocess

### First Step: Filter Data from OSM PBF based on the given tags

In [None]:
### takes for germany 1,5min

#### OSMIUM is needed!!! https://osmcode.org/osmium-tool/


# Set file paths inside the "osm_files" folder
base_dir = "osm_files/"

input_pbf = "germany-latest.osm-25-03-14.pbf"  # got from https://download.geofabrik.de/europe/germany.html 
extracted_pbf = base_dir + "full_germany_osm-25-03-14.pbf"


filter_command = [
    "osmium", "tags-filter", input_pbf,
    
    # Bildung (Schulen, Hochschulen, Weiterbildung)
    "nwr/amenity=school,kindergarten,library,college,university,driving_school,language_school,training",
    
    # Gesundheitseinrichtungen (Ärzte, Krankenhäuser, Apotheken, Pflegeheime)
    "nwr/amenity=doctors,dentist,clinic,hospital,pharmacy,nursing_home",
    "nwr/healthcare=alternative,counselling,dialysis,hospice,occupational_therapist,physiotherapist,psychotherapist,rehabilitation",
    
    # Kultur & Freizeit (Kinos, Theater, Museen, Restaurants, Bars, Cafés, religiöse Orte)
    "nwr/amenity=cinema,theatre,museum,arts_centre,place_of_worship,restaurant,bar,cafe",
    "nwr/leisure=fitness_centre,swimming_pool,sauna,sports_centre,park,playground,pitch,garden",
    
    # Verwaltung & Behörden (Rathaus, Gerichte, Feuerwehr, Polizei, Post, soziale Einrichtungen)
    "nwr/amenity=townhall,courthouse,fire_station,police,post_office,social_facility",
    "nwr/office=government,employment_agency",
    
    # Einzelhandel & Dienstleistungen (Banken, Geldautomaten, Friseure, Optiker, Lebensmittelgeschäfte, Tankstellen)
    "nwr/amenity=bank,atm,hairdresser,optician,fuel,ice_cream,fast_food,food,frozen_food,greengrocer,parcel_locker",
    "nwr/shop",  # Alle Geschäfte
    
    # Natur & Freiraum (Wälder, Wiesen, Friedhöfe)
    "nwr/landuse=forest,meadow,farmland,cemetery",
    
    "-o", extracted_pbf
]

print("🔹 Running: ", " ".join(filter_command))

try:
    result = subprocess.run(filter_command, capture_output=True, text=True, check=True)
    print(result.stdout)  # Erfolgreiche Ausgabe
except subprocess.CalledProcessError as e:
    print("❌ Osmium Fehler:", e.stderr)  # Fehlerausgabe von osmium

🔹 Running:  osmium tags-filter germany-latest.osm-25-03-14.pbf nwr/amenity=school,kindergarten,library,college,university,driving_school,language_school,training nwr/amenity=doctors,dentist,clinic,hospital,pharmacy,nursing_home nwr/healthcare=alternative,counselling,dialysis,hospice,occupational_therapist,physiotherapist,psychotherapist,rehabilitation nwr/amenity=cinema,theatre,museum,arts_centre,place_of_worship,restaurant,bar,cafe nwr/leisure=fitness_centre,swimming_pool,sauna,sports_centre,park,playground,pitch,garden nwr/amenity=townhall,courthouse,fire_station,police,post_office,social_facility nwr/office=government,employment_agency nwr/amenity=bank,atm,hairdresser,optician,fuel,ice_cream,fast_food,food,frozen_food,greengrocer,parcel_locker nwr/shop nwr/landuse=forest,meadow,farmland,cemetery -o osm_files/full_germany_osm-25-03-14.pbf



In [8]:
import subprocess

def ogrinfo_pbf(input_pbf):
    ogrinfo_path = r"C:\Program Files\QGIS 3.40.5\bin\ogrinfo.exe"
    
    cmd = [
        ogrinfo_path, 
        input_pbf
    ]

    print("Running:", " ".join(cmd))
    try:
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        print("OGRINFO Output:")
        print(result.stdout)  # This will display the information about the layers
    except subprocess.CalledProcessError as e:
        print("❌ ogrinfo failed:")
        print(e.stderr)
        raise

# Example usage
input_pbf = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\Wildau_attractions_new.pbf"
ogrinfo_pbf(input_pbf)

Running: C:\Program Files\QGIS 3.40.5\bin\ogrinfo.exe C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\Wildau_attractions_new.pbf
OGRINFO Output:
INFO: Open of `C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\Wildau_attractions_new.pbf'
      using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)



## tranform filtered pbf germany to geopackage

 ogr2ogr is needed!!! either install via gdal or use the QGIS version installed on your machine
 https://gdal.org/download.html   
 make sure the environment variable is set to the gdal version you want to use


In [33]:
import subprocess
import os

def ogr2ogr_to_geopackage(input_pbf, output_gpkg, osmconf_path=None):
    ogr2ogr_path = r"C:\Program Files\QGIS 3.40.5\bin\ogr2ogr.exe"

    cmd = [
        ogr2ogr_path,
        "-overwrite",                 # Overwrite the output file if it exists
        "--config",                   # Use configuration file
        "OSM_CONFIG_FILE",            # The config parameter key
        osmconf_path,                 # Path to the osmconf.ini
        "-f", "GPKG",  # Output format: GeoPackage
        #"-f", "Parquet",  # Output format: GeoPackage
        output_gpkg,   # Output GeoPackage file
        input_pbf#,      # Input .pbf file
       # "-oo OSM_CONFIG_FILE=C:/path/to/your/osmconf.ini"
    ]
    print("Running:", " ".join(cmd))  # Print the full command for debugging

    try:
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        print("GeoPackage created successfully:")
        print(result.stdout)
    except subprocess.CalledProcessError as e:
        print("❌ ogr2ogr failed during conversion to GeoPackage:")
        print(e.stderr)
        raise

# Example usage
#input_pbf = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\Wildau_attractions_new.pbf"
input_pbf = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.pbf"

output_gpkg = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.gpkg"
#output_gpkg = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\Wildau_attractions_new_meeeerrr.parquet"

osmconf_path=r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\osmconf.ini"

ogr2ogr_to_geopackage(input_pbf, output_gpkg, osmconf_path)


Running: C:\Program Files\QGIS 3.40.5\bin\ogr2ogr.exe -overwrite --config OSM_CONFIG_FILE C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\osmconf.ini -f GPKG C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.gpkg C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.pbf
GeoPackage created successfully:
0...10...20...30...40...50...60...70...80...90...100 - done.



now its time to convert gpkg into geoparquet

In [41]:
import fiona

def list_gpkg_layers(gpkg_file):
    # List all layers in the GeoPackage
    layers = fiona.listlayers(gpkg_file)
    print("Layers in GeoPackage:")
    for layer in layers:
        print(layer)

# Example usage
gpkg_file = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.gpkg"
list_gpkg_layers(gpkg_file)

Layers in GeoPackage:
points
lines
multilinestrings
multipolygons
other_relations


In [None]:
import geopandas as gpd
import fiona
import pandas as pd

def save_all_layers_to_parquet(gpkg_file, output_parquet):
    # List all layers in the GeoPackage
    layers = fiona.listlayers(gpkg_file)
    
    # List to store individual GeoDataFrames
    all_layers = []

    # Loop through all layers and read each one
    for layer_name in layers:
        print(f"Reading layer: {layer_name}")
        
        # Read the layer into a GeoDataFrame
        gdf = gpd.read_file(gpkg_file, layer=layer_name)
        
        # Add the layer name as a column to identify the layer
        gdf['layer'] = layer_name
        
        # Append the GeoDataFrame to the list
        all_layers.append(gdf)

    # Concatenate all GeoDataFrames into a single one
    combined_gdf = pd.concat(all_layers, ignore_index=True)

    # Convert to GeoDataFrame (necessary to keep the geometry column)
    combined_gdf = gpd.GeoDataFrame(combined_gdf, geometry='geometry')

    # Save to Parquet
    combined_gdf.to_parquet(output_parquet)
    print(f"GeoDataFrame saved as Parquet: {output_parquet}")
    combined_gdf

# Example usage
gpkg_file = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.gpkg"
output_parquet = r"C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm-25-03-14.parquet"

save_all_layers_to_parquet(gpkg_file, output_parquet)


Reading layer: points
Reading layer: lines
Reading layer: multilinestrings
Reading layer: multipolygons
Reading layer: other_relations
GeoDataFrame saved as Parquet: C:\Users\simon\Nextcloud3\Analysen\erreichbarad\attractions_w_osmium\osm_files\full_germany_osm.parquet
