# 📝 Notebook-Dokumentation

**Notebook:** `22_isochronen__`  
**Beschreibung:**  
Für jede Kombination aus Postleitzahl (PLZ) und Verkehrsmodus (`bike`, `my_bike_cycleways`, `cargo_bike`) wird aus den vorhandenen Isochronen (`.parquet`) eine sogenannte **Isodonut-Datei** erzeugt.  
**Isodonuts** sind gestanzte Isochronen (d.h. ringförmige Zonen zwischen zwei Zeitbuckets), die eine intuitivere Darstellung und eine bessere Grundlage für Scoring-Berechnungen bieten.

---

### 📥 Input

- Isochronen-Dateien pro PLZ und Modus  
  z. B. `/isochronen/{scenario_name}/86830_25-05-09_isochrones_6x5min__bike.parquet`

---

### 🔧 Verarbeitungsschritte

- Iteration über alle gültigen Postleitzahlen (`valid_plz_difference`) und definierten Modi (`bike`, `my_bike_cycleways`, `cargo_bike`)
- Für jede Kombination:
  - Laden der zugehörigen Isochronen-Datei (`*.parquet`)
  - Sortierung der Isochronen nach `id` (Rasterzelle) und `time_bucket` absteigend
  - Erzeugung der **Isodonuts** durch:
    - Differenzbildung benachbarter Isochronen-Geometrien (z. B. 10min-Zone minus 5min-Zone)
    - Entfernen leerer Geometrien nach der Differenzbildung
  - Vereinfachung (Topologie-basiert) jeder einzelnen Gruppe via `TopoJSON`:
    - Anwendung von `toposimplify()` auf gruppierte Geometrien (je Rasterzelle)
    - Ergebnis ist eine saubere, speicherschonende Geometrie, ideal für Visualisierung
  - Speicherung der erzeugten Isodonut-Geometrien als `.parquet`-Datei
- Protokollierung aller Schritte und Fehlerbehandlung über eigenes Logfile pro Tag
- 
- note: vielleicht kann man irgendwann das hier nehmen umd den przess zu vereinfachen: https://shapely.readthedocs.io/en/stable/release/2.x.html#coverage-validation-and-simplification
---

### 📤 Output

- Isodonut-Dateien pro PLZ und Modus  
  z. B. `isochronen/{scenario_name}/isodon/{plz}_{date_today}_isoDonuts_6x5min__{m}_simp0002.parquett`
---



In [4]:
# input: isochornen
# output: isochornen als donuts

In [16]:
import geopandas as gpd
import pandas as pd

from datetime import datetime
from tqdm import tqdm

from shapely.geometry import Polygon
import topojson
import warnings

import os

import glob
import re
from collections import Counter




In [18]:
scenario_name="test_plz_88636"

# Ensure isochronen_dir directory exists
isochronen_dir  = f"isochronen/{scenario_name}/"

### for these PLZ isochrones exists

In [19]:


# Get all files in the "data/isos" folder
#files = glob.glob("data/isos/*")
#files = glob.glob("../../storage/isos_ger/*")
files = glob.glob(f"{isochronen_dir}*")
#output_path = f"../../storage/isos_ger/{p}_{current_date}_isochrones_{buckets}x{time_limit_bucket}min__{m}.parquet"

# Extract PLZ (assuming it's the first numeric part of the filename)
plz_list = []
for file in files:
    match = re.search(r"(\d{5})", file)  # Looks for a 5-digit number
    if match:
        plz_list.append(match.group(1))

# Count occurrences of each PLZ
plz_counts = Counter(plz_list)

# Filter PLZs that appear at least 3 times
valid_plz_isochrone = [plz for plz, count in plz_counts.items() if count >= 3]

# Find PLZs that did not appear at least 3 times
invalid_plz = [plz for plz, count in plz_counts.items() if count < 3]

# Print results
print("PLZs that appeared at least 3 times:", valid_plz_isochrone)
print("PLZs that appeared at least 3 times, number:", len(valid_plz_isochrone))
print("PLZs that did NOT appear 3 times:", invalid_plz)

PLZs that appeared at least 3 times: ['88636']
PLZs that appeared at least 3 times, number: 1
PLZs that did NOT appear 3 times: []


### for these PLZ isodons exists

In [32]:
import glob
import re
from collections import Counter


output_dir_isodon  = f"isochronen/{scenario_name}/isodon"
# Ensure output directory exists
os.makedirs(output_dir_isodon, exist_ok=True)

# Get all files in the "data/isos" folder
#files = glob.glob("data/isos/isodon/*")
files = glob.glob(f"{output_dir_isodon}/*")

# Extract PLZ (assuming it's the first numeric part of the filename)
plz_list = []
for file in files:
    match = re.search(r"(\d{5})", file)  # Looks for a 5-digit number
    if match:
        plz_list.append(match.group(1))

# Count occurrences of each PLZ
plz_counts = Counter(plz_list)

# Filter PLZs that appear at least 3 times
valid_plz_isodon = [plz for plz, count in plz_counts.items() if count >= 3]

# Find PLZs that did not appear at least 3 times
invalid_plz = [plz for plz, count in plz_counts.items() if count < 3]

# Print results
print("PLZs that appeared at least 3 times:", valid_plz_isodon)
print("PLZs that appeared at least 3 times, number:", len(valid_plz_isodon))
print("PLZs that did NOT appear 3 times:", invalid_plz)

PLZs that appeared at least 3 times: []
PLZs that appeared at least 3 times, number: 0
PLZs that did NOT appear 3 times: []


In [33]:
# Find elements in B but not in A
valid_plz_difference = list(set(valid_plz_isochrone) - set(valid_plz_isodon))

In [34]:
len(valid_plz_difference)

1

In [35]:






# Transport modes
modes = ["bike", "my_bike_cycleways", "cargo_bike"]
#plz = "53902"

# Open the log file in append mode
log_file_path = f"{output_dir_isodon}/processing_isodons_log_{datetime.now().strftime('%Y-%m-%d')}.txt"

def log(message):
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    full_message = f"[{timestamp}] {message}"
    print(full_message)  # still print to console
    with open(log_file_path, "a") as f:
        f.write(full_message + "\n")


# Iterate over valid PLZs with monitoring
for i, plz in enumerate(valid_plz_difference, start=1):
    log(f"Processing PLZ {i}/{len(valid_plz_difference)}: {plz}")
    for m in modes:
    
        log(f" Processing Mode: {m}")
    
        # Load Isochrone Data
        #file_path = f"data/isos/{plz}_25-03-14_isochrones_6x5min__{m}.parquet"
        #file_pattern = f"data/isos/{plz}_*_isochrones_6x5min__{m}.parquet"
        #file_pattern = f"data/isos/failed_plz_iso/{plz}_*_isochrones_6x5min__{m}.parquet"
        file_pattern = f"{isochronen_dir}/{plz}_*_isochrones_6x5min__{m}.parquet"

        
        matching_files = glob.glob(file_pattern)
        file_path = matching_files[0]  # Take the first match, shoud be unique
        
        isos_all = gpd.read_parquet(file_path)
    
        #isos_all = isos_all[:30].copy()  # Keep only a sample for testing
    
        log(f"    Loaded {len(isos_all)} records.")
    
        # Start subtraction process
        #log(" Subtraction Start:", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        log(f" Subtraction Start: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
        # Sort values & reset index
        subtraction_gdf_all = isos_all.sort_values(['id', 'time_bucket'], ascending=False).reset_index().copy()
        all_raster_ids = subtraction_gdf_all['id'].unique()
    
        log(f"    Unique Raster IDs: {len(all_raster_ids)}")
    
        buckets_subs = subtraction_gdf_all['bucket'].max()
    
        # OLD LOOP-BASED SUBTRACTION (slow)
        for r in tqdm(all_raster_ids, desc="Subtracting Geometries"):
            for index, row in subtraction_gdf_all[subtraction_gdf_all['id'] == r][:buckets_subs].iterrows():
                if index + 1 >= len(subtraction_gdf_all):
                    continue  # Prevent out-of-bounds index error
    
                geom1 = subtraction_gdf_all.loc[index, 'geometry']
                geom2 = subtraction_gdf_all.loc[index + 1, 'geometry']
    
                subtraction_gdf_all.at[index, 'geometry'] = geom1.difference(geom2)
    
        # Remove empty geometries
        subtraction_gdf_all = subtraction_gdf_all[~subtraction_gdf_all['geometry'].is_empty].copy()
    
        #log(" Subtraction Done:", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        log(f" Subtraction Done: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
        # TOPO Cleaner/Simplifier
        log(" Starting TopoJSON Simplification...")
    
        warnings.filterwarnings("ignore", category=RuntimeWarning)
    
        # Store simplified results
        subtraction_gdf_all_tj___all = gpd.GeoDataFrame()
    
        for r in tqdm(all_raster_ids, desc="Simplifying Polygons"):
            subset = subtraction_gdf_all[subtraction_gdf_all['id'] == r]
    
            if subset.empty:
                continue  # Skip empty groups
    
            topo = topojson.Topology(subset, prequantize=False, topology=True)
            #topo_simplified = topo.toposimplify(0.0002).to_gdf()
            #topo = topo.topoquantize(75)
            topo_simplified = topo.toposimplify(0.0002).to_gdf()
            topo_simplified.crs = 'epsg:4326'
            
    
            # Append results
            subtraction_gdf_all_tj___all = pd.concat([subtraction_gdf_all_tj___all, topo_simplified])
    
        #log(" TopoJSON Simplification Done:", datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        log(f" TopoJSON Simplification Done: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
        # Save as Parquet (Updated Naming)
        #output_path = f"data/isos/isodon/{plz}_25-03-15_isoDonuts_6x5min__{m}_simp0002.parquet"
        date_today= datetime.now().strftime('%Y-%m-%d')
        output_path = f"{output_dir_isodon}/{plz}_{date_today}_isoDonuts_6x5min__{m}_simp0002.parquet"


        subtraction_gdf_all_tj___all.to_parquet(output_path)
    
        log(f" Saved to: {output_path}\n")


[2025-05-21 12:59:26] Processing PLZ 1/1: 88636
[2025-05-21 12:59:26]  Processing Mode: bike
[2025-05-21 12:59:27]     Loaded 996 records.
[2025-05-21 12:59:27]  Subtraction Start: 2025-05-21 12:59:27
[2025-05-21 12:59:27]     Unique Raster IDs: 166


Subtracting Geometries: 100%|██████████| 166/166 [00:01<00:00, 159.96it/s]


[2025-05-21 12:59:28]  Subtraction Done: 2025-05-21 12:59:28
[2025-05-21 12:59:28]  Starting TopoJSON Simplification...


Simplifying Polygons: 100%|██████████| 166/166 [00:23<00:00,  6.97it/s]


[2025-05-21 12:59:51]  TopoJSON Simplification Done: 2025-05-21 12:59:51
[2025-05-21 12:59:52]  Saved to: isochronen/test_plz_88636/isodon/88636_2025-05-21_isoDonuts_6x5min__bike_simp0002.parquet

[2025-05-21 12:59:52]  Processing Mode: my_bike_cycleways
[2025-05-21 12:59:52]     Loaded 996 records.
[2025-05-21 12:59:52]  Subtraction Start: 2025-05-21 12:59:52
[2025-05-21 12:59:52]     Unique Raster IDs: 166


Subtracting Geometries: 100%|██████████| 166/166 [00:00<00:00, 174.04it/s]


[2025-05-21 12:59:53]  Subtraction Done: 2025-05-21 12:59:53
[2025-05-21 12:59:53]  Starting TopoJSON Simplification...


Simplifying Polygons: 100%|██████████| 166/166 [00:23<00:00,  7.14it/s]


[2025-05-21 13:00:16]  TopoJSON Simplification Done: 2025-05-21 13:00:16
[2025-05-21 13:00:16]  Saved to: isochronen/test_plz_88636/isodon/88636_2025-05-21_isoDonuts_6x5min__my_bike_cycleways_simp0002.parquet

[2025-05-21 13:00:16]  Processing Mode: cargo_bike
[2025-05-21 13:00:16]     Loaded 996 records.
[2025-05-21 13:00:16]  Subtraction Start: 2025-05-21 13:00:16
[2025-05-21 13:00:16]     Unique Raster IDs: 166


Subtracting Geometries: 100%|██████████| 166/166 [00:01<00:00, 162.07it/s]


[2025-05-21 13:00:17]  Subtraction Done: 2025-05-21 13:00:17
[2025-05-21 13:00:17]  Starting TopoJSON Simplification...


Simplifying Polygons: 100%|██████████| 166/166 [00:18<00:00,  9.17it/s]


[2025-05-21 13:00:35]  TopoJSON Simplification Done: 2025-05-21 13:00:35
[2025-05-21 13:00:35]  Saved to: isochronen/test_plz_88636/isodon/88636_2025-05-21_isoDonuts_6x5min__cargo_bike_simp0002.parquet

