# 📝 Notebook-Dokumentation

**Notebook:** `32_scoring__2b_add_scoring_to_rasters_v04.ipynb`  
**Beschreibung:**  
Verknüpft berechnete Zugänglichkeits-Scores (aus CSV-Dateien) mit den entsprechenden Raster-Geometrien (100 m) und ergänzt diese um demografische Merkmale aus den Zensus-Daten.  
Das Ergebnis ist ein vollständiger, räumlich referenzierter Datensatz für weiterführende Analysen oder Visualisierungen.

---

### 📥 Input

- Zugänglichkeits-Scores (pro Rasterzelle und Modus):  
  `output/test_plz_XXXXX_output_raster/{plz}_acc2raster_.csv`

- Geometrien der 100 m-Rasterzellen:  
  z. B. `data/base_data/Zensus2022_100m_poly_GER_wPLZ_wRS.parquet` (enthält Spalten wie `id`, `geometry`, `plz`)

- Zensusdaten (z. B. Einwohnerzahl, Altersstruktur, Haushaltstypen):  
  als Teil der Raster-Geometrien oder separat via `id` verknüpfbar

---

### 🔧 Verarbeitungsschritte

1. **Einlesen der CSV-Scoring-Daten**
   - Jeweils pro PLZ eines Szenarios
   - Enthält `id`, `mode`, `coeff` (Zugänglichkeitswert), ggf. weitere Infos

2. **Join mit den Raster-Geometrien**
   - Über die Spalte `id` (Gitterzellen-ID)
   - Ergebnis ist ein GeoDataFrame mit Geometrie + Scoring-Werten

3. **Integration von Zensus-Merkmalen**
   - Ergänzung soziodemografischer Merkmale (z. B. Bevölkerung, Dichte)
   - Dient der Analyse von Versorgungslage in Bezug auf Bevölkerung

4. **Optional: Pivotieren der Modi**
   - Umwandlung in Wide-Format mit je einer Spalte pro Modus (z. B. `coeff_bike`, `coeff_cargo_bike`, ...)

---

### 📤 Output

- Geojoined Accessibility-Raster mit Geometrie, Modus-Scores und Zensusdaten:  
  z. B. `{output_folder_csv_raster}/{scenario_name}_coeff_rasters.parquet`  
  oder `.parquet`, je nach Formatwahl

---

### 🧾 Hinweise

- Das Ergebnis kann direkt für Karten, Dashboards oder Rankings genutzt werden.
- Bei Bedarf lassen sich zusätzlich Gewichtungen oder Scoring-Schwellen einbauen.
- Für überregionale Analysen kann ein Gesamtdatensatz über alle PLZs hinweg zusammengeführt werden.

---

In [2]:
import json
import pandas as pd
import geopandas as gpd
import os

import glob
import re
from collections import Counter

### get number of proccesed PLZs

In [3]:
#output_folder_csv = "data/germany/"
#output_folder_csv = "../../../Simon/erreichbarad_isos/data/germany/"
#output_folder_csv = "/home/jupyter-sime8802/Simon/erreichbarad_isos/data/germany/"

scenario_name="test_plz_88636"

output_folder_csv = f"output/{scenario_name}/"



# Get all files in the "data/isos" folder
#files = glob.glob("data/germany/*")
files = glob.glob(output_folder_csv + "*.csv")
#files = glob.glob(output_folder_csv)


# Extract PLZ (assuming it's the first numeric part of the filename)
plz_list = []
for file in files:
    match = re.search(r"(\d{5})", file)  # Looks for a 5-digit number
    if match:
        plz_list.append(match.group(1))

# Count occurrences of each PLZ
plz_counts = Counter(plz_list)


# Print results
#print("PLZs that are ready produeced scoring raster:", plz_list)
print("PLZs that are ready produeced scoring raster, number:", len(plz_list))

PLZs that are ready produeced scoring raster, number: 1


### get the coeffs per raster

In [4]:

#folder_path_="data/germany"



def read_and_concat_csv(folder_path):
    """
    Reads all CSV files in the specified folder and concatenates them into a single DataFrame.

    Args:
        folder_path (str): The path to the folder containing CSV files.

    Returns:
        pd.DataFrame: A concatenated DataFrame of all CSV files in the folder.
    """
    all_files = [f for f in os.listdir(folder_path) if f.endswith(".csv")]
    dataframes = []

    for file in all_files:
        file_path = os.path.join(folder_path, file)
        df = pd.read_csv(file_path)  # Read CSV
        dataframes.append(df)

    # Concatenate all DataFrames
    big_dataframe = pd.concat(dataframes, ignore_index=True)
    
    return big_dataframe

# Example usage
#folder_path = output_folder_csv
#folder_path = "data/rbz_koeln/alt_falsch"
coeff2raster_all = read_and_concat_csv(output_folder_csv)

coeff2raster_all

Unnamed: 0.1,Unnamed: 0,id,attr,coeff,mode
0,0,100mN27454E42759,attr_pois,36.70,bike
1,1,100mN27454E42759,pt_stops,8.80,bike
2,2,100mN27457E42760,attr_pois,38.10,bike
3,3,100mN27457E42760,pt_stops,8.80,bike
4,4,100mN27457E42761,attr_pois,39.75,bike
...,...,...,...,...,...
991,327,100mN27536E42726,pt_stops,7.10,my_bike_cycleways
992,328,100mN27536E42727,attr_pois,30.25,my_bike_cycleways
993,329,100mN27536E42727,pt_stops,7.10,my_bike_cycleways
994,330,100mN27537E42727,attr_pois,29.80,my_bike_cycleways


In [5]:
# put attr=attr_pois and attr=pt_stops together

coeff2raster_all = coeff2raster_all.drop(columns=["Unnamed: 0"])

# dont need the differentiation for pt
coeff2raster_all_grouped = coeff2raster_all.groupby(["id", "mode"], as_index=False)["coeff"].sum()
coeff2raster_all_grouped

Unnamed: 0,id,mode,coeff
0,100mN27454E42759,bike,45.50
1,100mN27454E42759,cargo_bike,70.40
2,100mN27454E42759,my_bike_cycleways,37.20
3,100mN27457E42760,bike,46.90
4,100mN27457E42760,cargo_bike,69.70
...,...,...,...
493,100mN27536E42727,cargo_bike,62.55
494,100mN27536E42727,my_bike_cycleways,37.35
495,100mN27537E42727,bike,37.60
496,100mN27537E42727,cargo_bike,59.55


In [6]:
coeff2raster_all_wide = coeff2raster_all_grouped.pivot(index="id", columns="mode", values="coeff")

# Optionally, rename the columns to include "coeff_" prefix
coeff2raster_all_wide = coeff2raster_all_wide.add_prefix("coeff_").reset_index()

In [7]:
coeff2raster_all_wide

mode,id,coeff_bike,coeff_cargo_bike,coeff_my_bike_cycleways
0,100mN27454E42759,45.50,70.40,37.20
1,100mN27457E42760,46.90,69.70,36.20
2,100mN27457E42761,48.55,69.60,36.45
3,100mN27458E42756,41.35,69.10,31.50
4,100mN27458E42761,48.00,69.40,36.80
...,...,...,...,...
161,100mN27535E42726,42.50,62.80,38.35
162,100mN27536E42723,40.40,63.70,38.15
163,100mN27536E42726,38.65,62.20,37.85
164,100mN27536E42727,39.30,62.55,37.35


In [8]:
# Absolute differences
coeff2raster_all_wide["diff_cargo_bike_abs"] = coeff2raster_all_wide["coeff_cargo_bike"] - coeff2raster_all_wide["coeff_bike"]
coeff2raster_all_wide["diff_my_bike_cycleways_abs"] = coeff2raster_all_wide["coeff_my_bike_cycleways"] - coeff2raster_all_wide["coeff_bike"]

# Relative differences (as ratio to coeff_bike)
coeff2raster_all_wide["diff_cargo_bike_rel"] = coeff2raster_all_wide["diff_cargo_bike_abs"] / coeff2raster_all_wide["coeff_bike"]
coeff2raster_all_wide["diff_my_bike_cycleways_rel"] = coeff2raster_all_wide["diff_my_bike_cycleways_abs"] / coeff2raster_all_wide["coeff_bike"]

In [9]:
coeff2raster_all_wide

mode,id,coeff_bike,coeff_cargo_bike,coeff_my_bike_cycleways,diff_cargo_bike_abs,diff_my_bike_cycleways_abs,diff_cargo_bike_rel,diff_my_bike_cycleways_rel
0,100mN27454E42759,45.50,70.40,37.20,24.90,-8.30,0.547253,-0.182418
1,100mN27457E42760,46.90,69.70,36.20,22.80,-10.70,0.486141,-0.228145
2,100mN27457E42761,48.55,69.60,36.45,21.05,-12.10,0.433574,-0.249228
3,100mN27458E42756,41.35,69.10,31.50,27.75,-9.85,0.671100,-0.238210
4,100mN27458E42761,48.00,69.40,36.80,21.40,-11.20,0.445833,-0.233333
...,...,...,...,...,...,...,...,...
161,100mN27535E42726,42.50,62.80,38.35,20.30,-4.15,0.477647,-0.097647
162,100mN27536E42723,40.40,63.70,38.15,23.30,-2.25,0.576733,-0.055693
163,100mN27536E42726,38.65,62.20,37.85,23.55,-0.80,0.609314,-0.020699
164,100mN27536E42727,39.30,62.55,37.35,23.25,-1.95,0.591603,-0.049618


### get the grid

In [10]:

#grid_100m_nrw= gpd.read_parquet("data/base_data/Zensus2022_100m_cent_NRW_wPLZ.parquet")
#grid_100m_nrw= gpd.read_file("data/base_data/Zensus2022_grid_final_8736829269455000_100m.gpkg")

#grid_100m_nrw= gpd.read_parquet("data/base_data/Zensus2022_100m_poly_NRW_wPLZ.parquet")

grid_100m= gpd.read_parquet("data/base_data/Zensus2022_100m_poly_GER_wPLZ_wRS.parquet")


grid_100m=grid_100m.to_crs(4326)

### merge grid with coeff2raster

In [11]:
#grouped_df_bike=coeff2raster_all_grouped[coeff2raster_all_grouped["mode"]=="bike"]
coeff2raster_all_wide_grid=coeff2raster_all_wide.merge(grid_100m, on="id")

coeff2raster_all_wide_grid_gdf = gpd.GeoDataFrame(
    coeff2raster_all_wide_grid, geometry="geometry", crs="EPSG:4326"
)

In [22]:

# Create new folder name by stripping trailing slash and appending suffix
output_folder_csv_raster = output_folder_csv.rstrip("/") + "_output_raster/"

# Ensure directory exists
os.makedirs(output_folder_csv_raster, exist_ok=True)

In [25]:
coeff2raster_all_wide_grid_gdf.to_parquet(f"{output_folder_csv_raster}/{scenario_name}_coeff_rasters.parquet")


_____