# ged_building_layout — Quick Start

This notebook shows how to run **Step0–Step5** either **individually** or in **custom combinations**.

## Expected input folder layout

```
DATA_ROOT/
├─ json/          # Annotated floor plans in a JSON format
├─ jpg/           # Floor plan images with same filename stem as JSON
└─ behavior_csv/  # Users' requirements matrices (correlations between different functions)
```
We provide a small example dataset (5 samples) to demonstrate the full pipeline execution.
The complete dataset (150 images) is not publicly released due to data usage agreements.

## Recommended output layout

```
OUTPUT_ROOT/
├─ step0_checks/      # Invalid polygons, corridor labels, functional labels, isolated nodes
├─ step1_behavior/    # Behavior graphs
├─ step2_basegraphs/  # Basic graphs reflecting circulation structure
├─ step3_transform/   # Connectivity-aware graphs reflection functional proximity
│  ├─ variants/       # All CaGs generated based on different distance thresholds
│  └─ selected/       # Optimal CaGs selected automatically by threshold
├─ step4_prototype/   # Layout prototypes
└─ step5_faged/       # ToGED, nGED, FaGED values and retrieval rankings
```


In [None]:
pip install shapely networkx pandas numpy tqdm matplotlib infomap opencv-python


Collecting shapely
  Downloading shapely-2.1.2-cp310-cp310-win_amd64.whl (1.7 MB)
     ---------------------------------------- 0.0/1.7 MB ? eta -:--:--
     ----- ---------------------------------- 0.2/1.7 MB 4.6 MB/s eta 0:00:01
     ------------- -------------------------- 0.6/1.7 MB 7.4 MB/s eta 0:00:01
     --------------------- ------------------ 0.9/1.7 MB 7.5 MB/s eta 0:00:01
     ------------------------------- -------- 1.4/1.7 MB 7.8 MB/s eta 0:00:01
     ---------------------------------------  1.7/1.7 MB 7.8 MB/s eta 0:00:01
     ---------------------------------------- 1.7/1.7 MB 7.3 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
     ---------------------------------------- 0.0/1.7 MB ? eta -:--:--
     -------------- ------------------------- 0.6/1.7 MB 19.5 MB/s eta 0:00:01
     ------------------------------ --------- 1.3/1.7 MB 16.5 MB/s eta 0:00:01
     ---------------------------------------- 1.7/1.7 MB 15.6 MB/s eta 0:00


[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
pip install scipy

Collecting scipy
  Downloading scipy-1.15.3-cp310-cp310-win_amd64.whl (41.3 MB)
     ---------------------------------------- 0.0/41.3 MB ? eta -:--:--
     ---------------------------------------- 0.4/41.3 MB 6.1 MB/s eta 0:00:07
      --------------------------------------- 1.0/41.3 MB 10.5 MB/s eta 0:00:04
     - -------------------------------------- 1.9/41.3 MB 13.5 MB/s eta 0:00:03
     -- ------------------------------------- 2.7/41.3 MB 14.6 MB/s eta 0:00:03
     --- ------------------------------------ 3.6/41.3 MB 15.5 MB/s eta 0:00:03
     ---- ----------------------------------- 4.4/41.3 MB 15.6 MB/s eta 0:00:03
     ----- ---------------------------------- 5.3/41.3 MB 16.1 MB/s eta 0:00:03
     ----- ---------------------------------- 6.1/41.3 MB 16.3 MB/s eta 0:00:03
     ------ --------------------------------- 7.0/41.3 MB 16.5 MB/s eta 0:00:03
     ------- -------------------------------- 7.9/41.3 MB 16.8 MB/s eta 0:00:02
     -------- ------------------------------- 8.8


[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
import sys, os
sys.path.append(os.path.abspath(".."))  
from ged_building_layout import run_step0, run_step1, run_step2, run_step3, run_step4_then_step5


In [2]:
# If you're running from the repo root, install in editable mode:
# !pip install -e .

from pathlib import Path

# ---- set your paths here ----
PROJECT_ROOT = Path(r"D:\6-FaGED\submission\1stRevision\faged")
DATA_ROOT = PROJECT_ROOT / "DATA_ROOT"          # <-- change to your dataset root
OUTPUT_ROOT   = PROJECT_ROOT / "OUTPUT_ROOT"    # <-- change to your desired output root

JSON_DIR = DATA_ROOT / "json"
JPG_DIR  = DATA_ROOT / "jpg"
BEHAVIOR_CSV_DIR = DATA_ROOT / "behavior_csv"   # optional if you are just conduction graph representation, prototype extraction for layouts

OUTPUT_ROOT.mkdir(parents=True, exist_ok=True)
print("DATA_ROOT:", DATA_ROOT.resolve())
print("OUTPUT_ROOT:", OUTPUT_ROOT.resolve())

DATA_ROOT: D:\6-FaGED\submission\1stRevision\faged\DATA_ROOT
OUTPUT_ROOT: D:\6-FaGED\submission\1stRevision\faged\OUTPUT_ROOT


## Step0 — Annotation checks (manual QA)

Step0 is designed to generate **visual inspection outputs** (PNGs) so you can quickly spot
annotation issues and fix your code/labels.

Available checks:
- `"invalid"`: invalid polygons + missing `group_id` (to check if there are polygons which are invalid or without groupID)
- `"corridor"`: corridor polygons (label=12) with group IDs (to check if the groupID of corridor segments are correct)
- `"labels"`: function label color overlay (to check if functional labels are correct)
- `"connectivity"`: connectivity visualization + isolated nodes report (to check if there are wrong connections)


In [3]:

from ged_building_layout.step0_checks import run_step0_checks

run_step0_checks(
    json_folder=str(JSON_DIR),
    jpg_folder=str(JPG_DIR),
    out_root=str(OUTPUT_ROOT / "step0_checks"),
    # choose any subset, e.g. ("invalid",) or ("labels", "connectivity")
    checks=("invalid", "corridor", "labels", "connectivity"),
)


{'invalid': 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step0_checks\\invalid_polygons',
 'corridor': 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step0_checks\\corridors',
 'labels': 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step0_checks\\function_labels',
 'connectivity': 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step0_checks\\connectivity'}

## Step1 — Build behavioral graphs

# If you're just conduction layout analysis, you can skip this step.
In default mode, this step takes the co-occurrence frequency matrix between different behaviors as input. Therefore, it requires the `people_counts` provided by you,which represents the total number of individuals who provided behavioral data. The method supports multiple user groups, labeled as “A”, “B”, “C”, “D”, and so on.

The input matrix can also be replaced by a matrix representing functional relationship strengths obtained through other methods, as long as it is in matrix form. In this case, `people_counts` can be directly set to 1.

- Input: behavior matrices CSVs (e.g., `A.csv`)
- Output: pickled graphs (`.pkl`) + optional visualizations


In [3]:
from ged_building_layout.step1_behavior import run_step1_build_behavior_graphs

# Example: you must fill this dict based on your experiment
people_counts = {
    "A": 339,
    "B": 70,
}
node_categories = {
    'a': "3", 'b': "6", 'c': "2", 'd': "9",
    'e': "9", 'f': "9", 'g': "5", 'h': "8", 'i': "3"
}

run_step1_build_behavior_graphs(
    csv_dir=str(BEHAVIOR_CSV_DIR),
    output_dir=str(OUTPUT_ROOT / "step1_behavior"),
    people_counts=people_counts,
    node_categories=node_categories
)

Unnamed: 0,file,group,people_num,n_nodes,n_edges,saved_pkl,saved_png,reason
0,A.csv,A,339.0,5,6,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,ok
1,B.csv,B,70.0,5,9,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,ok


## Step2 — Build base graphs (BG)

Build function- and area-aware base graphs from JSON files. The JSON files are annotated by LabelMe (Wada,2021) or X-Any-Labelling.

- Input: `DATA_ROOT/json/*.json`
- Output: `OUTPUT_ROOT/step2_basegraphs/*.pkl` (+ optional `.png`)


In [3]:
from ged_building_layout import run_step2

run_step2(
    json_folder=str(JSON_DIR),
    output_folder=str(OUTPUT_ROOT / "step2_basegraphs"),
    save_png=True,
)

  tree = STRtree(region_polys)


Unnamed: 0,file,stem,n_nodes,n_edges,pkl_path,saved,reason
0,17_918099.json,17_918099,40,46,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,True,ok
1,20_306098.json,20_306098,52,62,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,True,ok
2,31_644920.json,31_644920,66,67,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,True,ok
3,33_459135.json,33_459135,66,72,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,True,ok
4,71_523598.json,71_523598,66,66,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,True,ok


## Step3 — Transform to CaG + auto-select best variant

Generates several graph variants (relative(20%,25%,30% of the max distance in the current plan)/absolute(20m,25m,30m) thresholds), then selects the best
variant per file using the default heuristic (avg degree target 6–8).

- Input: `OUTPUT_ROOT/step2_basegraphs/*.pkl`
- Output:
  - variants: `OUTPUT_ROOT/step3_transform/variants/<variant_name>/*.pkl`
  - selected: `OUTPUT_ROOT/step3_transform/selected/*.pkl`
  - selection log: `step3_selected_variant.csv`


In [5]:
from ged_building_layout import run_step3, Step3Config

cfg3 = Step3Config(
    # avg_degree means the average degrees of nodes in the genated CaG. 6-8 are set as the optimal defaults according to experiments in university libraries; customize if you want
    # avg_degree_min=6.0,
    # avg_degree_max=8.0,
)

df_selected = run_step3(
    basegraph_folder=str(OUTPUT_ROOT / "step2_basegraphs"),
    json_folder=str(JSON_DIR),
    output_root=str(OUTPUT_ROOT / "step3_transform" / "variants"),
    selected_output_folder=str(OUTPUT_ROOT / "step3_transform" / "selected"),
    cfg=cfg3,
    save_png=True,
    )
df_selected.head()

Step3 transform:   0%|          | 0/5 [00:00<?, ?it/s]

Step3 transform: 100%|██████████| 5/5 [00:09<00:00,  1.84s/it]


Unnamed: 0,file,best_variant,best_kind,best_value,best_n_nodes,best_n_edges,best_avg_degree,edge_min,edge_max,in_edge_range,edge_dist_to_range
0,17_918099.pkl,absolute20,absolute,20.0,39,152,7.794872,120.0,160.0,True,0.0
1,20_306098.pkl,relative25,relative,0.25,44,157,7.136364,156.0,208.0,True,0.0
2,31_644920.pkl,relative20,relative,0.2,65,227,6.984615,198.0,264.0,True,0.0
3,33_459135.pkl,relative20,relative,0.2,63,221,7.015873,198.0,264.0,True,0.0
4,71_523598.pkl,relative20,relative,0.2,63,249,7.904762,198.0,264.0,True,0.0


## Step4 — Extract layout prototypes 
Runs commnunity detection (default as Infomap (Edler, Holmgren, & Rosvall, 2025)) at one or more Markov times and saves simplified prototype graphs.

The function of each prototype node is determined based on the nodes within the corresponding originally detected community, using either the dominant function by count or the function associated with the largest area.

- Input: selected CaG graphs from Step3
- Output: per-parameter folders under `OUTPUT_ROOT/step4_prototype/graphs/markov_*`


In [4]:
from ged_building_layout.step4_prototype import run_step4_infomap, InfomapConfig

cfg4 = InfomapConfig(
    input_folder=str(OUTPUT_ROOT / "step3_transform" / "selected"),
    graph_output_root=str(OUTPUT_ROOT / "step4_prototype" / "graphs"),
    community_img_output_root=str(OUTPUT_ROOT / "step4_prototype" / "communities"),
    markov_times=(0.7, 0.75, 0.8),
    main_function_mode="count",  # or "max_area"
)

run_step4_infomap(cfg4)


Step4 Prototype extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Step4 Prototype extraction: 100%|██████████| 5/5 [00:11<00:00,  2.39s/it]


['D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_7\\17_918099.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_75\\17_918099.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_8\\17_918099.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_7\\20_306098.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_75\\20_306098.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_8\\20_306098.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_7\\31_644920.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0_75\\31_644920.pkl',
 'D:\\6-FaGED\\submission\\1stRevision\\faged\\OUTPUT_ROOT\\step4_prototype\\graphs\\markov_0

## Step5 — Compute ToGED / nGED / FaGED 

Compares *target graphs* (e.g., Step1 behavior graphs) to *reference prototype graphs* (Step4).

- Target: `OUTPUT_ROOT/step1_behavior/*.pkl`
- Reference: `OUTPUT_ROOT/step4_prototype/graphs/markov_*/`
- Output: `OUTPUT_ROOT/step5_faged/markov_*/...`


In [4]:
from ged_building_layout.step5_faged import Step5BatchConfig, run_step5_batch_from_markov_folders, merge_step5_csvs_to_long_table

cfg5 = Step5BatchConfig(
    step4_graph_output_root=str(OUTPUT_ROOT / "step4_prototype" / "graphs"),
    target_folder=str(OUTPUT_ROOT / "step1_behavior"),
    step5_output_root=str(OUTPUT_ROOT / "step5_faged"),
    markov_folders=["markov_0_7"],  # markov_folders=["markov_0_7"],   # or ["markov_0_7", "markov_0_75"]
    do_ged=True,#False if you do not need graph edit distance
    do_nged=True,#False if you do not need normalized graph edit distance
    do_faged=True,
    timeout=30,
)

summary_df = run_step5_batch_from_markov_folders(cfg5)
summary_df.head()


                                                                     

Unnamed: 0,markov_folder,target_folder,reference_folder,output_dir,num_target_graphs,num_reference_graphs,ran_ok
0,markov_0_7,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,2,5,True


In [5]:
long_df = merge_step5_csvs_to_long_table(
    csv_root=str(OUTPUT_ROOT / "step5_faged" / "markov_0_7"),
    output_csv=str(OUTPUT_ROOT / "step5_faged" / "markov_0_7" / "step5_long_ranked.csv")
)

long_df.head()


Unnamed: 0,Reference_Graph,Edit_Distance_Type,Edit_Distance,Rank,Group
0,17_918099.pkl,FaGED,0.32,1,Group_A
1,33_459135.pkl,FaGED,0.35,2,Group_A
2,20_306098.pkl,FaGED,0.36,3,Group_A
3,71_523598.pkl,FaGED,0.36,3,Group_A
4,31_644920.pkl,FaGED,0.509091,5,Group_A


## Optional: common custom combinations

### A) Step2 → Step3 (layout graph representation only)


In [6]:
from ged_building_layout import run_step2_then_step3, Step3Config

df_sel = run_step2_then_step3(
    json_folder=str(JSON_DIR),
    step2_output_folder=str(OUTPUT_ROOT / "step2_basegraphs"),
    step3_output_root=str(OUTPUT_ROOT / "step3_transform" / "variants"),
    step3_selected_folder=str(OUTPUT_ROOT / "step3_transform" / "selected"),
    step3_cfg=Step3Config(),
    save_png_step2=True,
    save_selection_csv_step3=True,
)
df_sel.head()

  tree = STRtree(region_polys)
Step3 CaGs transform: 100%|██████████| 5/5 [00:00<00:00,  7.58it/s]


Unnamed: 0,file,best_variant,best_kind,best_value,best_n_nodes,best_n_edges,best_avg_degree,edge_min,edge_max,in_edge_range,edge_dist_to_range
0,17_918099.pkl,absolute20,absolute,20.0,39,152,7.794872,120.0,160.0,True,0.0
1,20_306098.pkl,relative25,relative,0.25,44,157,7.136364,156.0,208.0,True,0.0
2,31_644920.pkl,relative20,relative,0.2,65,227,6.984615,198.0,264.0,True,0.0
3,33_459135.pkl,relative20,relative,0.2,63,221,7.015873,198.0,264.0,True,0.0
4,71_523598.pkl,relative20,relative,0.2,63,249,7.904762,198.0,264.0,True,0.0


### B) Step4 → Step5 (if you already have Step3 CaGs and Step 1 Behavioral Graphs)


In [10]:
from ged_building_layout.pipeline import Step4Step5Config, run_step4_then_step5
from ged_building_layout.step4_prototype import InfomapConfig
from ged_building_layout.step5_faged import Step5BatchConfig

cfg = Step4Step5Config(
    step4=InfomapConfig(
        input_folder=str(OUTPUT_ROOT / "step3_transform" / "selected"),
        graph_output_root=str(OUTPUT_ROOT / "step4_prototype" / "graphs"),
        community_img_output_root=str(OUTPUT_ROOT / "step4_prototype" / "communities"),
        markov_times=(0.7, 0.75, 0.8),
        main_function_mode="count",
    ),
    step5=Step5BatchConfig(
        step4_graph_output_root=str(OUTPUT_ROOT / "step4_prototype" / "graphs"),
        target_folder=str(OUTPUT_ROOT / "step1_behavior"),   
        step5_output_root=str(OUTPUT_ROOT / "step5_faged"),
        do_ged=True,
        do_nged=True,
        do_faged=True,
        timeout=30,
        markov_folders=["markov_0_7"] # markov_folders=["markov_0_7", "markov_0_75"]/markov_folders= None (run for all)
    ),
)

summary_df = run_step4_then_step5(cfg)
summary_df.head()


Step4 Prototype extraction: 100%|██████████| 5/5 [00:12<00:00,  2.41s/it]
                                                                     

Unnamed: 0,markov_folder,target_folder,reference_folder,output_dir,num_target_graphs,num_reference_graphs,ran_ok
0,markov_0_7,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,D:\6-FaGED\submission\1stRevision\faged\OUTPUT...,2,5,True
