Problem
Synthesized dataset sizes fall ~17% short of targets set in #255. Last setup run (release 2026-05-16.1):
| size |
target |
actual |
multiplier (incl. originals) |
| small |
~5M |
4,166,773 |
1× (passthrough) |
| medium |
~40M |
33,334,184 |
8× (7 clones + original) |
| large |
~100M |
83,335,460 |
20× (19 clones + original) |
Root cause: targets in DatasetSize.clones_per_polygon were sized assuming a ~5M base, but the conflated source dataset is 4.17M polygons.
Fix
Bump clones_per_polygon in src/domain/enums/dataset_size.py:10:
MEDIUM: 7 → 9 clones → ~41.7M rows
LARGE: 19 → 23 clones → ~100.0M rows
Update docstring targets in the same enum.
Validation
Re-run setup container, confirm buildings_medium ≈ 40M and buildings_large ≈ 100M (minus dropped invalid clones).
Refs #255.
Problem
Synthesized dataset sizes fall ~17% short of targets set in #255. Last setup run (release
2026-05-16.1):Root cause: targets in
DatasetSize.clones_per_polygonwere sized assuming a ~5M base, but the conflated source dataset is 4.17M polygons.Fix
Bump
clones_per_polygoninsrc/domain/enums/dataset_size.py:10:MEDIUM: 7 → 9 clones → ~41.7M rowsLARGE: 19 → 23 clones → ~100.0M rowsUpdate docstring targets in the same enum.
Validation
Re-run setup container, confirm
buildings_medium≈ 40M andbuildings_large≈ 100M (minus dropped invalid clones).Refs #255.