Refactor clustering: clean keys, unified dims, and encapsulated expansion #592

FBumann · 2026-01-27T21:04:24Z

Summary

Major refactoring of the clustering and expansion logic to improve code clarity, maintainability, and encapsulation.

1. Clean Keys and Unified Dimensions

Use clean keys (e.g., (2024,) or (2024, 'high')) instead of sentinel patterns with None
Replace separate periods, scenarios parameters with unified dim_names list
Consolidate repeated expand_dims + combine_by_coords patterns into helper functions

2. Eliminated Redundant Dictionaries

Removed 3 redundant dicts (occurrences, metrics, tsam_clustering_results)
Single aggregation_results dict remains as the source of truth
Data is now derived on-demand rather than duplicated

3. Consistent Dimension Order

Fixed dimension order for single-slice vs multi-slice DataArray combining
Both paths now use explicit transpose(*base_dims, *extra_dims)

4. Robust Constant Array Handling

Filter constant arrays once on full dataset before slicing (not per-slice)
Prevents index mismatch errors when different periods have different constant arrays
Added warning suppression for expected all-NaN slices

5. Encapsulated Expansion Logic (`_Expander` class)

New _Expander class that fully encapsulates expansion from clustered to original timesteps:

_Expander
├── __init__(fs, clustering)                  # Pre-computes all shared state
├── _is_state_variable()                      # Category checks
├── _is_first_timestep_variable()
├── _build_segment_total_varnames()           # Backward compatibility
├── _append_final_state()                     # State variable N+1 handling
├── _interpolate_charge_state_segmented()     # Segmented interpolation
├── _expand_first_timestep_only()             # Startup/shutdown expansion
├── expand_dataarray()                        # Single DataArray expansion
├── _combine_intercluster_charge_states()     # SOC boundary handling
├── _apply_soc_decay()                        # Self-discharge decay
└── expand_flow_system()                      # Main entry point

TransformAccessor.expand() is now a thin wrapper:

def expand(self) -> FlowSystem:
    clustering = self._validate_for_expansion()
    expander = _Expander(self._fs, clustering)
    return expander.expand_flow_system()

Files Changed

flixopt/transform_accessor.py - Major refactoring with new _Expander class
flixopt/clustering/base.py - Dimension order fix, helper simplification
flixopt/core.py - Warning suppression for all-NaN slices
flixopt/dim_iterator.py - Deleted (no longer needed)

Benefits

Clarity: Clean keys eliminate confusion from None sentinels
Maintainability: Single source of truth for dimension handling
Encapsulation: Expansion logic isolated in dedicated class
Testability: _Expander methods can be unit tested independently
Performance: State pre-computed once, not per-call

Testing

All 98 clustering tests pass (test_cluster_reduce_expand.py)
All 43 IO tests pass (test_clustering_io.py)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Created flixopt/dim_iterator.py New utility class DimIterator with: - Factory methods: from_dataset(), from_flow_system(), from_sentinel_lists() - Iteration: iter_slices(ds), iter_keys() - Key conversion: key_from_values(p, s), selector_for_key(key) - Combination: combine(slices, base_dims, ...), combine_arrays(slices, base_dims, ...) Refactored flixopt/clustering/base.py 1. Added DimIterator import 2. Removed unused combine_slices() function (~70 lines) 3. Added _iterator cached property to ClusteringResults 4. Simplified _build_property_array() from ~35 lines to ~3 lines 5. Simplified apply() to use selector_for_key() Refactored flixopt/transform_accessor.py 1. Added DimIterator import 2. Refactored _build_cluster_weight_da() to use DimIterator 3. Refactored _build_segment_durations_da() to use DimIterator 4. Refactored _build_clustering_metrics() to use DimIterator 5. Refactored _build_reduced_dataset() to use DimIterator 6. Refactored _build_cluster_assignments_da() to use DimIterator 7. Removed _combine_slices_to_dataarray_generic() (~62 lines) 8. Removed _combine_slices_to_dataarray_2d() (~40 lines) Lines of Code Impact - Added: ~400 lines (new dim_iterator.py module) - Removed: ~170 lines (old combine methods + combine_slices) - Net: +230 lines, but the new code is reusable across the codebase and eliminates repetitive patterns Benefits - Eliminates [None] sentinel value confusion - Eliminates 4-way if/elif branching for period/scenario combinations - Consistent key format throughout (tuples matching dims) - Single source of truth for period/scenario iteration logic - Easier to add new functionality (e.g., parallel processing)

dim_iterator.py - Simplified to a single helper function: def add_slice_dims(da, period=None, scenario=None): """Add period/scenario dims back to a transformed DataArray.""" if period is not None: da = da.expand_dims(period=[period]) if scenario is not None: da = da.expand_dims(scenario=[scenario]) return da The new pattern used throughout the codebase: results = [] for p in periods: for s in scenarios: da = transform_data(...) # Create DataArray for this slice results.append(add_slice_dims(da, period=p, scenario=s)) combined = xr.combine_by_coords(results) if len(results) > 1 else results[0] This removes the complex DimIterator class with its key conversion methods, making the code more straightforward and easier to follow.

- flixopt/dim_iterator.py - no longer needed Key changes: 1. _build_typical_das now returns dict[str, xr.DataArray] (pre-combined) instead of dict[str, dict[tuple, xr.DataArray]] (nested by keys) 2. _build_reduced_dataset simplified from ~80 lines to ~35 lines: - Removed partial slice handling (was complex, likely unused) - No longer iterates over periods/scenarios - Just uses pre-combined DataArrays directly 3. All helper methods now use the same pattern: for key, data in results_dict.items(): da = xr.DataArray(...) for dim_name, coord_val in zip(dim_names, key): da = da.expand_dims({dim_name: [coord_val]}) results.append(da) combined = xr.combine_by_coords(results) 4. No more [None] sentinel pattern - keys are clean tuples matching dim_names 5. Removed add_slice_dims - no longer needed since dims are added inline The code is now much simpler: convert keys once at entry, add dims with expand_dims, combine with xr.combine_by_coords

coderabbitai · 2026-01-27T21:04:44Z

📝 Walkthrough

Walkthrough

Removed legacy slice-combiner and replaced per-(period,scenario) stacking with per-key xarray.DataArray expansion/merge; updated ClusteringResults serialization/selectors and propagated generic dim_names through transform builders; tests for the removed utility were deleted and one IO assertion was relaxed.

Changes

Cohort / File(s)	Summary
Core clustering logic `flixopt/clustering/base.py`	Removed `combine_slices()`; `_build_property_array()` now builds per-key `xarray.DataArray` slices and merges with `xr.combine_by_coords`; `_key_to_str` changed (returns `'__single__'` for empty keys); added `_str_to_key()` and `__repr__()`; `apply()` now constructs selectors using `dim_names`.
Transform accessor & multi-dim utilities `flixopt/transform_accessor.py`	Added `_expand_dims_for_key()` and `_combine_dataarray_slices()`; unified period/scenario handling into generic `dim_names`; updated builders (`_build_cluster_weight_da`, `_build_typical_das`, `_build_segment_durations_da`, `_build_clustering_metrics`, `_build_reduced_flow_system`, `_build_reduced_dataset`) to accept/produce DataArrays with `*dim_names`; propagation of `dim_names` across cluster/reduce/expand flows.
Tests — removed legacy combine_slices tests `tests/test_cluster_reduce_expand.py`	Deleted `TestCombineSlices` suite and associated assertions validating `combine_slices` behaviour (coords, attrs, datetime handling).
Tests — IO assertion tolerance `tests/test_clustering_io.py`	Relaxed exact ordering assertion for scenario index to a set-based comparison to tolerate ordering changes after roundtrip.
Minor warning handling `flixopt/core.py`	Replaced NumPy warnings filter with `warnings.catch_warnings` to suppress specific `RuntimeWarning: All-NaN slice` during ptp computation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant ClusteringResults
  participant TransformAccessor
  participant XArray as "xarray.combine_by_coords"
  participant Reduced as "Reduced Dataset"

  Caller->>ClusteringResults: build/apply(aggregation_results, dim_names)
  ClusteringResults->>ClusteringResults: _build_property_array -> per-key DataArrays
  ClusteringResults->>TransformAccessor: send per-key DataArrays + dim_names
  TransformAccessor->>TransformAccessor: _expand_dims_for_key per key
  TransformAccessor->>XArray: _combine_dataarray_slices (xr.combine_by_coords)
  XArray-->>TransformAccessor: merged DataArray(s)
  TransformAccessor->>Reduced: assemble reduced dataset / typical_das with *dim_names
  Reduced-->>Caller: return reduced dataset

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Fix/segmentation expansion+variable categories #574 — Overlaps _build_property_array / combine-slices logic; related to per-key assembly changes.
Fix clustering expansion bugs and add ExtremeConfig validation #590 — Alters transform_accessor expansion logic and dim_names propagation, similar adjustments.
Clustering Parameter Refinements & Unified Slot Assignment #552 — Modifies clustering and transform codepaths with related per-key expansion/combination changes.

Poem

🐰 I hop through dims both wide and small,

Keys unfurl and every slice finds place,
Combine_slices gone — per-key merges for all,
Coordinates line up in tidy grace,
I nibble code and leave a happy trace.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title concisely summarizes the main refactoring: cleaner keys, unified dimension handling, and better encapsulation through helper functions.
Description check	✅ Passed	The description is comprehensive and well-structured, covering all major changes with clear sections, specific examples, benefits, and test results matching the template format.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Changes made: 1. cluster() method: - Added dim_names list and to_clean_key() helper at the start - Dicts now use clean keys like (2024,) or (2024, 'high') instead of (2024, None) or (None, 'high') - Removed duplicate dim_names building and to_cr_key() function from the data_vars branch - Simplified data_vars branch since ClusteringResults can receive the dicts directly - Updated call to _build_reduced_flow_system to use new parameter names (aggregation_results, occurrences, metrics, dim_names) 2. apply_clustering() method: - Uses dim_names directly from clustering.results.dim_names - Removed the key conversion logic (full_key = (cr_key[0], None) etc.) - Updated call to _build_reduced_flow_system with new parameter names The code is now cleaner with consistent use of clean keys throughout the clustering workflow. All 141 tests pass (98 in test_cluster_reduce_expand.py + 43 in test_clustering_io.py).

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flixopt/transform_accessor.py (1)

316-341: Preserve extra dims even in the single-slice metrics case.

When dim_names is non-empty but only one slice exists, the current code returns metrics with dims ['time_series'] only, whereas the multi-slice path adds period/scenario dimensions via _expand_dims_for_key and _combine_dataarray_slices. This creates shape inconsistency that breaks downstream selection logic expecting consistent dimensions across all scenarios.

🛠️ Suggested fix

-        # Single slice case
-        if len(clustering_metrics_all) == 1 and len(non_empty_metrics) == 1:
-            metrics_df = next(iter(non_empty_metrics.values()))
-            return xr.Dataset(
-                {
-                    col: xr.DataArray(
-                        metrics_df[col].values,
-                        dims=['time_series'],
-                        coords={'time_series': metrics_df.index},
-                    )
-                    for col in metrics_df.columns
-                }
-            )
+        # Single slice case (still keep dim_names for consistency)
+        if len(clustering_metrics_all) == 1 and len(non_empty_metrics) == 1:
+            key, metrics_df = next(iter(non_empty_metrics.items()))
+            data_vars = {}
+            for col in metrics_df.columns:
+                da = xr.DataArray(
+                    metrics_df[col].values,
+                    dims=['time_series'],
+                    coords={'time_series': metrics_df.index},
+                )
+                da = _expand_dims_for_key(da, dim_names, key)
+                data_vars[col] = _combine_dataarray_slices([da], ['time_series'], dim_names, name=col)
+            return xr.Dataset(data_vars)

🤖 Fix all issues with AI agents

In `@flixopt/clustering/base.py`:
- Around line 485-498: Single-slice results skip the final transpose causing
inconsistent dimension order: after building the single DataArray in the loop
(uses get_data(cr), base_dims, base_coords, name and expands dims via
self._dim_names), apply the same transpose as the multi-slice path before
returning; i.e., when len(slices) == 1, call .transpose(*base_dims,
*self._dim_names) on slices[0] (or assign and then transpose) so the
single-slice DataArray has identical dimension ordering to the combined branch.

In `@flixopt/transform_accessor.py`:
- Around line 48-57: The single-slice branch currently skips applying result =
result.transpose(*base_dims, *extra_dims), causing incorrect dim order; after
the if-else that sets result (the branch that checks isinstance(combined,
xr.Dataset) and the else that assigns combined or slices[0]), move or add a
single transpose call so that result.transpose(*base_dims, *extra_dims) is
executed for both the single-slice and combined-slice paths (refer to variables
slices, combined, result, base_dims, extra_dims and the xr.combine_by_coords
logic).

flixopt/clustering/base.py

flixopt/transform_accessor.py

…g_results) - Single aggregation_results dict remains (required for Clustering)

…le-slice path 2. transform_accessor.py: _combine_dataarray_slices now transposes for both paths

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flixopt/transform_accessor.py (1)

301-352: Ensure metrics keep extra dims in single-slice case.

When dim_names is non-empty but only one slice exists, the single-slice path drops those dims, unlike other arrays that keep [time_series, *dim_names]. This can break downstream selection or alignment.

🛠️ Proposed fix

-        # Single slice case
-        if len(metrics_dfs) == 1 and len(non_empty_metrics) == 1:
-            metrics_df = next(iter(non_empty_metrics.values()))
-            return xr.Dataset(
-                {
-                    col: xr.DataArray(
-                        metrics_df[col].values,
-                        dims=['time_series'],
-                        coords={'time_series': metrics_df.index},
-                    )
-                    for col in metrics_df.columns
-                }
-            )
+        # Single slice case: keep extra dims for consistency
+        if len(metrics_dfs) == 1 and len(non_empty_metrics) == 1:
+            key, metrics_df = next(iter(non_empty_metrics.items()))
+            data_vars = {}
+            for col in metrics_df.columns:
+                da = xr.DataArray(
+                    metrics_df[col].values,
+                    dims=['time_series'],
+                    coords={'time_series': metrics_df.index},
+                )
+                da = _expand_dims_for_key(da, dim_names, key)
+                data_vars[col] = _combine_dataarray_slices([da], ['time_series'], dim_names, name=col)
+            return xr.Dataset(data_vars)

…n logic: 1. Pre-computed state in __init__: - Clustering dimensions (timesteps_per_cluster, n_segments, n_clusters, etc.) - Timesteps (original_timesteps, original_timesteps_extra) - Variable category sets (state_vars, first_timestep_vars, segment_total_vars) - Expansion divisor for segmented systems 2. Methods moved from nested functions/helpers: - _is_state_variable() - checks if variable needs interpolation - _is_first_timestep_variable() - checks if variable is startup/shutdown - _build_segment_total_varnames() - backward compatibility fallback - _append_final_state() - appends N+1th state value - _interpolate_charge_state_segmented() - interpolates within segments - _expand_first_timestep_only() - places events at first timestep of segment - expand_dataarray() - main expansion logic (was expand_da) 3. Simplified TransformAccessor.expand(): - Creates _Expander instance with pre-computed state - Uses expander.expand_dataarray() instead of nested closure - Accesses expander attributes for logging Benefits: - State is pre-computed once, not on every call - Methods are testable in isolation - No more nested function closures capturing external state - Clear separation: _Expander handles expansion, TransformAccessor orchestrates

_Expander ├── __init__(fs, clustering) # Pre-computes all shared state ├── _is_state_variable() # Category checks ├── _is_first_timestep_variable() ├── _build_segment_total_varnames() # Backward compatibility fallback ├── _append_final_state() # State variable N+1 handling ├── _interpolate_charge_state_segmented() # Segmented charge_state interpolation ├── _expand_first_timestep_only() # Startup/shutdown expansion ├── expand_dataarray() # Single DataArray expansion ├── _fast_get_da() # Performance helper ├── _combine_intercluster_charge_states() # SOC boundary handling ├── _apply_soc_decay() # Self-discharge decay └── expand_flow_system() # Main entry point - returns expanded FlowSystem TransformAccessor.expand() (lines 1944-2048) - thin wrapper: def expand(self) -> FlowSystem: """...(docstring preserved)...""" clustering = self._validate_for_expansion() expander = _Expander(self._fs, clustering) return expander.expand_flow_system() Benefits: - Complete encapsulation: all expansion logic in one class - Self-contained: _Expander doesn't depend on TransformAccessor internals - Clean separation: accessor validates, expander does the work - Testable: _Expander could be unit tested independently - Simpler signatures: methods use instance state instead of passing many parameters

…ring

FBumann added 5 commits January 27, 2026 21:27

simplify by iterating over the key and dim_names directly

f14c5a6

simplify by iterating over the key and dim_names directly

e6e757e

FBumann changed the title ~~Feature/rework clustering~~ Simplify clustering iteration with clean keys and dim_names Jan 27, 2026

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

flixopt/clustering/base.py Outdated Show resolved Hide resolved

flixopt/transform_accessor.py Outdated Show resolved Hide resolved

FBumann added 3 commits January 27, 2026 22:34

udpate

62f534d

- Eliminated 3 redundant dicts (occurrences, metrics, tsam_clusterin…

30774e6

…g_results) - Single aggregation_results dict remains (required for Clustering)

1. clustering/base.py: _build_property_array now transposes for sing…

ab5cb3c

…le-slice path 2. transform_accessor.py: _combine_dataarray_slices now transposes for both paths

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

FBumann added 7 commits January 27, 2026 23:09

Basic fixes

e822d19

drop constant arrays before slicing.

8ac679c

drop constant arrays before slicing.

3d9475c

Simplify expansion logic

24c2838

Merge remote-tracking branch 'origin/main' into feature/rework-cluste…

03446f8

…ring

FBumann changed the title ~~Simplify clustering iteration with clean keys and dim_names~~ Refactor clustering: clean keys, unified dims, and encapsulated expansion Jan 28, 2026

FBumann merged commit 5d91402 into main Jan 28, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor clustering: clean keys, unified dims, and encapsulated expansion #592

Refactor clustering: clean keys, unified dims, and encapsulated expansion #592

Uh oh!

FBumann commented Jan 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor clustering: clean keys, unified dims, and encapsulated expansion #592

Refactor clustering: clean keys, unified dims, and encapsulated expansion #592

Uh oh!

Conversation

FBumann commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Clean Keys and Unified Dimensions

2. Eliminated Redundant Dictionaries

3. Consistent Dimension Order

4. Robust Constant Array Handling

5. Encapsulated Expansion Logic (_Expander class)

Files Changed

Benefits

Testing

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Jan 27, 2026 •

edited

Loading

5. Encapsulated Expansion Logic (`_Expander` class)

coderabbitai bot commented Jan 27, 2026 •

edited

Loading