Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 27, 2026

Summary

Major refactoring of the clustering and expansion logic to improve code clarity, maintainability, and encapsulation.

1. Clean Keys and Unified Dimensions

  • Use clean keys (e.g., (2024,) or (2024, 'high')) instead of sentinel patterns with None
  • Replace separate periods, scenarios parameters with unified dim_names list
  • Consolidate repeated expand_dims + combine_by_coords patterns into helper functions

2. Eliminated Redundant Dictionaries

  • Removed 3 redundant dicts (occurrences, metrics, tsam_clustering_results)
  • Single aggregation_results dict remains as the source of truth
  • Data is now derived on-demand rather than duplicated

3. Consistent Dimension Order

  • Fixed dimension order for single-slice vs multi-slice DataArray combining
  • Both paths now use explicit transpose(*base_dims, *extra_dims)

4. Robust Constant Array Handling

  • Filter constant arrays once on full dataset before slicing (not per-slice)
  • Prevents index mismatch errors when different periods have different constant arrays
  • Added warning suppression for expected all-NaN slices

5. Encapsulated Expansion Logic (_Expander class)

New _Expander class that fully encapsulates expansion from clustered to original timesteps:

_Expander
├── __init__(fs, clustering)                  # Pre-computes all shared state
├── _is_state_variable()                      # Category checks
├── _is_first_timestep_variable()
├── _build_segment_total_varnames()           # Backward compatibility
├── _append_final_state()                     # State variable N+1 handling
├── _interpolate_charge_state_segmented()     # Segmented interpolation
├── _expand_first_timestep_only()             # Startup/shutdown expansion
├── expand_dataarray()                        # Single DataArray expansion
├── _combine_intercluster_charge_states()     # SOC boundary handling
├── _apply_soc_decay()                        # Self-discharge decay
└── expand_flow_system()                      # Main entry point

TransformAccessor.expand() is now a thin wrapper:

def expand(self) -> FlowSystem:
    clustering = self._validate_for_expansion()
    expander = _Expander(self._fs, clustering)
    return expander.expand_flow_system()

Files Changed

  • flixopt/transform_accessor.py - Major refactoring with new _Expander class
  • flixopt/clustering/base.py - Dimension order fix, helper simplification
  • flixopt/core.py - Warning suppression for all-NaN slices
  • flixopt/dim_iterator.py - Deleted (no longer needed)

Benefits

  • Clarity: Clean keys eliminate confusion from None sentinels
  • Maintainability: Single source of truth for dimension handling
  • Encapsulation: Expansion logic isolated in dedicated class
  • Testability: _Expander methods can be unit tested independently
  • Performance: State pre-computed once, not per-call

Testing

  • All 98 clustering tests pass (test_cluster_reduce_expand.py)
  • All 43 IO tests pass (test_clustering_io.py)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

  Created flixopt/dim_iterator.py

  New utility class DimIterator with:
  - Factory methods: from_dataset(), from_flow_system(), from_sentinel_lists()
  - Iteration: iter_slices(ds), iter_keys()
  - Key conversion: key_from_values(p, s), selector_for_key(key)
  - Combination: combine(slices, base_dims, ...), combine_arrays(slices, base_dims, ...)

  Refactored flixopt/clustering/base.py

  1. Added DimIterator import
  2. Removed unused combine_slices() function (~70 lines)
  3. Added _iterator cached property to ClusteringResults
  4. Simplified _build_property_array() from ~35 lines to ~3 lines
  5. Simplified apply() to use selector_for_key()

  Refactored flixopt/transform_accessor.py

  1. Added DimIterator import
  2. Refactored _build_cluster_weight_da() to use DimIterator
  3. Refactored _build_segment_durations_da() to use DimIterator
  4. Refactored _build_clustering_metrics() to use DimIterator
  5. Refactored _build_reduced_dataset() to use DimIterator
  6. Refactored _build_cluster_assignments_da() to use DimIterator
  7. Removed _combine_slices_to_dataarray_generic() (~62 lines)
  8. Removed _combine_slices_to_dataarray_2d() (~40 lines)

  Lines of Code Impact

  - Added: ~400 lines (new dim_iterator.py module)
  - Removed: ~170 lines (old combine methods + combine_slices)
  - Net: +230 lines, but the new code is reusable across the codebase and eliminates repetitive patterns

  Benefits

  - Eliminates [None] sentinel value confusion
  - Eliminates 4-way if/elif branching for period/scenario combinations
  - Consistent key format throughout (tuples matching dims)
  - Single source of truth for period/scenario iteration logic
  - Easier to add new functionality (e.g., parallel processing)
  dim_iterator.py - Simplified to a single helper function:
  def add_slice_dims(da, period=None, scenario=None):
      """Add period/scenario dims back to a transformed DataArray."""
      if period is not None:
          da = da.expand_dims(period=[period])
      if scenario is not None:
          da = da.expand_dims(scenario=[scenario])
      return da

  The new pattern used throughout the codebase:
  results = []
  for p in periods:
      for s in scenarios:
          da = transform_data(...)  # Create DataArray for this slice
          results.append(add_slice_dims(da, period=p, scenario=s))

  combined = xr.combine_by_coords(results) if len(results) > 1 else results[0]

  This removes the complex DimIterator class with its key conversion methods, making the code more straightforward and easier to follow.
  - flixopt/dim_iterator.py - no longer needed

  Key changes:

  1. _build_typical_das now returns dict[str, xr.DataArray] (pre-combined) instead of dict[str, dict[tuple, xr.DataArray]] (nested by keys)
  2. _build_reduced_dataset simplified from ~80 lines to ~35 lines:
    - Removed partial slice handling (was complex, likely unused)
    - No longer iterates over periods/scenarios
    - Just uses pre-combined DataArrays directly
  3. All helper methods now use the same pattern:
  for key, data in results_dict.items():
      da = xr.DataArray(...)
      for dim_name, coord_val in zip(dim_names, key):
          da = da.expand_dims({dim_name: [coord_val]})
      results.append(da)
  combined = xr.combine_by_coords(results)
  4. No more [None] sentinel pattern - keys are clean tuples matching dim_names
  5. Removed add_slice_dims - no longer needed since dims are added inline

  The code is now much simpler: convert keys once at entry, add dims with expand_dims, combine with xr.combine_by_coords
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Removed legacy slice-combiner and replaced per-(period,scenario) stacking with per-key xarray.DataArray expansion/merge; updated ClusteringResults serialization/selectors and propagated generic dim_names through transform builders; tests for the removed utility were deleted and one IO assertion was relaxed.

Changes

Cohort / File(s) Summary
Core clustering logic
flixopt/clustering/base.py
Removed combine_slices(); _build_property_array() now builds per-key xarray.DataArray slices and merges with xr.combine_by_coords; _key_to_str changed (returns '__single__' for empty keys); added _str_to_key() and __repr__(); apply() now constructs selectors using dim_names.
Transform accessor & multi-dim utilities
flixopt/transform_accessor.py
Added _expand_dims_for_key() and _combine_dataarray_slices(); unified period/scenario handling into generic dim_names; updated builders (_build_cluster_weight_da, _build_typical_das, _build_segment_durations_da, _build_clustering_metrics, _build_reduced_flow_system, _build_reduced_dataset) to accept/produce DataArrays with *dim_names; propagation of dim_names across cluster/reduce/expand flows.
Tests — removed legacy combine_slices tests
tests/test_cluster_reduce_expand.py
Deleted TestCombineSlices suite and associated assertions validating combine_slices behaviour (coords, attrs, datetime handling).
Tests — IO assertion tolerance
tests/test_clustering_io.py
Relaxed exact ordering assertion for scenario index to a set-based comparison to tolerate ordering changes after roundtrip.
Minor warning handling
flixopt/core.py
Replaced NumPy warnings filter with warnings.catch_warnings to suppress specific RuntimeWarning: All-NaN slice during ptp computation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant ClusteringResults
  participant TransformAccessor
  participant XArray as "xarray.combine_by_coords"
  participant Reduced as "Reduced Dataset"

  Caller->>ClusteringResults: build/apply(aggregation_results, dim_names)
  ClusteringResults->>ClusteringResults: _build_property_array -> per-key DataArrays
  ClusteringResults->>TransformAccessor: send per-key DataArrays + dim_names
  TransformAccessor->>TransformAccessor: _expand_dims_for_key per key
  TransformAccessor->>XArray: _combine_dataarray_slices (xr.combine_by_coords)
  XArray-->>TransformAccessor: merged DataArray(s)
  TransformAccessor->>Reduced: assemble reduced dataset / typical_das with *dim_names
  Reduced-->>Caller: return reduced dataset
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through dims both wide and small,

Keys unfurl and every slice finds place,
Combine_slices gone — per-key merges for all,
Coordinates line up in tidy grace,
I nibble code and leave a happy trace.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title concisely summarizes the main refactoring: cleaner keys, unified dimension handling, and better encapsulation through helper functions.
Description check ✅ Passed The description is comprehensive and well-structured, covering all major changes with clear sections, specific examples, benefits, and test results matching the template format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

  Changes made:

  1. cluster() method:
    - Added dim_names list and to_clean_key() helper at the start
    - Dicts now use clean keys like (2024,) or (2024, 'high') instead of (2024, None) or (None, 'high')
    - Removed duplicate dim_names building and to_cr_key() function from the data_vars branch
    - Simplified data_vars branch since ClusteringResults can receive the dicts directly
    - Updated call to _build_reduced_flow_system to use new parameter names (aggregation_results, occurrences, metrics, dim_names)
  2. apply_clustering() method:
    - Uses dim_names directly from clustering.results.dim_names
    - Removed the key conversion logic (full_key = (cr_key[0], None) etc.)
    - Updated call to _build_reduced_flow_system with new parameter names

  The code is now cleaner with consistent use of clean keys throughout the clustering workflow. All 141 tests pass (98 in test_cluster_reduce_expand.py + 43 in
  test_clustering_io.py).
@FBumann FBumann changed the title Feature/rework clustering Simplify clustering iteration with clean keys and dim_names Jan 27, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)

316-341: Preserve extra dims even in the single-slice metrics case.

When dim_names is non-empty but only one slice exists, the current code returns metrics with dims ['time_series'] only, whereas the multi-slice path adds period/scenario dimensions via _expand_dims_for_key and _combine_dataarray_slices. This creates shape inconsistency that breaks downstream selection logic expecting consistent dimensions across all scenarios.

🛠️ Suggested fix
-        # Single slice case
-        if len(clustering_metrics_all) == 1 and len(non_empty_metrics) == 1:
-            metrics_df = next(iter(non_empty_metrics.values()))
-            return xr.Dataset(
-                {
-                    col: xr.DataArray(
-                        metrics_df[col].values,
-                        dims=['time_series'],
-                        coords={'time_series': metrics_df.index},
-                    )
-                    for col in metrics_df.columns
-                }
-            )
+        # Single slice case (still keep dim_names for consistency)
+        if len(clustering_metrics_all) == 1 and len(non_empty_metrics) == 1:
+            key, metrics_df = next(iter(non_empty_metrics.items()))
+            data_vars = {}
+            for col in metrics_df.columns:
+                da = xr.DataArray(
+                    metrics_df[col].values,
+                    dims=['time_series'],
+                    coords={'time_series': metrics_df.index},
+                )
+                da = _expand_dims_for_key(da, dim_names, key)
+                data_vars[col] = _combine_dataarray_slices([da], ['time_series'], dim_names, name=col)
+            return xr.Dataset(data_vars)
🤖 Fix all issues with AI agents
In `@flixopt/clustering/base.py`:
- Around line 485-498: Single-slice results skip the final transpose causing
inconsistent dimension order: after building the single DataArray in the loop
(uses get_data(cr), base_dims, base_coords, name and expands dims via
self._dim_names), apply the same transpose as the multi-slice path before
returning; i.e., when len(slices) == 1, call .transpose(*base_dims,
*self._dim_names) on slices[0] (or assign and then transpose) so the
single-slice DataArray has identical dimension ordering to the combined branch.

In `@flixopt/transform_accessor.py`:
- Around line 48-57: The single-slice branch currently skips applying result =
result.transpose(*base_dims, *extra_dims), causing incorrect dim order; after
the if-else that sets result (the branch that checks isinstance(combined,
xr.Dataset) and the else that assigns combined or slices[0]), move or add a
single transpose call so that result.transpose(*base_dims, *extra_dims) is
executed for both the single-slice and combined-slice paths (refer to variables
slices, combined, result, base_dims, extra_dims and the xr.combine_by_coords
logic).

…g_results)

  - Single aggregation_results dict remains (required for Clustering)
…le-slice path

  2. transform_accessor.py: _combine_dataarray_slices now transposes for both paths
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)

301-352: Ensure metrics keep extra dims in single-slice case.

When dim_names is non-empty but only one slice exists, the single-slice path drops those dims, unlike other arrays that keep [time_series, *dim_names]. This can break downstream selection or alignment.

🛠️ Proposed fix
-        # Single slice case
-        if len(metrics_dfs) == 1 and len(non_empty_metrics) == 1:
-            metrics_df = next(iter(non_empty_metrics.values()))
-            return xr.Dataset(
-                {
-                    col: xr.DataArray(
-                        metrics_df[col].values,
-                        dims=['time_series'],
-                        coords={'time_series': metrics_df.index},
-                    )
-                    for col in metrics_df.columns
-                }
-            )
+        # Single slice case: keep extra dims for consistency
+        if len(metrics_dfs) == 1 and len(non_empty_metrics) == 1:
+            key, metrics_df = next(iter(non_empty_metrics.items()))
+            data_vars = {}
+            for col in metrics_df.columns:
+                da = xr.DataArray(
+                    metrics_df[col].values,
+                    dims=['time_series'],
+                    coords={'time_series': metrics_df.index},
+                )
+                da = _expand_dims_for_key(da, dim_names, key)
+                data_vars[col] = _combine_dataarray_slices([da], ['time_series'], dim_names, name=col)
+            return xr.Dataset(data_vars)

…n logic:

  1. Pre-computed state in __init__:
    - Clustering dimensions (timesteps_per_cluster, n_segments, n_clusters, etc.)
    - Timesteps (original_timesteps, original_timesteps_extra)
    - Variable category sets (state_vars, first_timestep_vars, segment_total_vars)
    - Expansion divisor for segmented systems
  2. Methods moved from nested functions/helpers:
    - _is_state_variable() - checks if variable needs interpolation
    - _is_first_timestep_variable() - checks if variable is startup/shutdown
    - _build_segment_total_varnames() - backward compatibility fallback
    - _append_final_state() - appends N+1th state value
    - _interpolate_charge_state_segmented() - interpolates within segments
    - _expand_first_timestep_only() - places events at first timestep of segment
    - expand_dataarray() - main expansion logic (was expand_da)
  3. Simplified TransformAccessor.expand():
    - Creates _Expander instance with pre-computed state
    - Uses expander.expand_dataarray() instead of nested closure
    - Accesses expander attributes for logging

  Benefits:
  - State is pre-computed once, not on every call
  - Methods are testable in isolation
  - No more nested function closures capturing external state
  - Clear separation: _Expander handles expansion, TransformAccessor orchestrates
  _Expander
  ├── __init__(fs, clustering)           # Pre-computes all shared state
  ├── _is_state_variable()               # Category checks
  ├── _is_first_timestep_variable()
  ├── _build_segment_total_varnames()    # Backward compatibility fallback
  ├── _append_final_state()              # State variable N+1 handling
  ├── _interpolate_charge_state_segmented()   # Segmented charge_state interpolation
  ├── _expand_first_timestep_only()      # Startup/shutdown expansion
  ├── expand_dataarray()                 # Single DataArray expansion
  ├── _fast_get_da()                     # Performance helper
  ├── _combine_intercluster_charge_states()   # SOC boundary handling
  ├── _apply_soc_decay()                 # Self-discharge decay
  └── expand_flow_system()               # Main entry point - returns expanded FlowSystem

  TransformAccessor.expand() (lines 1944-2048) - thin wrapper:

  def expand(self) -> FlowSystem:
      """...(docstring preserved)..."""
      clustering = self._validate_for_expansion()
      expander = _Expander(self._fs, clustering)
      return expander.expand_flow_system()

  Benefits:
  - Complete encapsulation: all expansion logic in one class
  - Self-contained: _Expander doesn't depend on TransformAccessor internals
  - Clean separation: accessor validates, expander does the work
  - Testable: _Expander could be unit tested independently
  - Simpler signatures: methods use instance state instead of passing many parameters
@FBumann FBumann changed the title Simplify clustering iteration with clean keys and dim_names Refactor clustering: clean keys, unified dims, and encapsulated expansion Jan 28, 2026
@FBumann FBumann merged commit 5d91402 into main Jan 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants