fix: validate user-provided clustering weight keys by FBumann · Pull Request #627 · flixOpt/flixopt

FBumann · 2026-03-24T08:50:58Z

Summary

Raise ValueError when ClusterConfig.weights references variables that don't exist in the FlowSystem, catching typos early
Weights for constant (but valid) variables are still silently filtered — they exist in the system, they're just not useful for clustering
Updated tests to cover both unknown-key rejection and constant-column filtering

Test plan

test_unknown_weight_keys_raise — bogus keys raise ValueError
test_extra_weight_keys_filtered_with_constant_column — constant columns silently filtered
test_unknown_weight_keys_raise_multiperiod — bogus key in multi-period raises
test_valid_weight_keys_multiperiod — valid keys in multi-period work

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved weight validation for clustering: unknown weight keys are now detected and rejected with clear error messages that point users to the available clustering variables; validation runs early to prevent confusing downstream failures. Single-period clustering still silently ignores constant columns.
Tests
- Updated tests to cover error handling for invalid weight keys and valid-weight success cases across single- and multi-period clustering.

Raise ValueError when ClusterConfig.weights references variables that don't exist in the FlowSystem, catching typos early with a clear message. Weights for constant (but valid) variables are still silently filtered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-24T08:53:18Z

Warning

Rate limit exceeded

@FBumann has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88efbe94-5c2e-44c2-a81b-82cb3676d42c

📥 Commits

Reviewing files that changed from the base of the PR and between 2af82d7 and df746c2.

📒 Files selected for processing (1)

tests/test_clustering/test_integration.py

📝 Walkthrough

Walkthrough

Added validation in TransformAccessor.cluster() to verify that any non-None cluster.weights keys exist in the selected clustering dataset's variables; unknown keys now cause a ValueError listing the unknown keys and pointing to transform.clustering_data(). The check runs before the dataset-empty guard.

Changes

Cohort / File(s)	Summary
Weight Key Validation `flixopt/transform_accessor.py`	Added a pre-check in `TransformAccessor.cluster()` that computes unknown keys from `cluster.weights` against the clustering input dataset's variables and raises `ValueError` listing unknown keys and referencing `transform.clustering_data()`. Runs before the empty-dataset check.
Clustering Tests `tests/test_clustering/test_integration.py`	Replaced tests that previously allowed unknown weight keys with tests expecting `ValueError` for unknown keys (`test_unknown_weight_keys_raise`, multi-period variant). Added a passing multi-period test limited to valid weight keys and updated single-period constant-column handling assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: filter user-provided clustering weights to available columns #625: Modifies the same weight-key handling in TransformAccessor.cluster() but implements silent filtering of unknown weight keys (opposite behavior).

Poem

🐰 I hopped through datasets, ears alert and bright,
Counting weight keys by soft moonlight,
If a name is missing, I thump and declare—
"Unknown variables!" — I leave no spare. 🥕✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: validate user-provided clustering weight keys' clearly and concisely summarizes the main change: adding validation for clustering weight keys.
Description check	✅ Passed	The PR description covers the key changes (validation logic, constant-column handling) and includes a comprehensive test plan, though it deviates from the template structure with a custom summary and test plan format.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch validate-clustering-weight-keys

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_clustering/test_integration.py (1)

399-405: ⚠️ Potential issue | 🟠 Major

This test never exercises the dropped-constant-column path.

Line 400 calls transform.clustering_data(), which already strips constant arrays, so weights is built only from varying columns. The test will still pass even if constant-column filtering in cluster() breaks. Pull the constant time-series name from fs.to_dataset(include_solution=False) (or add the known constant profile key explicitly) before invoking cluster().

Suggested change

-        # The constant column name (find it from clustering_data)
-        all_data = fs.transform.clustering_data()
-        all_columns = set(all_data.data_vars)
+        full_data = fs.to_dataset(include_solution=False)
+        varying_columns = set(fs.transform.clustering_data().data_vars)
+        constant_columns = {
+            name for name, da in full_data.data_vars.items() if 'time' in da.dims and name not in varying_columns
+        }
+        assert constant_columns, 'test setup failed: expected at least one constant time series'

         # Build weights that reference ALL columns including the constant one
         # that will be dropped — these are valid variables, just constant over time
-        weights = {col: 1.0 for col in all_columns}
+        weights = {col: 1.0 for col in varying_columns | constant_columns}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_clustering/test_integration.py` around lines 399 - 405, The test
builds weights from all columns using all_data = fs.transform.clustering_data(),
but clustering_data() already strips constant arrays so the test never verifies
the code path where cluster() must drop constant columns; instead, obtain the
constant time-series key from fs.to_dataset(include_solution=False) (or add the
known constant profile key explicitly) before constructing weights, then pass
that full-keys weights dict into cluster() so the constant-column-dropping logic
is exercised (update where weights is created and ensure cluster() is called
with that weights mapping).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flixopt/transform_accessor.py`:
- Around line 1701-1709: The weight validation currently compares
cluster.weights against the full dataset (ds.data_vars) which lets weights for
excluded variables slip through; update the check to validate cluster.weights
against the actual selected clustering input allow-list immediately after the
data_vars selection and before drop_constant_arrays() is called so that constant
columns and excluded variables are handled correctly; specifically, compute the
selected input set (the allow-list used to build available_columns / clustering
inputs) and use that set to detect unknown weights (referencing cluster.weights,
data_vars selection, available_columns, and drop_constant_arrays()) and raise
the same ValueError if any weight keys are not in that selected input set.

---

Outside diff comments:
In `@tests/test_clustering/test_integration.py`:
- Around line 399-405: The test builds weights from all columns using all_data =
fs.transform.clustering_data(), but clustering_data() already strips constant
arrays so the test never verifies the code path where cluster() must drop
constant columns; instead, obtain the constant time-series key from
fs.to_dataset(include_solution=False) (or add the known constant profile key
explicitly) before constructing weights, then pass that full-keys weights dict
into cluster() so the constant-column-dropping logic is exercised (update where
weights is created and ensure cluster() is called with that weights mapping).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87f28107-3d5e-4881-82fd-6d7e5691c52d

📥 Commits

Reviewing files that changed from the base of the PR and between 27cce42 and 626727c.

📒 Files selected for processing (2)

flixopt/transform_accessor.py
tests/test_clustering/test_integration.py

flixopt/transform_accessor.py

… dataset - Validate against ds_for_clustering (after data_vars selection) instead of ds (full dataset), so weights for excluded variables are caught - Move validation before drop_constant_arrays so constant columns remain accepted as valid keys - Fix test to use to_dataset() instead of clustering_data() so the constant-column path is actually exercised Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

tests/test_clustering/test_integration.py (2)

338-361: Please cover the data_vars allow-list directly.

These additions cover typo keys and the happy path, but they still do not exercise the case where a weight key exists in the FlowSystem and should fail only because data_vars excluded it. One targeted regression for that path would lock in the behavior this change is primarily relying on.

Also applies to: 418-497

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_clustering/test_integration.py` around lines 338 - 361, The test
currently checks unknown weight keys but doesn't exercise the case where a
weight key exists on the FlowSystem but is excluded by the
clustering_data.data_vars allow-list; add a new assertion (or modify
test_unknown_weight_keys_raise) to build a weights dict that includes a variable
name that is present on basic_flow_system but intentionally omitted from
clustering_data.data_vars, then call basic_flow_system.transform.cluster(...)
with ClusterConfig(weights=weights) and assert it raises ValueError (match
'unknown variables' or similar); to locate code update, reference the test
function test_unknown_weight_keys_raise, the clustering_data =
basic_flow_system.transform.clustering_data() call and the ClusterConfig class
so the test explicitly validates the data_vars allow-list enforcement.

399-405: Assert the known constant column explicitly.

all_columns > clustering_columns can already be true because to_dataset() contains other non-time-varying data vars, so this can pass without proving that the constant profile is the column being filtered.

🧪 Tighten the assertion

         all_data = fs.to_dataset(include_solution=False)
         all_columns = set(all_data.data_vars)
         clustering_columns = set(fs.transform.clustering_data().data_vars)
-        assert all_columns > clustering_columns, 'Test requires at least one constant column to be meaningful'
+        constant_column = 'constant_load(constant_out)|fixed_relative_profile'
+        assert constant_column in all_columns
+        assert constant_column not in clustering_columns

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_clustering/test_integration.py` around lines 399 - 405, Instead of
the loose set check, explicitly assert the constant column is being filtered:
find the constant column(s) from all_data (e.g., by checking which data_vars
have only a single unique value across time) and then assert at least one such
constant column exists in all_columns and that each of those constant column
names is not in clustering_columns; reference all_data, all_columns,
clustering_columns and the transform.clustering_data() path when making these
assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_clustering/test_integration.py`:
- Around line 338-361: The test currently checks unknown weight keys but doesn't
exercise the case where a weight key exists on the FlowSystem but is excluded by
the clustering_data.data_vars allow-list; add a new assertion (or modify
test_unknown_weight_keys_raise) to build a weights dict that includes a variable
name that is present on basic_flow_system but intentionally omitted from
clustering_data.data_vars, then call basic_flow_system.transform.cluster(...)
with ClusterConfig(weights=weights) and assert it raises ValueError (match
'unknown variables' or similar); to locate code update, reference the test
function test_unknown_weight_keys_raise, the clustering_data =
basic_flow_system.transform.clustering_data() call and the ClusterConfig class
so the test explicitly validates the data_vars allow-list enforcement.
- Around line 399-405: Instead of the loose set check, explicitly assert the
constant column is being filtered: find the constant column(s) from all_data
(e.g., by checking which data_vars have only a single unique value across time)
and then assert at least one such constant column exists in all_columns and that
each of those constant column names is not in clustering_columns; reference
all_data, all_columns, clustering_columns and the transform.clustering_data()
path when making these assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfae4c89-a44e-43d4-8212-84201f13b394

📥 Commits

Reviewing files that changed from the base of the PR and between 626727c and 2af82d7.

📒 Files selected for processing (2)

flixopt/transform_accessor.py
tests/test_clustering/test_integration.py

- New test_weight_keys_excluded_by_data_vars_raise: verifies that weight keys present on the FlowSystem but excluded by data_vars are rejected - Replace loose set check with explicit constant-column identification and per-column assertions in test_extra_weight_keys_filtered_with_constant_column Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

flixopt/transform_accessor.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

FBumann merged commit 4cba143 into main Mar 24, 2026
8 checks passed

flixopt-release-bot bot mentioned this pull request Mar 24, 2026

chore(main): release 6.1.1 #643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate user-provided clustering weight keys#627

fix: validate user-provided clustering weight keys#627
FBumann merged 3 commits intomainfrom
validate-clustering-weight-keys

FBumann commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FBumann commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FBumann commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading