fix: validate user-provided clustering weight keys#627
Conversation
Raise ValueError when ClusterConfig.weights references variables that don't exist in the FlowSystem, catching typos early with a clear message. Weights for constant (but valid) variables are still silently filtered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded validation in TransformAccessor.cluster() to verify that any non-None Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_clustering/test_integration.py (1)
399-405:⚠️ Potential issue | 🟠 MajorThis test never exercises the dropped-constant-column path.
Line 400 calls
transform.clustering_data(), which already strips constant arrays, soweightsis built only from varying columns. The test will still pass even if constant-column filtering incluster()breaks. Pull the constant time-series name fromfs.to_dataset(include_solution=False)(or add the known constant profile key explicitly) before invokingcluster().Suggested change
- # The constant column name (find it from clustering_data) - all_data = fs.transform.clustering_data() - all_columns = set(all_data.data_vars) + full_data = fs.to_dataset(include_solution=False) + varying_columns = set(fs.transform.clustering_data().data_vars) + constant_columns = { + name for name, da in full_data.data_vars.items() if 'time' in da.dims and name not in varying_columns + } + assert constant_columns, 'test setup failed: expected at least one constant time series' # Build weights that reference ALL columns including the constant one # that will be dropped — these are valid variables, just constant over time - weights = {col: 1.0 for col in all_columns} + weights = {col: 1.0 for col in varying_columns | constant_columns}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_clustering/test_integration.py` around lines 399 - 405, The test builds weights from all columns using all_data = fs.transform.clustering_data(), but clustering_data() already strips constant arrays so the test never verifies the code path where cluster() must drop constant columns; instead, obtain the constant time-series key from fs.to_dataset(include_solution=False) (or add the known constant profile key explicitly) before constructing weights, then pass that full-keys weights dict into cluster() so the constant-column-dropping logic is exercised (update where weights is created and ensure cluster() is called with that weights mapping).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flixopt/transform_accessor.py`:
- Around line 1701-1709: The weight validation currently compares
cluster.weights against the full dataset (ds.data_vars) which lets weights for
excluded variables slip through; update the check to validate cluster.weights
against the actual selected clustering input allow-list immediately after the
data_vars selection and before drop_constant_arrays() is called so that constant
columns and excluded variables are handled correctly; specifically, compute the
selected input set (the allow-list used to build available_columns / clustering
inputs) and use that set to detect unknown weights (referencing cluster.weights,
data_vars selection, available_columns, and drop_constant_arrays()) and raise
the same ValueError if any weight keys are not in that selected input set.
---
Outside diff comments:
In `@tests/test_clustering/test_integration.py`:
- Around line 399-405: The test builds weights from all columns using all_data =
fs.transform.clustering_data(), but clustering_data() already strips constant
arrays so the test never verifies the code path where cluster() must drop
constant columns; instead, obtain the constant time-series key from
fs.to_dataset(include_solution=False) (or add the known constant profile key
explicitly) before constructing weights, then pass that full-keys weights dict
into cluster() so the constant-column-dropping logic is exercised (update where
weights is created and ensure cluster() is called with that weights mapping).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87f28107-3d5e-4881-82fd-6d7e5691c52d
📒 Files selected for processing (2)
flixopt/transform_accessor.pytests/test_clustering/test_integration.py
… dataset - Validate against ds_for_clustering (after data_vars selection) instead of ds (full dataset), so weights for excluded variables are caught - Move validation before drop_constant_arrays so constant columns remain accepted as valid keys - Fix test to use to_dataset() instead of clustering_data() so the constant-column path is actually exercised Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_clustering/test_integration.py (2)
338-361: Please cover thedata_varsallow-list directly.These additions cover typo keys and the happy path, but they still do not exercise the case where a weight key exists in the
FlowSystemand should fail only becausedata_varsexcluded it. One targeted regression for that path would lock in the behavior this change is primarily relying on.Also applies to: 418-497
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_clustering/test_integration.py` around lines 338 - 361, The test currently checks unknown weight keys but doesn't exercise the case where a weight key exists on the FlowSystem but is excluded by the clustering_data.data_vars allow-list; add a new assertion (or modify test_unknown_weight_keys_raise) to build a weights dict that includes a variable name that is present on basic_flow_system but intentionally omitted from clustering_data.data_vars, then call basic_flow_system.transform.cluster(...) with ClusterConfig(weights=weights) and assert it raises ValueError (match 'unknown variables' or similar); to locate code update, reference the test function test_unknown_weight_keys_raise, the clustering_data = basic_flow_system.transform.clustering_data() call and the ClusterConfig class so the test explicitly validates the data_vars allow-list enforcement.
399-405: Assert the known constant column explicitly.
all_columns > clustering_columnscan already be true becauseto_dataset()contains other non-time-varying data vars, so this can pass without proving that the constant profile is the column being filtered.🧪 Tighten the assertion
all_data = fs.to_dataset(include_solution=False) all_columns = set(all_data.data_vars) clustering_columns = set(fs.transform.clustering_data().data_vars) - assert all_columns > clustering_columns, 'Test requires at least one constant column to be meaningful' + constant_column = 'constant_load(constant_out)|fixed_relative_profile' + assert constant_column in all_columns + assert constant_column not in clustering_columns🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_clustering/test_integration.py` around lines 399 - 405, Instead of the loose set check, explicitly assert the constant column is being filtered: find the constant column(s) from all_data (e.g., by checking which data_vars have only a single unique value across time) and then assert at least one such constant column exists in all_columns and that each of those constant column names is not in clustering_columns; reference all_data, all_columns, clustering_columns and the transform.clustering_data() path when making these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_clustering/test_integration.py`:
- Around line 338-361: The test currently checks unknown weight keys but doesn't
exercise the case where a weight key exists on the FlowSystem but is excluded by
the clustering_data.data_vars allow-list; add a new assertion (or modify
test_unknown_weight_keys_raise) to build a weights dict that includes a variable
name that is present on basic_flow_system but intentionally omitted from
clustering_data.data_vars, then call basic_flow_system.transform.cluster(...)
with ClusterConfig(weights=weights) and assert it raises ValueError (match
'unknown variables' or similar); to locate code update, reference the test
function test_unknown_weight_keys_raise, the clustering_data =
basic_flow_system.transform.clustering_data() call and the ClusterConfig class
so the test explicitly validates the data_vars allow-list enforcement.
- Around line 399-405: Instead of the loose set check, explicitly assert the
constant column is being filtered: find the constant column(s) from all_data
(e.g., by checking which data_vars have only a single unique value across time)
and then assert at least one such constant column exists in all_columns and that
each of those constant column names is not in clustering_columns; reference
all_data, all_columns, clustering_columns and the transform.clustering_data()
path when making these assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dfae4c89-a44e-43d4-8212-84201f13b394
📒 Files selected for processing (2)
flixopt/transform_accessor.pytests/test_clustering/test_integration.py
- New test_weight_keys_excluded_by_data_vars_raise: verifies that weight keys present on the FlowSystem but excluded by data_vars are rejected - Replace loose set check with explicit constant-column identification and per-column assertions in test_extra_weight_keys_filtered_with_constant_column Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
ValueErrorwhenClusterConfig.weightsreferences variables that don't exist in the FlowSystem, catching typos earlyTest plan
test_unknown_weight_keys_raise— bogus keys raiseValueErrortest_extra_weight_keys_filtered_with_constant_column— constant columns silently filteredtest_unknown_weight_keys_raise_multiperiod— bogus key in multi-period raisestest_valid_weight_keys_multiperiod— valid keys in multi-period work🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests