Skip to content

fix: filter user-provided clustering weights to available columns#625

Merged
FBumann merged 2 commits intomainfrom
fix/filter-user-clustering-weights
Mar 23, 2026
Merged

fix: filter user-provided clustering weights to available columns#625
FBumann merged 2 commits intomainfrom
fix/filter-user-clustering-weights

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Mar 23, 2026

Summary

  • When users pass a ClusterConfig with explicit weights to cluster(), those weights may reference time series that were removed before clustering (e.g., constant arrays dropped by drop_constant_arrays)
  • tsam raises errors when weights contain keys not present in the DataFrame
  • Now both user-provided and auto-calculated weights are filtered through the same available_columns parameter in _build_cluster_config_with_weights

Test plan

  • All 245 existing clustering tests pass
  • Manual test: pass ClusterConfig(weights={...}) with extra keys to cluster() on a multi-period system

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Clustering now tolerates weight settings that reference missing or dropped columns—weights are automatically ignored for unavailable features so clustering runs reliably.
  • Tests
    • Added integration tests covering single-period, dropped-column, and multi-period clustering to ensure stable behavior and correct cluster counts.

When users pass a ClusterConfig with explicit weights, those weights
may reference time series that were removed (e.g., constant arrays
dropped before clustering). tsam raises errors on extra weights.
Now both user-provided and auto-calculated weights are filtered to
only include columns present in the clustering DataFrame.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e7a72982-0844-44f1-a98d-f024514cc8df

📥 Commits

Reviewing files that changed from the base of the PR and between dc5a220 and 445a478.

📒 Files selected for processing (1)
  • tests/test_clustering/test_integration.py

📝 Walkthrough

Walkthrough

Refactors clustering weight resolution by adding an optional available_columns parameter to _build_cluster_config_with_weights(), so resolved weights (user-provided or auto) are filtered inside the helper. Removes local filtered_weights from cluster() and centralizes filtering logic.

Changes

Cohort / File(s) Summary
Weight filtering centralization
flixopt/transform_accessor.py
Added optional available_columns parameter to _build_cluster_config_with_weights(); resolve weights by priority (user cluster.weights or auto_weights), then filter to available columns and apply consistently for cluster present or absent. Removed local filtered_weights construction from cluster().
Clustering integration tests
tests/test_clustering/test_integration.py
Added three integration tests (TestClusterAdvancedOptions) ensuring ClusterConfig(weights=...) tolerates keys absent from clustering input across single-period, constant-column, and multi-period scenarios; asserts cluster counts and period-dimension behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
I hopped through code with careful paws,
Gathered weights and trimmed the claws,
Now columns pruned in one small nook,
No stray keys slipping past the hook,
A playful refactor—done! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: filtering user-provided clustering weights to match available columns, which is the core fix addressed in this PR.
Description check ✅ Passed The description covers the problem, solution, and test status, but is missing key template sections like 'Type of Change' and 'Related Issues'.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/filter-user-clustering-weights

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
flixopt/transform_accessor.py (1)

1776-1780: Please add a regression test for extra user weight keys.

This call path is the right place for the fix; adding one automated test for ClusterConfig(weights=...) containing keys absent from df_for_clustering.columns (especially with constant-array dropping and multi-slice clustering) would lock the behavior in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1776 - 1780, Add a regression
test that ensures extra keys in ClusterConfig(weights=...) that are not present
in df_for_clustering.columns are ignored/filtered out when building
cluster_config via _calculate_clustering_weights and
_build_cluster_config_with_weights; specifically construct a test that passes a
ClusterConfig with extra weight keys, exercise the same call path around
df_for_clustering used in transform_accessor (including a case where some
columns are constant and dropped, and a multi-slice clustering scenario) and
assert the resulting cluster_config only contains weights for
available/non-constant columns and that no exception is raised for unknown keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@flixopt/transform_accessor.py`:
- Around line 1776-1780: Add a regression test that ensures extra keys in
ClusterConfig(weights=...) that are not present in df_for_clustering.columns are
ignored/filtered out when building cluster_config via
_calculate_clustering_weights and _build_cluster_config_with_weights;
specifically construct a test that passes a ClusterConfig with extra weight
keys, exercise the same call path around df_for_clustering used in
transform_accessor (including a case where some columns are constant and
dropped, and a multi-slice clustering scenario) and assert the resulting
cluster_config only contains weights for available/non-constant columns and that
no exception is raised for unknown keys.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5830606-0044-4d77-b928-99e07c45e14b

📥 Commits

Reviewing files that changed from the base of the PR and between 1615ee8 and dc5a220.

📒 Files selected for processing (1)
  • flixopt/transform_accessor.py

Three tests covering extra keys in ClusterConfig.weights being filtered:
- Extra keys not in FlowSystem
- Keys for constant (dropped) columns
- Multi-period clustering with extra keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@FBumann FBumann merged commit ae6b099 into main Mar 23, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant