Skip to content

Feature/probe discrete variables#929

Open
BALOGUN-DAVID wants to merge 4 commits into
feature-engine:mainfrom
BALOGUN-DAVID:feature/probe-discrete-variables
Open

Feature/probe discrete variables#929
BALOGUN-DAVID wants to merge 4 commits into
feature-engine:mainfrom
BALOGUN-DAVID:feature/probe-discrete-variables

Conversation

@BALOGUN-DAVID
Copy link
Copy Markdown

Resolves #847.

This PR expands the ProbeFeatureSelection functionality to allow distinguishing between discrete and continuous variables during threshold evaluations.

Proposed Changes

  • Added a variables_discrete parameter to ProbeFeatureSelection (defaulting to None).
  • Modified _get_features_to_drop() so that if variables_discrete is provided, discrete variables are compared strictly against the aggregate importance of generated discrete probes (e.g., binary, discrete_uniform, poisson), while continuous variables are evaluated against continuous probes (e.g., gaussian, uniform).
  • Maintained backwards compatibility; if variables_discrete is None, the old behavior is preserved where all features are compared against all generated probes combined.
  • Added comprehensive unit test coverage to ensure correct partitioning, error handling for mismatched distribution configurations, and backwards compatibility.

How to Test

  • Run pytest tests/test_selection/test_probe_feature_selection.py to confirm all existing and newly added test cases pass successfully.

@BALOGUN-DAVID
Copy link
Copy Markdown
Author

Hi @solegalli! 👋

I've opened this PR to resolve issue #847 by adding support for discrete variables in ProbeFeatureSelection.

The changes allow the selector to dynamically assign thresholds based on variable type: if variables_discrete is provided, discrete features are compared strictly against the aggregate importance of discrete probes, while continuous variables are evaluated against continuous probes. If no discrete variables are specified, the backward-compatible behavior is preserved.

I've included comprehensive tests for this new behavior and ensured all existing tests continue to pass. The code has also been formatted with black to match the repository's style guidelines. Note that there appears to be an unrelated, preexisting mypy failure in datetime_subtraction.py causing the test_type CI check to fail, but the new code type-checks successfully on its own.

Could you please review this when you have a chance? I'm happy to make any necessary adjustments. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expand probe feature selection to compare discrete variables to discrete probes and continuous variables to continuous probes

1 participant