Skip to content

[BUG] Duplicated _random_ss_ix helper function across bootstrap.py, enbpi.py, and _bagging.py #1002

@krsatyamthakur-droid

Description

@krsatyamthakur-droid

Describe the issue

The internal helper function _random_ss_ix (randomly uniformly sample indices from a list of indices) is duplicated across three files in the regression module:

  1. skpro/regression/bootstrap.py (lines 208–215)
  2. skpro/regression/enbpi.py (lines 343–350)
  3. skpro/regression/ensemble/_bagging.py (lines 253–257)

The bootstrap.py and enbpi.py versions are identical (with a random_state parameter), while the _bagging.py version is a simplified variant (no random_state parameter, uses np.random.choice directly on the global state).

This duplication increases maintenance burden — any bug fix or enhancement must be applied in all three places.

Suggested fix

Extract a single canonical _random_ss_ix function into skpro/utils/sampling.py (or another appropriate shared utility module), and update all three files to import from there.

The consolidated version should support the random_state parameter (as in bootstrap.py / enbpi.py), making the _bagging.py version also properly support local random state.

Files affected

  • skpro/regression/bootstrap.py — delete local _random_ss_ix, import from utils
  • skpro/regression/enbpi.py — delete local _random_ss_ix, import from utils
  • skpro/regression/ensemble/_bagging.py — delete local _random_ss_ix, import from utils
  • skpro/utils/sampling.py[NEW] shared utility module

Expected impact

  • Reduced code duplication
  • Easier maintenance — single point of truth for the sampling logic
  • Consistent random state handling across all three estimators

I am happy to submit a PR for this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions