[BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing by arnavk23 · Pull Request #697 · sktime/skpro

arnavk23 · 2026-01-09T21:49:04Z

Reference Issues/PRs

Fixes #678

What does this implement/fix? Explain your changes.

Add _is_index_like() helper function to check for Index objects
Update _get_indexer_like_pandas to handle pd.Index inputs early
Fix _Indexer.__getitem__ to skip MultiIndex special case when elements
are Index-like
Add test for MultiIndex loc indexing with Index objects

- Add _is_index_like() helper function to check for Index objects - Update _get_indexer_like_pandas to handle pd.Index inputs early - Fix _Indexer.__getitem__ to skip MultiIndex special case when elements are Index-like - Add test for MultiIndex loc indexing with Index objects Fixes sktime#678

fkiraly

Looks good to me, FYI @marrov.

Question: why are you changing _show_version? That does not look relevant or related?
Should we split that off? Is there a related issue?

marrov · 2026-01-12T16:03:36Z

Looks good to me, FYI @marrov.

Question: why are you changing _show_version? That does not look relevant or related? Should we split that off? Is there a related issue?

Ah, did not see this one! Given mine has some errors in tests, if this is a go I can close my PR after merging this one.

arnavk23 · 2026-01-12T16:10:33Z

why are you changing _show_version? That does not look relevant or related?

The _show_versions.py is indeed unrelated to #678. Here's what happened:

When I ran the full test suite after implementing the MultiIndex fix, test_deps_info was failing because numba is installed but has an import-time error (incompatible with numpy 2.4). The existing _get_deps_info would return None for packages that fail to import, even if they're installed.

The fix makes _get_deps_info fall back to importlib.metadata to report the installed version when a package can't be imported (or lacks version). This is useful for diagnosing environment issues—you can see that numba 0.61.0 is installed even though it won't import.

fkiraly · 2026-01-14T14:49:58Z

The _show_versions.py is indeed unrelated to #678.

I see - good spot!

Could you kindly move this to a different pull request? Mid-term, I think this should be deduplciated and use the dependencies mechanism rather than importlib. That is, we ought to deduplicate this with the _get_deps_info from sktime and move it to a single location in scikit-base.

arnavk23 · 2026-01-14T20:51:24Z

Could you kindly move this to a different pull request?

@fkiraly I have compiled with your thoughts and will open a pr for _show_version.py in the future. All tests pass.

marrov · 2026-01-15T16:59:41Z

@arnavk23 - Had a look at the PR and LGTM.

Edit: I also tried installing skpro from this branch in the toml of my MC proba forecaster "project" that made me discover this bug (related PR) and it worked perfectly. So yeah, I'd green-lit this for merging @fkiraly.

fkiraly

Yes, looks good!

Although the if/else looks highly redundant, can you please check that?

fkiraly

Looks good to me!

…ktime#697) Fixes sktime#678 - Add `_is_index_like()` helper function to check for `Index` objects - Update `_get_indexer_like_pandas` to handle `pd.Index` inputs early - Fix `_Indexer.__getitem__` to skip `MultiIndex` special case when elements are Index-like - Add test for `MultiIndex` loc indexing with Index objects

#704) #### Reference Issues/PRs Addresses performance issues with EnbpiRegressor found when used with [Monte Carlo recursive probabilistic forecaster](sktime/sktime#9242). This optimization enables practical usage of conformal prediction intervals in large-scale MC forecasting scenarios. #### What does this implement/fix? Explain your changes. This PR updates ~~introduces `FastEnbpiRegressor`, a drop-in replacement for~~ `EnbpiRegressor` that provides ~10-100x speedup through two optimizations ~~1. Vectorized Aggregation in `_predict_proba()`: Replaces the O(n_train × n_test) nested loop with chunked vectorized numpy operations that process predictions in batches. This avoids the quadratic complexity of the original implementation.~~ ~~2. Optimized `_FastEmpiricalEnbpi` Distribution Class: A specialized `Empirical` subclass that:~~ - Skips expensive `_init_sorted()` initialization (~10x faster) - Implements custom `_sample()` using direct numpy indexing instead of pandas MultiIndex operations (~1000x faster sampling) - Stores raw prediction and error arrays for efficient resampling Edit: optimizations mentioned above (strikethrough) still apply but are now used on `EnbpiRegressor` and `Empirical` directly. **Performance Metrics:** - Single batch prediction: 1.2-1.5x faster - Sampling 100 from distribution: 2500x faster - Small MC workflow (10 steps, 100 samples each): 22x faster (9.9s → 0.45s) - Realistic-scale MC forecasting: ~100x faster (10+ minutes → 5 seconds) #### Does your contribution introduce a new dependency? No. The implementation uses only existing dependencies: NumPy, Pandas, and skpro's existing classes. #### What should a reviewer concentrate their feedback on? - ~~Correctness of numerical results: Verify that `_FastEmpiricalEnbpi` produces statistically equivalent samples and statistics as standard `Empirical`~~ - Appropriateness of skipping `_init_sorted()`: Confirm that quantiles computed on-demand from raw arrays are accurate - Custom `_sample()` implementation: Review the NumPy indexing logic for correctness across different batch and sample sizes - ~~Generalizability: he optimizations in the `_FastEmpiricalEnbpi` class could potentially benefit other estimators creating large empirical distributions (see "Any other comments" section)~~ #### Did you add any tests for the change? No, but I should. Tested with throw-away scripts for this draft. If ok-ed will produce tests. #### Any other comments? ~~The optimizations in `_FastEmpiricalEnbpi` are not EnbPI-specific. The performance findings suggest that:~~ - ~~Skipping `_init_sorted()` benefits any large Empirical distribution~~ - ~~Custom sampling using raw arrays is a general pattern applicable to distributions where parameters can be efficiently stored~~ ~~A follow-up enhancement (separate PR) could add a `skip_init_sorted` parameter to the base `Empirical` class or create a general `FastEmpirical` class in the distributions module to benefit other estimators.~~ ~~Note: the base branch here includes my fix for `quantile()` in the base distribution as I was unable to test without it. When #697 is merged, I will discard those changes and use main as the base branch.~~ Edit: base branch fixed and optimizations are now applied to original classes.

arnavk23 requested review from SaiRevanth25, felipeangelimvieira and fkiraly as code owners January 9, 2026 21:49

arnavk23 added 4 commits January 10, 2026 03:23

Fix pydocstyle error: remove blank line after sample() docstring

b8d2285

Robust deps info: fall back to metadata version when import fails

752b96e

fixes issue with MultiIndex quantile indexing in loc

4ccf94b

isort

492c46b

fkiraly requested changes Jan 12, 2026

View reviewed changes

arnavk23 requested a review from fkiraly January 12, 2026 16:16

Update test_proba_basic.py

65a4505

marrov mentioned this pull request Jan 14, 2026

Fix quantile() in BaseDistribution #701

Closed

Update _show_versions.py

956e85a

arnavk23 added 2 commits January 15, 2026 03:14

Merge branch 'sktime:main' into fix/issue-678-multiindex-quantile

85e2090

Update test_proba_basic.py

622f55c

marrov mentioned this pull request Jan 16, 2026

[ENH] Optimize EnbpiRegressor through faster sampling in Empirical #704

Merged

fkiraly reviewed Jan 17, 2026

View reviewed changes

Comment thread skpro/distributions/base/_base.py Outdated

fkiraly requested changes Jan 17, 2026

View reviewed changes

arnavk23 added 4 commits January 18, 2026 02:24

refactored base.py

24ed1f5

compiling with checks

b7881c8

Merge branch 'sktime:main' into fix/issue-678-multiindex-quantile

9f9cc2e

black

46283ae

arnavk23 requested a review from fkiraly January 17, 2026 21:16

fkiraly added bug module:probability&simulation probability distributions and simulators labels Jan 18, 2026

fkiraly changed the title ~~[ENH] Fix MultiIndex quantile bug in BaseDistribution.loc indexing~~ [BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing Jan 18, 2026

fkiraly approved these changes Jan 18, 2026

View reviewed changes

fkiraly merged commit c1d5c56 into sktime:main Jan 18, 2026
38 checks passed

arnavk23 deleted the fix/issue-678-multiindex-quantile branch January 18, 2026 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing#697

[BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing#697
fkiraly merged 13 commits into
sktime:mainfrom
arnavk23:fix/issue-678-multiindex-quantile

arnavk23 commented Jan 9, 2026 •

edited by fkiraly

Loading

Uh oh!

fkiraly left a comment •

edited

Loading

Uh oh!

marrov commented Jan 12, 2026

Uh oh!

arnavk23 commented Jan 12, 2026 •

edited

Loading

Uh oh!

fkiraly commented Jan 14, 2026

Uh oh!

arnavk23 commented Jan 14, 2026 •

edited

Loading

Uh oh!

marrov commented Jan 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

fkiraly left a comment

Uh oh!

fkiraly left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arnavk23 commented Jan 9, 2026 • edited by fkiraly Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marrov commented Jan 12, 2026

Uh oh!

arnavk23 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jan 14, 2026

Uh oh!

arnavk23 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marrov commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arnavk23 commented Jan 9, 2026 •

edited by fkiraly

Loading

fkiraly left a comment •

edited

Loading

arnavk23 commented Jan 12, 2026 •

edited

Loading

arnavk23 commented Jan 14, 2026 •

edited

Loading

marrov commented Jan 15, 2026 •

edited

Loading