Skip to content

[BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing#697

Merged
fkiraly merged 13 commits into
sktime:mainfrom
arnavk23:fix/issue-678-multiindex-quantile
Jan 18, 2026
Merged

[BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing#697
fkiraly merged 13 commits into
sktime:mainfrom
arnavk23:fix/issue-678-multiindex-quantile

Conversation

@arnavk23
Copy link
Copy Markdown
Contributor

@arnavk23 arnavk23 commented Jan 9, 2026

Reference Issues/PRs

Fixes #678

What does this implement/fix? Explain your changes.

  • Add _is_index_like() helper function to check for Index objects
  • Update _get_indexer_like_pandas to handle pd.Index inputs early
  • Fix _Indexer.__getitem__ to skip MultiIndex special case when elements
    are Index-like
  • Add test for MultiIndex loc indexing with Index objects

- Add _is_index_like() helper function to check for Index objects
- Update _get_indexer_like_pandas to handle pd.Index inputs early
- Fix _Indexer.__getitem__ to skip MultiIndex special case when elements are Index-like
- Add test for MultiIndex loc indexing with Index objects

Fixes sktime#678
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, FYI @marrov.

Question: why are you changing _show_version? That does not look relevant or related?
Should we split that off? Is there a related issue?

@marrov
Copy link
Copy Markdown
Member

marrov commented Jan 12, 2026

Looks good to me, FYI @marrov.

Question: why are you changing _show_version? That does not look relevant or related? Should we split that off? Is there a related issue?

Ah, did not see this one! Given mine has some errors in tests, if this is a go I can close my PR after merging this one.

@arnavk23
Copy link
Copy Markdown
Contributor Author

arnavk23 commented Jan 12, 2026

why are you changing _show_version? That does not look relevant or related?

The _show_versions.py is indeed unrelated to #678. Here's what happened:

When I ran the full test suite after implementing the MultiIndex fix, test_deps_info was failing because numba is installed but has an import-time error (incompatible with numpy 2.4). The existing _get_deps_info would return None for packages that fail to import, even if they're installed.

The fix makes _get_deps_info fall back to importlib.metadata to report the installed version when a package can't be imported (or lacks version). This is useful for diagnosing environment issues—you can see that numba 0.61.0 is installed even though it won't import.

@arnavk23 arnavk23 requested a review from fkiraly January 12, 2026 16:16
@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jan 14, 2026

The _show_versions.py is indeed unrelated to #678.

I see - good spot!

Could you kindly move this to a different pull request? Mid-term, I think this should be deduplciated and use the dependencies mechanism rather than importlib. That is, we ought to deduplicate this with the _get_deps_info from sktime and move it to a single location in scikit-base.

@arnavk23
Copy link
Copy Markdown
Contributor Author

arnavk23 commented Jan 14, 2026

Could you kindly move this to a different pull request?

@fkiraly I have compiled with your thoughts and will open a pr for _show_version.py in the future. All tests pass.

@marrov
Copy link
Copy Markdown
Member

marrov commented Jan 15, 2026

@arnavk23 - Had a look at the PR and LGTM.

Edit: I also tried installing skpro from this branch in the toml of my MC proba forecaster "project" that made me discover this bug (related PR) and it worked perfectly. So yeah, I'd green-lit this for merging @fkiraly.

Comment thread skpro/distributions/base/_base.py Outdated
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good!

Although the if/else looks highly redundant, can you please check that?

@arnavk23 arnavk23 requested a review from fkiraly January 17, 2026 21:16
@fkiraly fkiraly added bug module:probability&simulation probability distributions and simulators labels Jan 18, 2026
@fkiraly fkiraly changed the title [ENH] Fix MultiIndex quantile bug in BaseDistribution.loc indexing [BUG] Fix MultiIndex quantile bug in BaseDistribution.loc indexing Jan 18, 2026
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@fkiraly fkiraly merged commit c1d5c56 into sktime:main Jan 18, 2026
38 checks passed
@arnavk23 arnavk23 deleted the fix/issue-678-multiindex-quantile branch January 18, 2026 20:51
marrov pushed a commit to marrov/skpro that referenced this pull request Jan 19, 2026
…ktime#697)

Fixes sktime#678

- Add `_is_index_like()` helper function to check for `Index` objects
- Update `_get_indexer_like_pandas` to handle `pd.Index` inputs early
- Fix `_Indexer.__getitem__` to skip `MultiIndex` special case when elements
are Index-like
- Add test for `MultiIndex` loc indexing with Index objects
fkiraly pushed a commit that referenced this pull request Jan 23, 2026
#704)

#### Reference Issues/PRs

Addresses performance issues with EnbpiRegressor found when used with
[Monte Carlo recursive probabilistic
forecaster](sktime/sktime#9242). This
optimization enables practical usage of conformal prediction intervals
in large-scale MC forecasting scenarios.

#### What does this implement/fix? Explain your changes.

This PR updates ~~introduces `FastEnbpiRegressor`, a drop-in replacement
for~~ `EnbpiRegressor` that provides ~10-100x speedup through two
optimizations

~~1. Vectorized Aggregation in `_predict_proba()`: Replaces the
O(n_train × n_test) nested loop with chunked vectorized numpy operations
that process predictions in batches. This avoids the quadratic
complexity of the original implementation.~~

~~2. Optimized `_FastEmpiricalEnbpi` Distribution Class: A specialized
`Empirical` subclass that:~~
   - Skips expensive `_init_sorted()` initialization (~10x faster)
- Implements custom `_sample()` using direct numpy indexing instead of
pandas MultiIndex operations (~1000x faster sampling)
   - Stores raw prediction and error arrays for efficient resampling

Edit: optimizations mentioned above (strikethrough) still apply but are
now used on `EnbpiRegressor` and `Empirical` directly.

**Performance Metrics:**
- Single batch prediction: 1.2-1.5x faster
- Sampling 100 from distribution: 2500x faster
- Small MC workflow (10 steps, 100 samples each): 22x faster (9.9s →
0.45s)
- Realistic-scale MC forecasting: ~100x faster (10+ minutes → 5 seconds)

#### Does your contribution introduce a new dependency?

No. The implementation uses only existing dependencies: NumPy, Pandas,
and skpro's existing classes.

#### What should a reviewer concentrate their feedback on?

- ~~Correctness of numerical results: Verify that `_FastEmpiricalEnbpi`
produces statistically equivalent samples and statistics as standard
`Empirical`~~
- Appropriateness of skipping `_init_sorted()`: Confirm that quantiles
computed on-demand from raw arrays are accurate
- Custom `_sample()` implementation: Review the NumPy indexing logic for
correctness across different batch and sample sizes
- ~~Generalizability: he optimizations in the `_FastEmpiricalEnbpi`
class could potentially benefit other estimators creating large
empirical distributions (see "Any other comments" section)~~

#### Did you add any tests for the change?

No, but I should. Tested with throw-away scripts for this draft. If
ok-ed will produce tests.

#### Any other comments?

~~The optimizations in `_FastEmpiricalEnbpi` are not EnbPI-specific. The
performance findings suggest that:~~
- ~~Skipping `_init_sorted()` benefits any large Empirical
distribution~~
- ~~Custom sampling using raw arrays is a general pattern applicable to
distributions where parameters can be efficiently stored~~

~~A follow-up enhancement (separate PR) could add a `skip_init_sorted`
parameter to the base `Empirical` class or create a general
`FastEmpirical` class in the distributions module to benefit other
estimators.~~

~~Note: the base branch here includes my fix for `quantile()` in the
base distribution as I was unable to test without it. When #697 is
merged, I will discard those changes and use main as the base branch.~~

Edit: base branch fixed and optimizations are now applied to original
classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug module:probability&simulation probability distributions and simulators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] quantile method of BaseDistribution fails for multi-index inputs

3 participants