Skip to content

[ENH] _pmf_support method for BaseDistribution returning inspectable mass support#711

Merged
fkiraly merged 23 commits into
sktime:mainfrom
arnavk23:fix/issue-416-binomial-plot
Feb 15, 2026
Merged

[ENH] _pmf_support method for BaseDistribution returning inspectable mass support#711
fkiraly merged 23 commits into
sktime:mainfrom
arnavk23:fix/issue-416-binomial-plot

Conversation

@arnavk23
Copy link
Copy Markdown
Contributor

@arnavk23 arnavk23 commented Jan 24, 2026

Reference Issues/PRs

Related to #416 (discrete PMF plotting improvements). This PR provides the foundational infrastructure that enables more robust discrete distribution plotting in follow-up PRs.

What does this implement/fix? Explain your changes.

This PR introduces a new _pmf_support(lower, upper, max_points) method to the BaseDistribution class that provides a standardized, extensible way to obtain support points for discrete distributions within specified bounds.

Key Changes:

  • Base Implementation (skpro/distributions/base/_base.py): Added _pmf_support method that returns empty array for continuous distributions and non-negative integers for discrete distributions by default
  • Empirical Distribution Override (skpro/distributions/empirical.py): Custom implementation that returns actual empirical support points within bounds
  • Delta Distribution Override (skpro/distributions/delta.py): Implementation that returns point mass locations within bounds
  • Comprehensive Tests (skpro/distributions/tests/test_proba_basic.py): Added test_pmf_support_method() covering all distribution types

This addresses the need for extensible support point detection across different discrete distribution types, replacing hardcoded assumptions with a clean, overridable API.

Does your contribution introduce a new dependency? If yes, which one?

No, this change introduces no new dependencies.

What should a reviewer concentrate their feedback on?

  • API design and method signature consistency
  • Extensibility for future distribution types
  • Test coverage completeness
  • Performance implications for large empirical distributions
  • Backward compatibility

Did you add any tests for the change?

Yes, I added comprehensive tests in test_support_method() that cover:

  • Continuous distributions (should return empty arrays)
  • Discrete distributions with default integer support
  • Empirical distributions with custom support points
  • Delta distributions with point masses
  • Boundary condition handling

Any other comments?

This PR establishes the core infrastructure for proper discrete distribution support handling. It serves as a foundation for improved plotting capabilities (as demonstrated in related PRs) and future enhancements like full support API implementation (issue #244). The design prioritizes extensibility while maintaining backward compatibility.

PR checklist

For all contributions

  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured dependency isolation, see the estimator dependencies guide.

- Modified _plot_single() in BaseDistribution to handle discrete distributions
- For discrete PMF plots, extract support from scipy distribution and evaluate at integer points
- Use stem plots instead of line plots for discrete PMF visualization
- Handle infinite support bounds (e.g., Poisson) with reasonable limits
- Maintain backward compatibility for continuous distributions

This resolves the issue where Binomial and other discrete distribution PMFs
were plotted as continuous curves instead of proper discrete mass functions.
- Added test_discrete_pmf_plotting() to verify that discrete distributions
  use stem plots for PMF visualization
- Ensures the fix for issue sktime#416 doesn't regress
@arnavk23 arnavk23 changed the title Fix/issue 416 binomial plot [BUG] Fix incorrect PMF plotting Jan 24, 2026
- Replace bare except Exception with specific exception types
- Shorten long docstring line to comply with flake8 E501
- Maintain black formatting
@fkiraly fkiraly added enhancement module:probability&simulation probability distributions and simulators labels Jan 24, 2026
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, good idea!

Can we try to avoid the try/except though?

@arnavk23 arnavk23 requested a review from fkiraly January 25, 2026 18:39
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think this is opening a can of worms though... the plotting assumes that the support is at integers, right? Which in general need not be the case, e.g., for Empirical.

To deal with this programmatically, we would need to be able to inspect each discrete - and also, potentially, mixed - distribution for the point masses.

Currently, this is not part of the API - see #244 for the issue to add this.

For now, I do think this PR is an improvement above the status quo, but I wonder whether we can do something quick and intermediate? E.g., a private function _support(lower, upper) that generates all discrete points between lower and upper?

- Add _support(lower, upper, max_points) method to BaseDistribution that returns support points within bounds
- Default implementation assumes integer support (for distributions like Binomial)
- Override _support in Empirical distribution to return actual empirical support points
- Update _plot_single to use _support method for discrete PMF plotting instead of hardcoded integer range
- This allows distributions with non-integer support (like Empirical) to be plotted correctly
- Override _support method in Delta distribution to return the actual support point(s) c
- Handles both scalar and array cases, filtering points within the given bounds
- Ensures Delta distributions with non-integer support points are plotted correctly
- Remove unnecessary blank lines in Delta and Empirical _support methods
- Maintain consistent code formatting
@arnavk23 arnavk23 requested a review from fkiraly January 27, 2026 11:55
Comment thread skpro/distributions/base/_base.py Outdated
Comment thread skpro/distributions/empirical.py Outdated
Comment thread skpro/distributions/tests/test_proba_basic.py
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional thoughts about _support, I think this is great.

May I suggest to split this PR into two, one with _support and specific tests for it, and one about plotting, which stacks on it? I anticipate some discussion about _support.

Main questions about this PR as-is:

  • how do we handle the issue that each entry can have different support, in a 2D shaped distribution, e.g., for Empirical or Delta? This feels a bit tricky.
  • why are you deleting some of the existing tests?

@arnavk23
Copy link
Copy Markdown
Contributor Author

arnavk23 commented Jan 28, 2026

Thanks for the additional thoughts about _support, I think this is great.

Here - Q1: 2D Support Handling: The _support method is called on scalar distributions (after [iloc[i,j] extracts each entry), not on the full 2D distribution. This means each entry gets its own support calculation.
For distributions with uniform support across entries (like [Binomial]), the base implementation works fine since all entries share the same support characteristics.
For distributions with entry-specific support (like Empirical or Delta), subclasses override _support to compute the appropriate support points for that specific entry based on its parameters.

In new pr (title - Incorrect pmf plotting) - Q2: Test Changes: The original test only checked that a stem plot was created, but didn't verify it contained actual data points (Added it back). The updated version checked for multiple support points to ensure the plotting worked correctly.

@arnavk23 arnavk23 force-pushed the fix/issue-416-binomial-plot branch from 17010db to 77d7ac3 Compare January 28, 2026 04:17
…ibution support detection

- Add _support(lower, upper, max_points) method to BaseDistribution
- Returns empty array for continuous distributions
- Returns non-negative integers for discrete distributions by default
- Override in Empirical to return actual empirical support points within bounds
- Override in Delta to return point mass locations within bounds
- Add comprehensive test for _support method across distribution types
@arnavk23 arnavk23 changed the title [BUG] Fix incorrect PMF plotting [BUG] Add _sopprt method to BaseDistribution Jan 28, 2026
@arnavk23 arnavk23 changed the title [BUG] Add _sopprt method to BaseDistribution [BUG] Add _support method to BaseDistribution Jan 28, 2026
@arnavk23 arnavk23 requested a review from fkiraly January 28, 2026 11:31
@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jan 28, 2026

The _support method is called on scalar distributions (after [iloc[i,j] extracts each entry), not on the full 2D distribution. This means each entry gets its own support calculation.

Ah, I see! So it can be called only on 0D-array (i.e., scalar) distributions, right?

Although where all entries share the same support, it is a bit unfortunate that it cannot be called from the 2D one. Not sure how to solve this. Perhaps distinguish potential (class level) and actual (instance & entry level)? Anyway, that might be best discussed in a design issue and is not blocking for this PR.

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Could you kindly expand in the docstring that the _support (a) applies to pmf only, and (b) scalar distributions only?

@arnavk23 arnavk23 requested a review from fkiraly January 29, 2026 12:12
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Small change - to make the purpose clearer, could we rename to _pmf_support?

@arnavk23 arnavk23 requested a review from fkiraly February 2, 2026 14:59
@arnavk23 arnavk23 changed the title [BUG] Add _support method to BaseDistribution [BUG] Add _pmf_support method to BaseDistribution Feb 2, 2026
@fkiraly fkiraly changed the title [BUG] Add _pmf_support method to BaseDistribution [ENH] _pmf_support method for BaseDistribution returning inspectable mass support Feb 5, 2026
Comment thread skpro/distributions/delta.py
Comment thread skpro/distributions/empirical.py Outdated
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I noticed that some implementations still deal with the 2D case - can you explain why this is there, if - according to your previous statements - the _pmf_support method is only for the 0D case.

@arnavk23 arnavk23 requested a review from fkiraly February 6, 2026 16:56
@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 6, 2026

can you explain why the 2D code was there and how you want to resolve it?

@arnavk23
Copy link
Copy Markdown
Contributor Author

can you explain why the 2D code was there and how you want to resolve it?

The 2D logic in _pmf_support handled multi-dimensional distributions by aggregating support points across all elements. This contradicted the base class, which defines _pmf_support as 0D-only. I enforced this by raising NotImplementedError for ndim > 0 and removed redundant 2D handling. The code is now aligned with the docs, simpler, and all tests pass.

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks!

@fkiraly fkiraly merged commit 3107046 into sktime:main Feb 15, 2026
39 checks passed
@fkiraly fkiraly mentioned this pull request Feb 15, 2026
4 tasks
@arnavk23 arnavk23 deleted the fix/issue-416-binomial-plot branch February 15, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:probability&simulation probability distributions and simulators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants