Skip to content

[ENH] Add explicit energy computations for multiple distributions#688

Merged
fkiraly merged 12 commits into
sktime:mainfrom
arnavk23:fix/issue-267-clean
Dec 21, 2025
Merged

[ENH] Add explicit energy computations for multiple distributions#688
fkiraly merged 12 commits into
sktime:mainfrom
arnavk23:fix/issue-267-clean

Conversation

@arnavk23
Copy link
Copy Markdown
Contributor

@arnavk23 arnavk23 commented Dec 19, 2025

Reference Issues/PRs

Towards #267

What does this implement/fix? Explain your changes.

  • Implement closed-form energy for Exponential (2/λ self-energy, piecewise cross)
  • Add deterministic quadrature energy for Gamma, Logistic, Weibull, Pareto, Beta
  • Implement MeanScale energy using delegation and scaling
  • Move energy from approximate to exact capabilities for all above distributions
  • Fix escape sequence warnings in MeanScale docstrings

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

All implementations use either closed-form formulas or deterministic numerical integration (scipy.integrate.quad) instead of Monte Carlo approximation.

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

- Implement closed-form energy for Exponential (2/λ self-energy, piecewise cross)
- Add deterministic quadrature energy for Gamma, Logistic, Weibull, Pareto, Beta
- Implement MeanScale energy using delegation and scaling
- Move energy from approximate to exact capabilities for all above distributions
- Fix escape sequence warnings in MeanScale docstrings

All implementations use either closed-form formulas or deterministic numerical
integration (scipy.integrate.quad) instead of Monte Carlo approximation.

Fixes sktime#267
@arnavk23 arnavk23 changed the title ENH: Add explicit energy computations for multiple distributions [ENH] Add explicit energy computations for multiple distributions Dec 19, 2025
- Add pydocstyle-compliant Examples sections to 7 distributions showing exact energy computations
- Exponential: Closed-form self-energy and cross-energy formulas (E|X-Y| = 2/λ)
- Beta, Gamma, Weibull, Pareto, Logistic: Deterministic quadrature-based energy
- MeanScale: Energy delegation with scaling formula
- Update distributions API reference with "Energy computations" section
- Document shift from Monte Carlo approximation to exact/deterministic methods
- Fixes sktime#267
- Fix pydocstyle D202: Remove blank lines after docstrings (Exponential, Gamma, Logistic)
- Fix pydocstyle D209: Move closing quotes to separate line (Logistic, Weibull)
- Fix flake8 E501: Break long lines in docstrings and energy implementations
- Add noqa: E731 comments for lambda assignments in energy callbacks
- These lambdas are required for quad() integration, not simple assignments
- Break long formula line in _energy_self docstring
- Complies with 88 character limit
- Fix Logistic, Weibull docstrings: use correct parameter names (scale, k)
- Add doctest output expectations with # doctest: +ELLIPSIS to all energy examples
- Fix Logistic _energy_x formula: handle both x > mean and x < mean cases properly
- Now returns non-negative energy values as required
- Fixes test_doctest_examples and test_methods_x failures
- Fix doctest directive syntax: remove space after # (now #doctest: not # doctest:)
- Rewrite Logistic _energy_x to use direct numerical integration of |t - x| * f(t)
- Logistic PDF properly integrated as 1/(4*s*cosh^2((t-m)/(2*s)))
- Now returns non-negative energy values for all x values
- Fixes doctest syntax error and negative energy assertion
- Break docstring formula to separate line
- Already had correct line breaks in quad calls from previous commit
Comment thread docs/source/api_reference/distributions.rst Outdated
@fkiraly fkiraly added enhancement module:probability&simulation probability distributions and simulators labels Dec 20, 2025
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super useful! May I ask where you got the energy computations from? Derived yourself or from a book/paper?

Did you test against the Monte Carlo approximate computations?

- Keep API reference focused; move such highlights to release notes
…date docs/examples

- Exponential: set self-energy to 1/lambda (was 2/lambda)
- Gamma/Beta/Weibull/Pareto: use factor 2 for non-negative support in CDF integral (was 4)
- Logistic: make `energy_x` a non-negative integral of |t-x|·pdf(t)
- Docstrings: add pydocstyle-compliant Examples with doctest outputs; fix parameter names
- Lint: resolve flake8/pydocstyle issues (E501, E731, D202, D209)
- Docs: remove non-API "Energy computations" section from distributions API ref

Monte Carlo validation matches exact implementations within ~0.1–0.6% relative error across distributions.
@arnavk23
Copy link
Copy Markdown
Contributor Author

This is super useful! May I ask where you got the energy computations from? Derived yourself or from a book/paper?

Did you test against the Monte Carlo approximate computations?

Thanks! The energy computations come from a standard probability identity.

For i.i.d. X, Y with CDF F on support S, E|X−Y| = 2 ∫_S F(t)·(1−F(t)) dt.
For symmetric full-real support you can write it as 4 ∫_0^∞ F(t)·(1−F(t)) dt; for non-negative support (Gamma, Weibull, Beta, Pareto) the correct factor is 2 over the support. Exponential has a closed form E|X−Y| = 1/λ; for Logistic I compute energy_x by numerically integrating |t−x|·pdf(t), which guarantees non-negativity.

Yes—validated against Monte Carlo approximations. With N=200k samples:
Exponential(rate=2): exact 0.5000, MC 0.5005
Gamma(α=2, β=1): exact 1.5000, MC 1.4979
Beta(α=2, β=3): exact 0.2286, MC 0.2293
Logistic(μ=0, scale=1): exact 2.0000, MC 1.9975
Weibull(k=2, scale=1): exact 0.5191, MC 0.5208
Pareto(α=2.5, scale=1): exact 0.8333, MC 0.8285
All are within ~0.1–0.6% relative error.

Comment thread skpro/distributions/exponential.py Outdated
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great! Excellent that you know about the identity, that is very helpful :-)

Some comments:

  • could you remove the call of the energy function from the distribution examples?
  • it should be possible to compute logistic without quadrature, you need to integrate tanh-squared and tanh which have analytical expressions
  • for Pareto, it should also be possible to derive the integral, you integrate 1/x to some power

Let me know if you do not see this and I can help work it out. (or maybe I made a mistake)

…Examples

- Logistic: closed-form E|X-Y| = 2s (was quadrature)
- Pareto: closed-form E|X-Y| = 2ma/[(a-1)(2a-1)] (was quadrature)
- Remove .energy() calls from all distribution Examples sections
- Validated via Monte Carlo: all <0.6% relative error

The analytical formulas eliminate numerical integration overhead and improve accuracy.
@arnavk23
Copy link
Copy Markdown
Contributor Author

@fkiraly Removed .energy() calls from Examples.

Logistic: Analytical self-energy — Replaced quadrature with closed-form formula:
Formula: E|X-Y| = 2s for Logistic(μ, s)
Derivation: Using the identity F(t)(1-F(t)) = 1/(4 cosh²((t-μ)/(2s))), the integral simplifies to s, giving E|X-Y| = 2s.

Pareto: Analytical self-energy — Replaced quadrature with closed-form formula:
Formula: E|X-Y| = 2ma/[(a-1)(2a-1)] for Pareto(scale=m, alpha=a) with a > 1
Derivation: Direct integration of (m/t)ᵃ - (m/t)²ᵃ from m to ∞
Edge case: Returns inf for α ≤ 1

Validation:
Monte Carlo (N=200k) confirms all formulas within < 0.6% relative error:
Exponential: 0.09%, Gamma: 0.14%, Beta: 0.33%, Logistic: 0.12%, Weibull: 0.32%, Pareto: 0.58%

Energy computation examples are too specialized for general usage examples.
The energy method remains documented via capability tags and API reference.
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@fkiraly fkiraly merged commit 6f66a82 into sktime:main Dec 21, 2025
38 checks passed
@arnavk23 arnavk23 deleted the fix/issue-267-clean branch December 21, 2025 13:23
@arnavk23 arnavk23 restored the fix/issue-267-clean branch December 24, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:probability&simulation probability distributions and simulators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants