[ENH] Add explicit energy computations for multiple distributions#688
Conversation
- Implement closed-form energy for Exponential (2/λ self-energy, piecewise cross) - Add deterministic quadrature energy for Gamma, Logistic, Weibull, Pareto, Beta - Implement MeanScale energy using delegation and scaling - Move energy from approximate to exact capabilities for all above distributions - Fix escape sequence warnings in MeanScale docstrings All implementations use either closed-form formulas or deterministic numerical integration (scipy.integrate.quad) instead of Monte Carlo approximation. Fixes sktime#267
- Add pydocstyle-compliant Examples sections to 7 distributions showing exact energy computations - Exponential: Closed-form self-energy and cross-energy formulas (E|X-Y| = 2/λ) - Beta, Gamma, Weibull, Pareto, Logistic: Deterministic quadrature-based energy - MeanScale: Energy delegation with scaling formula - Update distributions API reference with "Energy computations" section - Document shift from Monte Carlo approximation to exact/deterministic methods - Fixes sktime#267
- Fix pydocstyle D202: Remove blank lines after docstrings (Exponential, Gamma, Logistic) - Fix pydocstyle D209: Move closing quotes to separate line (Logistic, Weibull) - Fix flake8 E501: Break long lines in docstrings and energy implementations - Add noqa: E731 comments for lambda assignments in energy callbacks - These lambdas are required for quad() integration, not simple assignments
- Break long formula line in _energy_self docstring - Complies with 88 character limit
- Fix Logistic, Weibull docstrings: use correct parameter names (scale, k) - Add doctest output expectations with # doctest: +ELLIPSIS to all energy examples - Fix Logistic _energy_x formula: handle both x > mean and x < mean cases properly - Now returns non-negative energy values as required - Fixes test_doctest_examples and test_methods_x failures
- Fix doctest directive syntax: remove space after # (now #doctest: not # doctest:) - Rewrite Logistic _energy_x to use direct numerical integration of |t - x| * f(t) - Logistic PDF properly integrated as 1/(4*s*cosh^2((t-m)/(2*s))) - Now returns non-negative energy values for all x values - Fixes doctest syntax error and negative energy assertion
- Break docstring formula to separate line - Already had correct line breaks in quad calls from previous commit
fkiraly
left a comment
There was a problem hiding this comment.
This is super useful! May I ask where you got the energy computations from? Derived yourself or from a book/paper?
Did you test against the Monte Carlo approximate computations?
- Keep API reference focused; move such highlights to release notes
…date docs/examples - Exponential: set self-energy to 1/lambda (was 2/lambda) - Gamma/Beta/Weibull/Pareto: use factor 2 for non-negative support in CDF integral (was 4) - Logistic: make `energy_x` a non-negative integral of |t-x|·pdf(t) - Docstrings: add pydocstyle-compliant Examples with doctest outputs; fix parameter names - Lint: resolve flake8/pydocstyle issues (E501, E731, D202, D209) - Docs: remove non-API "Energy computations" section from distributions API ref Monte Carlo validation matches exact implementations within ~0.1–0.6% relative error across distributions.
Thanks! The energy computations come from a standard probability identity. For i.i.d. X, Y with CDF F on support S, E|X−Y| = 2 ∫_S F(t)·(1−F(t)) dt. Yes—validated against Monte Carlo approximations. With N=200k samples: |
fkiraly
left a comment
There was a problem hiding this comment.
Really great! Excellent that you know about the identity, that is very helpful :-)
Some comments:
- could you remove the call of the energy function from the distribution examples?
- it should be possible to compute logistic without quadrature, you need to integrate tanh-squared and tanh which have analytical expressions
- for Pareto, it should also be possible to derive the integral, you integrate 1/x to some power
Let me know if you do not see this and I can help work it out. (or maybe I made a mistake)
…Examples - Logistic: closed-form E|X-Y| = 2s (was quadrature) - Pareto: closed-form E|X-Y| = 2ma/[(a-1)(2a-1)] (was quadrature) - Remove .energy() calls from all distribution Examples sections - Validated via Monte Carlo: all <0.6% relative error The analytical formulas eliminate numerical integration overhead and improve accuracy.
|
@fkiraly Removed Logistic: Analytical self-energy — Replaced quadrature with closed-form formula: Pareto: Analytical self-energy — Replaced quadrature with closed-form formula: Validation: |
Energy computation examples are too specialized for general usage examples. The energy method remains documented via capability tags and API reference.
Reference Issues/PRs
Towards #267
What does this implement/fix? Explain your changes.
Does your contribution introduce a new dependency? If yes, which one?
What should a reviewer concentrate their feedback on?
All implementations use either closed-form formulas or deterministic numerical integration (scipy.integrate.quad) instead of Monte Carlo approximation.
Did you add any tests for the change?
Any other comments?
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skproroot directory (not theCONTRIBUTORS.md). Common badges:code- fixing a bug, or adding code logic.doc- writing or improving documentation or docstrings.bug- reporting or diagnosing a bug (get this pluscodeif you also fixed the bug in the PR).maintenance- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst, follow the pattern.Examplessection.python_dependenciestag and ensureddependency isolation, see the estimator dependencies guide.