levy-stable-benchmarks

Stable distributions (sometimes called Lévy alpha-stable distributions) are important for modelling data across several disciplines including signal processing, physics, and finance. Despite this, many Python libraries provide buggy and/or inaccurate implementations for computing its PDF/CDF.

This repository attempts to define a benchmark to test the accuracy of such implementations. We also provide some alternative calculation methods.

Among other things, this repository has helped to

Identify a typo in Theorem 1, case (d) of Nolan's original paper, Numerical calculation of stable densities and distribution functions. This is corrected in Nolan's new textbook.
Further progress on a scipy PR to improve stable distribution computations.
Identify some issues in pylevy, a Python package for applications involving stable distributions.

How are "accuracy percentage" and "composite accuracy" defined?
Where did these PDF/CDF tables come from? Are they accurate?
What are some known limitations of this benchmark?
Why is the range of tested absolute tolerances different for CDF vs. PDF?
Where can I find the libraries tested?
The literature is very inconsistent/fragmented with respect to parameterizing stable distributions. Are you sure the libraries are actually consistent in their calculations here?
simple_quadrature usually seems accurate. When/where is it inaccurate?
simple_monte_carlo appears far slower in practice than listed here. Why?
These methods vary greatly in their speed. What is a "good" average time per call?
Some of the methods only appear in the PDF or CDF tests. Why?
I know of a Python library that is missing from this benchmark. Can you add it?

How are "accuracy percentage" and "composite accuracy" defined?

"Accuracy percentage" is the percentage of tabulated values computed to the desired tolerance.

When listed above, the accuracy percentages are truncated (not rounded), so a method will only have 100.0% accuracy if it is within the specified tolerance on all the test cases.

"Composite accuracy" is the average accuracy percentage on all the PDF or CDF tables. There are three tables each for PDF and CDF, so ~33% of the weighting goes to each.

Where did these PDF/CDF tables come from? Are they accurate?

These tables are obtained from Nolan's STABLE program, which has been the gold standard for computing this distribution's functions since its release.

STABLE rounds certain inputs to avoid parameter regions where his method has difficulties. The specifics are complicated (see his README) and certain parts of his algorithm are undocumented, but can be partially reverse-engineered through his program's debug messages. Regardless, I believe these have very little effect on the overall accuracy of the tables.

More imporantly, Nolan's methods have significant inaccuracies in the distribution tails for some parameter values. For instance, alpha=1.0 with nonzero beta often has an (incorrect) discontinuity, resulting in incorrect values beyond p=1% and p=99%.

We plotted this specific issue in a related scipy PR and copy this inline below.

We have also compiled plots of all STABLE CDF errors larger than 1e-4 in the tables that use the most recent (publicly available) version of the program (v3.14.02).

Nolan's published CDF/PDF quantiles table use a much older version of the program and contain other/larger inaccuracies.

There are more inaccuracies further in the tails for other parameter values (notably as alpha gets smaller), but they are less impactful. For such parameter choices, Nolan's integrands become very pathological and are beyond the capabilities of most general quadrature routines.

What are some known limitations of this benchmark?

Our tables lack beta < 0 values, so we implicitly assume that implementations handle negative beta values correctly. This distribution has a certain "symmetry" in beta, so one could easily test these values as well, but it would double the running time.

The behavior very far out (p < 0.00001) into the tails is not tested, but this is probably a minor concern in most applications

There are known errors in the tables that we use to test accuracy, but I believe this also has little effect (see the FAQ question on the PDF/CDF tables).

Why is the range of tested absolute tolerances different for CDF vs. PDF?

This is an ad-hoc choice. It was assumed that absolute accuracies of 0.01 for the CDF and 0.0001 for the PDF are near the lower end of usefulness.

It's worth noting that simply returning the normal distribution (with e.g. norm.pdf(1, scale=sqrt(2))) yields composite accuracies of ~20% and ~30% at this accuracy level for the PDF and CDF, respectively.

Where can I find the libraries tested?

There are six methods tested here. See the links below and the code in algorithms.

Our simple methods (relatively "simple" compared to existing techniques)
- simple_quadrature: direct numerical integration based on our integrand derivation directly from the characteristic function
- simple_monte_carlo: monte carlo scheme based on the Chambers-Mallows-Stuck method of simulating stable random variables
- larger_monte_carlo: same as simple_monte_carlo, but with a sample size of 100 million
Scipy's methods (tested on version 1.12.0)
- scipy_piecewise with pdf_default_method = "piecewise" and cdf_default_method = "piecewise"
- scipy_dni with pdf_default_method = "dni"
pylevy_miotto (tested on commit d6b855e)
TODO: add the unofficial pystable_jones?

The literature is very inconsistent/fragmented with respect to parameterizing stable distributions. Are you sure the libraries are actually consistent in their calculations here?

Mark Veillette says this well:

One of the most frustrating issues in dealing with alpha-stable distribtuions is that its parameterization is not consistent across the literature (there are over half a dozen parameterizations). [...] The most common way to specify a parameterization is to look at the characteristic function of the alpha-stable random variable.

One further annoyance is that the names of the 4 parameters are also inconsistent. [...] The letters alpha and beta are used almost everywhere you look, while the other two parameters are almost always different.

To this end, we've written some tests to prove our transformations are good. At the moment, all the libraries here appear use either the S0 or S1 parameterization (in Nolan's notation).

simple_quadrature usually seems accurate. When/where is it inaccurate?

Most notably, simple_quadrature can completely fail near x = ? (TODO: this has been observed, but where and why?) and in the tails when the oscillatory components of the integrand become incredibly unwieldy.

One could probably create a hybrid scheme that uses known asymptotic tail results to improve accuracy here, though it's unclear how successful or performant such an approach would be.

If accuracy is critical, one should really use a custom integrator for computing this distribution's functions as there are some nice properties of the integrands that so far have not been exploited. To date, only general quadrature routines have been used. Indeed, most of the significant issues in Nolan's STABLE program appear to arise due to numerical quadrature failures.

simple_monte_carlo appears far slower in practice than listed here. Why?

simple_monte_carlo is an incredibly inefficient method for computing the CDF in general. It must generate new random samples for each unique pair of (alpha, beta) values.

Here, the test tables include many repeated (alpha, beta) pairs, so the method appears to run more quickly on average than it otherwise might.

These methods vary greatly in their speed. What is a "good" average time per call?

This is highly domain specific -- some applications might need very quick calculations and can tolerate large inaccuracies. Others might require high accuracy and have speed only as a secondary consideration.

The tests here were run on a machine with a i7-9700K (8 cores, stock, up to 4.9 GHz) CPU and 16 GB DDR4 3200 MHz of RAM.

Nolan claims to have

code to quickly approximate stable densities. This routine is much faster than the regular density calculations: approximately 1 million density evaluations/second can be performed on a 1 GHz Pentium.

which appears to suggest that an average time per call of <1 us (!) is "easily" feasible on modern hardware. However, this is several orders of magnitude faster than any of the methods tested here.

Some of the methods only appear in the PDF or CDF tests. Why?

Some methods only support one or the other. In general, computing the CDF of this distribution is much easier than computing the PDF.

I know of a Python library that is missing from this benchmark. Can you add it?

Yes (assuming it's publicly available), please raise an issue and I'll try to add it.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
algorithms		algorithms
benchmarks		benchmarks
data		data
figures		figures
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

	Absolute Tolerance
Method	1E-2	1E-3	1E-4	1E-5
simple_quadrature	99.9%	99.9%	99.6%	99.4%
simple_monte_carlo	100.0%	98.9%	69.4%	21.3%
larger_monte_carlo	100.0%	100.0%	99.6%	76.8%
scipy_piecewise	100.0%	99.7%	98.1%	97.3%
pylevy_miotto	83.5%	78.0%	68.9%	58.1%

	Relative Tolerance
Method	1E-1	1E-2	1E-3	1E-4
simple_quadrature	98.9%	98.9%	98.9%	98.6%
simple_monte_carlo	91.8%	80.5%	55.7%	34.0%
larger_monte_carlo	99.6%	94.6%	80.1%	58.6%
scipy_piecewise	96.7%	96.7%	96.7%	96.3%
pylevy_miotto	81.8%	66.7%	54.0%	42.3%

	Absolute Tolerance
Method	1E-2	1E-3	1E-4	1E-5
simple_quadrature	99.0%	99.0%	99.0%	99.0%
simple_monte_carlo	100.0%	95.1%	15.4%	1.6%
larger_monte_carlo	100.0%	100.0%	99.9%	36.3%
scipy_piecewise	100.0%	99.9%	99.8%	99.7%
pylevy_miotto	85.9%	80.7%	78.8%	77.5%

	Relative Tolerance
Method	1E-1	1E-2	1E-3	1E-4
simple_quadrature	99.0%	99.0%	99.0%	99.0%
simple_monte_carlo	100.0%	99.5%	49.9%	9.1%
larger_monte_carlo	100.0%	99.9%	99.6%	81.2%
scipy_piecewise	99.9%	99.9%	99.8%	99.8%
pylevy_miotto	92.7%	83.0%	79.8%	78.1%

Method	CDF average runtime	PDF average runtime
simple_quadrature	2.18 ms	1.77 ms
simple_monte_carlo	0.055 ms^*	-
scipy_piecewise	0.70 ms	1.15 ms
scipy_dni	-	4.89 ms
pylevy_miotto	1.23 ms	1.22 ms

	Relative Tolerance
Method	1E-1	1E-2	1E-3	1E-4
simple_quadrature	95.7%	95.4%	95.3%	95.1%
simple_monte_carlo	94.3%	87.5%	50.9%	21.6%
larger_monte_carlo	99.4%	96.6%	87.2%	68.7%
scipy_piecewise	97.6%	97.2%	97.0%	96.6%
pylevy_miotto	93.7%	86.0%	79.5%	71.5%

	Relative Tolerance
Method	1E-1	1E-2	1E-3	1E-4
simple_quadrature	97.5%	97.5%	97.4%	97.4%
scipy_piecewise	99.3%	99.3%	99.3%	99.3%
scipy_dni	84.0%	84.0%	84.0%	84.0%
pylevy_miotto	79.5%	48.8%	24.4%	11.5%

License

ragibson/levy-stable-benchmarks

Folders and files

Latest commit

History

Repository files navigation

levy-stable-benchmarks

Table of Contents

CDF accuracy percentages

CDF table (-100 <= x <= 100)

CDF quantile table (0.001 <= p <= 0.999)

Nolan CDF quantile table (0.00001 <= p <= 0.99999)

PDF accuracy percentages

PDF table (-100 <= x <= 100)

PDF quantile table (0.001 <= p <= 0.999)

Nolan PDF quantile table (0.00001 <= p <= 0.99999)

Average runtimes

FAQ: notes and limitations

How are "accuracy percentage" and "composite accuracy" defined?

Where did these PDF/CDF tables come from? Are they accurate?

What are some known limitations of this benchmark?

Why is the range of tested absolute tolerances different for CDF vs. PDF?

Where can I find the libraries tested?

The literature is very inconsistent/fragmented with respect to parameterizing stable distributions. Are you sure the libraries are actually consistent in their calculations here?

simple_quadrature usually seems accurate. When/where is it inaccurate?

simple_monte_carlo appears far slower in practice than listed here. Why?

These methods vary greatly in their speed. What is a "good" average time per call?

Some of the methods only appear in the PDF or CDF tests. Why?

I know of a Python library that is missing from this benchmark. Can you add it?

About

Topics

Resources

License

Stars

Watchers

Forks

Languages