Skip to content

Accuracy and performance benchmark of stable ("fat-tailed") distribution libraries in Python.

License

Notifications You must be signed in to change notification settings

ragibson/levy-stable-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

levy-stable-benchmarks

Stable distributions (sometimes called LĂ©vy alpha-stable distributions) are important for modelling data across several disciplines including signal processing, physics, and finance. Despite this, many Python libraries provide buggy and/or inaccurate implementations for computing its PDF/CDF.

This repository attempts to define a benchmark to test the accuracy of such implementations. We also provide some alternative calculation methods.

accuracy figures

Among other things, this repository has helped to

Table of Contents

CDF accuracy percentages

CDF table (-100 <= x <= 100)

Absolute Tolerance
Method1E-21E-31E-41E-5
simple_quadrature 99.9%99.9%99.6%99.4%
simple_monte_carlo 100.0%98.9%69.4%21.3%
larger_monte_carlo 100.0%100.0%99.6%76.8%
scipy_piecewise 100.0%99.7%98.1%97.3%
pylevy_miotto 83.5%78.0%68.9%58.1%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 98.9%98.9%98.9%98.6%
simple_monte_carlo 91.8%80.5%55.7%34.0%
larger_monte_carlo 99.6%94.6%80.1%58.6%
scipy_piecewise 96.7%96.7%96.7%96.3%
pylevy_miotto 81.8%66.7%54.0%42.3%

CDF quantile table (0.001 <= p <= 0.999)

Absolute Tolerance
Method1E-21E-31E-41E-5
simple_quadrature 99.0%99.0%99.0%99.0%
simple_monte_carlo 100.0%95.1%15.4%1.6%
larger_monte_carlo 100.0%100.0%99.9%36.3%
scipy_piecewise 100.0%99.9%99.8%99.7%
pylevy_miotto 85.9%80.7%78.8%77.5%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 99.0%99.0%99.0%99.0%
simple_monte_carlo 100.0%99.5%49.9%9.1%
larger_monte_carlo 100.0%99.9%99.6%81.2%
scipy_piecewise 99.9%99.9%99.8%99.8%
pylevy_miotto 92.7%83.0%79.8%78.1%

Nolan CDF quantile table (0.00001 <= p <= 0.99999)

Absolute Tolerance
Method1E-21E-31E-41E-5
simple_quadrature 95.7%95.4%95.2%95.1%
simple_monte_carlo 99.4%96.5%42.1%11.6%
larger_monte_carlo 99.4%99.1%98.9%55.0%
scipy_piecewise 99.4%99.0%98.1%95.0%
pylevy_miotto 91.5%87.9%83.4%77.7%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 95.7%95.4%95.3%95.1%
simple_monte_carlo 94.3%87.5%50.9%21.6%
larger_monte_carlo 99.4%96.6%87.2%68.7%
scipy_piecewise 97.6%97.2%97.0%96.6%
pylevy_miotto 93.7%86.0%79.5%71.5%

PDF accuracy percentages

PDF table (-100 <= x <= 100)

Absolute Tolerance
Method1E-41E-51E-61E-7
simple_quadrature 99.9%99.9%99.9%99.9%
scipy_piecewise 99.9%99.8%99.8%99.8%
scipy_dni 87.5%87.5%87.5%87.5%
pylevy_miotto 90.8%79.8%67.7%37.5%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 97.5%97.5%97.4%97.4%
scipy_piecewise 99.3%99.3%99.3%99.3%
scipy_dni 84.0%84.0%84.0%84.0%
pylevy_miotto 79.5%48.8%24.4%11.5%

PDF quantile table (0.001 <= p <= 0.999)

Absolute Tolerance
Method1E-41E-51E-61E-7
simple_quadrature 99.8%99.8%99.8%99.7%
scipy_piecewise 99.9%99.8%99.8%99.8%
scipy_dni 67.6%67.5%67.5%67.5%
pylevy_miotto 85.4%82.5%77.5%55.9%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 98.6%98.5%98.4%98.4%
scipy_piecewise 99.9%99.9%99.9%99.8%
scipy_dni 67.0%66.7%66.6%66.5%
pylevy_miotto 85.2%79.1%77.7%75.5%

Nolan PDF quantile table (0.00001 <= p <= 0.99999)

Absolute Tolerance
Method1E-41E-51E-61E-7
simple_quadrature 99.9%99.9%99.9%99.8%
scipy_piecewise 99.9%99.8%99.7%99.6%
scipy_dni 64.5%64.5%64.5%64.5%
pylevy_miotto 91.9%87.0%77.9%54.4%
Relative Tolerance
Method1E-11E-21E-31E-4
simple_quadrature 92.5%91.6%91.6%91.4%
scipy_piecewise 99.1%99.1%99.0%99.0%
scipy_dni 58.5%57.8%57.6%57.1%
pylevy_miotto 88.3%79.8%67.5%55.3%

Average runtimes

The average per-call (single-threaded) runtimes on the benchmark tables are as follows.

MethodCDF average runtimePDF average runtime
simple_quadrature2.18 ms1.77 ms
simple_monte_carlo0.055 ms*-
scipy_piecewise0.70 ms1.15 ms
scipy_dni-4.89 ms
pylevy_miotto1.23 ms1.22 ms

* This method takes ~100-150 ms for single calls, but is very fast on repeated (alpha, beta) values. See this FAQ question for more details.

FAQ: notes and limitations

How are "accuracy percentage" and "composite accuracy" defined?

"Accuracy percentage" is the percentage of tabulated values computed to the desired tolerance.

When listed above, the accuracy percentages are truncated (not rounded), so a method will only have 100.0% accuracy if it is within the specified tolerance on all the test cases.

"Composite accuracy" is the average accuracy percentage on all the PDF or CDF tables. There are three tables each for PDF and CDF, so ~33% of the weighting goes to each.

Where did these PDF/CDF tables come from? Are they accurate?

These tables are obtained from Nolan's STABLE program, which has been the gold standard for computing this distribution's functions since its release.

STABLE rounds certain inputs to avoid parameter regions where his method has difficulties. The specifics are complicated (see his README) and certain parts of his algorithm are undocumented, but can be partially reverse-engineered through his program's debug messages. Regardless, I believe these have very little effect on the overall accuracy of the tables.

More imporantly, Nolan's methods have significant inaccuracies in the distribution tails for some parameter values. For instance, alpha=1.0 with nonzero beta often has an (incorrect) discontinuity, resulting in incorrect values beyond p=1% and p=99%.

We plotted this specific issue in a related scipy PR and copy this inline below.

STABLE CDF inaccuracy near alpha=1

We have also compiled plots of all STABLE CDF errors larger than 1e-4 in the tables that use the most recent (publicly available) version of the program (v3.14.02).

Nolan's published CDF/PDF quantiles table use a much older version of the program and contain other/larger inaccuracies.

There are more inaccuracies further in the tails for other parameter values (notably as alpha gets smaller), but they are less impactful. For such parameter choices, Nolan's integrands become very pathological and are beyond the capabilities of most general quadrature routines.

What are some known limitations of this benchmark?

Our tables lack beta < 0 values, so we implicitly assume that implementations handle negative beta values correctly. This distribution has a certain "symmetry" in beta, so one could easily test these values as well, but it would double the running time.

The behavior very far out (p < 0.00001) into the tails is not tested, but this is probably a minor concern in most applications

There are known errors in the tables that we use to test accuracy, but I believe this also has little effect (see the FAQ question on the PDF/CDF tables).

Why is the range of tested absolute tolerances different for CDF vs. PDF?

This is an ad-hoc choice. It was assumed that absolute accuracies of 0.01 for the CDF and 0.0001 for the PDF are near the lower end of usefulness.

It's worth noting that simply returning the normal distribution (with e.g. norm.pdf(1, scale=sqrt(2))) yields composite accuracies of ~20% and ~30% at this accuracy level for the PDF and CDF, respectively.

Where can I find the libraries tested?

There are six methods tested here. See the links below and the code in algorithms.

The literature is very inconsistent/fragmented with respect to parameterizing stable distributions. Are you sure the libraries are actually consistent in their calculations here?

Mark Veillette says this well:

One of the most frustrating issues in dealing with alpha-stable distribtuions is that its parameterization is not consistent across the literature (there are over half a dozen parameterizations). [...] The most common way to specify a parameterization is to look at the characteristic function of the alpha-stable random variable.

One further annoyance is that the names of the 4 parameters are also inconsistent. [...] The letters alpha and beta are used almost everywhere you look, while the other two parameters are almost always different.

To this end, we've written some tests to prove our transformations are good. At the moment, all the libraries here appear use either the S0 or S1 parameterization (in Nolan's notation).

simple_quadrature usually seems accurate. When/where is it inaccurate?

Most notably, simple_quadrature can completely fail near x = ? (TODO: this has been observed, but where and why?) and in the tails when the oscillatory components of the integrand become incredibly unwieldy.

One could probably create a hybrid scheme that uses known asymptotic tail results to improve accuracy here, though it's unclear how successful or performant such an approach would be.

If accuracy is critical, one should really use a custom integrator for computing this distribution's functions as there are some nice properties of the integrands that so far have not been exploited. To date, only general quadrature routines have been used. Indeed, most of the significant issues in Nolan's STABLE program appear to arise due to numerical quadrature failures.

simple_monte_carlo appears far slower in practice than listed here. Why?

simple_monte_carlo is an incredibly inefficient method for computing the CDF in general. It must generate new random samples for each unique pair of (alpha, beta) values.

Here, the test tables include many repeated (alpha, beta) pairs, so the method appears to run more quickly on average than it otherwise might.

These methods vary greatly in their speed. What is a "good" average time per call?

This is highly domain specific -- some applications might need very quick calculations and can tolerate large inaccuracies. Others might require high accuracy and have speed only as a secondary consideration.

The tests here were run on a machine with a i7-9700K (8 cores, stock, up to 4.9 GHz) CPU and 16 GB DDR4 3200 MHz of RAM.

Nolan claims to have

code to quickly approximate stable densities. This routine is much faster than the regular density calculations: approximately 1 million density evaluations/second can be performed on a 1 GHz Pentium.

which appears to suggest that an average time per call of <1 us (!) is "easily" feasible on modern hardware. However, this is several orders of magnitude faster than any of the methods tested here.

Some of the methods only appear in the PDF or CDF tests. Why?

Some methods only support one or the other. In general, computing the CDF of this distribution is much easier than computing the PDF.

I know of a Python library that is missing from this benchmark. Can you add it?

Yes (assuming it's publicly available), please raise an issue and I'll try to add it.

About

Accuracy and performance benchmark of stable ("fat-tailed") distribution libraries in Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published