This repository provides a software implementation in Python of the computation of the Non-Parametric estimation of Fisher Information (NPFI), as presented in [1]. The repository contains a simple example of how npfi can be used to compute the Fisher information of the Gaussian distribution, in addition to the scripts recreating Figs. 2-4 in [1].
Written by Omri Har-Shemesh, Computational Science Lab, University of Amsterdam
To compute the Fisher information from samples there are two steps necessary -
computing probability density functions (pdfs) from the samples that are taken
at known parameter values and combining these to compute the Fisher
information. The file npfi.py
provides three two functions, one for each of
these steps and one to integrate the FI along a line. To use these functions
simply place npfi.py
in the same directory as your script and import the
necessary functions. For example:
from npfi import npfi, get_pdfs_from_data
The process of extracting the PDFs and computing the Fisher information is described in the following subsections:
The function get_pdfs_from_data
currently uses either gaussian_kde
from the
scipy
Python package or deft
(if present) to estimate the PDFs for each of
the data provided. It takes as first argument a list
of numpy
arrays, each
array assumed to be a list of samples from the same parameter range and returns
a list of pdfs with the corresponding non-parametric estimate for each of
these. The default estimation method is deft
and the rest of the parameters
it accepts control the estimation process. See the code for a detailed
explanation of each of the parameters. In order for the deft method to work,
the file deft.py
which is available at https://github.com/jbkinney/13_deft
has to be placed in the same directory as npfi.py
and your script.
The function npfi
accepts either three or five pdfs and computes either the
diagonal FI element or the off-diagonal element respectively of the Fisher
information matrix. See the source code for exact implementation details and
documentation of each of the input parameters.
from npfi import npfi, get_pdfs_from_data
# Compute g_ss for the Gaussian distribution
s = 1.0
ds = 0.1
N = 5000
rep = 30
analytic_value = 2.0 / s
FIMs_kde = []
FIMs_deft = []
epsilons_kde = []
epsilons_deft = []
for i in range(rep):
Xa = normal(size=N, scale=s)
Xb = normal(size=N, scale=s-ds)
Xc = normal(size=N, scale=s+ds)
pdfs_deft, bbox_deft = get_pdfs_from_data([Xa, Xb, Xc], method="deft") # DEFT
pdfs_kde, bbox_kde = get_pdfs_from_data([Xa, Xb, Xc], method="gaussian_kde")
FIM_deft, int_err_deft, epsilon_deft = npfi(pdfs_deft, ds, N=N, bounds=bbox_deft, logarithmic=False)
FIM_kde, int_err_kde, epsilon_kde = npfi(pdfs_kde, ds, N=N, bounds=bbox_kde, logarithmic=True)
FIMs_deft.append(FIM_deft)
FIMs_kde.append(FIM_kde)
epsilons_deft.append(FIM_deft)
epsilons_kde.append(FIM_kde)
print("#" * 50)
print("Estimation of the FI after %d repetitions:" % rep)
print("Analytic value: %.2f" % analytic_value)
print("FIM from DEFT: %.3f, epsilon=%.3f" % (np.mean(FIMs_deft), np.mean(epsilon_deft)))
print("FIM from KDE: %.3f, epsilon=%.3f" % (np.mean(FIMs_kde), np.mean(epsilon_kde)))
rel_deft = (np.mean(FIMs_deft) - analytic_value) / analytic_value
rel_deft_95 = (np.percentile(FIMs_deft, 95) - analytic_value) / analytic_value - rel_deft
rel_deft_5 = rel_deft - (np.percentile(FIMs_deft, 5) - analytic_value) / analytic_value
print("Relative error DEFT: %.5f + %.5f - %.5f" % (rel_deft, rel_deft_95, rel_deft_5))
rel_kde = (np.mean(FIMs_kde) - analytic_value) / analytic_value
rel_kde_95 = (np.percentile(FIMs_kde, 95) - analytic_value) / analytic_value - rel_kde
rel_kde_5 = rel_kde - (np.percentile(FIMs_kde, 5) - analytic_value) / analytic_value
print("Relative error KDE: %.5f + %.5f - %.5f" % (rel_kde, rel_kde_95, rel_kde_5))
print("#" * 50)
To reproduce the figures, simply run fig2.py
, fig3.py
, or fig4.py
using
python2
.
npfi.py:
The main file. It provides two functions npfi
and get_pdfs_from_data
It is independent from the rest of the files in the repository and should
be placed in the same directory as your code. If deft.py
is available [2],
it will be able to use DEFT for the density estimation.
fig2.py: Reproduces Fig. 2 in [1] by simulating the data and computing the FI. Note: this does not use the same seed as the original plot in the publication.
fig3.py: Reproduces Fig. 3 in [1] by simulating the data and computing the FI. Note: this does not use the same seed as the original plot in the publication. (On my laptop the runtime with the parameters of the paper took around 3.9 hours).
fig4.py: Reproduces Fig. 4 in [1] by simulating the data and computing the FI. Note: this does not use the same seed as the original plot in the publication.
https://github.com/omrihar/1_npfi
[1] O. Har-Shemesh, R. Quax, B. Miñano, A.G. Hoekstra, P.M.A. Sloot, Non-parametric estimation of Fisher information from real data, (2015) arxiv:1507.00964[stat.CO]