-
Notifications
You must be signed in to change notification settings - Fork 1
Nonparametric plotting and analysis tool for estimating a one-dimensional data sample
License
jennyfarmer/PDFAnalyze
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
================================================================ PDFAnalyze, Version 1.0, September 2022 Jenny Farmer jfarmer@carolina.rr.com Donald Jacobs djacobs1@uncc.edu University of North Carolina at Charlotte ================================================================ GENERAL INFORMATION ================================================================ The PDFAnalyze package includes the following high-level MATLABfunctions: 1. PDFe.m: Computes a nonparametric probability density estimate for a multivariate data sample for 1 to 5 variables. (Calls EstimatePDFmv.mex function, which can also be called directly) 2. PDFAnalyze.m: Computes a probability density estimate for a one-dimensional data sample and produces optional plots for analysis. (Calls EstimatePDF.mex function, which can also be called directly) Please cite at least one of these publication if you use this code for your research: Jenny, F. and J. Donald, High throughput nonparametric probability density estimation. PLoS ONE, 2018. 13(5): p. e0196937. Farmer, Jenny, and Donald J. Jacobs. “MATLAB Tool for Probability Density Assessment and Nonparametric Estimation.” SoftwareX, vol. 18, Elsevier BV, June 2022, p. 101017, doi:10.1016/j.softx.2022.101017. ================================================================= INSTALLATION FOR MATLAB (2018 or greater) ================================================================= Installation Steps 1. Prior to installing the PDFAnalyze, the MingGW C/C++ compiler for Windows must be installed as a MATLAB Add-on. To install, select [Add-Ons/Get Add-Ons] from the HOME menu within MATLAB and search for ‘MinGW’. Select and install MinGW-w64. 2. Copy all source files into a single folder 3. Run the CompilePDF.m script in MATLAB to create a MATLAB Executable (mex) 4. Run the CompilePDFmv.m script in MATLAB to create a MATLAB Executable (mex) 5. (optional) Verify installation by running example.m script in MATLAB The PDFAnalyze package consists of the following files: PDFAnalyze.m PlotBeta.m EstimatePDF.m FigureSettings.m GetTargets.m example.m CompilePDF.m CompilePDFmv.m EstimatePDF.cpp; EstimatePDF.h EstimatePDFmv.cpp; EstimatePDFmv.h JointProbability.cpp; JointProbability.h Variable.cpp; Variable.h callPDF.cpp; callPDF.h ChebyShev.cpp; ChebyShev.h InputData.cpp; InputData.h InputParameters.cpp; InputParameters.h MinimizeScore.cpp; MinimizeScore.h Score.cpp; Score.h ScoreQZ.cpp; ScoreQZ.h WriteResults.cpp; WriteResults.h OutputControl.cpp; OutputControl.h README.txt ================================================================= PDFe USAGE ================================================================= [pdfPoints, pdfEst] = PDFe(r); Input Parameters r random data sample, one column of data for each variable Output Parameters pdfEst the joint probability density function; a matrix of nVariable dimensions: [nGrids x nGrids x ... x nGrids] pdfPoints evaluation points for jp; one row for each variable ================================================================= PDFAnalyze USAGE ================================================================= [F, XI] = PDFAnalyze(X) Computes the density estimate of data in sample X. F contains the density estimate at points XI. The number of points and the relative spacing is determined automatically from the features of the data sample [F, XI, CDF, SQR] = PDFAnalyze(...) also returns the cumulative density and the scaled quantile residual for each sample data point. PDFAnalyze(...) with no output arguments produces a plot of the density estimate. [...] = PDFAnalyze(..., 'param1', 'val1, 'param2', 'val2', ...) specifies parameter name/value pairs to control the density estimation. Valid parameters are as follows: Parameter Value 'PlotType' Produces any combination of three plot types: 'pdf' probability density function for the 'sqr' scaled quantile residual (see NOTES) 'combined' pdf and sqr plotted on one figure Multiple plot types occur with multiple name/value pairs specified 'EstimationType' The default estimation method is PDFEstimate. Additional KDE methods are available: 'kde1' built-in MATLAB function ksdensity 'kde2' Zdravko Botev (2020). Kernel Density Estimator (https://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-est MATLAB Central File Exchange. Retrieved March 17, 2020. 'distribution' A two column matrix, [F, XI], representing a distribution to plot on the same figure as the estimate for use with 'pdf' plot type. Useful for comparison to a known distribution. ================================================================= EXAMPLES ================================================================= Example 1: Plot the estimate of random sample for a Normal distribution along with the true Normal distribution: data = randn(1000, 1); x = min(data):0.1:max(data); f = normpdf(x); d = [x(:), f(:)]; PDFAnalyze(data, 'distribution', d); Example 2: Plot the scaled quantile residual (SQR) for an estimate of the Normal distribution, showing confidence thresholds and uncertainty estimates: PDFAnalyze(randn(10000, 1), 'PlotType', 'sqr'); ================================================================= EstimatePDFmv USAGE ================================================================= EstimatePDFmv is invoked from within PDFe.m but can be called directly to customize the resolution Usage [jp, x] = EstimatePDFmv(r, nSamples, nVariables, nGrids); Input Parameters (all required) r random data sample, one column of data for each variable nSamples the number of rows in r, representing the number of samples per variable nVariables the number of columns in r, representing the number of variables nGrids the resolution, per variable, for desired output. Output Parameters jp the joint probability density function; an array of size (nVariable)^(nGrids) x evaluation points for jp; an array of size (nVariable) * (nGrids) ================================================================= EstimatePDF USAGE ================================================================= EstimatePDF is invoked from within PDFAnalyze.m and can be customized through a collection of advanced input and output options. Usage [failed, y, pdf, cdf, sqr, lagrange, score, confidence, SURD] = EstimatePDF(data, parameters) data (required) a single vector of random sample data. parameters (optional) a MATLAB structure of options listed below Optional Input Parameters Name Default Value parameters.SURDtarget [40] parameters.SURDmin [5] parameters.SURDmax [100] parameters.LagrangeMin [1] parameters.LagrangeMax [200] parameters.lowBound [calculated] parameters.highBound [calculated] parameters.integrationPoints [calculated] parameters.debug [false] parameters.partition [1025] parameters.scoreType ['QZ'] parameters.outlierCutoff [7] parameters.adaptiveDx [true] Output Parameters failed non-zero if a solution was not found y range of values in PDF (independent variable) pdf Probability Density Function (PDF) cdf Cummulative Denstiy Function sqr Scaled Quantile Residual lagrange Lagrange coefficients score Value returned by the score-type selected confidence SURD threshold achieved SURD Sample Uniform Random Data ================================================================= NOTES ================================================================= The following section includes a few brief notes concerning more advanced input and output options available, and how they may affect performance of the estimation. For a greater understanding of the methodology used, please see the publication referenced in the GENERAL INFORMATION section. 1. SURD Threshold Targets Sample Uniform Random Data (SURD) loosely correlates with the strength of the solution, with higher thresholds indicating more probably solutions for the PDF. 2. Scaled Quantile Residual The equation for Scaled Quantile Residual (SQR) is given by SQR = sqrt(N+2)*(u - uniform-u) where N is the number of data samples. SQR plots are very useful as a diagnostic measure because they are sample size invariant and have universal characteristics independent of the true PDF. The SQR plot type plots the SQR for each data sample by position, highlighting in red those that fall outside of the expected 98% threshold. 3. Lagrange Coefficients Each Lagrange multiplier returned as output is an expansion coefficient in the series of orthogonal functions within an exponential. The more complex the shape of the distribution, the more Lagrange multipliers are required to accurately define the PDF. 4. Greater accuracy in numerical integration can be controlled Increasing the number of integration points will improve the resolution of the PDF, but could increase runtime. Decreasing the integration points is not recommended, as it may produce poor solutions. 5. Failed solutions Two safety measures are implemented to prevent the program from continuing an unreasonably long time without finding a solution. i) If progress stalls and the score is not improving significantly after many attempts, or ii) If the maximum number of Lagrange multipliers has been reached. If the maximum number of Lagrange multipliers is reached, this indicates that the solution is likely not yet converged. The user can increase the maximum. However, the default maximum of 200 is set to prevent cases that may never converge. 6. Parametric maximum entropy method can be used with this program. If a user desires an exact number of Lagrange multipliers, the minLagrange and maxLagrange parameter options can be set to equal values. For example, if the user knows the distribution is a Gaussian, then the user could set both the minimum and maximum Lagrange mutlipliers to 3. In this case, the output will be equivalent to a parametric maximum entropy method, where the number of Lagrange multipliers is known in advance. 7. Verbose outputs for debugging. For more details on the progress of the program and explanations of possible warnings and outcomes, set the debug parameter option to true.
About
Nonparametric plotting and analysis tool for estimating a one-dimensional data sample
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published