Fetching contributors…
Cannot retrieve contributors at this time
251 lines (216 sloc) 9.69 KB
MADlib Release Notes
These release notes contain the significant changes in each MADlib release,
with most recent versions listed at the top.
A complete list of changes for each release can be obtained by viewing the git
commit history located at
Current list of bugs and issues can be found at
MADlib v0.4.1
Release Date: 2012-Aug-9
Bug Fixes:
- Fixed installation problem that could occur on some platforms (MADLIB-589)
New Features/Improvements:
* C++ Abstraction Layer:
- Increased ABI compatibility across multiple Greenplum versions
* Hypothesis Tests:
- Tests that are not implemented as ordered aggregates are now also
installed on PostgreSQL 8.4 and Greenplum 4.0.
MADlib v0.4
Release Date: 2012-Jun-18
Bug Fixes:
* Association Rules:
- assoc_rules() now uses schema-qualified function calls (MADLIB-435)
* Decision Trees:
- Enhanced correctness (MADLIB-409, 502, 503)
- Improved handling of invalid arguments (MADLIB-331)
* k-Means:
- Improved handling of invalid arguments (MADLIB-336, 364, 459)
- Improved robustness (MADLIB-474)
* Sparse Vectors:
- svec_sfv() now uses locale-aware sorting (MADLIB-457)
- Operators now install to MADlib schema (MADLIB-470)
New Features/Improvements:
* C++ Abstraction Layer:
- Support for "function pointers" (MADLIB-370)
- Support for sparse vectors (MADLIB-371)
- Support for more Eigen (linear algebra) types (MADLIB-533)
* Decision Trees:
- Code refactoring and optimization (MADLIB-410, 476, 504, 509)
- Documentation improvments (MADLIB-507)
- Output table now contains unencoded information (MADLIB-434)
- Enhance the missing value handling for continuous features (MADLIB-493)
* Hypothesis Tests:
- Pearson chi-square test (MADLIB-390)
- One- and two-sample t-Tests (MADLIB-391)
- F-test (MADLIB-392)
- Mann-Whitney U-test (MADLIB-393)
- Kolmogorov-Smirnov test (MADLIB-394)
- Wilcoxon-Signed-Rank test (MADLIB-405)
- One-way ANOVA (MADLIB-406)
* PostgreSQL Extensibility:
- Support for CREATE EXTENSION in PostgreSQL >= 9.1 (MADLIB-316)
- Availability on PGXN (MADLIB-334)
* Probability Functions:
- Wrap all distribution functions implemented by Boost (MADLIB-412)
- Wrap Kolmogorov distribution function from CERN ROOT project (MADLIB-413)
* Random Forests:
- New module (MADLIB-419)
* Support:
- Add elementary matrix/vector functions (e.g., norm/distances etc.)
* Viterbi Feature Extraction:
- New module (MADLIB-478)
Known issues:
- svec_sfv() does not support collations, as introduced with PostgreSQL 9.1
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 359, 361, 363)
MADlib v0.3
Release Date: 2012-Feb-9
New features:
* Installer:
- Single installer package targeting all supported DBMSs per OS (MADLIB-218)
* C++ Abstraction Layer:
- Switched from using Armadillo to using Eigen for linear-algebra
operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275)
- Reimplemented as a template library for performance improvements
* Decision Trees:
- Major update
- Now supports multiple split criteria (information gain, gini, gain ratio)
- Now supports tree pruning using a validation set to address over fitting
- Now supports additional functions for tree output
- Now supports continuous features in addition to categorical features
- Additional support for handling null values
- Improved scalability and performance
* k-Means Clustering:
- Now handles any input that is convertible to SVEC. (MADLIB-42)
- Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto
similarity) (MADLIB-43)
- Supports multiple seedings methods (kmeans++, random, user-specified list
of centroids)
- Replaced goodness of fit with the (simplified) Silhouette coefficient
- New run-time parameters (MADLIB-47)
* Linear Regression:
- Major speed improvement
* Logistic Regression:
- Major speed improvement
- Now handles any input that is convertible to BOOLEAN (dependent variable)
or DOUBLE PRECISION[] (independent variables). (MADLIB-283)
- An under-/overflow safe version to evaluate the (usual) logistic function,
for scoring logistic regression (MADLIB-271)
- A third optimizer: Incremental-gradient-descent (MADLIB-303)
* Support:
- For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way
as the existing CREATE TABLE AS workaround. This workaround is not needed
in Greenplum >= 4.2.1 any more. (MADLIB-265)
- Function version() returns Madlib build information (MADLIB-309)
Bug fixes:
* Sparse vectors:
- Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305)
- Fixed a situation where using svec_svf() could cause a segmentation fault
- Increased compatibility with internal PostgreSQL conventions (MADLIB-257)
* Logistic regression:
- Handle numerical instability more gracefully (MADLIB-343, MADLIB-345)
- Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344)
- Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356)
Known issues:
- Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347)
- K-means: the error '"nan" does not exist' may be raised when input vectors
contain NaN. (MADLIB-364)
- Association Rules require the madlib schema to be in the search path
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364)
MADlib v0.2.1beta
Release Date: 2011-Sep-14
General changes:
* numerous improvements to the C++ abstraction layer:
- code clean-up
- fixed issue where incorrect values were returned when used with
debug builds of PostgreSQL/Greenplum (MADLIB-253)
- fixed issue where returning arrays to PostgreSQL/Greenplum could lead
to a crash (MADLIB-250)
- allocated memory is now 16-byte aligned for improved stability and
performance (MADLIB-236)
* compiling with advanced warnings enabled by default now
* all C/C++ code now free of warnings. On gcc <= 4.6, there might still be
warnings due to "unclean" macros in DBMS header files (MADLIB-228)
* prepared Solaris support in a later release (MADLIB-204)
- added support for Sun Compiler in CMake build script
- fixed all compilation errors with Sun compiler
* added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum
issue (MADLIB-241). Included this as GP Compatibility module.
* madpack utility:
- dropped madpack dependency on PygreSQL (MADLIB-217)
- improved security in madpack install-check (MADLIB-229)
- fixed bashism in madpack (MADLIB-222)
- fixed install-check not running on non-default schema (MADLIB-251)
* SVM (kernel_machines):
- fixed cumulative error count in svm_cls_update() function
- improved memory management in SVM module
* Linear regression (regress):
- fixed unexpected behavior for some edge cases (MADLIB-214)
- fixed crashing with huge number of independent vars (MADLIB-250)
* Logistic regression (regress):
- added support for arbitrary expressions for dep./indep. variables, not
just column names (MADLIB-255)
* Quantile:
- fixed quantile() function to be exact
- added simple version for small data sets
* Sparse Vectors:
- added check for sorted dictionary to svec_sfv (MADLIB-187)
* Decision Tree (decision_tree):
- now can be run multiple times in one session (MADLIB-156)
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
MADlib v0.2.0beta
Release Date: 2011-Jul-8
General changes:
* new build and installation framework based on CMake
* new C++ abstraction layer for easy and secure method development
* new database installation utility (madpack)
* new: Association Rules (assoc_rules)
* new: Array Operators (array_ops)
* new: Decision Tree (decision_tree)
* new: Conjugate Gradient (conjugate_gradient)
* new: Parallel LDA (plda)
* improved: all methods from previous release
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* running decision tree more than once in one session fails (MADLIB-156)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
* svec_sfv function doesn't check for sorted dictionary (MADLIB-187)
MADlib v0.1.0alpha
Release Date: 2011-Jan-31
Initial release.
Included modules/methods:
* Naive-Bayes Classification (bayes)
* k-Means Clustering (kmeans)
* Support Vector Machines (kernel_machines)
* Sketch-based Estimators (sketch)
* Sketch-based Profile (data_profile)
* Quantile (quantile)
* Linear & Logistic Regression (regress)
* SVD Matrix Factorisation (svdmf)
* Sparse Vectors (svec)
MADlib v0.1.0prerelease
Release date: 2011-Jan-25
Demo release.