Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

234 lines (203 sloc) 9.414 kB
MADlib Release Notes
--------------------
These release notes contain the significant changes in each MADlib release,
with most recent versions listed at the top.
A complete list of changes for each release can be obtained by viewing the git
commit history located at https://github.com/madlib/madlib/commits/master.
Current list of bugs and issues can be found at http://jira.madlib.net.
--------------------------------------------------------------------------------
MADlib v0.4
Release Date: 2012-Jun-18
Bug Fixes:
* Association Rules:
- assoc_rules() now uses schema-qualified function calls (MADLIB-435)
* Decision Trees:
- Enhanced correctness (MADLIB-409, 502, 503)
- Improved handling of invalid arguments (MADLIB-331)
* k-Means:
- Improved handling of invalid arguments (MADLIB-336, 364, 459)
* PLDA:
- Improved robustness (MADLIB-474)
* Sparse Vectors:
- svec_sfv() now uses locale-aware sorting (MADLIB-457)
- Operators now install to MADlib schema (MADLIB-470)
New Features/Improvements:
* C++ Abstraction Layer:
- Support for "function pointers" (MADLIB-370)
- Support for sparse vectors (MADLIB-371)
- Support for more Eigen (linear algebra) types (MADLIB-533)
* Decision Trees:
- Code refactoring and optimization (MADLIB-410, 476, 504, 509)
- Documentation improvments (MADLIB-507)
- Output table now contains unencoded information (MADLIB-434)
- Enhance the missing value handling for continuous features (MADLIB-493)
* Hypothesis Tests:
- Pearson chi-square test (MADLIB-390)
- One- and two-sample t-Tests (MADLIB-391)
- F-test (MADLIB-392)
- Mann-Whitney U-test (MADLIB-393)
- Kolmogorov-Smirnov test (MADLIB-394)
- Wilcoxon-Signed-Rank test (MADLIB-405)
- One-way ANOVA (MADLIB-406)
* PostgreSQL Extensibility:
- Support for CREATE EXTENSION in PostgreSQL >= 9.1 (MADLIB-316)
- Availability on PGXN (MADLIB-334)
* Probability Functions:
- Wrap all distribution functions implemented by Boost (MADLIB-412)
- Wrap Kolmogorov distribution function from CERN ROOT project (MADLIB-413)
* Random Forests:
- New module (MADLIB-419)
* Support:
- Add elementary matrix/vector functions (e.g., norm/distances etc.)
(MADLIB-532)
* Viterbi Feature Extraction:
- New module (MADLIB-478)
Known issues:
- svec_sfv() does not support collations, as introduced with PostgreSQL 9.1
(MADLIB-558)
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 359, 361, 363)
--------------------------------------------------------------------------------
MADlib v0.3
Release Date: 2012-Feb-9
New features:
* Installer:
- Single installer package targeting all supported DBMSs per OS (MADLIB-218)
* C++ Abstraction Layer:
- Switched from using Armadillo to using Eigen for linear-algebra
operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275)
- Reimplemented as a template library for performance improvements
(MADLIB-295)
* Decision Trees:
- Major update
- Now supports multiple split criteria (information gain, gini, gain ratio)
- Now supports tree pruning using a validation set to address over fitting
- Now supports additional functions for tree output
- Now supports continuous features in addition to categorical features
- Additional support for handling null values
- Improved scalability and performance
* k-Means Clustering:
- Now handles any input that is convertible to SVEC. (MADLIB-42)
- Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto
similarity) (MADLIB-43)
- Supports multiple seedings methods (kmeans++, random, user-specified list
of centroids)
- Replaced goodness of fit with the (simplified) Silhouette coefficient
(MADLIB-45)
- New run-time parameters (MADLIB-47)
* Linear Regression:
- Major speed improvement
* Logistic Regression:
- Major speed improvement
- Now handles any input that is convertible to BOOLEAN (dependent variable)
or DOUBLE PRECISION[] (independent variables). (MADLIB-283)
- An under-/overflow safe version to evaluate the (usual) logistic function,
for scoring logistic regression (MADLIB-271)
- A third optimizer: Incremental-gradient-descent (MADLIB-303)
* Support:
- For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way
as the existing CREATE TABLE AS workaround. This workaround is not needed
in Greenplum >= 4.2.1 any more. (MADLIB-265)
- Function version() returns Madlib build information (MADLIB-309)
Bug fixes:
* Sparse vectors:
- Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305)
- Fixed a situation where using svec_svf() could cause a segmentation fault
(MADLIB-350)
- Increased compatibility with internal PostgreSQL conventions (MADLIB-257)
* Logistic regression:
- Handle numerical instability more gracefully (MADLIB-343, MADLIB-345)
- Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344)
- Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356)
Known issues:
- Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347)
- K-means: the error '"nan" does not exist' may be raised when input vectors
contain NaN. (MADLIB-364)
- Association Rules require the madlib schema to be in the search path
(MADLIB-353)
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364)
--------------------------------------------------------------------------------
MADlib v0.2.1beta
Release Date: 2011-Sep-14
General changes:
* numerous improvements to the C++ abstraction layer:
- code clean-up
- fixed issue where incorrect values were returned when used with
debug builds of PostgreSQL/Greenplum (MADLIB-253)
- fixed issue where returning arrays to PostgreSQL/Greenplum could lead
to a crash (MADLIB-250)
- allocated memory is now 16-byte aligned for improved stability and
performance (MADLIB-236)
* compiling with advanced warnings enabled by default now
* all C/C++ code now free of warnings. On gcc <= 4.6, there might still be
warnings due to "unclean" macros in DBMS header files (MADLIB-228)
* prepared Solaris support in a later release (MADLIB-204)
- added support for Sun Compiler in CMake build script
- fixed all compilation errors with Sun compiler
* added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum
issue (MADLIB-241). Included this as GP Compatibility module.
* madpack utility:
- dropped madpack dependency on PygreSQL (MADLIB-217)
- improved security in madpack install-check (MADLIB-229)
- fixed bashism in madpack (MADLIB-222)
- fixed install-check not running on non-default schema (MADLIB-251)
Modules/methods:
* SVM (kernel_machines):
- fixed cumulative error count in svm_cls_update() function
- improved memory management in SVM module
* Linear regression (regress):
- fixed unexpected behavior for some edge cases (MADLIB-214)
- fixed crashing with huge number of independent vars (MADLIB-250)
* Logistic regression (regress):
- added support for arbitrary expressions for dep./indep. variables, not
just column names (MADLIB-255)
* Quantile:
- fixed quantile() function to be exact
- added simple version for small data sets
* Sparse Vectors:
- added check for sorted dictionary to svec_sfv (MADLIB-187)
* Decision Tree (decision_tree):
- now can be run multiple times in one session (MADLIB-156)
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
--------------------------------------------------------------------------------
MADlib v0.2.0beta
Release Date: 2011-Jul-8
General changes:
* new build and installation framework based on CMake
* new C++ abstraction layer for easy and secure method development
* new database installation utility (madpack)
Modules/methods:
* new: Association Rules (assoc_rules)
* new: Array Operators (array_ops)
* new: Decision Tree (decision_tree)
* new: Conjugate Gradient (conjugate_gradient)
* new: Parallel LDA (plda)
* improved: all methods from previous release
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* running decision tree more than once in one session fails (MADLIB-156)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
* svec_sfv function doesn't check for sorted dictionary (MADLIB-187)
--------------------------------------------------------------------------------
MADlib v0.1.0alpha
Release Date: 2011-Jan-31
Initial release.
Included modules/methods:
* Naive-Bayes Classification (bayes)
* k-Means Clustering (kmeans)
* Support Vector Machines (kernel_machines)
* Sketch-based Estimators (sketch)
* Sketch-based Profile (data_profile)
* Quantile (quantile)
* Linear & Logistic Regression (regress)
* SVD Matrix Factorisation (svdmf)
* Sparse Vectors (svec)
--------------------------------------------------------------------------------
MADlib v0.1.0prerelease
Release date: 2011-Jan-25
Demo release.
Jump to Line
Something went wrong with that request. Please try again.