Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

357 lines (307 sloc) 13.793 kb
MADlib Release Notes
--------------------
These release notes contain the significant changes in each MADlib release,
with most recent versions listed at the top.
A complete list of changes for each release can be obtained by viewing the git
commit history located at https://github.com/madlib/madlib/commits/master.
Current list of bugs and issues can be found at http://jira.madlib.net.
--------------------------------------------------------------------------------
MADlib v0.6
Release Date: 2013-Apr-01
New Features / Improvements:
* Generic cross-validation:
- Support for k-fold cross-validation of any supervised learning
algorithm
* Heteroskedasticity of linear regression
- Support for calculating heteroskedasticity via Breusch-Pagan test
* Grouping support for linear regression
- Support for linear regression on each group of data grouped by
one or multiple columns
* Grouping support for logistic regression
- Refactor of logistic regression code
- Support for logistic regression on each group of data grouped by
one or multiple columns
- Grouping support is added to the convex optimization framework
* LDA:
- Improved performance and scalability (MADLIB-480)
* Elastic net regularization for both linear and logistic regressions
- Support FISTA and IGD optimizers
* Summary function
- Support for an overview of data table
* Eigen package upgrade
- Now Eigen 3.1.2 is used by MADlib v0.6
* Unit testing framework:
- A new unit testing framework is added for C++ abstraction layer
Bug Fixes:
* C++ abstraction layer:
- Improved handling of NULL values in the input array (MADLIB-773)
* Naive Bayes:
- Improved the handling of NULL values. (MADlib-749)
Known Issues:
* K-means:
- K-means crashes on some datasets, when the dimensionality of the points is not uniform on the data set. (MADlib-789)
* Distribution Functions:
- Certain quantile functions will abort their session on invalid input. (MADlib-786)
* Multinomial Logistic Regression:
- Signs of coefficient outputs are inconsistent with other tools like R and Stata. (MADlib-785)
--------------------------------------------------------------------------------
MADlib v0.5
Release Date: 2012-Nov-15
Bug Fixes:
* K-means:
- Improved handling of invalid arguments (MADLIB-359, 361)
* Sketch-based estimators:
- Addressed security vulnerability (MADLIB-630)
New Features / Improvements:
* Association Rules (Apriori):
- Improved reporting output format for better usability (MADLIB-411)
- Significant improvement in performance (MADLIB-638)
* C++ (Database) Abstraction Layer:
- Extension to support modular transition states (MADLIB-499)
- Extension to support functions returning set of values (MADLIB-638)
* Conditional Random fields:
- Support for Linear Chain Conditional Random Fields for NLP (MADLIB-628)
* Decision Tree:
- Improved performance for C4.5 and Random forests (MADLIB-605)
- Improved encoding (MADLIB-590)
* Infrastructure:
- Convex optimization framework
* K-means:
- Code refactoring and Improved performance
(MADLIB-454, MADLIB-522, MADLIB-678)
- Silhouette function for k-means (MADLIB-681)
* Low-rank Matrix Factorization
- New module
* Logistic Regression:
- Support for Multinomial Logistic Regression (MADLIB-575)
* Naive Bayes
- Significant improvement in performance (MADLIB-611, 619, 626)
* Regression Analysis:
- Support for Cox Proportional Hazards test (MADLIB-576)
* Sampling
- Added weighted sampling of a single row (MADLIB-584)
* SVD Matrix Factorization:
- Improved performance (MADLIB-578)
Documentation:
* Conditional Random Fields:
- Example added for CRF module (MADLIB-731)
* SVD Matrix Factorization:
- Incremental-gradient SVD algorithm (MADLIB-572)
Known issues:
* Multinomial Logistic Regression:
- Number of independent variables cannot exceed 65535 (MADLIB-665)
* Naive Bayes:
- Current implementation of Naive Bayes is only suitable for
categorical attributes (MADLIB-679)
- NULL input values not accepted for attributes (MADLIB-614)
- NULL probabilities given for test set values not seen in
training set (MADLIB-523)
--------------------------------------------------------------------------------
MADlib v0.4.1
Release Date: 2012-Aug-9
Bug Fixes:
* PGXN:
- Fixed installation problem that could occur on some platforms (MADLIB-589)
New Features/Improvements:
* C++ Abstraction Layer:
- Increased ABI compatibility across multiple Greenplum versions
(MADLIB-606)
* Hypothesis Tests:
- Tests that are not implemented as ordered aggregates are now also
installed on PostgreSQL 8.4 and Greenplum 4.0.
--------------------------------------------------------------------------------
MADlib v0.4
Release Date: 2012-Jun-18
Bug Fixes:
* Association Rules:
- assoc_rules() now uses schema-qualified function calls (MADLIB-435)
* Decision Trees:
- Enhanced correctness (MADLIB-409, 502, 503)
- Improved handling of invalid arguments (MADLIB-331)
* k-Means:
- Improved handling of invalid arguments (MADLIB-336, 364, 459)
* PLDA:
- Improved robustness (MADLIB-474)
* Sparse Vectors:
- svec_sfv() now uses locale-aware sorting (MADLIB-457)
- Operators now install to MADlib schema (MADLIB-470)
New Features/Improvements:
* C++ Abstraction Layer:
- Support for "function pointers" (MADLIB-370)
- Support for sparse vectors (MADLIB-371)
- Support for more Eigen (linear algebra) types (MADLIB-533)
* Decision Trees:
- Code refactoring and optimization (MADLIB-410, 476, 504, 509)
- Documentation improvments (MADLIB-507)
- Output table now contains unencoded information (MADLIB-434)
- Enhance the missing value handling for continuous features (MADLIB-493)
* Hypothesis Tests:
- Pearson chi-square test (MADLIB-390)
- One- and two-sample t-Tests (MADLIB-391)
- F-test (MADLIB-392)
- Mann-Whitney U-test (MADLIB-393)
- Kolmogorov-Smirnov test (MADLIB-394)
- Wilcoxon-Signed-Rank test (MADLIB-405)
- One-way ANOVA (MADLIB-406)
* PostgreSQL Extensibility:
- Support for CREATE EXTENSION in PostgreSQL >= 9.1 (MADLIB-316)
- Availability on PGXN (MADLIB-334)
* Probability Functions:
- Wrap all distribution functions implemented by Boost (MADLIB-412)
- Wrap Kolmogorov distribution function from CERN ROOT project (MADLIB-413)
* Random Forests:
- New module (MADLIB-419)
* Support:
- Add elementary matrix/vector functions (e.g., norm/distances etc.)
(MADLIB-532)
* Viterbi Feature Extraction:
- New module (MADLIB-478)
Known issues:
- svec_sfv() does not support collations, as introduced with PostgreSQL 9.1
(MADLIB-558)
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 359, 361, 363)
--------------------------------------------------------------------------------
MADlib v0.3
Release Date: 2012-Feb-9
New features:
* Installer:
- Single installer package targeting all supported DBMSs per OS (MADLIB-218)
* C++ Abstraction Layer:
- Switched from using Armadillo to using Eigen for linear-algebra
operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275)
- Reimplemented as a template library for performance improvements
(MADLIB-295)
* Decision Trees:
- Major update
- Now supports multiple split criteria (information gain, gini, gain ratio)
- Now supports tree pruning using a validation set to address over fitting
- Now supports additional functions for tree output
- Now supports continuous features in addition to categorical features
- Additional support for handling null values
- Improved scalability and performance
* k-Means Clustering:
- Now handles any input that is convertible to SVEC. (MADLIB-42)
- Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto
similarity) (MADLIB-43)
- Supports multiple seedings methods (kmeans++, random, user-specified list
of centroids)
- Replaced goodness of fit with the (simplified) Silhouette coefficient
(MADLIB-45)
- New run-time parameters (MADLIB-47)
* Linear Regression:
- Major speed improvement
* Logistic Regression:
- Major speed improvement
- Now handles any input that is convertible to BOOLEAN (dependent variable)
or DOUBLE PRECISION[] (independent variables). (MADLIB-283)
- An under-/overflow safe version to evaluate the (usual) logistic function,
for scoring logistic regression (MADLIB-271)
- A third optimizer: Incremental-gradient-descent (MADLIB-303)
* Support:
- For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way
as the existing CREATE TABLE AS workaround. This workaround is not needed
in Greenplum >= 4.2.1 any more. (MADLIB-265)
- Function version() returns Madlib build information (MADLIB-309)
Bug fixes:
* Sparse vectors:
- Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305)
- Fixed a situation where using svec_svf() could cause a segmentation fault
(MADLIB-350)
- Increased compatibility with internal PostgreSQL conventions (MADLIB-257)
* Logistic regression:
- Handle numerical instability more gracefully (MADLIB-343, MADLIB-345)
- Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344)
- Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356)
Known issues:
- Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347)
- K-means: the error '"nan" does not exist' may be raised when input vectors
contain NaN. (MADLIB-364)
- Association Rules require the madlib schema to be in the search path
(MADLIB-353)
- Invalid arguments are not always guaranteed to be handled gracefully and
may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364)
--------------------------------------------------------------------------------
MADlib v0.2.1beta
Release Date: 2011-Sep-14
General changes:
* numerous improvements to the C++ abstraction layer:
- code clean-up
- fixed issue where incorrect values were returned when used with
debug builds of PostgreSQL/Greenplum (MADLIB-253)
- fixed issue where returning arrays to PostgreSQL/Greenplum could lead
to a crash (MADLIB-250)
- allocated memory is now 16-byte aligned for improved stability and
performance (MADLIB-236)
* compiling with advanced warnings enabled by default now
* all C/C++ code now free of warnings. On gcc <= 4.6, there might still be
warnings due to "unclean" macros in DBMS header files (MADLIB-228)
* prepared Solaris support in a later release (MADLIB-204)
- added support for Sun Compiler in CMake build script
- fixed all compilation errors with Sun compiler
* added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum
issue (MADLIB-241). Included this as GP Compatibility module.
* madpack utility:
- dropped madpack dependency on PygreSQL (MADLIB-217)
- improved security in madpack install-check (MADLIB-229)
- fixed bashism in madpack (MADLIB-222)
- fixed install-check not running on non-default schema (MADLIB-251)
Modules/methods:
* SVM (kernel_machines):
- fixed cumulative error count in svm_cls_update() function
- improved memory management in SVM module
* Linear regression (regress):
- fixed unexpected behavior for some edge cases (MADLIB-214)
- fixed crashing with huge number of independent vars (MADLIB-250)
* Logistic regression (regress):
- added support for arbitrary expressions for dep./indep. variables, not
just column names (MADLIB-255)
* Quantile:
- fixed quantile() function to be exact
- added simple version for small data sets
* Sparse Vectors:
- added check for sorted dictionary to svec_sfv (MADLIB-187)
* Decision Tree (decision_tree):
- now can be run multiple times in one session (MADLIB-156)
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
--------------------------------------------------------------------------------
MADlib v0.2.0beta
Release Date: 2011-Jul-8
General changes:
* new build and installation framework based on CMake
* new C++ abstraction layer for easy and secure method development
* new database installation utility (madpack)
Modules/methods:
* new: Association Rules (assoc_rules)
* new: Array Operators (array_ops)
* new: Decision Tree (decision_tree)
* new: Conjugate Gradient (conjugate_gradient)
* new: Parallel LDA (plda)
* improved: all methods from previous release
Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* running decision tree more than once in one session fails (MADLIB-156)
* performance of the conjugate-gradient optimizer in logistic regression
can be very poor (MADLIB-164)
* svec_sfv function doesn't check for sorted dictionary (MADLIB-187)
--------------------------------------------------------------------------------
MADlib v0.1.0alpha
Release Date: 2011-Jan-31
Initial release.
Included modules/methods:
* Naive-Bayes Classification (bayes)
* k-Means Clustering (kmeans)
* Support Vector Machines (kernel_machines)
* Sketch-based Estimators (sketch)
* Sketch-based Profile (data_profile)
* Quantile (quantile)
* Linear & Logistic Regression (regress)
* SVD Matrix Factorisation (svdmf)
* Sparse Vectors (svec)
--------------------------------------------------------------------------------
MADlib v0.1.0prerelease
Release date: 2011-Jan-25
Demo release.
Jump to Line
Something went wrong with that request. Please try again.