Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

file 308 lines (268 sloc) 12.031 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308
MADlib Release Notes
--------------------

These release notes contain the significant changes in each MADlib release,
with most recent versions listed at the top.

A complete list of changes for each release can be obtained by viewing the git
commit history located at https://github.com/madlib/madlib/commits/master.

Current list of bugs and issues can be found at http://jira.madlib.net.

--------------------------------------------------------------------------------
MADlib v0.5

Release Date: 2012-Nov-15

Bug Fixes:
* K-means:
    - Improved handling of invalid arguments (MADLIB-359, 361)
* Sketch-based estimators:
- Addressed security vulnerability (MADLIB-630)

New Features / Improvements:
* Association Rules (Apriori):
    - Improved reporting output format for better usability (MADLIB-411)
- Significant improvement in performance (MADLIB-638)
* C++ (Database) Abstraction Layer:
    - Extension to support modular transition states (MADLIB-499)
    - Extension to support functions returning set of values (MADLIB-638)
* Conditional Random fields:
    - Support for Linear Chain Conditional Random Fields for NLP (MADLIB-628)
* Decision Tree:
    - Improved performance for C4.5 and Random forests (MADLIB-605)
    - Improved encoding (MADLIB-590)
* Infrastructure:
    - Convex optimization framework
* K-means:
    - Code refactoring and Improved performance
      (MADLIB-454, MADLIB-522, MADLIB-678)
    - Silhouette function for k-means (MADLIB-681)
* Low-rank Matrix Factorization
    - New module
* Logistic Regression:
    - Support for Multinomial Logistic Regression (MADLIB-575)
* Naive Bayes
    - Significant improvement in performance (MADLIB-611, 619, 626)
* Regression Analysis:
    - Support for Cox Proportional Hazards test (MADLIB-576)
* Sampling
    - Added weighted sampling of a single row (MADLIB-584)
* SVD Matrix Factorization:
    - Improved performance (MADLIB-578)

Documentation:
* Conditional Random Fields:
    - Example added for CRF module (MADLIB-731)
* SVD Matrix Factorization:
    - Incremental-gradient SVD algorithm (MADLIB-572)

Known issues:
* Multinomial Logistic Regression:
    - Number of independent variables cannot exceed 65535 (MADLIB-665)
* Naive Bayes:
    - Current implementation of Naive Bayes is only suitable for
        categorical attributes (MADLIB-679)
    - NULL input values not accepted for attributes (MADLIB-614)
    - NULL probabilities given for test set values not seen in
      training set (MADLIB-523)

--------------------------------------------------------------------------------
MADlib v0.4.1

Release Date: 2012-Aug-9

Bug Fixes:
* PGXN:
    - Fixed installation problem that could occur on some platforms (MADLIB-589)

New Features/Improvements:
* C++ Abstraction Layer:
    - Increased ABI compatibility across multiple Greenplum versions
      (MADLIB-606)
* Hypothesis Tests:
    - Tests that are not implemented as ordered aggregates are now also
      installed on PostgreSQL 8.4 and Greenplum 4.0.

--------------------------------------------------------------------------------
MADlib v0.4

Release Date: 2012-Jun-18

Bug Fixes:
* Association Rules:
    - assoc_rules() now uses schema-qualified function calls (MADLIB-435)
* Decision Trees:
    - Enhanced correctness (MADLIB-409, 502, 503)
    - Improved handling of invalid arguments (MADLIB-331)
* k-Means:
    - Improved handling of invalid arguments (MADLIB-336, 364, 459)
* PLDA:
    - Improved robustness (MADLIB-474)
* Sparse Vectors:
    - svec_sfv() now uses locale-aware sorting (MADLIB-457)
    - Operators now install to MADlib schema (MADLIB-470)

New Features/Improvements:
* C++ Abstraction Layer:
    - Support for "function pointers" (MADLIB-370)
    - Support for sparse vectors (MADLIB-371)
    - Support for more Eigen (linear algebra) types (MADLIB-533)
* Decision Trees:
    - Code refactoring and optimization (MADLIB-410, 476, 504, 509)
    - Documentation improvments (MADLIB-507)
    - Output table now contains unencoded information (MADLIB-434)
    - Enhance the missing value handling for continuous features (MADLIB-493)
* Hypothesis Tests:
    - Pearson chi-square test (MADLIB-390)
    - One- and two-sample t-Tests (MADLIB-391)
    - F-test (MADLIB-392)
    - Mann-Whitney U-test (MADLIB-393)
    - Kolmogorov-Smirnov test (MADLIB-394)
    - Wilcoxon-Signed-Rank test (MADLIB-405)
    - One-way ANOVA (MADLIB-406)
* PostgreSQL Extensibility:
    - Support for CREATE EXTENSION in PostgreSQL >= 9.1 (MADLIB-316)
    - Availability on PGXN (MADLIB-334)
* Probability Functions:
    - Wrap all distribution functions implemented by Boost (MADLIB-412)
    - Wrap Kolmogorov distribution function from CERN ROOT project (MADLIB-413)
* Random Forests:
    - New module (MADLIB-419)
* Support:
    - Add elementary matrix/vector functions (e.g., norm/distances etc.)
      (MADLIB-532)
* Viterbi Feature Extraction:
    - New module (MADLIB-478)

Known issues:
    - svec_sfv() does not support collations, as introduced with PostgreSQL 9.1
      (MADLIB-558)
    - Invalid arguments are not always guaranteed to be handled gracefully and
      may lead to confusing error messages (MADLIB-28, 359, 361, 363)

--------------------------------------------------------------------------------
MADlib v0.3

Release Date: 2012-Feb-9

New features:
* Installer:
    - Single installer package targeting all supported DBMSs per OS (MADLIB-218)
* C++ Abstraction Layer:
    - Switched from using Armadillo to using Eigen for linear-algebra
      operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275)
    - Reimplemented as a template library for performance improvements
      (MADLIB-295)
* Decision Trees:
    - Major update
    - Now supports multiple split criteria (information gain, gini, gain ratio)
    - Now supports tree pruning using a validation set to address over fitting
    - Now supports additional functions for tree output
    - Now supports continuous features in addition to categorical features
    - Additional support for handling null values
    - Improved scalability and performance
* k-Means Clustering:
    - Now handles any input that is convertible to SVEC. (MADLIB-42)
    - Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto
      similarity) (MADLIB-43)
    - Supports multiple seedings methods (kmeans++, random, user-specified list
      of centroids)
    - Replaced goodness of fit with the (simplified) Silhouette coefficient
      (MADLIB-45)
    - New run-time parameters (MADLIB-47)
* Linear Regression:
    - Major speed improvement
* Logistic Regression:
    - Major speed improvement
    - Now handles any input that is convertible to BOOLEAN (dependent variable)
      or DOUBLE PRECISION[] (independent variables). (MADLIB-283)
    - An under-/overflow safe version to evaluate the (usual) logistic function,
      for scoring logistic regression (MADLIB-271)
    - A third optimizer: Incremental-gradient-descent (MADLIB-303)
* Support:
    - For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way
      as the existing CREATE TABLE AS workaround. This workaround is not needed
      in Greenplum >= 4.2.1 any more. (MADLIB-265)
    - Function version() returns Madlib build information (MADLIB-309)

Bug fixes:
* Sparse vectors:
    - Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305)
    - Fixed a situation where using svec_svf() could cause a segmentation fault
      (MADLIB-350)
    - Increased compatibility with internal PostgreSQL conventions (MADLIB-257)
* Logistic regression:
    - Handle numerical instability more gracefully (MADLIB-343, MADLIB-345)
    - Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344)
    - Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356)

Known issues:
    - Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347)
    - K-means: the error '"nan" does not exist' may be raised when input vectors
      contain NaN. (MADLIB-364)
    - Association Rules require the madlib schema to be in the search path
      (MADLIB-353)
    - Invalid arguments are not always guaranteed to be handled gracefully and
      may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364)

--------------------------------------------------------------------------------
MADlib v0.2.1beta

Release Date: 2011-Sep-14

General changes:
* numerous improvements to the C++ abstraction layer:
    - code clean-up
    - fixed issue where incorrect values were returned when used with
      debug builds of PostgreSQL/Greenplum (MADLIB-253)
    - fixed issue where returning arrays to PostgreSQL/Greenplum could lead
      to a crash (MADLIB-250)
    - allocated memory is now 16-byte aligned for improved stability and
      performance (MADLIB-236)
* compiling with advanced warnings enabled by default now
* all C/C++ code now free of warnings. On gcc <= 4.6, there might still be
  warnings due to "unclean" macros in DBMS header files (MADLIB-228)
* prepared Solaris support in a later release (MADLIB-204)
    - added support for Sun Compiler in CMake build script
    - fixed all compilation errors with Sun compiler
* added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum
  issue (MADLIB-241). Included this as GP Compatibility module.
* madpack utility:
    - dropped madpack dependency on PygreSQL (MADLIB-217)
    - improved security in madpack install-check (MADLIB-229)
    - fixed bashism in madpack (MADLIB-222)
    - fixed install-check not running on non-default schema (MADLIB-251)

Modules/methods:
* SVM (kernel_machines):
    - fixed cumulative error count in svm_cls_update() function
    - improved memory management in SVM module
* Linear regression (regress):
    - fixed unexpected behavior for some edge cases (MADLIB-214)
    - fixed crashing with huge number of independent vars (MADLIB-250)
* Logistic regression (regress):
    - added support for arbitrary expressions for dep./indep. variables, not
      just column names (MADLIB-255)
* Quantile:
    - fixed quantile() function to be exact
    - added simple version for small data sets
* Sparse Vectors:
    - added check for sorted dictionary to svec_sfv (MADLIB-187)
* Decision Tree (decision_tree):
    - now can be run multiple times in one session (MADLIB-156)

Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* performance of the conjugate-gradient optimizer in logistic regression
  can be very poor (MADLIB-164)

--------------------------------------------------------------------------------
MADlib v0.2.0beta

Release Date: 2011-Jul-8

General changes:
* new build and installation framework based on CMake
* new C++ abstraction layer for easy and secure method development
* new database installation utility (madpack)

Modules/methods:
* new: Association Rules (assoc_rules)
* new: Array Operators (array_ops)
* new: Decision Tree (decision_tree)
* new: Conjugate Gradient (conjugate_gradient)
* new: Parallel LDA (plda)
* improved: all methods from previous release

Known issues:
* non-unified API for several SQL UDFs (MADLIB-208)
* running decision tree more than once in one session fails (MADLIB-156)
* performance of the conjugate-gradient optimizer in logistic regression
  can be very poor (MADLIB-164)
* svec_sfv function doesn't check for sorted dictionary (MADLIB-187)

--------------------------------------------------------------------------------
MADlib v0.1.0alpha

Release Date: 2011-Jan-31

Initial release.

Included modules/methods:
* Naive-Bayes Classification (bayes)
* k-Means Clustering (kmeans)
* Support Vector Machines (kernel_machines)
* Sketch-based Estimators (sketch)
* Sketch-based Profile (data_profile)
* Quantile (quantile)
* Linear & Logistic Regression (regress)
* SVD Matrix Factorisation (svdmf)
* Sparse Vectors (svec)

--------------------------------------------------------------------------------
MADlib v0.1.0prerelease

Release date: 2011-Jan-25

Demo release.
Something went wrong with that request. Please try again.