Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 815 lines (719 sloc) 36.731 kb
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
1 MADlib Release Notes
2 --------------------
3
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
4 These release notes contain the significant changes in each MADlib release,
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
5 with most recent versions listed at the top.
6
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
7 A complete list of changes for each release can be obtained by viewing the git
9b8f840 Updated Release Notes and version number (MADLIB-252)
Florian Schoppmann authored
8 commit history located at https://github.com/madlib/madlib/commits/master.
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
9
10 Current list of bugs and issues can be found at http://jira.madlib.net.
33d9fe5 v1.0: Update release Notes, version number and gppkg version
Rahul Iyer authored
11 --------------------------------------------------------------------------------
c001e1c Rahul Iyer Build: Release notes for v1.7
iyerr3 authored
12 MADlib v1.7
13
14 Release Date: 2014-December-31
15
16 New features:
17 * Generalized Linear Model:
18 - Added a new generic module for GLM functions that allow for response
19 variables that have arbitrary distributions (rather than simply
20 Gaussian distributions), and for an arbitrary function of the response
21 variable (the link function) to vary linearly with the predicted values
22 (rather than assuming that the response itself must vary linearly).
23 - Available distribution families: gaussian (link functions: identity,
24 inverse and log), binomial (link functions: probit and logit),
25 poisson (link functions: log, identity and square-root), gamma (link
26 functions: inverse, identity and log) and inverse gaussian (link functions:
27 square-inverse, inverse, identity and log).
28 - Deprecated 'mlogregr_train' in favor of 'multinom' available as part of
29 the new GLM functionality.
30 - Added a new 'ordinal' function for ordered logit and probit regression.
31 * Decision Tree: Reimplemented the decision tree module which includes following
32 changes:
33 - Improved usability due to a new interface.
34 - Performance enhancements upto 40 times faster than the old interface.
35 - Additional features like pruning methods, surrogate variables for
36 NULL handling, cross validation, and various new tree tuning parameters.
37 - Addition of a new display function to visualize the trained tree and new
38 prediction function for scoring of new datasets.
39 * Random Forest: Reimplemented the random forest module which includes following
40 changes:
41 - New random forest module based on the new decision tree module.
42 - Better variable importance metrics and ability to explore each tree
43 in the forest independently.
44 - Ability to get class probabilities of all classes and not just the max
45 class during prediction.
46 - Improved visualization with export capabilities using Graphviz dot format.
47 * PMML:
48 - Upgraded compatible PMML version to 4.1.
49 - Moved PMML export out of early stage development with new functionality
50 available to export GLM, decision tree, and random forest models.
51 * Updated Eigen from 3.1.2 to 3.2.2.
52 * Updated PyXB from 1.2.3 to 1.2.4.
53 * Added finer granularity control for running specific install-check tests.
54
55 Bug fixes:
56 - Fixed bug in K-means allowing use of user-defined metric functions
57 (MADLIB-874, MADLIB-875).
58 - Fixed issues related to header files included in the build system
59 (MADLIB-855, MADLIB-879, MADLIB-884).
60
61 Known issues:
62 - Performance for decision tree with cross-validation is poor on a HAWQ
63 multi-node system.
64
65 --------------------------------------------------------------------------------
aed9a04 Liquan Pei Deprecated profile, svd_mf, quantile
Ishiihara authored
66 MADlib v1.6
67
538c8cc Release notes for v1.5 + license for PyXB
Rahul Iyer authored
68 Release Date: 2014-June-30
aed9a04 Liquan Pei Deprecated profile, svd_mf, quantile
Ishiihara authored
69
e40bb5d Elastic Net: Use loglikelihood to test for convergence
Rahul Iyer authored
70 New features:
538c8cc Release notes for v1.5 + license for PyXB
Rahul Iyer authored
71 - Added a new unified 'margins' function that computes marginal effects for
72 linear, logistic, multilogistic, and cox proportional hazards regression. The
73 new function also introduces support for interaction terms in the independent
74 array.
75 - Updated convergence for 'elastic_net_train' by checking the change in the
76 loglikelihood instead of the l2-norm of the change in coefficients. This allows
77 for faster convergence in problems with multiple optimal solutions.
78 The default threshold for convergence has been reduced from 1e-4 to 1e-6.
79 - Added a new helper function to convert categorical variables to indicator
80 variables which can be used directly in regression methods. The function
81 currently only supports dummy encoding.
82 - Improved performance for cox proportional hazards: average improvement of
83 20 fold on GPDB and 2.5 fold on HAWQ.
84 - Improved performance on ARIMA by 30%.
85 - Added new functionality to export linear and logistic regression models as a
86 PMML object. The new module relies on PyXB to create PMML elements.
87 - Added a function ('array_scalar_add') to 'add' a scalar to an array.
88 - Added 'numeric' type support for all functions that take 'anyarray' as
89 argument.
90 - Made usability and aesthetic enhancements to documentation.
e40bb5d Elastic Net: Use loglikelihood to test for convergence
Rahul Iyer authored
91
92 Bug Fixes:
538c8cc Release notes for v1.5 + license for PyXB
Rahul Iyer authored
93 - Prepended python module name to sys.path before executing madlib function
94 to avoid conflicts with user-defined modules.
95 - Added a check in K-Means to ensure dimensionality of all data points are
96 the same and also equal to the dimensionality of any provided initial centroids
97 (MADLIB-713, MADLIB-789).
98 - Added a check in multinomial regression to quit early and cleanly if model
99 size is greater than the maximum permissible memory (MADLIB-667).
100 - Fixed a minor bug with incorrect column names in the decision trees module
101 (MADLIB-763).
102 - Fixed a bug in Kmeans that resulted in incorrect number of centroids for
103 particular datasets (MADLIB-857).
104 - Fixed bug when grouping columns have same name as one of the output table
105 column names (MADLIB-833).
e40bb5d Elastic Net: Use loglikelihood to test for convergence
Rahul Iyer authored
106
107 Deprecated Functions:
538c8cc Release notes for v1.5 + license for PyXB
Rahul Iyer authored
108 - Modules profile and quantile have been deprecated in favor of the 'summary'
109 function.
110 - Module 'svd_mf' has been deprecated in favor of the improved 'svd' function.
111 - Functions 'margins_logregr' and 'margins_mlogregr' have been deprecated in
112 favor of the 'margins' function.
aed9a04 Liquan Pei Deprecated profile, svd_mf, quantile
Ishiihara authored
113
114 --------------------------------------------------------------------------------
29e4e1c Release Notes for v1.5
Rahul Iyer authored
115 MADlib v1.5
116
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
117 Release Date: 2014-Mar-05
29e4e1c Release Notes for v1.5
Rahul Iyer authored
118
119 New features:
120 - Added a new port 'HAWQ'. MADlib can now be used with the Pivotal
121 Distribution of Hadoop (PHD) through HAWQ
122 (see http://www.gopivotal.com/big-data/pivotal-hd for more details).
123 - Implemented performance improvements for linear and logistic predict functions.
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
124 - Moved Conditional Random Fields (CRFs) out of early stage development, and
125 updated the design and APIs for to enable ease of use and better functionality.
126 API changes include lincrf replaced by lincrf_train, crf_train_fgen and
127 crf_test_fgen with updated arguments, and format of segment tables.
128 - Improved linear support vector machines (SVMs) by enabling iterations, and
129 removed lsvm_predict and svm_predict, which are not useful in GPDB and HAWQ.
130 - Added new functions, with improved performance compared to svec_sfv, for
131 document vectorization into sparse vectors.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
132 - Removed the bool-to-text cast and updated all functions depending on it to
133 explicitly convert variable to text.
134 - Added function properties for all SQL functions to allow the database optimizer
135 to make better plans.
136
137 Bug Fixes:
138 - Set client_min_messages to 'notice' during database installation to ensure
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
139 that log messages don't get logged to STDERR.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
140 - Fixed elastic net prediction to predict using all features instead of just
141 the selected features to avoid an error when no feature is selected as relevant
142 in the trained model.
143 - For corner probability values, p=0 and p=1, in bernoulli and binomial
144 distributions, the quantile values should be 0 and num_of_trials (=1 in the case
145 of bernoulli) respectively, independent of the probability of success.
146 - Changed install script to explicitly use /bin/bash instead of /bin/sh to avoid
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
147 problems in Ubuntu where /bin/sh is linked to 'dash'.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
148 - Fixed issue in Elastic Net to take any array expression as input instead of
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
149 specifically expecting the expression 'ARRAY[...]'.
150 - Fixed wrong output in percentile of count-min (CM) sketches.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
151
152 Known issues:
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
153 - Elastic net prediction wrapper function elastic_net_prediction is not
154 available in HAWQ. Instead, prediction functionality is available for both
155 families via elastic_net_gaussian_predict and elastic_net_binomial_predict.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
156 - Distance metrics functions in K-Means for the HAWQ port are restricted to the
157 in-built functions, specifically squaredDistNorm2, distNorm2, distNorm1,
158 distAngle, and distTanimoto.
a7e0f65 Feng, Xixuan (Aaron) Release Notes: Update to include RF, SVM, Sketch
haying authored
159 - Functions in Quantile and Profile modules of Early Stage Development are not
160 available in HAWQ. Replacement of these functions is available as built-in
161 functions (percentile_cont) in HAWQ and Summary module in MADlib, respectively.
29e4e1c Release Notes for v1.5
Rahul Iyer authored
162
163 --------------------------------------------------------------------------------
3a327d0 Release 1.4.1: Release Notes + version changes
Rahul Iyer authored
164 MADlib v1.4.1
165
166 Release Date: 2013-Dec-13
167
168 Bug Fixes:
169 - Fixed problem in Elastic Net for 'binomial' family if an 'integer' column was
29e4e1c Release Notes for v1.5
Rahul Iyer authored
170 passed for dependent variable instead of a 'boolean' column.
3a327d0 Release 1.4.1: Release Notes + version changes
Rahul Iyer authored
171 - '*' support in Elastic Net lacked checks for the columns being combined. Now
172 we check if the column for '*' is already an array, in which case we don't wrap
173 it with an 'array' modifier. If there are multiple columns we check that they
174 are of the same numeric type before building an array.
175 - Fixed a software regression in Robust Variance, Clustered Variance and
176 Marginal Effects for multinomial regression introduced in v1.4 when
177 output table name is schema-qualified.
178 - We now also support schema-qualified output table prefixes for SVD and PCA.
179 - Added warning message when deprecated functions are run. Also added a list of
180 deprecated functions in the ReadMe.
181 - Added a Markdown Readme along with the text version for better rendering on
182 Github.
183
184 --------------------------------------------------------------------------------
24d9fba Documentation: Fix various inconsistencies in documentation
Rahul Iyer authored
185 MADlib v1.4
186
187 Release Date: 2013-Nov-25
188
189 New Features:
190 * Improved interface for Multinomial logistic regression:
191 - Added a new interface that accepts an 'output_table' parameter and
192 stores the model details in the output table instead of returning as a struct
193 data type. The updated function also builds a summary table that includes
194 all parameters and meta-parameters used during model training.
195 - The output table has been reformatted to present the model coefficients
196 and related metrics for each category in a separate row. This replaces the
197 old output format of model stats for all categories combined in a
198 single array.
199 * Variance Estimators
200 - Added Robust Variance estimator for Cox PH models (Lin and Wei, 1989).
201 It is useful in calculating variances in a dataset with potentially
202 noisy outliers. Namely, the standard errors are asymptotically normal even
203 if the model is wrong due to outliers.
204 - Added Clustered Variance estimator for Cox PH models. It is used
205 when data contains extra clustering information besides covariates and
206 are asymptotically normal estimates.
207 * NULL Handling:
208 - Modified behavior of regression modules to 'omit' rows containing NULL
209 values for any of the dependent and independent variables. The number of
210 rows skipped is provided as part of the output table.
211 This release includes NULL handling for following modules:
212 - Linear, Logistic, and Multinomial logistic regression, as well as
213 Cox Proportional Hazards
214 - Huber-White sandwich estimators for linear, logistic, and multinomial
215 logistic regression as well as Cox Proportional Hazards
216 - Clustered variance estimators for linear, logistic, and multinomial
217 logistic regression as well as Cox Proportional Hazards
218 - Marginal effects for logistic and multinomial logistic regression
219
220 Deprecated functions:
221 - Multinomial logistic regression function has been renamed to
222 'mlogregr_train'. Old function ('mlogregr') has been deprecated,
223 and will be removed in the next major version update.
224
225 - For all multinomial regression estimator functions (list given below),
226 changes in the argument list were made to collate all optimizer specific
227 arguments in a single string. An example of the new optimizer parameter is
228 'max_iter=20, optimizer=irls, precision=0.0001'.
229 This is in contrast to the original argument list that contained 3 arguments:
230 'max_iter', 'optimizer', and 'precision'. This change allows adding new
231 optimizer-specific parameters without changing the argument list.
232 Affected functions:
233 - robust_variance_mlogregr
234 - clustered_variance_mlogregr
235 - margins_mlogregr
236
237 Bug Fixes:
238 - Fixed an overflow problem in LDA by using INT64 instead of INT32.
239 - Fixed integer to boolean cast bug in clustered variance for logistic
240 regression. After this fix, integer columns are accepted for binary
241 dependent variable using the 'integer to bool' cast rules.
242 - Fixed two bugs in SVD:
243 - The 'example' option for online help has been fixed
244 - Column names for sparse input tables in the 'svd_sparse' and
245 'svd_sparse_native' functions are no longer restricted to 'row_id',
246 'col_id' and 'value'.
247
248 --------------------------------------------------------------------------------
1b7646e Upgrade: Add script to upgrade to v1.3 from v1.0, v1.1, v1.2
Rahul Iyer authored
249 MADlib v1.3
250
251 Release Date: 2013-October-03
252
253 New Features:
254 * Cox Proportional Hazards:
255 - Added stratification support for Cox PH models. Stratification is used as
256 shorthand for building a Cox model that allows for more than one stratum,
257 and hence, allows for more than one baseline hazard function.
258 Stratification provides two pieces of key, flexible functionality for the
259 end user of Cox models:
260 -- Allows a categorical variable Z to be appropriately accounted for in
261 the model without estimating its predictive impact on the response
262 variable.
263 -- Categorical variable Z is predictive/associated with the response
264 variable, but Z may not satisfy the proportional hazards assumption
265 - Added a new function (cox_zph) that tests the proportional hazards
266 assumption of a Cox model. This allows the user to build Cox models and then
267 verify the relevance of the model.
268 * NULL Handling:
269 - Modified behavior of linear and logistic regression to 'omit' rows
270 containing NULL values for any of the dependent and independent variables.
271 The number of rows skipped is provided as part of the output table.
272
273 Deprecated functions:
274 - Cox Proportional Hazard function has been renamed to 'coxph_train'.
275 Old function names ('cox_prop_hazards' and 'cox_prop_hazards_regr')
276 have been deprecated, and will be removed in the next major version update.
277 - The aggregate form of linear regression ('linregr') has been deprecated.
278 The stored-procedure form ('linregr_train') should be used instead.
279
280 Bug Fixes:
281 - Fixed a memory leak in the Apriori algorithm.
282
283
284 --------------------------------------------------------------------------------
3ee6ff4 Release steps: Update release notes and other version numbers
Rahul Iyer authored
285 MADlib v1.2
286
287 Release Date: 2013-September-06
288
289 New Features:
290 * ARIMA Timeseries modeling
291 - Added auto-regressive integrated moving average (ARIMA) modeling for
292 non-seasonal, univariate timeseries data.
293 - Module includes a training function to compute an ARIMA model and a
294 forecasting function to predict future values in the timeseries
295 - Training function employs the Levenberg-Marquardt algorithm (LMA) to
296 compute a numerical solution for the parameters of the model. The
297 observations and innovations for time before the first timestamp
298 are assumed to be zero leading to minimization of the conditional sum of
299 squares. This produces estimates referred to as conditional maximum likelihood
300 estimates (also referred as 'CSS' in some statistical packages).
301 * Documentation updates:
302 - Introduced a new format for documentation improving usability.
303 - Upgraded to Doxygen v1.84.
304 - Updated documentation improving consistency for multiple modules including
305 Regression methods, SVD, PCA, Summary function, and Linear systems.
306 Bug fixes:
307 - Checking out-of-bounds access of a 'svec' even if the size of svec is zero.
308 - Fixed a minor bug allowing use of GCC 4.7 and higher to build from source.
309 --------------------------------------------------------------------------------
96a50d9 Release notes: Updates for v1.1
Rahul Iyer authored
310 MADlib v1.1
311
312 Release Date: 2013-August-09
313
314 New Features:
315 * Singular Value Decomposition:
316 - Added Singular Value Decomposition using the Lanczos bidiagonalization
317 iterative method to decompose the original matrix into PBQ^t, where B is
318 a bidiagonalized matrix. We assume that the original matrix is too big to
319 load into memory but B can be loaded into the memory. B is then further
3ee6ff4 Release steps: Update release notes and other version numbers
Rahul Iyer authored
320 decomposed into XSY^T using Eigen's JacobiSVD function. This restricts the
96a50d9 Release notes: Updates for v1.1
Rahul Iyer authored
321 number of features in the data matrix to about 5000.
322 - This implementation provides SVD (for dense matrix), SVD_BLOCK (also for
323 dense matrix but faster), SVD_SPARSE (convert a sparse matrix into a
324 dense one, slower) and SVD_SPARSE_NATIVE (directly operate on the sparse
325 matrix, much faster for really sparse matrices).
326
327 * Principal Component Analysis:
328 - Added a PCA training function that generates the top-K principal
329 components for an input matrix. The original data is mean-centered by the
330 function with the mean matrix returned by the function as a separate table.
331 - The module also includes the projection function that projects a test data
332 set to the principal components returned by the train function.
333
334 * Linear Systems:
335 - Added a module to solve linear system of equations (Ax = b).
336 - The module utilizes various direct methods from the Eigen library for
337 dense systems. Given below is a summary of the methods (more details at
338 http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html):
339 - Householder QR
340 - Partial Pivoting LU
341 - Full Pivoting LU
342 - Column Pivoting Householder QR
343 - Full Pivoting Householder QR
344 - Standard Cholesky decomposition (LLT)
345 - Robust Cholesky decomposition (LDLT)
346 - The module also includes direct and iterative methods for sparse linear
347 systems:
348 Direct:
349 - Standard Cholesky decomposition (LLT)
350 - Robust Cholesky decomposition (LDLT)
351 Iterative:
352 - In-memory Conjugate gradient
353 - In-memory Conjugate gradient with diagonal preconditioners
354 - In-memory Bi-conjugate gradient
355 - In-memory Bi-conjugate gradient with incomplete LU preconditioners
356
357 Bug fixes and other changes:
358 * Robust input validation:
359 - Validation of input parameters to various functions has been improved to
360 ensure that it does not fail if double quotes are included as part of the
361 table name.
362 * Random Forest
363 - The ID field in rf_train has been expanded from INT to BIGINT (MADLIB-764)
364 * Various documentation updates:
365 - Documentation updated for various modules including elastic net, linear
366 and logistic regression.
367 --------------------------------------------------------------------------------
33d9fe5 v1.0: Update release Notes, version number and gppkg version
Rahul Iyer authored
368 MADlib v1.0
369
370 Release Date: 2013-July-03
371
372 New Features:
373 * Cox Proportional Hazards:
374 - Added Right Censoring support for Cox Prop Hazards
375 * Robust Variance Tests - Huber White:
376 - Added a method of calculating robust variance statistic by utilizing the
377 Huber-White sandwich estimator for linear regression, logistic regression,
378 and multinomial logistic regression
379 - Robust variance for linear and logistic regression also includes
380 grouping support
381 * Clustered Sandwich Estimators:
382 - Added clustered robust variance statistic by utilizing a clustered sandwich
383 estimator for linear regression, logistic regression, and multinomial
384 logistic regression
385 - Grouping is currently not implemented for clustered and parameter is only
386 a placeholder at present
387 * Marginal Effects Estimator:
388 - Added a method for computing the marginal effects for logistic regression
389 and multinomial logistic regression
390 - Grouping is currently not implemented for marginal effects and the
391 parameter is only a placeholder at present
392 * Multinomial logistic regression:
393 - Added a parameter in multinomial logistic regression, to enable picking
394 the reference category. Input for number of categories has been removed
395 due to redundancy
396 * Linear regression:
397 - Updated grouping columns to input as a comma delimited string rather
398 than as an array
ba07747 ReleaseNotes: Add note about condition number improvement
Rahul Iyer authored
399 - Resolved an issue with highly collinear data to produce results consistent
400 with other statistical packages. Threshold on condition number to use an
401 approximation for computing the pseudo-inverse was increased.
33d9fe5 v1.0: Update release Notes, version number and gppkg version
Rahul Iyer authored
402 * Logistic regression:
403 - Changed behavior to error-out if the ouput table already exists
404
405 Bug fixes:
406 * Summary:
407 - Summary function (when used with quartiles) used high memory when number
408 of column is large. This has been fixed by computing quartiles in an
409 iterative manner for a fixed number of columns (Pivotal-170)
410 - Fixed a problem with incorrect number of rows returned for Summary when
411 all values in a column are NULL (Pivotal-171)
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
412 --------------------------------------------------------------------------------
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
413 MADlib v0.7
414
415 Release Date: 2013-May-01
416
417 New Features:
418 * Correlation function:
419 - Function to compute Pearson's cross-correlation for numeric columns in a
420 relational table
421 * Upgrade capability:
422 - All new versions since v0.7 are installed in a version-specific folder
423 (/usr/local/madlib/Versions/)
424 - Upgrade from v0.5/v0.6 to v0.7 on the database is now supported without
425 uninstalling previous MADlib database installation.
426 - Dependencies on updated functions, types, and other operators are caught
427 and upgrade is aborted with an appropriate message
428
429 Bug fixes:
430 * Linear Regression:
431 - Improved matrix inversion method to compute coefficients comparable to R
432 for regression problems with high multicollinearity (MADLIB-790)
433 * Logistic Regression:
434 - Fixed a problem in logistic regression with grouping on 'text' datatype
435 columns (MADLIB-791)
436
437 Known issues:
438 * Upgrade:
1706d4c ReleaseNotes: Add warning about problems in upgrade
Rahul Iyer authored
439 - Views dependent on MADlib functions being updated will be dropped during
440 the upgrade and restored after finishing upgrade. If upgrade fails for
441 any reason, these views and the original MADlib schema will *not* be
442 restored. Before initiating upgrade, we recommend taking a backup of
443 the MADlib schema and move all views dependent on MADlib to separate
444 schema and perform a backup with:
445 pg_dump -n 'schema_name'
446
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
447 - Upgrade is currently not supported for the PostgreSQL platform and will
448 abort with an error
1706d4c ReleaseNotes: Add warning about problems in upgrade
Rahul Iyer authored
449
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
450 - Upgrade currently does not detect functions defined by the user that
451 depend upon MADlib functions. Semantic/API changes to these MADlib
452 functions could lead to undefined results in such user-defined functions
1706d4c ReleaseNotes: Add warning about problems in upgrade
Rahul Iyer authored
453
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
454 - Some important changes for the upgrade from v0.5 to v0.7 are given below
455 (Upgrade will raise an error and abort if there exist user-defined views
456 that depend on these changes. User-defined functions are not validated
457 with this check. An aborted upgrade does not affect the installed version
458 of MADlib.)
459 -- Logistic regression renamed from 'logregr' to 'logregr_train'
460 -- All internal and external aggregates in logistic regression
461 have been updated
462 -- PLDA module replaced with a refactored LDA module. Due to the
463 renaming all functions using PLDA need to be updated
464 -- Updated MADlib types:
465 logregr_result, plda_topics_t, plda_word_distrn,
466 plda_word_weight
467 --------------------------------------------------------------------------------
f88edc3 Release Notes and version number updated for v0.6
Hai Qian authored
468 MADlib v0.6
469
470 Release Date: 2013-Apr-01
471
472 New Features / Improvements:
473 * Generic cross-validation:
474 - Support for k-fold cross-validation of any supervised learning
475 algorithm
476 * Heteroskedasticity of linear regression
477 - Support for calculating heteroskedasticity via Breusch-Pagan test
478 * Grouping support for linear regression
479 - Support for linear regression on each group of data grouped by
480 one or multiple columns
481 * Grouping support for logistic regression
482 - Refactor of logistic regression code
483 - Support for logistic regression on each group of data grouped by
484 one or multiple columns
485 - Grouping support is added to the convex optimization framework
486 * LDA:
487 - Improved performance and scalability (MADLIB-480)
488 * Elastic net regularization for both linear and logistic regressions
489 - Support FISTA and IGD optimizers
490 * Summary function
491 - Support for an overview of data table
492 * Eigen package upgrade
493 - Now Eigen 3.1.2 is used by MADlib v0.6
494 * Unit testing framework:
495 - A new unit testing framework is added for C++ abstraction layer
496
497 Bug Fixes:
498 * C++ abstraction layer:
499 - Improved handling of NULL values in the input array (MADLIB-773)
500 * Naive Bayes:
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
501 - Improved the handling of NULL values. (MADLIB-749)
f88edc3 Release Notes and version number updated for v0.6
Hai Qian authored
502
503 Known Issues:
504
505 * K-means:
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
506 - K-means crashes on some datasets, when the dimensionality of the points
507 is not uniform on the data set. (MADLIB-789)
f88edc3 Release Notes and version number updated for v0.6
Hai Qian authored
508
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
509 * Distribution Functions:
510 - Certain quantile functions will abort their session on invalid input
511 (MADLIB-786)
f88edc3 Release Notes and version number updated for v0.6
Hai Qian authored
512
513 * Multinomial Logistic Regression:
81544c2 Release v0.7 tasks: Update version, add release notes and tags
Rahul Iyer authored
514 - Signs of coefficient outputs are inconsistent with other tools like R and
515 Stata (MADLIB-785)
f88edc3 Release Notes and version number updated for v0.6
Hai Qian authored
516
517
518 --------------------------------------------------------------------------------
66dbbed Sujit Philip v0.5: Release notes and Version number
sujitp authored
519 MADlib v0.5
520
521 Release Date: 2012-Nov-15
522
523 Bug Fixes:
524 * K-means:
525 - Improved handling of invalid arguments (MADLIB-359, 361)
526 * Sketch-based estimators:
527 - Addressed security vulnerability (MADLIB-630)
528
529 New Features / Improvements:
530 * Association Rules (Apriori):
531 - Improved reporting output format for better usability (MADLIB-411)
532 - Significant improvement in performance (MADLIB-638)
533 * C++ (Database) Abstraction Layer:
534 - Extension to support modular transition states (MADLIB-499)
535 - Extension to support functions returning set of values (MADLIB-638)
536 * Conditional Random fields:
537 - Support for Linear Chain Conditional Random Fields for NLP (MADLIB-628)
538 * Decision Tree:
539 - Improved performance for C4.5 and Random forests (MADLIB-605)
540 - Improved encoding (MADLIB-590)
541 * Infrastructure:
542 - Convex optimization framework
543 * K-means:
544 - Code refactoring and Improved performance
545 (MADLIB-454, MADLIB-522, MADLIB-678)
546 - Silhouette function for k-means (MADLIB-681)
547 * Low-rank Matrix Factorization
548 - New module
549 * Logistic Regression:
550 - Support for Multinomial Logistic Regression (MADLIB-575)
551 * Naive Bayes
552 - Significant improvement in performance (MADLIB-611, 619, 626)
553 * Regression Analysis:
554 - Support for Cox Proportional Hazards test (MADLIB-576)
555 * Sampling
556 - Added weighted sampling of a single row (MADLIB-584)
557 * SVD Matrix Factorization:
558 - Improved performance (MADLIB-578)
559
560 Documentation:
561 * Conditional Random Fields:
562 - Example added for CRF module (MADLIB-731)
563 * SVD Matrix Factorization:
564 - Incremental-gradient SVD algorithm (MADLIB-572)
565
566 Known issues:
567 * Multinomial Logistic Regression:
568 - Number of independent variables cannot exceed 65535 (MADLIB-665)
569 * Naive Bayes:
570 - Current implementation of Naive Bayes is only suitable for
571 categorical attributes (MADLIB-679)
572 - NULL input values not accepted for attributes (MADLIB-614)
573 - NULL probabilities given for test set values not seen in
574 training set (MADLIB-523)
575
576 --------------------------------------------------------------------------------
966dc7f Updated release notes, version number, and year of copyright in MADlib l...
Florian Schoppmann authored
577 MADlib v0.4.1
578
579 Release Date: 2012-Aug-9
580
581 Bug Fixes:
582 * PGXN:
583 - Fixed installation problem that could occur on some platforms (MADLIB-589)
584
585 New Features/Improvements:
586 * C++ Abstraction Layer:
587 - Increased ABI compatibility across multiple Greenplum versions
588 (MADLIB-606)
589 * Hypothesis Tests:
590 - Tests that are not implemented as ordered aggregates are now also
591 installed on PostgreSQL 8.4 and Greenplum 4.0.
592
593 --------------------------------------------------------------------------------
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
594 MADlib v0.4
595
596 Release Date: 2012-Jun-18
597
598 Bug Fixes:
599 * Association Rules:
600 - assoc_rules() now uses schema-qualified function calls (MADLIB-435)
601 * Decision Trees:
602 - Enhanced correctness (MADLIB-409, 502, 503)
603 - Improved handling of invalid arguments (MADLIB-331)
604 * k-Means:
605 - Improved handling of invalid arguments (MADLIB-336, 364, 459)
606 * PLDA:
607 - Improved robustness (MADLIB-474)
608 * Sparse Vectors:
609 - svec_sfv() now uses locale-aware sorting (MADLIB-457)
610 - Operators now install to MADlib schema (MADLIB-470)
611
612 New Features/Improvements:
613 * C++ Abstraction Layer:
614 - Support for "function pointers" (MADLIB-370)
615 - Support for sparse vectors (MADLIB-371)
616 - Support for more Eigen (linear algebra) types (MADLIB-533)
617 * Decision Trees:
618 - Code refactoring and optimization (MADLIB-410, 476, 504, 509)
619 - Documentation improvments (MADLIB-507)
620 - Output table now contains unencoded information (MADLIB-434)
621 - Enhance the missing value handling for continuous features (MADLIB-493)
622 * Hypothesis Tests:
623 - Pearson chi-square test (MADLIB-390)
624 - One- and two-sample t-Tests (MADLIB-391)
625 - F-test (MADLIB-392)
626 - Mann-Whitney U-test (MADLIB-393)
627 - Kolmogorov-Smirnov test (MADLIB-394)
628 - Wilcoxon-Signed-Rank test (MADLIB-405)
629 - One-way ANOVA (MADLIB-406)
630 * PostgreSQL Extensibility:
631 - Support for CREATE EXTENSION in PostgreSQL >= 9.1 (MADLIB-316)
632 - Availability on PGXN (MADLIB-334)
633 * Probability Functions:
634 - Wrap all distribution functions implemented by Boost (MADLIB-412)
635 - Wrap Kolmogorov distribution function from CERN ROOT project (MADLIB-413)
636 * Random Forests:
637 - New module (MADLIB-419)
638 * Support:
639 - Add elementary matrix/vector functions (e.g., norm/distances etc.)
640 (MADLIB-532)
641 * Viterbi Feature Extraction:
642 - New module (MADLIB-478)
643
644 Known issues:
645 - svec_sfv() does not support collations, as introduced with PostgreSQL 9.1
646 (MADLIB-558)
647 - Invalid arguments are not always guaranteed to be handled gracefully and
648 may lead to confusing error messages (MADLIB-28, 359, 361, 363)
649
650 --------------------------------------------------------------------------------
72cee94 Build system:
Florian Schoppmann authored
651 MADlib v0.3
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
652
653 Release Date: 2012-Feb-9
72cee94 Build system:
Florian Schoppmann authored
654
655 New features:
656 * Installer:
657 - Single installer package targeting all supported DBMSs per OS (MADLIB-218)
658 * C++ Abstraction Layer:
659 - Switched from using Armadillo to using Eigen for linear-algebra
660 operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275)
661 - Reimplemented as a template library for performance improvements
662 (MADLIB-295)
663 * Decision Trees:
664 - Major update
665 - Now supports multiple split criteria (information gain, gini, gain ratio)
666 - Now supports tree pruning using a validation set to address over fitting
667 - Now supports additional functions for tree output
668 - Now supports continuous features in addition to categorical features
669 - Additional support for handling null values
670 - Improved scalability and performance
671 * k-Means Clustering:
672 - Now handles any input that is convertible to SVEC. (MADLIB-42)
673 - Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto
674 similarity) (MADLIB-43)
675 - Supports multiple seedings methods (kmeans++, random, user-specified list
676 of centroids)
677 - Replaced goodness of fit with the (simplified) Silhouette coefficient
678 (MADLIB-45)
679 - New run-time parameters (MADLIB-47)
680 * Linear Regression:
681 - Major speed improvement
682 * Logistic Regression:
683 - Major speed improvement
684 - Now handles any input that is convertible to BOOLEAN (dependent variable)
685 or DOUBLE PRECISION[] (independent variables). (MADLIB-283)
686 - An under-/overflow safe version to evaluate the (usual) logistic function,
687 for scoring logistic regression (MADLIB-271)
688 - A third optimizer: Incremental-gradient-descent (MADLIB-303)
689 * Support:
690 - For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way
691 as the existing CREATE TABLE AS workaround. This workaround is not needed
692 in Greenplum >= 4.2.1 any more. (MADLIB-265)
693 - Function version() returns Madlib build information (MADLIB-309)
694
695 Bug fixes:
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
696 * Sparse vectors:
72cee94 Build system:
Florian Schoppmann authored
697 - Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305)
698 - Fixed a situation where using svec_svf() could cause a segmentation fault
699 (MADLIB-350)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
700 - Increased compatibility with internal PostgreSQL conventions (MADLIB-257)
701 * Logistic regression:
72cee94 Build system:
Florian Schoppmann authored
702 - Handle numerical instability more gracefully (MADLIB-343, MADLIB-345)
703 - Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344)
704 - Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356)
705
706 Known issues:
707 - Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347)
708 - K-means: the error '"nan" does not exist' may be raised when input vectors
709 contain NaN. (MADLIB-364)
710 - Association Rules require the madlib schema to be in the search path
711 (MADLIB-353)
712 - Invalid arguments are not always guaranteed to be handled gracefully and
713 may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364)
714
715 --------------------------------------------------------------------------------
9b8f840 Updated Release Notes and version number (MADLIB-252)
Florian Schoppmann authored
716 MADlib v0.2.1beta
717
72cee94 Build system:
Florian Schoppmann authored
718 Release Date: 2011-Sep-14
719
720 General changes:
721 * numerous improvements to the C++ abstraction layer:
722 - code clean-up
723 - fixed issue where incorrect values were returned when used with
724 debug builds of PostgreSQL/Greenplum (MADLIB-253)
725 - fixed issue where returning arrays to PostgreSQL/Greenplum could lead
726 to a crash (MADLIB-250)
727 - allocated memory is now 16-byte aligned for improved stability and
728 performance (MADLIB-236)
729 * compiling with advanced warnings enabled by default now
730 * all C/C++ code now free of warnings. On gcc <= 4.6, there might still be
731 warnings due to "unclean" macros in DBMS header files (MADLIB-228)
732 * prepared Solaris support in a later release (MADLIB-204)
733 - added support for Sun Compiler in CMake build script
734 - fixed all compilation errors with Sun compiler
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
735 * added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum
72cee94 Build system:
Florian Schoppmann authored
736 issue (MADLIB-241). Included this as GP Compatibility module.
737 * madpack utility:
738 - dropped madpack dependency on PygreSQL (MADLIB-217)
739 - improved security in madpack install-check (MADLIB-229)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
740 - fixed bashism in madpack (MADLIB-222)
72cee94 Build system:
Florian Schoppmann authored
741 - fixed install-check not running on non-default schema (MADLIB-251)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
742
72cee94 Build system:
Florian Schoppmann authored
743 Modules/methods:
744 * SVM (kernel_machines):
745 - fixed cumulative error count in svm_cls_update() function
746 - improved memory management in SVM module
747 * Linear regression (regress):
748 - fixed unexpected behavior for some edge cases (MADLIB-214)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
749 - fixed crashing with huge number of independent vars (MADLIB-250)
72cee94 Build system:
Florian Schoppmann authored
750 * Logistic regression (regress):
751 - added support for arbitrary expressions for dep./indep. variables, not
752 just column names (MADLIB-255)
753 * Quantile:
754 - fixed quantile() function to be exact
755 - added simple version for small data sets
756 * Sparse Vectors:
757 - added check for sorted dictionary to svec_sfv (MADLIB-187)
758 * Decision Tree (decision_tree):
759 - now can be run multiple times in one session (MADLIB-156)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
760
72cee94 Build system:
Florian Schoppmann authored
761 Known issues:
762 * non-unified API for several SQL UDFs (MADLIB-208)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
763 * performance of the conjugate-gradient optimizer in logistic regression
72cee94 Build system:
Florian Schoppmann authored
764 can be very poor (MADLIB-164)
9b8f840 Updated Release Notes and version number (MADLIB-252)
Florian Schoppmann authored
765
766 --------------------------------------------------------------------------------
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
767 MADlib v0.2.0beta
768
72cee94 Build system:
Florian Schoppmann authored
769 Release Date: 2011-Jul-8
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
770
72cee94 Build system:
Florian Schoppmann authored
771 General changes:
772 * new build and installation framework based on CMake
773 * new C++ abstraction layer for easy and secure method development
774 * new database installation utility (madpack)
775
776 Modules/methods:
777 * new: Association Rules (assoc_rules)
778 * new: Array Operators (array_ops)
779 * new: Decision Tree (decision_tree)
780 * new: Conjugate Gradient (conjugate_gradient)
781 * new: Parallel LDA (plda)
782 * improved: all methods from previous release
783
784 Known issues:
785 * non-unified API for several SQL UDFs (MADLIB-208)
786 * running decision tree more than once in one session fails (MADLIB-156)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
787 * performance of the conjugate-gradient optimizer in logistic regression
72cee94 Build system:
Florian Schoppmann authored
788 can be very poor (MADLIB-164)
789 * svec_sfv function doesn't check for sorted dictionary (MADLIB-187)
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
790
791 --------------------------------------------------------------------------------
792 MADlib v0.1.0alpha
793
72cee94 Build system:
Florian Schoppmann authored
794 Release Date: 2011-Jan-31
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
795
72cee94 Build system:
Florian Schoppmann authored
796 Initial release.
797
798 Included modules/methods:
799 * Naive-Bayes Classification (bayes)
800 * k-Means Clustering (kmeans)
801 * Support Vector Machines (kernel_machines)
802 * Sketch-based Estimators (sketch)
803 * Sketch-based Profile (data_profile)
804 * Quantile (quantile)
805 * Linear & Logistic Regression (regress)
806 * SVD Matrix Factorisation (svdmf)
807 * Sparse Vectors (svec)
bdb59fc v0.4: Release notes, Version number, Read-Me
Florian Schoppmann authored
808
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
809 --------------------------------------------------------------------------------
810 MADlib v0.1.0prerelease
811
72cee94 Build system:
Florian Schoppmann authored
812 Release date: 2011-Jan-25
c8cddc1 Added ReleaseNotes.txt
Aleks Gorajek authored
813
72cee94 Build system:
Florian Schoppmann authored
814 Demo release.
Something went wrong with that request. Please try again.