Browse files

Release Notes: Update to include RF, SVM, Sketch

Pivotal Tracker: 65516280

    - Added note for not supporting quantile, profile in HAWQ
    - Added information for linear SVM
    - More details for API change (CRF, SVM)
  • Loading branch information...
1 parent ce0f62c commit a7e0f6572cffacaf921c309c59c0a32c896c8fd7 @haying haying committed with Rahul Iyer Mar 7, 2014
@@ -67,8 +67,8 @@ Optional Settings
one, the installer uses the value stored in the environment variable
---prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value in
- the MADlib RPM is used.
+--prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value
+ ${GPHOME}/madlib is used.
-h | -? | --help Displays help.
@@ -11,39 +11,51 @@ Current list of bugs and issues can be found at
MADlib v1.5
-Release Date: 2014-Feb-27
+Release Date: 2014-Mar-05
New features:
- Added a new port 'HAWQ'. MADlib can now be used with the Pivotal
Distribution of Hadoop (PHD) through HAWQ
(see for more details).
- Implemented performance improvements for linear and logistic predict functions.
-- Updated the design and API for Conditional Random Fields to enable ease of use
-and better functionality.
+- Moved Conditional Random Fields (CRFs) out of early stage development, and
+updated the design and APIs for to enable ease of use and better functionality.
+API changes include lincrf replaced by lincrf_train, crf_train_fgen and
+crf_test_fgen with updated arguments, and format of segment tables.
+- Improved linear support vector machines (SVMs) by enabling iterations, and
+removed lsvm_predict and svm_predict, which are not useful in GPDB and HAWQ.
+- Added new functions, with improved performance compared to svec_sfv, for
+document vectorization into sparse vectors.
- Removed the bool-to-text cast and updated all functions depending on it to
explicitly convert variable to text.
- Added function properties for all SQL functions to allow the database optimizer
to make better plans.
Bug Fixes:
- Set client_min_messages to 'notice' during database installation to ensure
-that log messages don't get logged to STDERR
+that log messages don't get logged to STDERR.
- Fixed elastic net prediction to predict using all features instead of just
the selected features to avoid an error when no feature is selected as relevant
in the trained model.
- For corner probability values, p=0 and p=1, in bernoulli and binomial
distributions, the quantile values should be 0 and num_of_trials (=1 in the case
of bernoulli) respectively, independent of the probability of success.
- Changed install script to explicitly use /bin/bash instead of /bin/sh to avoid
-problems in Ubuntu where /bin/sh is linked to 'dash'
+problems in Ubuntu where /bin/sh is linked to 'dash'.
- Fixed issue in Elastic Net to take any array expression as input instead of
-specifically expecting the expression 'ARRAY[...]'
+specifically expecting the expression 'ARRAY[...]'.
+- Fixed wrong output in percentile of count-min (CM) sketches.
Known issues:
-- Random forest and SVM are currently not available in the HAWQ port of MADlib
+- Elastic net prediction wrapper function elastic_net_prediction is not
+available in HAWQ. Instead, prediction functionality is available for both
+families via elastic_net_gaussian_predict and elastic_net_binomial_predict.
- Distance metrics functions in K-Means for the HAWQ port are restricted to the
in-built functions, specifically squaredDistNorm2, distNorm2, distNorm1,
distAngle, and distTanimoto.
+- Functions in Quantile and Profile modules of Early Stage Development are not
+available in HAWQ. Replacement of these functions is available as built-in
+functions (percentile_cont) in HAWQ and Summary module in MADlib, respectively.
MADlib v1.4.1
@@ -52,8 +52,8 @@ OPTIONS
use the environment variable.
- Optional. Expected MADlib installation path. If not set, a default value
- in the RPM file will be used.
+ Optional. Expected MADlib installation path. If not set, the default value
+ \${GPHOME}/madlib is used.
-s | --skip-localhost
Optional. If not set, the RPM file will be installed to localhost as well
@@ -187,4 +187,13 @@ if [ 0 -ne $? ]; then
echo "MADlib successfully installed."
+echo "Please run the following command to deploy MADlib"
+echo "usage: madpack install -p hawq -c user@host:port/database"
+echo "Example:"
+echo " \$ \${GPHOME}/madlib/bin/madpack install -p hawq -c gpadmin@mdw:5432/testdb"
+echo " This will install MADlib objects into a Greenplum database named \"testdb\""
+echo " running on server \"mdw\" on port 5432. Installer will try to login as \"gpadmin\""
+echo " and will prompt for password. The target schema will be \"madlib\"."
+echo "For additional options run: madpack --help"
+echo "Release notes and additional documentation can be found at"
@@ -131,6 +131,9 @@ arrays, which are not easily handled in pl/python, we are using
<c>array_collapse</c> function to collaps the n-dim arrays to 1-dim arrays.
All values of 2 and upper dimensions are separated with ':' character.
+Note that this module is not available in HAWQ.
+Instead, we suggest the Summary module in Descriptive Statistics as a replacement of this module.
@anchor related
@par Related Topics
File profile.sql_in documenting SQL functions.
@@ -207,3 +210,4 @@ BEGIN
$$ LANGUAGE plpgsql
@@ -83,6 +83,11 @@ Result:
(1 row)
+Note that this module is not available in HAWQ.
+GPDB 4.2+ and HAWQ 1.2+ support the SQL Standard percentile_cont() inverse distribution function which should be used preferentially to this implementation.
+It is also planned for support in Postgres 9.4.
+This implementation will be retired once the functionality is available in Postgres.
@anchor related
@par Related Topics
File quantile.sql_in documenting the SQL function.
@@ -291,4 +296,5 @@ begin
return res;
$$ LANGUAGE plpgsql

0 comments on commit a7e0f65

Please sign in to comment.