Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Mar 17, 2015
  1. @haying

    Documentation: add v1.7 to main page

    haying authored
    Pivotal Tracker: #87321430
  2. @haying

    Deprecated DT/RF: fix __validate_input_table

    haying authored
    Pivotal Tracker: #87321430
    
    - Found in install-check of GPDB 4.3.3.0 build 1 in OS X
Commits on Mar 16, 2015
  1. @haying
  2. @iyerr3 @haying

    PMML: Add RF as valid export module

    iyerr3 authored haying committed
  3. @haying

    Documentation: LMF example

    haying authored
  4. @haying

    Documentation: per-table predict is needed for CV

    haying authored
    Pivotal Tracker: #85717520
  5. @haying

    Elastic Net Documentation: predict functions

    haying authored
    Pivotal Tracker: #85717520
    
    Changes:
    - classify two types of predict function: per-tuple and per-table
    - add examples to explain the usages of above
  6. @haying

    Release Notes of v1.7.1

    haying authored
    Pivotal Tracker: #87321430
    
    Changes:
    - add release notes
    - fix some stale readme
  7. @haying

    One Way ANOVA: fix wrong answers

    haying authored
    Pivotal Tracker: #90320376
    JIRA: MADLIB-866
    
    Changes:
    - remove the call of idxOfGroup in merge function
    - use quote_ident to check schema names in assoc_rules
Commits on Mar 13, 2015
  1. @iyerr3

    K-means: Add subsampling in kmeans++ seeding

    iyerr3 authored
    Pivotal tracker: 88636158
    
    Change:
        - Kmeans++ runs through the data 'k' times to compute the k initial
          centroids. This is an incredibly slow process for big data. We can
          speed things up by running the seeding only on a subsample of the
          data (size controlled by a user-defined parameter).
        - The default behavior is to seed from the complete dataset (as
          described) in the original algorithm. A subsampling ratio
          parameter allows the user to set the size of the subsample for
          seeding.
Commits on Mar 11, 2015
  1. @haying

    Upgrade: from v1.x to v1.7.1

    haying authored
    Additional author: Rahul Iyer <riyer@pivotal.io>
    
    Pivotal Tracker: #87321270
    JIRA: MADLIB-901
    
    Changes:
    - allow expression as id column in multinom_predict,
      mlogregr_predict, and coxph_predict
    - fix _filter_recursive_view_dependency (MADLIB-901)
    - fix coxph_result in old change lists
    - drop old lmf, coxph functions for hawq reinstall
    - remove .block(), still problematic in OSX
    - add coxph_predict help function with no arguments
Commits on Mar 5, 2015
  1. @oberstet @haying

    CMake: add FreeBSD specific m4 options

    oberstet authored haying committed
    Pull request: 308
  2. @no0p @haying

    Validation: allow views and materialized views

    no0p authored haying committed
    Additional author: Feng, Xixuan (Aaron) <xfeng@pivotal.io>
    
    Pull request: 307
  3. @haying

    Madpack: not ask for password when not necessary

    haying authored
    JIRA: MADLIB-357
    Pivotal Tracker: #70138798, #55443434
    
    Changes:
    - raise and catch EnvironmentError for password errors
    and then ask for password and proceed
  4. @haying

    Random Forest: improve performance for variable importance

    haying authored
    Additional author: Rahul Iyer <riyer@pivotal.io>
    
    Pivotal Tracker: #87237284
    
    Changes:
    - keep var_imp_score in oob_prediction table
    - vectorize distribution_agg for permutation
    - edit array_add for use as merge/transition function
Commits on Mar 4, 2015
  1. @iyerr3 @haying

    Validation: Add option in table_exists to search only 1st schema in path

    iyerr3 authored haying committed
    Pivotal Tracker: #87646110
    
    Details:
    The table_exists function in validate_args checks for the table in all
    schemas in the search path. For output tables, we only want to check in
    the current schema. We add a boolean as a flag to
    differentiate the two situations and change all calls when checking for
    output table.
    
    The commit also includes updates to all modules that use table_exists,
    with the flag set to True when the output table is validated.
    
    Others:
    Add online help message to coxph_predict function
    Add drop functions and change lists for coxph functions for HAWQ
    reinstall and upgrade, missed in v1.6
Commits on Feb 27, 2015
  1. @iyerr3

    GLM: Fix in memory group controller to detect empty states

    iyerr3 authored
    Pivotal tracker: 87646110
    
    Changes:
        - If the final group state is terminated (or the single state in the
          no grouping case is terminated) for GLM controllers then the
          iteration convergence test query returns empty result. This can be
          avoided by automatically finishing when there are no active
          states.
        - Validation for columns_exist_in_table was not comparing the
          unquoted column names of the columns in table. This has been
          changed to compare the unquoted input with the unquoted column
          names.
        - Couple of minor validation bugs were fixed in ordinal() and
          multinom().
Commits on Feb 21, 2015
  1. @haying

    Random Forests: disable optimizer and hashagg

    haying authored
    Pivotal Tracker: #88568704
    
    Changes:
    - disable hashagg for tree_predict, forest_predict and forest_train
    - remove a unnecessary check for excluding features in RF and DT
Commits on Feb 13, 2015
  1. @haying

    CMake: use -mno-sse2 in CMAKE_CXX_FLAGS

    haying authored
    Pivotal Tracker: #74929244
    
    Changes:
    - use -mno-sse2 in CMakeLists.txt
    - use .block() which had error with SSE2
    - verify no performance regression
  2. @haying

    Random Forest: performance improvement by 30%

    haying authored
    Additional author: Rahul Iyer <riyer@pivotal.io>
    
    Pivotal Tracker: #88080334
    
    Changes:
    - remove views w/ duplicate rows that was passed to tree_train
        (use src_view with poisson count directly instead)
    - enable updating stats using weights_as_counts for RF
Commits on Feb 11, 2015
  1. @iyerr3

    Summary: Add quantile computation for HAWQ and PG 9.4

    iyerr3 authored
    Pivotal tracker: 86025702
    
    Summary function uses PERCENTILE_CONT to compute the percentiles.
    This functionality was available only on GPDB 4.2.2 or higher and this
    was explicitly checked for in the code. The function is now available in
    PostgreSQL 9.4 and on HAWQ 1.2.0, so we add those platforms for
    quantiles.
  2. @haying @iyerr3

    Random forest: Improve perf by eliminating join, add sample ratio

    haying authored iyerr3 committed
    Pivotal tracker: 86653930
    
    In random forest, we join the original source table with a Poisson count
    table as part of the bootstrap. This is a join between two big tables.
    Replacing that join by building a single temporary table gives about 20%
    speedup.
    
    Further, we also provide a user option to run RF only on a random
    subsample of the dataset. The bootstrapping is performed on a random
    subset improving the runtime of the method.
Commits on Feb 6, 2015
  1. @adirastogi @haying

    Gaussian Naive Bayes: allow continuous variables

    adirastogi authored haying committed
    Additional author: Feng, Xixuan (Aaron) <xfeng@pivotal.io>
    
    Pivotal Tracker: #86332928
    JIRA: MADLIB-753
    
    Changes:
    - add install-check for correctness test on iris data
    - add toy dataset install-check testcases,
      install_test_4 and install_test_5
    - add overloaded SQL functions to support numeric/categorical variables
      - create_nb_prepared_data_tables
      - create_nb_classify_view
      - create_nb_probs_view
Commits on Feb 3, 2015
  1. @haying

    PG 9.4: adjust for changes in PostgreSQL 9.4

    haying authored
    Pivotal Tracker: #86341718
    
    Changes:
    - PG94 allows empty target_list in SELECT
    - numeric is converted to decimal in plpy
  2. @haying

    PCA: allow any column names for an input matrix

    haying authored
    Pivotal Tracker: #86821174
    
    Changes:
    - use ‘row_id’ for densified pca call
Commits on Jan 30, 2015
  1. @haying

    Deploy: support Postgresql 9.4

    haying authored
    Pivotal Tracker: #86341718
    
    Changes:
    - add necessary cmake files
    - fix return type of array_contains_null
Commits on Jan 21, 2015
  1. @iyerr3
  2. @haying
Commits on Jan 16, 2015
  1. PivotalR for Random Forest: Support get_tree function

    Preethi Jayaram authored
    Pivotal Tracker: #75480818
    
    Changes:
    - added a function to retrieve information on a particular
    tree in the forest, in the format required by R's randomForest
    library.
Commits on Jan 14, 2015
  1. @haying

    Cross-Validation: fix the parameter replacement

    haying authored
    Pivotal Tracker: #86045166
    JIRA: MADLIB-898
    
    Changes:
    - backup and restore the original modeling parameters
  2. @haying

    Cross-validation: fix train with NULL args

    haying authored
    Pivotal Tracker: #85937818
    JIRA: MADLIB-896
    
    Changes:
    - add checking for NULLs
  3. @iyerr3
Commits on Jan 6, 2015
  1. @iyerr3
Commits on Dec 29, 2014
  1. @iyerr3

    Build: Release notes for v1.7

    iyerr3 authored
  2. @haying @iyerr3

    Doc: clarify output_schema in assoc_rules()

    haying authored iyerr3 committed
    JIRA: MADLIB-878
Something went wrong with that request. Please try again.