Commits on Feb 24, 2016
  1. Add move message in readme

    iyerr3 committed Feb 24, 2016
Commits on Feb 23, 2016
  1. SVM: Update docs and online help

    JIRA: MADLIB-956
    - Moved epsilon and eps_table into reg_params section of model summary table
    - Online help for prediction using SVM model is available through svm_predict()
    - Examples are available as part of the online help, e.g., svm_regression('example')
    - Epsilon value checking (one test) is removed from install check
    Closes #19
    mktal committed Feb 23, 2016
Commits on Feb 22, 2016
  1. PCA: Add proportion of variance for train

    JIRA: MADLIB-948
    - Added a new functionality where the user can specify the proportion of
      variance to be covered by the principal components. This function
      accepts a float value (between 0 and 1) instead of an integer k value.
    - The interface has been updated with new parameter names to reflect the
    - The sparse and block variants of PCA have been updated to employ this
    - The proportion of variance covered by each principal component has
      been added to the output.
    - The implementation required splitting the SVD function into two parts
      and applying various levels of wrappers so that the general SVD
      interface does not change while giving PCA enough access to manipulate
      the intermediate tables.
    Closes #17
    orhankislal committed with iyerr3 Feb 17, 2016
Commits on Feb 20, 2016
  1. Release 1.9alpha: Add release notes

    Closes #18
    orhankislal committed with iyerr3 Feb 20, 2016
Commits on Feb 12, 2016
  1. SVM: Ensure clean code and consistency

    Closes #10
    iyerr3 committed Feb 11, 2016
  2. SVM: Add Gaussian Kernel

    JIRA: MADLIB-937, MADLIB-958
    This implements explicit random feature mapping described by Rahimi and
    Recht. A few distributed random matrix generation utilities used by
    the transformation process are added to the MADlib Matrix module.
    The commit provides two forms of kernel transformation:
    - in-memory transformation, applied to problems where the
      resultant model fits in the Postgres 1GB limit
    - out-of-memory transformation, where the random transformation matrix
      is stored in a distributed table.
    mktal committed with iyerr3 Dec 9, 2015
  3. Build: Remove doxypy to be compatible with Apache

    JIRA: MADLIB-962
    doxypy build's doxygen compatible documentation from Python
    docstrings and comment blocks. It is licensed as GPL which breaks the
    MADlib Apache license. We need to find/write an alternative Python
    script to create the documentation.
    Closes #16
    iyerr3 committed Feb 12, 2016
  4. Release: Add Apache license headers and disclaimer

    JIRA: MADLIB-962
    - ASL for files created after the grant date are added.
    - RAT checks are modified to accommodate for the new files.
    - DISCLAIMER is added as the release guide requires.
    orhankislal committed with iyerr3 Feb 10, 2016
Commits on Feb 10, 2016
  1. Term Freq: Allow custom col names, avoid temp vocab

    JIRA: MADLIB-933
    - Fixed a minor bug that forced users to use "doc_id" as a column name.
    - Fixed an incorrect temp table output for the vocabulary.
    iyerr3 committed Dec 7, 2015
  2. Remove swap file and outdated examples

    JIRA: MADLIB-961
    - Unnecessary vim swap was removed and added to gitignore
    - Outdated example files referred to code that is either deprecated or
      deleted. Current examples can be found on each doc page.
    iyerr3 committed Feb 10, 2016
Commits on Jan 22, 2016
  1. Kmeans: Skip NULL feature values

    JIRA: MADLIB-946
    Closest column used to throw exception if the matrix or the vector argument
    had any null values. Changed to return Null() in this case.  Inserted an
    additional check for null values in the compute_kmeans function to accomodate
    for the change.
    orhankislal committed Jan 22, 2016
Commits on Jan 21, 2016
Commits on Jan 20, 2016
  1. Elastic Net: Check only if features are numeric

    JIRA: MADLIB-952
    Columns were being checked to ensure every column is of the same numeric
    type. While giving an error for non-numeric types is correct, there is
    no need to enforce them to be of same numeric type, as the columns are
    cast to float8[] inside the function. The input analyzer has been
    changed to relax this condition.
    This closes #12.
    orhankislal committed with iyerr3 Jan 20, 2016
  2. Minor: Update return value in correlation + matrix

    - In correlation, a special UDT was returned containing run time stats.
      This has been replaced by a TEXT that contains required information.
    - In matrix, the decomposition operations did not provide enough
      information about how to use suffixes to query the result tables. This
      has now been added.
    iyerr3 committed Jan 18, 2016
Commits on Jan 15, 2016
  1. Correlation: Return columns sorted in ordinal position

    JIRA: MADLIB-941
    Couple of minor issues fixed here:
    1. get_cols_and_types utility function returned columns
    in an arbitrary order due to dictionary creation at the end. This has
    been fixed by returning a list of tuples. An OrderedDict would be the
    best choice here but some platforms are still on Python 2.6
    2. Multiple modules that depended on this function had to be updated
    either creating the dict in the calling function or using the list
    instead of a dict.
    iyerr3 committed Jan 15, 2016
Commits on Jan 14, 2016
  1. Summary: Lower case for unquoted table names

    JIRA: MADLIB-954
    - Columns were being filtered by comparing all column names with the
      provided target names in Python. This led to issues when names were
      not quoted properly. This is now fixed by moving the compare to SQL.
    - Table validation performed using utility functions.
    - Minor PEP8 errors fixed.
    This closes #11
    orhankislal committed with iyerr3 Jan 13, 2016
Commits on Jan 8, 2016
  1. Utilities: Revert 60a07eb + use unquoted table names

    The previous commit (60a07eb) used pg_attribute instead of
    information_schema to get column names and types. This led
    to different names in the types causing issues in rest of the code. It's
    better to use information_schema since it gives general (non-Postgres
    specific) results, but it requires unquoted table and schema names as
    iyerr3 committed Jan 8, 2016
Commits on Dec 30, 2015
  1. Covariance: Add covariance matrix function (Pearson)

    JIRA: MADLIB-941
    Added new function covariance() which returns the covariance matrix. The
    implementation is an update to the Pearson's correlation method, where
    the scaling at the final step is avoided.
    iyerr3 committed Dec 30, 2015
Commits on Dec 23, 2015
  1. Matrix: updated docs and examples

    This closes #5, closes #8
    fmcquillan99 committed with iyerr3 Dec 1, 2015
  2. Path: Minor edit to doc

    iyerr3 committed Dec 23, 2015
Commits on Dec 21, 2015
  1. Path: Add default partition + fix output col order

    JIRA: MADLIB-916
    - NULL value for the partition leads to match on the whole table
    - Utilities function "get_cols" updated to return columns in the same
      order as present in table
    iyerr3 committed Dec 21, 2015
Commits on Dec 16, 2015
  1. Path: Match a pattern in a subset of partition

    JIRA: MADLIB-916
    This commit adds the functionality to match a subset of a partition
    instead of just the complete partition. This is performed by
    array-agging the symbols and then finding the position of a match within
    the agg. This position is then compared to an array of ids to get the
    actual rows that correlate to this match.
    This approach will only work if each symbol is a single character. Since
    we allow the user to set symbols to be arbitrary strings, the
    user-supplied symbol is replaced with a single character (in the row
    match and in the pattern expression).
    Other: Parent commit (30e9286) closes #9
    iyerr3 committed Dec 16, 2015
  2. SVM: Fix how grouping cols are validated

    SVM prediction expects grouping_col to be None if no grouping is
    performed. This assumption was not applied in the training function.
    Xiaocheng Tang committed with iyerr3 Dec 16, 2015
Commits on Dec 8, 2015
  1. Matrix: Fix minor issue with sparse LU output

    matrix_lu function used incorrect arguments in the sparsify operation
    leading to error if a sparse output matrix was requested.
    This commit fixes that issue by using 'out_args' for column names for
    the dense and sparse output. The commit also cleans some repetitive code
    in the function.
    iyerr3 committed Dec 8, 2015
Commits on Dec 4, 2015
  1. Matrix: Fix multiple input/output issues

    JIRA: MADLIB-932
    This commit contains fixes for following issues:
        - 'inf' or 'infinity' was read by Python as a float value. This has
          been fixed by checking for supported strings before checking for
        - matrix_inverse and matrix_pinv were not using out_args for output
          column names.
        - matrix_eigen did not accept an out_args parameter that determines the
          output column names.
        - Multiple methods did not provide a default value for out_args.
        - All decomposition methods had incorrect usage for in_args and
          out_args. These have been fixed with the output being sparse or
          dense (determined by the fmt specifier in out_args).
    rahiyer committed with iyerr3 Dec 3, 2015
  2. SVM: Allow grouping_col to be empty

    This closes #6
    - Fix "verbose" KeyError in in_mem_group_contol
    - SVM: explicitly set grouping_col to None so that
    grouping_col can be empty string or NULL.
    mktal committed with iyerr3 Dec 4, 2015
Commits on Dec 1, 2015
  1. SVM: Add CV support with generic class

    JIRA: MADLIB-915
    PR: Closes #4
        Xiaocheng Tang <>
        Rahul Iyer <>
    - Add cross validation support on lambda, epsilon, init_stepsize,
    max_iter, and decay_factor
    - Add support for optionally writing validation results to a sql table
    - Add support for lazy-generation of cv datasets
    - Add internal generic CrossValidator class which is used for implementing
    this issue
    - Refactoring SVM for better modularity
    - Ignore cv on epsilon for classification
    - Cross validation now works when independent variables are queries
    - Fixed "zero length field name in format" error in python < 2.7
    Xiaocheng Tang committed with iyerr3 Dec 1, 2015
Commits on Nov 19, 2015
  1. New module: Add basic path capabilities

    JIRA: MADLIB-916
    This fixes MADLIB-916.
    Added a function to find paths where the complete partition is matched.
    This commit does not add capability to produce multiple matches per
    iyerr3 committed Nov 19, 2015