Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Open-source library for scalable in-database analytics.
C++ C Python CMake Shell PLpgSQL

Matrix ops: Support id of any type that can cast to int

Pivotal tracker: 96013656

Changes:
    - Add casts in matrix ops, SVD, PCA to allow row_id/col_id of
      any type that can be cast to int.
    - Improve performance in dense multiplication by transposing the
      second matrix only if required.

After this change, the row_id for dense matrix and row_id, col_id for
sparse matrix can be of any type. For some operations (like
matrix_densify), the id is cast to be of type integer. Further, in some
cases, we check the size of the row/col dimension to ensure it is
within the range for integer.
latest commit f3b235232b
@iyerr3 iyerr3 authored

README.md

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

Installation and Contribution

See the project webpage MADlib Home for links to the latest binary and source packages. For installation and contribution guides, please see MADlib Wiki

User and Developer Documentation

The latest documentation of MADlib modules can be found at MADlib Docs or can be accessed directly from the MADlib installation directory by opening doc/user/html/index.html.

Architecture

The following block-diagram gives a high-level overview of MADlib's architecture.

MADlib Architecture

Third Party Components

MADlib incorporates material from the following third-party components

  1. argparse 1.2.1 "provides an easy, declarative interface for creating command line tools"
  2. Boost 1.47.0 (or newer) "provides peer-reviewed portable C++ source libraries"
  3. CERN ROOT "is an object oriented framework for large scale data analysis"
  4. doxypy 0.4.2 "is an input filter for Doxygen"
  5. Eigen 3.2.2 "is a C++ template library for linear algebra"
  6. PyYAML 3.10 "is a YAML parser and emitter for Python"
  7. PyXB 1.2.4 "is a Python library for XML Schema Bindings"

Licensing

License information regarding MADlib and included third-party libraries can be found inside the license directory.

Release Notes

Changes between MADlib versions are described in the ReleaseNotes.txt file.

Papers and Talks

Related Software

  • PivotalR - PivotalR also lets the user run the functions of the open-source big-data machine learning package MADlib directly from R.
  • PyMADlib - PyMADlib is a python wrapper for MADlib, which brings you the power and flexibility of python with the number crunching power of MADlib.
Something went wrong with that request. Please try again.