MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
Installation and Contribution
User and Developer Documentation
The latest documentation of MADlib modules can be found at
Docs or can be accessed directly from the MADlib
installation directory by opening
The following block-diagram gives a high-level overview of MADlib's architecture.
Third Party Components
MADlib incorporates material from the following third-party components
argparse 1.2.1"provides an easy, declarative interface for creating command line tools"
Boost 1.47.0 (or newer)"provides peer-reviewed portable C++ source libraries"
CERN ROOT"is an object oriented framework for large scale data analysis"
doxypy 0.4.2"is an input filter for Doxygen"
Eigen 3.2.2"is a C++ template library for linear algebra"
PyYAML 3.10"is a YAML parser and emitter for Python"
PyXB 1.2.4"is a Python library for XML Schema Bindings"
License information regarding MADlib and included third-party libraries can be
found inside the
Changes between MADlib versions are described in the
Papers and Talks
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)