Modular, extensible implementation of the Random Forest (TM) algorithm.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ArboristBridgePy
ArboristBridgeR
ArboristCore
.travis.yml
Doxyfile
LICENSE
README.md
TODO.md

README.md

Arborist: Parallelized, Extensible Random Forests

License R License Python License CRAN Downloads PyPI version Travis-CI Build Status

The Arborist provides a fast, open-source implementation of Leo Brieman's Random Forest algorithm. The Arborist achieves its speed through efficient C++ code and parallel, distributed tree construction.

Bindings are available for R and Python (evolving)

R

The Arborist is available on CRAN as the Rborist package.

Installation of Release Version:

> install.packages('Rborist')

Installation of Development Version:

> ./ArboristBridgeR/Package/Rborist.CRAN.sh
> R CMD INSTALL Rborist_*.*-*.tar.gz

Notes

  • Rborist version 0.1-8 now on CRAN.
  • Rborist version 0.1-9 under development.

Python

  • Some users reporting successful training and prediction.
  • Out-of-bag validation NYI.
  • Contributors sought.
  • Test cases sought.

Performance

Performance metrics will be measured soon using benchm-ml. Partial results can be found here

References

News/Changes

  • New option 'maxLeaf' to bound the number of terminal (leaf) nodes.
  • New option 'autoCompress' to fine-tune sparse internal representation.
  • New function 'Streamline' to clear unused fields of trained forest.
  • Sparse R-style 'dcgMatrix' format now accepted, with "i/p" encoding.
  • Autocompression employed on a per-predictor basis.
  • Space-saving 'thinLeaves' option suppresses creation of summary data for narrow workflows.
  • Research-inspired option 'splitQuantile' provides fine-tuning of numeric split positions.
  • Improved scaling with row count. Correctness errors are being addressed as they are received.