Arborist: Parallelized, Extensible Random Forests
The Arborist provides a fast, open-source implementation of Leo Brieman's Random Forest algorithm. The Arborist achieves its speed through efficient C++ code and parallel, distributed tree construction.
Bindings are available for R and Python (evolving)
The Arborist is available on CRAN as the Rborist package.
Installation of Release Version:
Installation of Development Version:
> ./ArboristBridgeR/Package/Rborist.CRAN.sh > R CMD INSTALL Rborist_*.*-*.tar.gz
- Rborist version 0.1-8 now on CRAN.
- Rborist version 0.1-9 under development.
- Some users reporting successful training and prediction.
- Out-of-bag validation NYI.
- Contributors sought.
- Test cases sought.
- Scalability Issues in Training Decision Trees (video), Nimbix Developer Summit, 2017.
- Controlling for Monotonicity in Random Forest Regressors (PDF), R in Finance, May 2016.
- Accelerating the Random Forest algorithm for commodity parallel hardware (Video), PyData, July, 2015.
- The Arborist: A High-Performance Random Forest (TM) Implementation, R in Finance, May 2015.
- Training Random Forests on the GPU: Tree Unrolling (PDF), GTC, March, 2015.
- New option 'maxLeaf' to bound the number of terminal (leaf) nodes.
- New option 'autoCompress' to fine-tune sparse internal representation.
- New function 'Streamline' to clear unused fields of trained forest.
- Sparse R-style 'dcgMatrix' format now accepted, with "i/p" encoding.
- Autocompression employed on a per-predictor basis.
- Space-saving 'thinLeaves' option suppresses creation of summary data for narrow workflows.
- Research-inspired option 'splitQuantile' provides fine-tuning of numeric split positions.
- Improved scaling with row count. Correctness errors are being addressed as they are received.