Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
obs
 
 
 
 
rf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Arborist: Parallelized, Extensible Random Forests

License R License Python License CRAN Downloads PyPI version Travis-CI Build Status

The Arborist provides a fast, open-source implementation of Leo Brieman's Random Forest algorithm. The Arborist achieves its speed through efficient C++ code and parallel, distributed tree construction.

Bindings are available for R and Python (evolving)

R

The Arborist is available on CRAN as the Rborist package.

Installation of Release Version:

> install.packages('Rborist')

Installation of Development Version:

> ./Rborist/Package/Rborist.CRAN.sh
> R CMD INSTALL Rborist_*.*-*.tar.gz

Notes

  • Rborist version 0.2-3 passes all checks on CRAN.

Python

  • Version 0.1-0 has been archived.
  • Version 0.2-4 is under active development.
  • Test cases sought.

Performance

Performance metrics have been measured using benchm-ml. Partial results can be found here

Some users have reported diminished performance when running single-threaded. We recommend running with at least two cores, as frequently-executed inner loops have been cast specifically to take advantage of multiple cores. In particular, when using a scaffold such as caret, please prefer to let Rborist be greedier with cores than is the scaffold.

This paper compares several implementations of the Random Forest algorithm, including Rborist: (https://www.jstatsoft.org/article/view/v077i01/v77i01.pdf). Benchmarks used in the study are also provided at https://www.jstatsoft.org/article/view/v077i01.

A recent paper compares several categories of regression tools, including Random Forests. Rborist is among the faster packages offering high prediction accuracy: (https://doi.org/10.1109/ACCESS.2019.2933261). Based on the findings, we are investigating changes to the package's default settings. In particular, fixed-number predictor sampling (mtry) appears to provide more accurate predictions at low dimension than the current approach of Bernoulli sampling.

References

News/Changes

  • Prediction and validiation support large (> 32 bits) observation counts.
  • Trained forest index ranges may now exceed 32 bits. Index ranges for individual trees remain constrained to 32 bits, for now.
  • New option 'keyed' identifies predictors by name, rather than position within frame.
  • New option 'impPermute' introduces permutation-based variable importance.

Correctness and runtime errors are addressed as received. With reproducible test cases, repairs are typically uploaded to GitHub within several days.

Feature requests are addressed on a case-by-case basis.

About

Modular, extensible implementation of the Random Forest (TM) algorithm.

Resources

License

Packages

No packages published

Languages