One Class Splitting Criteria for Random Forests ============ This repository provide the code corresponding to the article One Class Splitting Criteria for Random Forests, and other anomaly detection algorithms.
Abstract =======
Random Forests (RFs) are strong machine learning tools for classification and regression. However, they remain supervised algorithms, and no extension of RFs to the one-class setting has been proposed, except for techniques based on second-class sampling. This work fills this gap by proposing a natural methodology to extend standard splitting criteria to the one-class setting, structurally generalizing RFs to one-class classification. An extensive benchmark of seven state-of-the-art anomaly detection algorithms is also presented. This empirically demonstrates the relevance of our approach.
The implementation is based on a fork of scikit-learn. To have a working version of both scikit-learn and OCRF scikit-learn one can use conda to create a virtual environment specific to OCRF while keeping the original version of scikit-learn clean.
This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:
python setup.py build_ext --inplace
and run your personal code inside the folder OCRF. To use OCRF outside of the OCRF folder change the environment variable PYTHONPATH or create a virtual environment with Conda.
Install with Conda =======
First install conda Conda and update it:
conda update conda
conda update --all
Then create a virtual environment for OCRF, activate it and install OCRF and its dependencies on the new virtual environment:
conda create -n OCRF_env python=2.7 anaconda
source activate OCRF_env
conda install -n OCRF_env numpy scipy cython matplotlib
git clone https://github.com/ngoix/OCRF
cd OCRF
pip install --upgrade pip
pip install pyper
python setup.py install
cd ..
Now OCRF is installed. To check it run the script benchmark_oneclassrf.py:
python benchmarks/benchmark_oneclassrf.py
To quit the environment and revert to the original scikit-learn use:
source deactivate
To return to the OCRF environment use:
source activate OCRF_env
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the AUTHORS.rst file for a complete list of contributors.
It is currently maintained by a team of volunteers.
Note scikit-learn was previously referred to as scikits.learn.