Skip to content

Code for the PIDForest algorithm for anomaly detection

License

Notifications You must be signed in to change notification settings

ksanu1998/pidforest

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PIDforest

Code for the PIDForest algorithm for anomaly detection.

The PIDForest algorithm is based on the Partial Identification framework for anomaly detection. Partial Identification captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. PIDScore is a geometric anomaly score based on this framework, and it measures the minimum density of data points over all subcubes containing the point. PIDForest is a random forest based algorithm that finds anomalies based on PIDScore.

The accompanying paper shows that PIDForest performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.

The associated data files in .mat format are also attached. Many of these datasets have additional citation requests if they are useful in your research.

The current implementation is in Python, we are working on releasing a much faster C++ based implementation soon.

NOTE: C++ code is still under development. Progress has been made on re-implementing much of the existing classes defined in Python scripts to C++. Currently focussed on building the tree reliably and developing an interface for calling these C++ classes using a Python interface.

About

Code for the PIDForest algorithm for anomaly detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 56.7%
  • Python 26.6%
  • C++ 16.4%
  • Other 0.3%