# CLX LODA Anomaly Detection

This is an introduction to CLX LODA Anomaly Detection.

## Introduction

Anomaly detection is an important problem that has been studied within wide areas and application domains. Several anomaly detection algorithms are generic while many are developed specifically to the domain of interest. In practice, several ensemble-based anomaly detection algorithms have been shown to have superior performance on many benchmark datasets, namely Isolation Forest, Lightweight Online Detector of Anomalies (LODA), and an ensemble of Gaussian mixture models ...etc.

The Loda algorithm is one of the good performing generic anomaly detection algorithms. Loda detects anomalies in a dataset by computing the likelihood of data points using an ensemble of one-dimensional histograms.

## How to train LODA Anomaly Detection model

First initialize your new model

In [1]:
from clx.analytics.loda import Loda

loda_ad = Loda(n_bins=None, n_random_cuts=100)

Next, train your LODA Anomaly detector. The below example uses a `5-D multivariate synthetic dataset` for demonstration only. Ideally you will want a larger training set.

In [2]:
import cupy as cp
x = cp.random.randn(100,5)
loda_ad.fit(x)

## Evaluate detector

In [3]:
score = loda_ad.score(x) #generate nll scores
print(score)

[0.02504323 0.04019007 0.02562371 0.03061367 0.04752786 0.04289815
 0.02754808 0.04728209 0.03528147 0.03084071 0.04800468 0.03810805
 0.06416966 0.03079894 0.03384362 0.02584911 0.03283915 0.04372543
 0.02679274 0.03050299 0.03536936 0.02609486 0.03020831 0.02585407
 0.03935717 0.03466724 0.02478028 0.03598875 0.02926487 0.04528794
 0.03143523 0.02618153 0.02640285 0.04027725 0.05122316 0.02747226
 0.02948695 0.0377711  0.03794401 0.0341601  0.03190703 0.04000252
 0.02768167 0.03384439 0.03125174 0.04467097 0.05840884 0.05366506
 0.02665807 0.02636387 0.02587865 0.0507532  0.02846542 0.02756456
 0.0322691  0.0435649  0.03847129 0.02806225 0.02990485 0.03676192
 0.02981161 0.04271214 0.03094535 0.04040348 0.03780686 0.04097244
 0.02685846 0.04776255 0.03761849 0.04382971 0.02780935 0.02530128
 0.02547234 0.02792817 0.02624567 0.03750734 0.02563193 0.03189411
 0.02779641 0.03145703 0.03154918 0.04150905 0.02863377 0.03065881
 0.03108079 0.02714798 0.04061623 0.0379528  0.04409691 0.0299

## Explanation of anomalies
To explain the cause of anomalies LODA utilize contributions of each feature across the histograms.

In [4]:
feature_explanation = loda_ad.explain(x[5])

In [5]:
print("Feature importance scores: {}".format(feature_explanation.ravel()))

Feature importance scores: [0.22249588 0.         0.12161014 1.         0.69199443]


## Conclusion
This example shows GPU implementation of LODA algorithm for anomaly detection and explanation. Users can experiment with other datasets and evaluate the model implementation to identify anomalies and explain the features using RAPDIS.