## Novelty detection
- Novelty detection systems can be used to **predict unseen** data;
- it flags data which they don't look like the data which we have seen before.
- Anomaly samples must be **absent** during training and they should appear only on the test set;
- Novelty detection are also known as **One-class classification** becasue only one class (the normal one) is present in the training data;
- **One-class Support Vector Machine** can be tuned accessing the raw score via `clf.score_samples(X_test)`. The lower is the value, the more anomalous is the data point.
- **Isolation Forest** is a modificiation of random forest for anomaly detection.
- The **sensitivity** of novelty detection algorithms on the choice of threshold should be done on the raw score using `roc_auc_score`.

In [4]:
from sklearn.neighbors import LocalOutlierFactor as lof

import pandas as pd

**Create dataset**

In [6]:
# Create a list of thirty 1s and cast to a dataframe
X = pd.DataFrame([1.0]*30)

**Fit Model**

In [6]:
# Create an instance of a lof novelty detector
detector = lof(novelty = True, contamination = 0.1)
# Fit the detector to the data
detector.fit(X)

**Predict**

In [9]:
# Use it to predict the label of an example with value 10.0
print(detector.predict(pd.DataFrame([10.0])))

[-1]


Note that, the anomaly is identified as `-1`.

## Comparison of Three Novelty Outlier Detection 
Aritmia dataset

In [None]:

# Import the novelty detector
from sklearn.svm import OneClassSVM as onesvm

# Fit it to the training data and score the test data
svm_detector = onesvm().fit(X_train)
scores = svm_detector.score_samples(X_test)



# Import the novelty detector
from sklearn.ensemble import IsolationForest as isof

# Fit it to the training data and score the test data
isof_detector = isof().fit(X_train)
scores = isof_detector.score_samples(X_test)



# Import the novelty detector
from sklearn.neighbors import LocalOutlierFactor as lof

# Fit it to the training data and score the test data
lof_detector = lof(novelty=True).fit(X_train)
scores = lof_detector.score_samples(X_test)

You notice that one-class SVM does not have a contamination parameter. But you know well by now that you really need a way to control the proportion of examples that are labeled as novelties in order to control your false positive rate. So you decide to experiment with thresholding the scores. 

In [None]:
# Fit a one-class SVM detector and score the test data
nov_det = onesvm().fit(X_train)
scores = nov_det.score_samples(X_test)

# Find the observed proportion of outliers in the test data
prop = np.mean(y_test==1.0)

# Compute the appropriate threshold
threshold = np.quantile(scores, prop)

# Print the confusion matrix for the thresholded scores
print(confusion_matrix(y_test, scores > threshold))