### Detekce anomalit pomocí One-class SVM

Příklad: známe vlastnosti jednoho druhu rostliny (z datasetu Iris). Nyní chceme natrénovat model, který pozná, že se jedná o rostlinu jinou než tu, kterou dosud známe. 

In [1]:
from sklearn import datasets
iris = datasets.load_iris()

Dáme bokem testovací data

In [2]:
from sklearn.model_selection import train_test_split

X = iris['data']
y = iris['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Známe vlastnosti jen jedné rostliny (zde si ji pro jednoduchost vytáhneme z datasetu).

In [3]:
flower = 1

Vytáhneme z datasetu jen instance pro nás známé kytky 

In [4]:
X_train_oneclass = X_train[y_train == flower]

In [5]:
X_train_oneclass

array([[6.7, 3.1, 4.4, 1.4],
       [6.4, 3.2, 4.5, 1.5],
       [6. , 3.4, 4.5, 1.6],
       [6.7, 3.1, 4.7, 1.5],
       [5.5, 2.4, 3.7, 1. ],
       [6.6, 3. , 4.4, 1.4],
       [5.7, 2.9, 4.2, 1.3],
       [5.6, 3. , 4.5, 1.5],
       [5.8, 2.7, 4.1, 1. ],
       [5. , 2. , 3.5, 1. ],
       [5.6, 2.7, 4.2, 1.3],
       [5.7, 3. , 4.2, 1.2],
       [6.2, 2.9, 4.3, 1.3],
       [6. , 2.2, 4. , 1. ],
       [5.4, 3. , 4.5, 1.5],
       [5.5, 2.3, 4. , 1.3],
       [5. , 2.3, 3.3, 1. ],
       [5.5, 2.4, 3.8, 1.1],
       [6.7, 3. , 5. , 1.7],
       [5.9, 3.2, 4.8, 1.8],
       [5.1, 2.5, 3. , 1.1],
       [6. , 2.7, 5.1, 1.6],
       [5.5, 2.5, 4. , 1.3],
       [6.3, 2.5, 4.9, 1.5],
       [6.1, 2.9, 4.7, 1.4],
       [6.5, 2.8, 4.6, 1.5],
       [7. , 3.2, 4.7, 1.4],
       [5.9, 3. , 4.2, 1.5],
       [5.7, 2.6, 3.5, 1. ],
       [5.2, 2.7, 3.9, 1.4],
       [6.1, 3. , 4.6, 1.4],
       [6.6, 2.9, 4.6, 1.3],
       [5.5, 2.6, 4.4, 1.2],
       [5.6, 3. , 4.1, 1.3],
       [4.9, 2

In [6]:
from sklearn.svm import OneClassSVM
one = OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
one.fit(X_train_oneclass)

OneClassSVM(gamma=0.1, nu=0.1)

In [7]:
y_preds = one.predict(X_test)

Predikce jsou hodnoty buď 1 (normál) nebo -1 (anomalita)  

In [8]:
y_test_oneclass = y_test.copy()
y_test_oneclass[y_test != flower] = -1
y_test_oneclass[y_test == flower] = 1

y_test_oneclass

array([ 1, -1, -1,  1,  1, -1,  1, -1,  1,  1, -1, -1, -1, -1, -1,  1, -1,
        1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])

In [9]:
y_preds

array([ 1, -1, -1,  1,  1, -1,  1, -1,  1,  1, -1, -1, -1, -1, -1,  1, -1,
        1,  1, -1, -1,  1, -1, -1, -1, -1, -1, -1, -1, -1], dtype=int64)

In [10]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test_oneclass, y_preds)

array([[20,  1],
       [ 0,  9]], dtype=int64)

In [11]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test_oneclass, y_preds)

0.9666666666666667