## One Class Learning on German Credit Risk Assessment

## Hatice Erdogan

In [20]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.neighbors import LocalOutlierFactor
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.metrics import classification_report, roc_auc_score


### Introduction 

Credit risk assessment influences financial decisions, such as loan approvals, interest rates, and credit limits. Understanding one's credit risk is pivotal for making informed financial choices.

For banks and financial institutions, accurately assessing credit risk is foundational to wise lending practices. It ensures the sustainability of their operations and helps prevent financial crises.


### Dataset 

- The dataset is retreived from UCI Machine Learning Repository titled as German Credit Data
- Individuals classified based on a set of attributes, such as, status of existing checking account, credit history, savings account, present employment since, marital status & sex, housing, job, other debtors/gurantors, total of 24 features that categorizes individuals as either good or bad credit risk.
- The dataset was already converted into numerical format so it was already available to use for machine learning models. 
- Total of 1000 instances; 700 good, 300 bad 



In [2]:
ccard = pd.read_csv("german_creditcard.csv", header = None) 
ccard.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,1,6,4,12,5,5,3,4,1,67,...,0,0,1,0,0,1,0,0,1,1
1,2,48,2,60,1,3,2,2,1,22,...,0,0,1,0,0,1,0,0,1,2
2,4,12,4,21,1,4,3,3,1,49,...,0,0,1,0,0,1,0,1,0,1
3,1,42,2,79,1,4,3,4,2,45,...,0,0,0,0,0,0,0,0,1,1
4,1,24,3,49,1,3,3,4,4,53,...,1,0,1,0,0,0,0,0,1,2


The last column is the target variable. It is classified as 1 = "good" and 2 = "bad", but it needs to be converted into 0 = "good" and 1 = "bad"

In [3]:
# Transforn target variable into 0's and 1's
ccard.iloc[:, -1] = ccard.iloc[:, -1].apply(lambda x: 1 if x == 2 else 0)

In [4]:
# Checkk the number of observations classified as 0 and 1 
ccard.iloc[:, -1].value_counts()

0    700
1    300
Name: 24, dtype: int64

There are 700 normal cases and 300 anomaly cases.

In [5]:
# Shape of the entire dataset
print(np.shape(ccard))

X = ccard.iloc[:,0:24].values
print(np.shape(X))

y = ccard.iloc[:,24].values
print(np.shape(y))

print(y)

(1000, 25)
(1000, 24)
(1000,)
[0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0
 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 1 1 0 1
 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0
 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1
 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1
 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 0
 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0
 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1
 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 0

### One-Class Learning Approach for Credit Risk Assessment

In this project, a distinctive approach known as one-class learning was employed for analyzing the credit risk dataset. Unlike traditional methods that predict whether an individual is a good or bad credit risk, one-class learning focuses on training the model solely on normal classes and subsequently testing it on both normal and anomaly classes.

The methodology involved the following steps:

### Data Preprocessing:

The dataset was initially categorized into negative-only (0, indicating good credit) and positive-only (1, indicating bad credit) classes for both feature variables (X) and the target variable (y).

In [6]:
X_neg_only = X[y == 0]
y_neg_only = y[y == 0]
print(np.shape(X_neg_only))
print(np.shape(y_neg_only))

X_pos_only = X[y == 1]
y_pos_only = y[y == 1]
print(np.shape(X_pos_only))
print(np.shape(y_pos_only))

X_train, X_test, y_train, y_test = train_test_split(X_neg_only, y_neg_only, test_size=0.2, random_state=42)

(700, 24)
(700,)
(300, 24)
(300,)


### Data Splitting:

The dataset was divided into training and testing sets, utilizing only the negative class (0) to train the model. This emphasizes learning from instances of good credit.

### Model Evaluation:

The trained model was then tested on both the negative (good credit) and positive (bad credit) classes to assess its performance in distinguishing between normal and anomalous instances.
This one-class learning approach aims to enhance the model's ability to identify potential credit risks by focusing on the characteristics of good credit instances during the training phase. The inclusion of both positive and negative classes in the testing phase allows for a comprehensive evaluation of the model's generalization to different credit risk scenarios.

In [7]:
print(np.shape(X_train))
print(np.shape(X_test))
print(np.shape(y_test))

# Adding the positive labels to full X_test
X_test_both = np.vstack((X_test, X_pos_only))
print(np.shape(X_test_both))

print(len(y_test))
print(len(y_pos_only))

# Adding the positive labels to full t_test
y_test_both = np.concatenate((y_test, y_pos_only))
print(np.shape(y_test_both))

(560, 24)
(140, 24)
(140,)
(440, 24)
140
300
(440,)


### Methadology

To effectively assess credit risk using the one-class learning approach, three distinct anomaly detection algorithms—Local Outlier Factor, Isolation Forest, and One-Class Support Vector Machine (SVM)—were implemented. The primary goal was to evaluate their performance on key metrics such as recall, precision, F-1 score, and ROC-AUC. The metrics were computed using three different aggregation methods: micro, macro, and weighted.

### Performance Evaluation:

The implemented algorithms were rigorously evaluated based on the following metrics:

- Accuracy: The ratio of correctly predicted instances to the total number of instances, offering a comprehensive measure of overall model correctness.
- Recall: The ability of the model to correctly identify true positives among all actual positive instances.
- Precision: The proportion of correctly identified positive instances among all instances predicted as positive.
- F-1 Score: The harmonic mean of precision and recall, providing a balanced measure of the model's performance.
- ROC-AUC Score: The Area Under the Receiver Operating Characteristic curve, assessing the model's ability to distinguish between positive and negative instances across different thresholds.

### Aggregation Methods:

The computed metrics were aggregated using three different methods:

- Macro: Calculating metrics independently for each class and then averaging them.
- Weighted: Computing a weighted average based on the number of instances in each class.
This comprehensive evaluation strategy aims to provide insights into the strengths and weaknesses of each algorithm under consideration, facilitating the selection of the most suitable approach for credit risk assessment based on the defined evaluation criteria.

### Local Outlier Factor (LOF):

LOF, a density-based anomaly detection algorithm, was implemented with varying parameters to identify potential outliers in the credit risk dataset.

In [100]:
# Local Outlier Factor

n_neig = [5, 10, 15, 20, 25, 50, 100]
result_lof =[]

for k in n_neig:
    lof = LocalOutlierFactor(n_neighbors = k, novelty = True)
    lof.fit(X_train)
    preds = lof.predict(X_test_both)
    #print(preds)

    preds_new = np.where(preds, preds == 1, 0)
    preds_new = np.where(preds_new, preds_new == -1, 1)
    #print(preds_new)
    
    print("Neighbors:", k)
    
    print("Precision:", precision_score(y_test_both, preds_new, average = "weighted"))
    print("Recall:", recall_score(y_test_both, preds_new, average = "weighted"))
    print("F1 Score:", f1_score(y_test_both, preds_new, average = "weighted"))
    print()

    # Classification Report
    report = classification_report(y_test_both, preds_new, target_names=['Good', 'Bad'], output_dict=True)
    
    resultlof_dict = {
        "N Neighbors": k,
        "Accuracy": report['accuracy'],
        "Precision (Macro)": report['macro avg']['precision'],
        "Recall (Macro)": report['macro avg']['recall'],
        "F1 Score (Macro)": report['macro avg']['f1-score'],
        "Precision (Weighted)": report['weighted avg']['precision'],
        "Recall (Weighted)": report['weighted avg']['recall'],
        "F1 Score (Weighted)": report['weighted avg']['f1-score'],
        "ROC-AUC": roc_auc_score(y_test_both, preds)
    }

    result_lof.append(resultlof_dict)

# Create a DataFrame
resultlof_df = pd.DataFrame(result_lof)

# Display the DataFrame
resultlof_df


Neighbors: 5
Precision: 0.7851719046614638
Recall: 0.3386363636363636
F1 Score: 0.19574367448136853

Neighbors: 10
Precision: 0.7854122621564482
Recall: 0.3409090909090909
F1 Score: 0.20028811030508822

Neighbors: 15
Precision: 0.7856537401991947
Recall: 0.3431818181818182
F1 Score: 0.20480604392697027

Neighbors: 20
Precision: 0.7863849765258216
Recall: 0.35
F1 Score: 0.21820350656073464

Neighbors: 25
Precision: 0.7868782161234992
Recall: 0.35454545454545455
F1 Score: 0.22700748394257686

Neighbors: 50
Precision: 0.7636534839924669
Recall: 0.375
F1 Score: 0.2683776564858901

Neighbors: 100
Precision: 0.7411794452872759
Recall: 0.3977272727272727
F1 Score: 0.31370682103819936



Unnamed: 0,N Neighbors,Accuracy,Precision (Macro),Recall (Macro),F1 Score (Macro),Precision (Weighted),Recall (Weighted),F1 Score (Weighted),ROC-AUC
0,5,0.338636,0.662413,0.515,0.27431,0.785172,0.338636,0.195744,0.485
1,10,0.340909,0.662791,0.516667,0.277872,0.785412,0.340909,0.200288,0.483333
2,15,0.343182,0.66317,0.518333,0.281415,0.785654,0.343182,0.204806,0.481667
3,20,0.35,0.664319,0.523333,0.291936,0.786385,0.35,0.218204,0.476667
4,25,0.354545,0.665094,0.526667,0.29886,0.786878,0.354545,0.227007,0.473333
5,50,0.375,0.649762,0.539762,0.330867,0.763653,0.375,0.268378,0.460238
6,100,0.397727,0.635094,0.552619,0.365611,0.741179,0.397727,0.313707,0.447381


### Isolation Forest:

The Isolation Forest algorithm, known for its ability to isolate anomalies efficiently, was applied with different configurations to capture the peculiarities of credit risk instances.

In [99]:
# Isolation Forest 
n_est = [50, 100, 150, 200]
result_iso = []

for est in n_est:
    isf = IsolationForest(random_state=0, n_estimators = est).fit(X_train)
    preds = isf.predict(X_test_both)
    #print(preds)

    preds_new = np.where(preds, preds == 1, 0)
    preds_new = np.where(preds_new, preds_new == -1, 1)
    #print(preds_new)

    print("Estimator:", est)
    print("Precision:", precision_score(y_test_both, preds_new, average = "weighted"))
    print("Recall:", recall_score(y_test_both, preds_new, average = "weighted"))
    print("F1 Score:", f1_score(y_test_both, preds_new, average = "weighted"))   
    print()

# Classification Report
    report = classification_report(y_test_both, preds_new, target_names=['Good', 'Bad'], output_dict=True)
    
    resultiso_dict = {
        "N Estimators": est,
        "Accuracy": report['accuracy'],
        "Precision (Macro)": report['macro avg']['precision'],
        "Recall (Macro)": report['macro avg']['recall'],
        "F1 Score (Macro)": report['macro avg']['f1-score'],
        "Precision (Weighted)": report['weighted avg']['precision'],
        "Recall (Weighted)": report['weighted avg']['recall'],
        "F1 Score (Weighted)": report['weighted avg']['f1-score'],
        "ROC-AUC": roc_auc_score(y_test_both, preds)
    }

    result_iso.append(resultiso_dict)

# Create a DataFrame
resultiso_df = pd.DataFrame(result_iso)

# Display the DataFrame
resultiso_df

Estimator: 50
Precision: 0.6660405709992486
Recall: 0.5772727272727273
F1 Score: 0.5904187908224999

Estimator: 100
Precision: 0.6617382849654415
Recall: 0.5613636363636364
F1 Score: 0.5735355656982065

Estimator: 150
Precision: 0.6473829201101929
Recall: 0.5363636363636364
F1 Score: 0.547126141185547

Estimator: 200
Precision: 0.6539902534377673
Recall: 0.5477272727272727
F1 Score: 0.5592250329092434



Unnamed: 0,N Estimators,Accuracy,Precision (Macro),Recall (Macro),F1 Score (Macro),Precision (Weighted),Recall (Weighted),F1 Score (Weighted),ROC-AUC
0,50,0.577273,0.596419,0.61,0.569798,0.666041,0.577273,0.590419,0.39
1,100,0.561364,0.590677,0.602143,0.556301,0.661738,0.561364,0.573536,0.397857
2,150,0.536364,0.575758,0.58381,0.533239,0.647383,0.536364,0.547126,0.41619
3,200,0.547727,0.582553,0.592143,0.543766,0.65399,0.547727,0.559225,0.407857


### One-Class Support Vector Machine (SVM):

The One-Class SVM, a support vector machine variant designed for one-class classification, was employed to discern normal credit instances from potential risks.

In [98]:
# One Class SVM

kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = []

for ker in kernels:
    ocsvm = OneClassSVM(gamma='auto', kernel=ker).fit(X_train)
    preds = ocsvm.predict(X_test_both)
#    print(preds)

    preds_new = np.where(preds, preds == 1, 0)
    preds_new = np.where(preds_new, preds_new == -1, 1)
#    print(preds_new)

    print("Kernel:", ker)
    print("Precision:", precision_score(y_test_both, preds_new, average = "weighted"))
    print("Recall:", recall_score(y_test_both, preds_new, average = "weighted"))
    print("F1 Score:", f1_score(y_test_both, preds_new, average = "weighted"))
    print()


    # Classification Report
    report = classification_report(y_test_both, preds_new, target_names=['Good', 'Bad'], output_dict=True)
    
    result_dict = {
        "Kernel": ker,
        "Accuracy": report['accuracy'],
        "Precision (Macro)": report['macro avg']['precision'],
        "Recall (Macro)": report['macro avg']['recall'],
        "F1 Score (Macro)": report['macro avg']['f1-score'],
        "Precision (Weighted)": report['weighted avg']['precision'],
        "Recall (Weighted)": report['weighted avg']['recall'],
        "F1 Score (Weighted)": report['weighted avg']['f1-score'],
        "ROC-AUC": roc_auc_score(y_test_both, preds)
    }

    results.append(result_dict)

# Create a DataFrame
results_df = pd.DataFrame(results)

# Display the DataFrame
results_df


Kernel: linear
Precision: 0.5744203412043493
Recall: 0.4863636363636364
F1 Score: 0.5017821011821557

Kernel: poly
Precision: 0.57772974230569
Recall: 0.48863636363636365
F1 Score: 0.5038359442340763

Kernel: rbf
Precision: 0.6307784477427336
Recall: 0.6772727272727272
F1 Score: 0.6197605125166311

Kernel: sigmoid
Precision: 0.4648760330578512
Recall: 0.6818181818181818
F1 Score: 0.5528255528255529



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0,Kernel,Accuracy,Precision (Macro),Recall (Macro),F1 Score (Macro),Precision (Weighted),Recall (Weighted),F1 Score (Weighted),ROC-AUC
0,linear,0.486364,0.507962,0.509048,0.478509,0.57442,0.486364,0.501782,0.490952
1,poly,0.488636,0.511118,0.512619,0.481108,0.57773,0.488636,0.503836,0.487381
2,rbf,0.677273,0.590349,0.540476,0.519739,0.630778,0.677273,0.619761,0.459524
3,sigmoid,0.681818,0.340909,0.5,0.405405,0.464876,0.681818,0.552826,0.5


### Model Performance Analysis

After implementing Local Outlier Factor (LOF), Isolation Forest, and One-Class Support Vector Machine (SVM) with various parameters, a thorough analysis of their performance metrics revealed valuable insights.

Baseline Model - Isolation Forest:

Isolation Forest, although performing the worst among the three models, was selected as the baseline model. Its optimal configuration, with 50 estimators, demonstrated a weighted precision score of 0.666.

One-Class SVM (RBF Kernel):

Among the SVM variants, the One-Class SVM with an RBF kernel emerged as the top performer. It achieved the highest recall (weighted) score of 0.681 and highest precision (weighted) score of 0.630. This suggests its effectiveness in identifying instances of credit risk, especially those that might be missed by other models.

Local Outlier Factor (LOF):

Local Outlier Factor exhibited the best overall performance among the three models. The optimal configuration of Local Outlier Factor (LOF), with 25 neighbors, yielded a remarkable weighted precision score of 0.786. This suggests that LOF excelled in accurately classifying instances, particularly in identifying credit risk cases. The high weighted precision indicates the model's proficiency in minimizing false positives within the credit risk category, showcasing its robustness in correctly identifying instances of anomaly or credit risk, which is crucial for effective credit risk assessment.

These findings indicate that, despite the Isolation Forest serving as the baseline model, both the One-Class SVM with an RBF kernel and Local Outlier Factor outperformed it in specific metrics. The weighted precision of 0.786 achieved by LOF showcases its superiority in handling credit risk scenarios.

### Extended Model Evaluation with COPOD and HBOS

The analysis was further enriched by incorporating COPOD and HBOS models from the prod library. These models were tested with various parameters to assess their performance in the context of credit risk assessment.

In [48]:
import pyod
from pyod import model


COPOD Model:

Cluster-based Outlier Factor with Population Outlier Detection algorithm combines the concepts of cluster-based outlier factors and population outlier detection to identify outliers in a dataset. The COPOD model was implemented with different parameter configurations to identify its optimal settings. Evaluation metrics such as accuracy, recall, precision, F-1, and ROC-AUC were computed, utilizing macro, and weighted aggregation methods.

In [97]:
from pyod.models.copod import COPOD

# Parameters to try
contamination_values = [0.05, 0.1, 0.15]
results_copod = []

for contamination in contamination_values:
    copod_model = COPOD(contamination=contamination)
    copod_model.fit(X_train)
    preds = copod_model.predict(X_test_both)

    print(f"Contamination: {contamination}")
    print("Precision:", precision_score(y_test_both, preds, average = "weighted"))
    print("Recall:", recall_score(y_test_both, preds, average = "weighted"))
    print("F1 Score:", f1_score(y_test_both, preds, average = "weighted"))
    print()

    # Classification Report
    report = classification_report(y_test_both, preds, target_names=['Good', 'Bad'], output_dict=True)
    
    resultcp_dict = {
        "Contamination Values": contamination,
        "Accuracy": report['accuracy'],
        "Precision (Macro)": report['macro avg']['precision'],
        "Recall (Macro)": report['macro avg']['recall'],
        "F1 Score (Macro)": report['macro avg']['f1-score'],
        "Precision (Weighted)": report['weighted avg']['precision'],
        "Recall (Weighted)": report['weighted avg']['recall'],
        "F1 Score (Weighted)": report['weighted avg']['f1-score'],
        "ROC-AUC": roc_auc_score(y_test_both, preds)
    }

    results_copod.append(resultcp_dict)

# Create a DataFrame
results_cp = pd.DataFrame(results_copod)

# Display the DataFrame
results_cp

Contamination: 0.05
Precision: 0.7487433577480971
Recall: 0.35454545454545455
F1 Score: 0.2302912289483162

Contamination: 0.1
Precision: 0.6402746656035632
Recall: 0.3840909090909091
F1 Score: 0.3105814580294732

Contamination: 0.15
Precision: 0.6500911092938266
Recall: 0.4159090909090909
F1 Score: 0.36893435841672595



Unnamed: 0,Contamination Values,Accuracy,Precision (Macro),Recall (Macro),F1 Score (Macro),Precision (Weighted),Recall (Weighted),F1 Score (Weighted),ROC-AUC
0,0.05,0.354545,0.636914,0.524762,0.30079,0.748743,0.354545,0.230291,0.524762
1,0.1,0.384091,0.557967,0.527381,0.357816,0.640275,0.384091,0.310581,0.527381
2,0.15,0.415909,0.56756,0.543095,0.402044,0.650091,0.415909,0.368934,0.543095


For COPOD model, accuracy increased with higher contamination values. Precision (weighted) was highest 0.75 at 0.05 contamination value. ROC-AUC also increased with higher contamination values however, precision decreased.

HBOS Model:

Histogram-based Outlier Score is an unsupervised anomaly detection algorithm that calculates an outlier score based on the histogram of feature values. The HBOS model, known for its simplicity and efficiency, was also employed with varying parameters to explore its performance. Similar to other models, performance metrics were assessed using macro, and weighted aggregation methods.

In [96]:
from pyod.models.hbos import HBOS

# Parameters to try
contamination_values = [0.05, 0.1, 0.15]
num_bins_values = [20, 30, 40]  

results_hb = []

for contamination in contamination_values:
    for num_bins in num_bins_values:
        hbos_model = HBOS(contamination=contamination, n_bins=num_bins)
        hbos_model.fit(X_train)
        preds_hb = hbos_model.predict(X_test_both)

        print(f"Contamination: {contamination}, Bins: {num_bins}")
        print("Precision:", precision_score(y_test_both, preds_hb, average = "weighted"))
        print("Recall:", recall_score(y_test_both, preds_hb, average = "weighted"))
        print("F1 Score:", f1_score(y_test_both, preds_hb, average = "weighted"))
        print()

        # Classification Report
        report = classification_report(y_test_both, preds_hb, target_names=['Good', 'Bad'], output_dict=True)
    
        resulthb_dict = {
            "Contamination Values": contamination,
            "Bin Values": num_bins,
            "Accuracy": report['accuracy'],
            "Precision (Macro)": report['macro avg']['precision'],
            "Recall (Macro)": report['macro avg']['recall'],
            "F1 Score (Macro)": report['macro avg']['f1-score'],
            "Precision (Weighted)": report['weighted avg']['precision'],
            "Recall (Weighted)": report['weighted avg']['recall'],
            "F1 Score (Weighted)": report['weighted avg']['f1-score'],
            "ROC-AUC": roc_auc_score(y_test_both, preds)
        }

        results_hb.append(resulthb_dict)

# Create a DataFrame
results_hbos = pd.DataFrame(results_hb)

# Display the DataFrame
results_hbos

Contamination: 0.05, Bins: 20
Precision: 0.6595617188837527
Recall: 0.3568181818181818
F1 Score: 0.24709409740184335

Contamination: 0.05, Bins: 30
Precision: 0.6855846601609313
Recall: 0.3613636363636364
F1 Score: 0.252414987172855

Contamination: 0.05, Bins: 40
Precision: 0.6444904621345598
Recall: 0.3568181818181818
F1 Score: 0.2500901423170412

Contamination: 0.1, Bins: 20
Precision: 0.6190041233884858
Recall: 0.375
F1 Score: 0.29796728883708634

Contamination: 0.1, Bins: 30
Precision: 0.6133119753809408
Recall: 0.3886363636363636
F1 Score: 0.3291877807439615

Contamination: 0.1, Bins: 40
Precision: 0.6155107778819119
Recall: 0.37727272727272726
F1 Score: 0.30414944903581265

Contamination: 0.15, Bins: 20
Precision: 0.6070270942303765
Recall: 0.4113636363636364
F1 Score: 0.3765343419572393

Contamination: 0.15, Bins: 30
Precision: 0.6139393939393939
Recall: 0.41818181818181815
F1 Score: 0.3862137862137862

Contamination: 0.15, Bins: 40
Precision: 0.5971586882288512
Recall: 0.402272

Unnamed: 0,Contamination Values,Bin Values,Accuracy,Precision (Macro),Recall (Macro),F1 Score (Macro),Precision (Weighted),Recall (Weighted),F1 Score (Weighted),ROC-AUC
0,0.05,20,0.356818,0.570846,0.51881,0.311401,0.659562,0.356818,0.247094,0.5
1,0.05,30,0.361364,0.590575,0.524048,0.316268,0.685585,0.361364,0.252415,0.5
2,0.05,40,0.356818,0.559569,0.516905,0.313103,0.64449,0.356818,0.25009,0.5
3,0.1,20,0.375,0.541234,0.51881,0.347072,0.619004,0.375,0.297967,0.5
4,0.1,30,0.388636,0.537472,0.52119,0.369322,0.613312,0.388636,0.329188,0.5
5,0.1,40,0.377273,0.53866,0.518571,0.351326,0.615511,0.377273,0.304149,0.5
6,0.15,20,0.411364,0.533538,0.524524,0.402697,0.607027,0.411364,0.376534,0.5
7,0.15,30,0.418182,0.539365,0.529524,0.41057,0.613939,0.418182,0.386214,0.5
8,0.15,40,0.402273,0.525311,0.517857,0.39207,0.597159,0.402273,0.363432,0.5


For HBOS model, accuracy, weighted recall and F-1 scores increased higher bin and contamination values. Highest precision (weighted) is 0.685 with 0.05 contamination and 30 bins. ROC-AUC scores are consistently around 0.5, indicating that the models are not effectively distinguishing between positive and negative instances

Both models show a trade-off between precision and recall. The performance of both models depends on the choice of hyperparameters, specifically contamination values. As contamination values increase, the models become more sensitive to anomalies but might generate more false positives. COPOD generally outperformed HBOS in terms of precision and ROC-AUC scores.

#### Challenges Faced: ABOD Model

While exploring different anomaly detection models, attempts were made to implement the Angle-Based Outlier Detection (ABOD) model. However, it is noteworthy that the ABOD model encountered technical issues and did not run successfully on the system

### Continual Learning Approach

The purpose of continual learning is to contuniously adapt to new challenges in dynamic environments while retaining past knowledge. It is important to model the normal behavior and identify anomalies as observations that differ from the modeled behavior. 

### Scenarios for Continual Learning

The remaining data is divided into 'normal' and 'anomaly' subsets based on the binary 'anomaly' column which includes "good" and "bad" credit risks. K-Means clustering is separately applied to both the 'normal' and 'anomaly' subsets, creating clusters for each. The function iterates over the clusters, creating training and evaluation batches for the 'normal' subset. For the 'anomaly' subset, anomalies are only included in the evaluation batches. This process results in distinct clusters for normal and anomaly instances, facilitating the creation of batches for continual learning.

In [6]:
import sys
from sklearn.cluster import KMeans

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(axis=1)
    return df[indices_to_keep].astype(np.float64)

####################################################################################
def clustering(filename, train_eval_perc, n_clusters, dataset_name, drop_attrs):

    data = pd.read_csv(filename)

    for a in drop_attrs:
        data.drop(columns=a, inplace=True)

    print(data)
    print(data.head)
    normal = data[data["anomaly"] == 0]
    print(f'Normal {len(normal)}')

    anomalies = data[data["anomaly"] == 1]
    print(f'Anomalies {len(anomalies)}')

    normal = clean_dataset(normal)
    anomalies = clean_dataset(anomalies)

    clust = KMeans(n_clusters=n_clusters)

    normal_clustering = clust.fit(normal)
    normal_ids = normal_clustering.predict(normal)
    print(f'{len(set(normal_ids))} clusters extracted for Normal')
    print(normal_ids)

    anomalies_clustering = clust.fit(anomalies)
    anomalies_ids = anomalies_clustering.predict(anomalies)
    print(f'{len(set(anomalies_ids))} clusters extracted for Anomalies')
    print(anomalies_ids)

    # TODO: closest, random : add param

    for c in range(len(set(normal_ids))):
        print(f'Cluster {c}')
        normal_idx = [i for i in range(len(normal_ids)) if normal_ids[i] == c]
        normal_data_c = normal.iloc[normal_idx]
        print(len(normal_data_c))

        # Divide normal in training and evaluation batches
        normal_train, normal_eval = divide(normal_data_c, train_eval_perc)
        print(f'Training: {train_eval_perc}% - {len(normal_train)} - Evaluation: {100-train_eval_perc}% - {len(normal_eval)}')
        normal_train.to_csv(f'{dataset_name}_{c}_train.csv')

        # Anomalies are only in evaluation batches
        anomalies_idx = [i for i in range(len(anomalies_ids)) if anomalies_ids[i] == c]
        anomalies_data_c = anomalies.iloc[anomalies_idx]
        print(f'Anomalies: {len(anomalies_data_c)}')

        eval_batch_c = pd.concat([normal_eval, anomalies_data_c])
        print(f'Evaluation batch: {len(eval_batch_c)}')
        eval_batch_c.to_csv(f'{dataset_name}_{c}_eval.csv')


####################################################################################
def divide(cluster_data, perc):
    num_rows = cluster_data.shape[0]
    num_rows_p1 = int(np.rint((num_rows / 100) * perc))

    part_1 = cluster_data.iloc[0:num_rows_p1]
    part_2 = cluster_data.iloc[num_rows_p1 + 1:]

    return part_1, part_2

####################################################################################
def reduce_anomalies(filename, dataset_name, num_tasks, anomaly_ratio):

    for i in range(num_tasks): # Open files containing anomalies (eval batches)
        eval_b = pd.read_csv(f'{dataset_name}_{i}_eval.csv')
        eval_labels = eval_b["anomaly"]

        normal = eval_b[eval_b["anomaly"] == 0]
        print(f'Normal {len(normal)}')

        anomalies = eval_b[eval_b["anomaly"] == 1]
        print(f'Anomalies {len(anomalies)}')

        total_size = len(normal) + len(anomalies)
        curr_anom_ratio = len(anomalies) / total_size
        print(f'Curr anom ratio {curr_anom_ratio}')

        #num_to_sample = anomaly_ratio * total_size
        # rand_indices = np.random.choice(range(len(anomalies)), num_to_sample, replace=False)
        # sampled_anomalies = anomalies.iloc[rand_indices]
        sampled_anomalies = anomalies.sample(frac=anomaly_ratio)

        new_batch = pd.concat([normal, sampled_anomalies])
        new_batch.to_csv(f'{dataset_name}-r-{anomaly_ratio}_{i}_eval.csv')

In [4]:
# Saving the dataset with the updated labels
ccard.to_csv("german_creditcard1.csv", index = False)

In [9]:
pd.read_csv("german_creditcard1.csv")

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,anomaly
0,1,6,4,12,5,5,3,4,1,67,...,0,0,1,0,0,1,0,0,1,0
1,2,48,2,60,1,3,2,2,1,22,...,0,0,1,0,0,1,0,0,1,1
2,4,12,4,21,1,4,3,3,1,49,...,0,0,1,0,0,1,0,1,0,0
3,1,42,2,79,1,4,3,4,2,45,...,0,0,0,0,0,0,0,0,1,0
4,1,24,3,49,1,3,3,4,4,53,...,1,0,1,0,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,4,12,2,17,1,4,2,4,1,31,...,0,0,1,0,0,1,0,1,0,0
996,1,30,2,39,1,3,1,4,2,40,...,0,1,1,0,0,1,0,0,0,0
997,4,12,2,8,1,5,3,4,3,38,...,0,0,1,0,0,1,0,0,1,0
998,1,45,2,18,1,3,3,4,4,23,...,0,0,1,0,0,0,0,0,1,1


In [10]:
filename = "german_creditcard1.csv"
dataset_name = "ccard"
drop_attrs = []


n_clusters = 5
train_eval_perc = 70

clustering(filename, train_eval_perc, n_clusters,  dataset_name, drop_attrs)

     0   1  2   3  4  5  6  7  8   9  ...  15  16  17  18  19  20  21  22  23  \
0    1   6  4  12  5  5  3  4  1  67  ...   0   0   1   0   0   1   0   0   1   
1    2  48  2  60  1  3  2  2  1  22  ...   0   0   1   0   0   1   0   0   1   
2    4  12  4  21  1  4  3  3  1  49  ...   0   0   1   0   0   1   0   1   0   
3    1  42  2  79  1  4  3  4  2  45  ...   0   0   0   0   0   0   0   0   1   
4    1  24  3  49  1  3  3  4  4  53  ...   1   0   1   0   0   0   0   0   1   
..  ..  .. ..  .. .. .. .. .. ..  ..  ...  ..  ..  ..  ..  ..  ..  ..  ..  ..   
995  4  12  2  17  1  4  2  4  1  31  ...   0   0   1   0   0   1   0   1   0   
996  1  30  2  39  1  3  1  4  2  40  ...   0   1   1   0   0   1   0   0   0   
997  4  12  2   8  1  5  3  4  3  38  ...   0   0   1   0   0   1   0   0   1   
998  1  45  2  18  1  3  3  4  4  23  ...   0   0   1   0   0   0   0   0   1   
999  2  45  4  46  2  1  3  4  3  27  ...   0   1   1   0   0   1   0   0   1   

     anomaly  
0          0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True)


5 clusters extracted for Normal
[1 1 3 2 1 3 1 4 4 1 3 0 1 1 1 1 4 4 4 4 4 4 0 3 1 4 3 4 4 0 4 3 3 4 0 4 3
 4 0 3 4 0 4 2 4 4 1 0 3 4 4 0 3 1 1 3 1 0 2 0 1 4 1 1 4 4 4 1 1 0 1 1 0 0
 3 4 0 4 4 4 3 3 4 4 4 3 4 0 4 4 0 4 1 4 4 4 2 0 1 2 4 3 0 1 4 3 0 4 0 4 1
 3 1 4 4 3 3 3 1 1 0 4 0 4 4 3 4 4 4 0 4 4 1 1 4 1 0 4 1 0 4 4 0 1 4 3 4 2
 1 4 3 1 0 0 0 4 0 0 0 1 4 4 4 0 0 0 0 1 4 4 4 4 0 1 4 0 4 1 4 4 4 1 0 4 3
 0 0 1 4 1 3 1 4 3 4 4 0 4 2 4 4 1 4 0 1 1 1 0 2 3 3 1 4 3 3 3 0 1 1 0 1 1
 4 0 4 3 3 0 4 4 0 4 0 0 0 4 0 3 4 0 4 3 4 0 4 3 0 0 0 0 1 4 0 4 4 0 1 1 4
 1 0 4 4 4 0 0 0 0 3 4 2 4 4 1 4 4 4 0 4 4 3 4 4 4 4 0 4 4 2 1 0 1 4 4 4 0
 1 4 0 4 3 4 3 3 0 0 4 1 4 0 0 1 4 1 4 0 4 4 4 1 4 4 0 4 4 1 2 4 0 0 0 0 4
 0 0 4 0 4 4 1 3 0 0 4 0 3 4 1 0 4 1 4 1 1 1 4 0 4 4 4 4 4 4 4 3 1 4 4 2 0
 0 4 1 4 4 1 4 3 0 4 3 4 4 4 0 4 4 0 1 0 0 1 1 4 4 2 4 1 4 1 1 4 0 4 0 0 3
 3 4 4 0 0 4 4 1 4 1 4 4 1 1 4 0 1 4 0 1 4 1 0 0 2 2 0 1 0 4 4 4 0 1 0 4 1
 4 2 4 4 4 4 0 3 0 0 2 0 3 4 1 4 1 3 0 0 0 0 3 2 4 0 3 1 4 1 4 4 3 2

In [15]:
from sklearn.metrics import precision_recall_fscore_support, roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix
from joblib import dump, load
from sklearn.cluster import SpectralClustering, OPTICS
from utils import *
import pickle
import csv
import sys
import os

The implementation of continual learning scenario for anomaly detection is presented below, employing models like Local Outlier Factor and Isolation Forest. It sequentially trains these models on tasks and evaluates their performance using ROC-AUC scores. Different strategies, such as "Naive", which forgets the previous task and considered as the baseline, "Replay", which is the summarized version of the task, and "Cumulative", which uses all the tasks with low forgetting and slow training, are explored. The evaluation includes metrics like Lifelong ROC (LROC), Backward Transfer (BWT), and Forward Transfer (FWT). Results are logged, and heatmaps illustrating ROC-AUC scores are generated for analysis and reporting.

In [17]:
num_tasks = 5

# dataset_name = "pv-italy"
# drop_cols = ["anomaly", "id"]

eval_standard_metrics = False
dataset_name = "ccard"
drop_cols = ["anomaly"]

models = [LocalOutlierFactor(n_neighbors=10, novelty=True),
          #LocalOutlierFactor(n_neighbors=25, novelty=True),
          #LocalOutlierFactor(n_neighbors=50, novelty=True),
          #LocalOutlierFactor(n_neighbors=100, novelty=True),
            IsolationForest(random_state=0),
        #    ABOD(contamination=0.01),
        #    HBOS(contamination=0.01)
          #AutoEncoder(hidden_neurons=[20, 10, 5, 10, 20], epochs=20, contamination=0.05, batch_size=64, verbose=True)
          # OneClassSVM(gamma='auto', kernel='rbf'),
          # OneClassSVM(gamma='auto', kernel='poly'),
          # OneClassSVM(gamma='auto', kernel='sigmoid'),
          # OneClassSVM(gamma='auto', kernel='linear')
          ]

#strategy = 'Replay Graph'
rep_budget = 0.2
rep_compact = True      # Use a sample from last task instead of full data
const_ratio_replay = False
min_samples_cluster = False
skip_noise = False

strategies = ["Naive", "Replay", "Cumulative"]
#strategies = ["MSTE"]


params = {
    "Naive": "",
    "Replay": f'budget={rep_budget}_compact_{rep_compact}',
    "Replay Graph": f'budget={rep_budget}_compact_{rep_compact}_const_ratio_replay_{const_ratio_replay}_min_samples_cluster_{min_samples_cluster}_skip_noise_{skip_noise}',
    "Cumulative": "",
    "MSTE": ""
}


# params = ''


tasks_train = []
tasks_eval = []
labels_eval = []

MSTE = []

# Load data
for i in range(num_tasks):
    train_b = pd.read_csv(f'{dataset_name}_{i}_train.csv')
    train_b.drop(columns=drop_cols, inplace=True)   # Remove class attribute from training sets
    tasks_train.append(train_b)

    eval_b = pd.read_csv(f'{dataset_name}_{i}_eval.csv')
    eval_labels = eval_b["anomaly"]
    eval_b.drop(columns=drop_cols, inplace=True)
    tasks_eval.append(eval_b)
    labels_eval.append(eval_labels)

# Run scenario (train and predict)
for s in strategies:
    print(f'Strategy: {s}')
    logger = []
    logger.append('strategy,params,model,LROC,BWT,FWT')

    for m in models:
        clf = m
        mat_f1 = np.zeros((num_tasks, num_tasks))
        mat_roc = np.zeros((num_tasks, num_tasks))
        print(f'__________________\n{m}\n__________________\n')

        for i in range(num_tasks):
            print(f'Current task: {i}')
            train_b = tasks_train[i]

            if s == 'Naive':
                print(f'Train {len(train_b)}')
                clf = m.fit(train_b)
            elif s == 'Cumulative':
                if i > 0:
                    train_tasks_dfs = []
                    for k in range(0, i+1):  # including current task
                        train_tasks_dfs.append(tasks_train[k])
                    train_b = pd.concat(train_tasks_dfs)
                print(f'Train {len(train_b)}')
                clf = clf.fit(train_b)
            elif s == 'MSTE':
                clf = m.fit(train_b)
                MSTE.append(clf)
            elif s == 'Replay':
                if i > 0:
                    train_tasks_dfs = []
                    for k in range(0, i):
                        rb = tasks_train[k].sample(frac=rep_budget)
                        train_tasks_dfs.append(rb)
                    if rep_compact:
                        train_tasks_dfs.append(tasks_train[i].sample(frac=rep_budget))
                    else:
                        train_tasks_dfs.append(tasks_train[i])
                    train_b = pd.concat(train_tasks_dfs)
                print(f'Train {len(train_b)}')
                clf = clf.fit(train_b)
            elif s == 'Replay Graph':    # Just clusters task i-1 and appends it to previous clusters
                if i == 0:
                    rb_prev_tasks = []
                if i > 0:
                    train_tasks_dfs = []
                    c_sampled_all = []          # Clusters for considered task
                    budget = int(rep_budget * len(tasks_train[i-1]))    # Budget for all clusters within a single task
                    budget_left = budget
                    if min_samples_cluster:
                        clustering = OPTICS(min_samples=int(np.sqrt(len(tasks_train[i-1]))/2)).fit_predict(tasks_train[i-1])  # Clusters prev task
                    else:
                        clustering = OPTICS(min_samples=20).fit_predict(tasks_train[i-1])  # Clusters prev task

                    budget_per_cluster = int(budget / len(np.unique(clustering)))

                    while budget_left > 0:
                        print(f'Budget still available for task: {budget_left}')
                        for c in np.unique(clustering):
                            if skip_noise:
                                if c == -1:
                                    budget_per_cluster = int(budget / len(np.unique(clustering) - 1))
                                    continue        # Discarding noise cluster
                            print(f'Sampling from cluster {c}...')
                            c_indices = [i for i in range(len(clustering)) if clustering[i] == c]
                            c_data = tasks_train[i-1].iloc[c_indices]
                            print(len(c_data))
                            if const_ratio_replay:  # vanilla: sample frac of data without caring about num of samples
                                c_sampled = c_data.sample(frac=rep_budget)
                            else:
                                if len(c_data) >= budget_per_cluster:
                                    c_sampled = c_data.sample(budget_per_cluster) # chance of smaller cluster than budget
                                else:
                                    c_sampled = c_data

                            budget_left = budget_left - len(c_sampled)
                            c_sampled_all.append(c_sampled)

                    rb = pd.concat(c_sampled_all)           # Append all clusters for a single task (previous)
                    rb_prev_tasks.append(rb)                # Append all clusters for a single task to RB of previous tasks
                    rb_all_except_last = pd.concat(rb_prev_tasks)
                    print(f'RB all except last size: {np.shape(rb_all_except_last)}')
                    if rep_compact:
                        train_b = rb_all_except_last.append(tasks_train[i].sample(frac=rep_budget))    # Append sampled current task to clustered prev tasks
                    else:
                        train_b = rb_all_except_last.append(tasks_train[i])    # Append full current task to clustered prev tasks
                    print(f'RB all including last size: {np.shape(train_b)}')
                print(f'Train {len(train_b)}')

                clf = clf.fit(train_b)
            else:
                print('Unknown strategy')
                sys.exit(1)

            # Evaluation on all tasks (all strategies except MSTE)
            if 'MSTE' not in s:
                for j in range(num_tasks):
                    eval_b = tasks_eval[j]
                    print(f'Eval {j} - Len: {len(eval_b)}')
                    eval_labels_b = labels_eval[j]
                    print(f'{sum(1 for anom in eval_labels_b if anom == 1)}/{len(eval_b)} anomalies')

                    preds_raw = clf.decision_function(eval_b)
                    ROC = roc_auc_score(eval_labels_b, preds_raw)
                    print(f'ROC: {ROC}')

                    preds = clf.predict(eval_b)

                    if not str(m).__contains__("contamination"):
                        np.place(preds, preds == 1, 0)
                        np.place(preds, preds == -1, 1)

                    # print(preds)
                    # print(eval_labels_b)

                    if eval_standard_metrics:
                        cf = confusion_matrix(eval_labels_b, preds, labels=[0, 1])
                        print(cf)

                        [precision_micro, recall_micro, fscore_micro, support_RF_micro] = precision_recall_fscore_support(eval_labels_b, preds, average='micro')
                        [precision_macro, recall_macro, fscore_macro, support_RF_macro] = precision_recall_fscore_support(eval_labels_b, preds, average='macro')
                        [precision_weighted, recall_weighted, fscore_weighted, support_RF_weighted] = precision_recall_fscore_support(eval_labels_b, preds, average='weighted')
                        print(f'{precision_micro},{recall_micro},{fscore_micro}')
                        print(f'{precision_macro},{recall_macro},{fscore_macro}')
                        print(f'{precision_weighted},{recall_weighted},{fscore_weighted}')
                        mat_f1[i][j] = np.round(fscore_weighted, 2)

                    mat_roc[i][j] = np.round(ROC, 2)

        # Special out-of-loop evaluation (MSTE only)
        if 'MSTE' in s:
            for i in range(num_tasks):
                for j in range(num_tasks):
                    clf = MSTE[j]       # Select expert model from the pool
                    eval_b = tasks_eval[j]
                    print(f'Eval {j} - Len: {len(eval_b)}')
                    eval_labels_b = labels_eval[j]
                    print(f'{sum(1 for anom in eval_labels_b if anom == 1)}/{len(eval_b)} anomalies')

                    preds_raw = clf.decision_function(eval_b)
                    ROC = roc_auc_score(eval_labels_b, preds_raw)
                    print(f'ROC: {ROC}')

                    preds = clf.predict(eval_b)

                    if not str(m).__contains__("contamination"):
                        np.place(preds, preds == 1, 0)
                        np.place(preds, preds == -1, 1)

                    # print(preds)
                    # print(eval_labels_b)

                    if eval_standard_metrics:
                        cf = confusion_matrix(eval_labels_b, preds, labels=[0, 1])
                        print(cf)

                        [precision_micro, recall_micro, fscore_micro, support_RF_micro] = precision_recall_fscore_support(
                            eval_labels_b, preds, average='micro')
                        [precision_macro, recall_macro, fscore_macro, support_RF_macro] = precision_recall_fscore_support(
                            eval_labels_b, preds, average='macro')
                        [precision_weighted, recall_weighted, fscore_weighted,
                         support_RF_weighted] = precision_recall_fscore_support(eval_labels_b, preds, average='weighted')
                        print(f'{precision_micro},{recall_micro},{fscore_micro}')
                        print(f'{precision_macro},{recall_macro},{fscore_macro}')
                        print(f'{precision_weighted},{recall_weighted},{fscore_weighted}')
                        mat_f1[i][j] = np.round(fscore_weighted, 2)

                    mat_roc[i][j] = np.round(ROC, 2)

        # matrices_f1["naive"][str(m)] = mat_f1
        # matrices_roc["naive"][str(m)] = mat_roc

        print(mat_roc)

        l_roc = lifelong_roc(mat_roc)
        bwt = backward_transfer(mat_roc)
        fwt = forward_transfer(mat_roc)

        heatmap(mat_roc, dataset_name, s, params[s], str(m), "f")
        heatmap(mat_roc, dataset_name, s, params[s], str(m), "b")
        heatmap(mat_roc, dataset_name, s, params[s], str(m), "all")

        if "AutoEncoder" in str(m):
            m = "AutoEncoder"

        print('strategy,params,model,LROC,BWT,FWT')
        print(f'{s},{params[s]},{m},{l_roc},{bwt},{fwt}')
        logger.append(f'{s},{params[s]},{str(m).replace(",","").replace(" ","_")},{l_roc},{bwt},{fwt}')

        with open(f'logs/{dataset_name}_{s}_{params[s]}_{m}_rocmat.pkl', "wb") as fp:
            pickle.dump(mat_roc, fp)

    print(logger)

    if os.path.isfile(f'logs/{dataset_name}_{s}_{params[s]}_metrics.csv'):
        with(open(f'logs/{dataset_name}_{s}_{params[s]}_metrics.csv', "a")) as f:
            for line in logger:
                f.write(f'{line}\n')
        f.close()
    else:
        np.savetxt(f'logs/{dataset_name}_{s}_{params[s]}_metrics.csv', logger, delimiter=',', fmt='%s')

    # Load saved ROC matrix and print
    with open(f'logs/{dataset_name}_{s}_{params[s]}_{m}_rocmat.pkl', "rb") as fp:
        b = pickle.load(fp)
        print(b)

# TODO: naive, replay, and cumulative: train together or separately?
# TODO: latex code for tables (acc, bwt, fwt) based on previous papers

Strategy: Naive
__________________
LocalOutlierFactor(n_neighbors=10, novelty=True)
__________________

Current task: 0
Train 127
Eval 0 - Len: 104
50/104 anomalies
ROC: 0.7970370370370371
Eval 1 - Len: 159
120/159 anomalies
ROC: 0.864957264957265
Eval 2 - Len: 29
22/29 anomalies
ROC: 0.8376623376623378
Eval 3 - Len: 101
77/101 anomalies
ROC: 0.821969696969697
Eval 4 - Len: 111
31/111 anomalies
ROC: 0.769758064516129
Current task: 1
Train 94
Eval 0 - Len: 104
50/104 anomalies
ROC: 0.8096296296296297
Eval 1 - Len: 159
120/159 anomalies
ROC: 0.8762820512820512
Eval 2 - Len: 29
22/29 anomalies
ROC: 0.8116883116883117
Eval 3 - Len: 101
77/101 anomalies
ROC: 0.8214285714285715
Eval 4 - Len: 111
31/111 anomalies
ROC: 0.8419354838709677
Current task: 2
Train 20
Eval 0 - Len: 104
50/104 anomalies
ROC: 0.8062962962962963
Eval 1 - Len: 159
120/159 anomalies
ROC: 0.8476495726495726
Eval 2 - Len: 29
22/29 anomalies
ROC: 0.8441558441558441
Eval 3 - Len: 101
77/101 anomalies
ROC: 0.7738095238095237


<Figure size 432x432 with 0 Axes>

In [101]:
naive_metrics = pd.read_csv("logs/ccard_Naive__metrics.csv")
cumulative_metrics = pd.read_csv("logs/ccard_Cumulative__metrics.csv")
replay_metrics = pd.read_csv("logs/ccard_Replay_budget=0.2_compact_True_metrics.csv")

In [102]:
naive_metrics

Unnamed: 0,strategy,params,model,LROC,BWT,FWT
0,Naive,,LocalOutlierFactor(n_neighbors=10_novelty=True),0.810667,-0.018,0.802
1,Naive,,IsolationForest(random_state=0),0.446667,0.096,0.447


In [103]:
cumulative_metrics

Unnamed: 0,strategy,params,model,LROC,BWT,FWT
0,Cumulative,,LocalOutlierFactor(n_neighbors=10_novelty=True),0.831333,0.007,0.815
1,Cumulative,,IsolationForest(random_state=0),0.416667,0.013,0.462


In [104]:
replay_metrics

Unnamed: 0,strategy,params,model,LROC,BWT,FWT
0,Replay,budget=0.2_compact_True,LocalOutlierFactor(n_neighbors=10_novelty=True),0.846,0.018,0.822
1,Replay,budget=0.2_compact_True,IsolationForest(random_state=0),0.443333,0.017,0.479


The "Replay" strategy tends to perform the best across both models and consistently achieves high LROC scores, indicating good overall performance.

Best Performing Model: The "Local Outlier Factor (LOF)" model generally outperforms the "Isolation Forest" model in terms of LROC across all strategies.

Trade-offs: While "Naive" has a decent LROC for LOF, it experiences negative BWT, indicating some forgetting of the previous tasks. "Replay" strategy achieves a good balance between BWT and FWT.

In summary, the "Replay" strategy with the Local Outlier Factor model appears to be the most effective for the given tasks based on the provided metrics.

### Conclusion 

The Local Outlier Factor model with the replay strategy demonstrated a substantial improvement in ROC-AUC, achieving a score of 0.846. This represents a significant advancement over one of the baseline models, Isolation Forest, which had an ROC-AUC score of 0.390. Ranking the models based on weighted precision scores from lowest to highest, the order is as follows: One-Class SVM (0.63), Isolation Forest (0.66), HBOS (0.685), COPOD (0.75), Local Outlier Factor (0.78), and the highest performing model, Local Outlier Factor with the replay strategy and continual learning.

For future steps, it would be beneficial to explore additional anomaly detection algorithms and fine-tune hyperparameters to further enhance model performance. The observed success of Local Outlier Factor with the replay strategy may be attributed to its ability to adapt and learn from previous tasks, providing improved anomaly detection capabilities over standalone models.