Split the dataset into training, validation, and testing sets.  

Design and train a deep autoencoder with multiple layers to encode the input data and decode it back to its original  form.  

Use the trained autoencoder to detect anomalies in the test set by comparing the input and output data and calculating reconstruction error.  

Evaluate the performance of the autoencoder by measuring the accuracy, precision, recall, and F1 score of the anomaly detection.  

Discuss the limitations and potential applications of deep autoencoders for anomaly detection in credit card transactions and other domains.

In [26]:
import tensorflow as tf
from tensorflow.keras import layers, models
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import  StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [11]:
df = pd.read_csv('creditcard.csv')
display(df.head())

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [15]:
#
print("Class Values:", df['Class'].unique())
print("Amount Range:", min(df['Amount'].unique()), ":", max(df['Amount'].unique()) )
print("Time Range:", min(df['Time'].unique()), ":",max(df['Time'].unique()) )

Class Values: [0 1]
Amount Range: 0.0 : 25691.16
Amount Range: 0.0 : 172792.0


In [81]:
# X = df.drop(columns=['Class'])  # Features
y = df['Class']  # Labels

scaler = StandardScaler()
# X = scaler.fit_transform(X.astype('float32'))
X = scaler.fit_transform(df.drop(columns=['Class']).astype('float32'))


# Split the data into training, validation, and test sets (70% train, 15% validation, 15% test)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Convert the datasets into TensorFlow datasets
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, X_train))
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, X_val))
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, X_test))

# Batch the datasets
batch_size = 32
train_dataset = train_dataset.batch(batch_size)
val_dataset = val_dataset.batch(batch_size)
test_dataset = test_dataset.batch(batch_size)

In [82]:
# define the model
input_dim = X_train.shape[1]  # Number of features
print("number of features", input_dim)

autoencoder = models.Sequential([
    layers.Input(shape=(input_dim,)),
    layers.Dense(32, activation='relu'),  # Encoder layers
    layers.Dense(16, activation='relu'),
    layers.Dense(8, activation='relu'),   # Bottleneck layer
    layers.Dense(16, activation='relu'),  # Decoder layers
    layers.Dense(32, activation='relu'),
    layers.Dense(input_dim, activation='sigmoid')  # Output layer
])

number of features 30


In [83]:
autoencoder.compile(optimizer='adam', loss=['mse'])
autoencoder.summary()

# Train the model
autoencoder.fit(X_train, X_train,
                epochs=15,
                batch_size=256,
                shuffle=True,
                validation_data=(X_test, X_test))

Epoch 1/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 460us/step - loss: 1.0082 - val_loss: 0.8330
Epoch 2/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 421us/step - loss: 0.8604 - val_loss: 0.7958
Epoch 3/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 415us/step - loss: 0.8301 - val_loss: 0.7811
Epoch 4/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 409us/step - loss: 0.8220 - val_loss: 0.7748
Epoch 5/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 405us/step - loss: 0.8134 - val_loss: 0.7695
Epoch 6/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 407us/step - loss: 0.7895 - val_loss: 0.7661
Epoch 7/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 402us/step - loss: 0.7918 - val_loss: 0.7619
Epoch 8/15
[1m779/779[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 400us/step - loss: 0.7993 - val_loss: 0.7590
Epoch 9/15
[1m779/779[

<keras.src.callbacks.history.History at 0x31c263f40>

In [84]:
# Predict the reconstruction of the input
reconstructions = autoencoder.predict(test_dataset)

# Calculate the reconstruction error
mse = np.mean(np.power(X_test - reconstructions, 2), axis=1)
print('reconstruction error: ', mse)

# Set a threshold for anomaly detection (e.g., 95th percentile of MSE)
threshold = np.percentile(mse, 95)

# Predict anomalies
anomalies = mse > threshold


# Binary labels for evaluation
y_pred = anomalies.astype(int)
y_true = y_test.values

# Calculate evaluation metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print(f'Accuracy: {accuracy:.4f}')
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1 Score: {f1:.4f}')

[1m1336/1336[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 260us/step
reconstruction error:  [1.1148175  0.39986694 0.23857005 ... 0.21938515 0.13780019 1.3256209 ]
Accuracy: 0.9511
Precision: 0.0253
Recall: 0.9153
F1 Score: 0.0492


### Limitations
Data Imbalance, such as Credit card fraud datasets, often have imbalanced classes, which can affect the autoencoder's ability to distinguish between normal and anomalous data.

Interpretability: Autoencoders are black-box models, making it hard to interpret the reasons behind certain anomaly detections.

The performance of deep autoencoders heavily depends on the choice of hyperparameters. Poorly chosen hyperparameters can lead to underfitting or overfitting, reducing the model's ability to detect anomalies.

### Potential Applications
Autoencoders can capture the complex, non-linear relationships in the transaction data, helping to identify unusual patterns that may indicate fraud. This can help reduce false positives and improved detection rates compared to simpler models.

Similarly, by monitoring network traffic to detect malicious activities such as unauthorized access or data breaches, autoencoders can model the normal traffic patterns and identify deviations, which may indicate potential security threats.This will help proactive enhanced security by detecting intrusions in real-time or near real-time.