# Soft SVM Implementation for Determining Credit Card Fraud.

Dataset: [Kaggle-Credit Card Fraud Dataset](https://paperswithcode.com/dataset/kaggle-credit-card-fraud-dataset)

_Data has already undergone PCA and preprocessing to anonymize the data._

The original classes for the data are:

* Fraudulent Transaction: _+1_

* Non-fraudulent Transaction: _0_

In [12]:
import pandas as pd
import numpy as np

Load and modify the class values for SVM.

The modified classes for the data are:

* Fraudulent Transaction: _+1_

* Non-fraudulent Transaction: _-1_

In [13]:
data: pd.DataFrame = pd.read_csv("../creditcard.csv")
data["Class"] = np.where(data["Class"] <= 0, -1, 1)

data.shape

(284807, 31)

Preview the data

In [14]:
data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,-1
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,-1
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,-1
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,-1
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,-1


## Split the Data for Training and Testing

In [15]:
# Split data into training/testing
training_data_mask: np.ndarray = np.random.rand(len(data)) < 0.8

train: pd.DataFrame = data[training_data_mask]
test: pd.DataFrame = data[~training_data_mask]

## Implement Soft-SVM

### Soft-SVM Objective

$$\min_{w,b} \frac{1}{2} ||w||^2 + C \sum_i \xi_i$$

$\text{s.t. } y_i(w^T x_i + b) \geq 1 - \xi_i$

$\xi_i \geq 0$

$\text{for } i \in \{1, 2, ..., N\}$

### Slack variable $\xi$

The misclassified points

$$\xi_i = \max(1 - y_i(w^T x_i + b), 0)$$

* If $\xi_i = 0$, correctly classified point outside of the margin.

* If $0 < \xi_i < 1$, correctly classified point within the margin.

* If $\xi_i \geq 1$, misclassified point.

### Hinge-Loss Function

- Counting the distance of the misclassified points from the margin.

$$\min_{w,b} \frac{1}{2} ||w||^2 + C \sum_i \xi_i$$

In [16]:
def hinge_loss(
    X: np.ndarray, Y: np.ndarray, weights: np.ndarray, bias: float, C: float
) -> float:
    """Computes the loss for the current weights and bias.

    Args:
        X (np.ndarray): The features.
        Y (np.ndarray): The labels (either -1 or 1).
        weights (np.ndarray): The current weights of the model.
        bias (float): The current bias of the model.
        C (float): Soft-SVM hyperparameter for adjusting margin size.

    Returns:
        float: The regularized hinge-loss value for the current model.
    """
    # Calculate the slack variable
    hinge = np.maximum(1 - Y * (np.dot(X, weights) + bias), 0)

    # Return the hinge-loss
    return 0.5 * np.linalg.norm(weights) ** 2 + C * np.sum(hinge)

### Fitting the Weights and Bias

1. Initialize weights and bias to 0

2. Determine the margin, or the distance that each sample is from the margin.
    - $m_i = 1- y_i (w^T x_i + b)$

3. Determine the weight gradient:
    - $\nabla_w = w - \frac{C}{n} \sum_{i, \text{where } m_i > 0}^n y_i x_i$
    - Only use points that are either in the margin or misclassified.

4. Determine the bias gradient:
    - $\nabla_b = - \frac{C}{n} \sum_{i, \text{where } m_i > 0}^n y_i$
    - Only use points that are either in the margin or misclassified.

5. Update the weights and bias:
    - $w \leftarrow w - \eta \nabla_w$
    - $b \leftarrow b - \eta \nabla_b$

- Repeat 2-5 per epoch.



In [17]:
def fit(
    X: np.ndarray,
    Y: np.ndarray,
    learning_rate: float,
    C: float,
    epochs,
) -> tuple[np.ndarray, float]:
    """Fits weights and bias for the input data and labels.

    A linear SVM model.

    Weights and bias are initialized to zero.

    The program does the following for each epoch:
    - Computes the margin given the current weights and biases for all samples.
    - Updates the weights using misclassified points which are determined using the margin.
    - Updates the bias using the misclassified points which are determined using the margin.
    - Display the current hinge loss every 50 epochs.

    Args:
        X (np.ndarray): The features.
        Y (np.ndarray): The labels.
        learning_rate (float): The learning rate.
        C (float): Soft-SVM hyperparameter for adjusting margin size.
        epochs (int): The number of epochs the model will run.

    Returns:
        tuple[np.ndarray, float]: The final weights and bias.
    """

    num_samples, num_features = X.shape

    weights: np.ndarray = np.zeros(num_features)
    bias: float = 0.0

    for epoch in range(epochs):
        # Determine SVM Margins
        margins: np.ndarray = 1 - Y * (np.dot(X, weights) + bias)

        # Determine Gradients - updated using points that are in the margin or misclassified.
        grad_weights: np.ndarray = (
            weights - np.dot(C * (margins > 0) * Y, X) / num_samples
        )
        grad_bias: float = np.sum(-C * (margins > 0) * Y) / num_samples

        # Update Weights and Biases
        weights -= learning_rate * grad_weights
        bias -= learning_rate * grad_bias

        # Print the loss at every 50th epoch
        if epoch % 50 == 0 or epoch == epochs - 1:
            loss: float = hinge_loss(X, Y, weights, bias, C)
            print(f"Epoch {epoch}, Loss = {loss}")

    return weights, bias

### Make Predictions

- Determine the sign of the result of $x$ in $w^T x + b$

- If the sign is _positive_, the prediction is the transaction is **fraudulent**

- If the sign is _negative_, the prediction is the transaction is **not fraudulent**

In [18]:
def predict(X: np.ndarray, weights: np.ndarray, bias: float) -> np.ndarray:
    """Use input weights/bias to predict the label for the input data.

    Args:
        X (np.ndarray): Data used for predictions.
        weights (np.ndarray): Weights used to make predictions.
        bias (float): Bias used to make predictions.

    Returns:
        np.ndarray: The predicted labels for the input data.
    """

    return np.sign(np.dot(X, weights) + bias)

## Train the Model

In [19]:
X_train = train.drop("Class", axis=1).to_numpy()
Y_train = train["Class"].to_numpy()

w, b = fit(X_train, Y_train, learning_rate=0.001, C=0.01, epochs=500)

Epoch 0, Loss = 291541.02071554144
Epoch 50, Loss = 256962.55344048317
Epoch 100, Loss = 224071.3340884018
Epoch 150, Loss = 192785.0334918487
Epoch 200, Loss = 163025.33783914064
Epoch 250, Loss = 134717.75864996642
Epoch 300, Loss = 107791.43880870726
Epoch 350, Loss = 82178.9789103346
Epoch 400, Loss = 57816.27388898845
Epoch 450, Loss = 34642.3431957212
Epoch 499, Loss = 13029.425142910251


## Test the Model

In [20]:
X_test = test.drop("Class", axis=1).to_numpy()
Y_test = test["Class"].to_numpy()

np.mean(Y_test == predict(X_test, w, b))

np.float64(0.9980359147025814)