<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Training-a-Classifier-to-Detect-Fraudulent-Financial-Transactions" data-toc-modified-id="Training-a-Classifier-to-Detect-Fraudulent-Financial-Transactions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Training a Classifier to Detect Fraudulent Financial Transactions</a></span><ul class="toc-item"><li><span><a href="#Creating-the-Test-Set" data-toc-modified-id="Creating-the-Test-Set-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Creating the Test Set</a></span></li><li><span><a href="#Fitting-a-Dummy-Classifier" data-toc-modified-id="Fitting-a-Dummy-Classifier-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Fitting a Dummy Classifier</a></span></li><li><span><a href="#Training-the-model" data-toc-modified-id="Training-the-model-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Training the model</a></span><ul class="toc-item"><li><span><a href="#Logistic-Regression" data-toc-modified-id="Logistic-Regression-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Logistic Regression</a></span></li><li><span><a href="#Neural-Network" data-toc-modified-id="Neural-Network-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Neural Network</a></span></li></ul></li><li><span><a href="#Implementing-the-Predict-Function" data-toc-modified-id="Implementing-the-Predict-Function-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Implementing the Predict Function</a></span></li><li><span><a href="#Testing-the-Classifier" data-toc-modified-id="Testing-the-Classifier-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Testing the Classifier</a></span></li></ul></li></ul></div>

# Training a Classifier to Detect Fraudulent Financial Transactions
First, we load our dataset. Note that due to privacy concerns, all features but for Time and Amount have generic names and were found through [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis).

In [1]:
import pandas as pd
import os
path = "/data/mlproject21" if os.path.exists("/data/mlproject21") else "."
df = pd.read_csv(os.path.join(path, "transactions.csv.zip"))
df.head()

Unnamed: 0,Time,Feature0,Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7,Feature8,...,Feature20,Feature21,Feature22,Feature23,Feature24,Feature25,Feature26,Feature27,Amount,Class
0,12187.0,1.127257,0.170387,1.675702,1.662017,-1.093046,-0.447651,-0.590031,-0.071291,2.015259,...,-0.17089,0.009832,0.066699,0.877103,0.35003,-0.482569,0.052777,0.038219,4.99,0
1,149717.0,-0.723098,-1.307087,1.119492,-2.486829,-1.781857,0.382495,0.221389,-0.02155,-1.964369,...,-0.457047,-0.980797,0.006519,-0.488353,0.336841,-0.335149,-0.027314,-0.226558,299.0,0
2,72288.0,1.357358,-0.802677,1.135552,-0.490788,-1.672022,-0.509976,-1.192288,0.044009,-0.0016,...,0.299976,0.911314,-0.096112,0.427782,0.436567,-0.044389,0.045821,0.024275,10.0,0
3,168435.0,1.891806,-0.123111,-1.791275,0.342303,0.235308,-0.723866,0.0828,-0.114664,0.767009,...,-0.186402,-0.408728,0.091938,-0.518938,-0.082267,-0.088265,-0.015244,-0.022998,61.19,0
4,55416.0,-0.378806,0.449422,0.154983,-0.89931,-0.678177,-1.419243,0.130648,0.087002,0.509679,...,0.053215,0.266919,0.514422,0.4122,-1.279881,-0.166479,0.071403,0.150137,27.75,0


## Creating the Test Set
We perform a split into a train and test set:

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop(columns = "Class"),
                                                    df["Class"],
                                                    test_size = 0.2,
                                                    stratify = df["Class"])
print(f"{y_train.size} train samples\n {y_test.size} test samples")

182276 train samples
 45569 test samples


## Fitting a Dummy Classifier
For now, we use a dummy classifier where no matter the input, the probability of reporting a fraudulent transaction is always equal to the ratio of fraudulent transactions in our training set.

In [3]:
from sklearn.dummy import DummyClassifier
dummy = DummyClassifier().fit(X_train, y_train)

## Training the model

In [4]:
import pickle

def save_model(model, filename='model.sav'):
    pickle.dump(model, open(filename, 'wb'))
    
def load_model(filename='model.sav'):
    model = pickle.load(open(filename, 'rb'))
    return model

In [5]:
import numpy as np

from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from imblearn.over_sampling import SMOTE

### Logistic Regression

In [6]:
class Classifier:
    """
    Basic classifier defining methods train and predict.
    This class does no classifications. Use Sub-classes instead
    """
    def train(self, X, y):
        pass

    def predict(self, X):
        pass

In [7]:
from tqdm.notebook import trange

class LogisticRegression(Classifier):

    def __init__(self):
        self.w = None

    def sigmoid(self, z):
        """
        Function that computes the sigmoid of the input values.

        :param z: input values
        :returns: sigmoid values for each input value
        """

        return 1 / (1 + np.exp(-z))

    def loss_function_gradient(self, w, x, y):
        """
        Function that computes the empirical loss for a logistic regression model.

        :param w: Weights vector
        :param x: Training input data
        :param y: Training target labels
        :returns: gradient of the loss
        """

        N = y.shape[0]

        f_x = self.sigmoid(x @ w)

        gradient = np.dot(x.T, (f_x - y)) / N 

        return gradient

    def batch_gradient_descent(self, x, y, alpha=0.01, num_steps=2000):
        """
        Implementation of the gradient descent algorithm for logistic regression

        :param: x: Training input data
        :param: y: Training target labels
        :param: alpha: Scalar learning rate
        :param: num_steps: Number of gradient descent steps
        :returns: weight vector 'w'
        """

        # Initialize the weights to zero
        w = np.zeros((x.shape[1]))

        for i in trange(num_steps):
            w = w - alpha * self.loss_function_gradient(w, x, y)

        return w

    def train(self, X, y):
        self.w = self.batch_gradient_descent(X, y)
        print("Finished training")

    def predict(self, x):
        """
        Assign input to a class using the logistic regression model.

        :param: w: Weight vector
        :param: x: Test input data
        :returns: Predicted class labels (0 or 1)
        """

        if self.w is None:
            print("Weights not specified. Train the model first")
            return

        f_x = self.sigmoid(x @ self.w)
        predictions = np.round(f_x)

        return predictions

In [8]:
def train_LogisticRegression():
    """
    Trains a LogisticRegression model using RandomOverSampler and the LogisticRegression class.
    :returns: the standardScaler and the trained model
    """
    clf = LogisticRegression()
    standardScaler = StandardScaler()

    X, y = RandomUnderSampler().fit_resample(X_train, y_train)
    X = standardScaler.fit_transform(X, y)

    clf.train(X, y)
    
    return standardScaler, clf

# UNCOMMENT this line if you want to train the logistic regression model (Neural Network is preferred) 
# scaler, clf = train_LogisticRegression()

In [9]:
# save standardscaler and model

# SCALER_NAME = 'scaler.sav'

# save_model(scaler, filename=SCALER_NAME)
# save_model(clf)

### Neural Network

In [10]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Check https://pytorch.org/docs/stable/notes/randomness.html#reproducibility
torch.manual_seed(123)
print("gpu available:", torch.cuda.is_available())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

gpu available: False


  return torch._C._cuda_getDeviceCount() > 0


In [11]:
# Define hyperparameters
LEARNING_RATE = 0.0002
INPUT_SIZE = 30
HIDDEN_SIZE = 11
OUTPUT_SIZE = 1
NUM_EPOCHS = 2
BATCH_SIZE = 128

MODEL_NAME = 'model.pt'
SCALER_NAME = 'scaler.sav'

In [12]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.output_size = output_size
        
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Flatten the input x keeping the batch dimension the same
        # Use the relu activation functions 
        # Pass x through functions but do not apply any activation function
        
        
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x  # Return x (logits)

In [13]:
def binary_acc(y_pred, y_test):
    """
    Calculates the accuracy of the predicted values y_pred in comparison to y_test.
    :returns: the accuracy
    """
    predictions = torch.round(torch.sigmoid(y_pred))

    correct_results_sum = (predictions == y_test).sum().float()
    acc = correct_results_sum / y_test.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

In [14]:
from tqdm.notebook import tqdm

def train_neural_network_pytorch(net, train_loader, optimizer, criterion, num_epochs):
    """
    Function for training the PyTorch network.
    
    :param net: the neural network object
    :param inputs: numpy array of training data values
    :param labels: numpy array of training data labels 
    :param optimizer: PyTorch optimizer instance
    :param criterion: PyTorch loss function
    :param iterations: number of training steps
    """    
    net.train()  # Before training, set the network to training mode
    
    for epoch in range(num_epochs):
        epoch_loss = 0.0
        epoch_acc = 0.0
        for batch_idx, (inputs, labels) in enumerate(tqdm(train_loader)):
            inputs = inputs.to(device)
            labels = labels = labels.unsqueeze(1).to(device)

            # 1. Zero parameter gradients
            # 2. Forward
            # 3. Compute loss
            # 4. Backward
            # 5. Update step

            optimizer.zero_grad()
            outputs = net.forward(inputs)
            loss = criterion(outputs, labels)
            # calculate current accuracy
            acc = binary_acc(outputs, labels)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()
            epoch_acc += acc.item()
            if batch_idx % 1000 == 999:
                print(f'Loss: {loss.item():.5f}')
            
        print(f'Epoch {epoch}: | Loss: {epoch_loss/len(train_loader):.5f} | Acc: {epoch_acc/len(train_loader):.3f}')
        
    print('Finished Training')

In [15]:
# Initialize the network
net = Net(INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE)
net = net.to(device)

# Define the loss criterion and the training algorithm
criterion = nn.BCEWithLogitsLoss().to(device)  # binary cross entropy

# Using Adam optimizer instead of SGD
# Adam was faster converging to the (nearly) optimum
# Adam Optimizer does not need a momentum
optimizer = optim.Adam(net.parameters(), lr=LEARNING_RATE)

In [16]:
import torch
from torch.utils.data import Dataset

class trainData(Dataset):
    """
    Basic class to store a dataset as torch-Dataset that can be used for torch epochs.
    """
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

In [17]:
from torch.utils.data import DataLoader

def train_nn():
    """
    Preprocesses data, transforms data to torch datasets and starts training the NN-model.
    """
    X, y = SMOTE(random_state=12).fit_resample(X_train.values, y_train.values.ravel())

    scaler = StandardScaler()
    X = scaler.fit_transform(X, y)

    train_data = trainData(torch.FloatTensor(X), 
                           torch.FloatTensor(y))
    
    train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
    
    # Train the PyTorch network
    train_neural_network_pytorch(net, train_loader, optimizer, criterion, NUM_EPOCHS)
    
    return scaler
    
scaler = train_nn()

HBox(children=(FloatProgress(value=0.0, max=2844.0), HTML(value='')))

Loss: 0.15573
Loss: 0.05282

Epoch 0: | Loss: 0.19045 | Acc: 92.044


HBox(children=(FloatProgress(value=0.0, max=2844.0), HTML(value='')))

Loss: 0.01943
Loss: 0.05768

Epoch 1: | Loss: 0.06137 | Acc: 97.753
Finished Training


In [18]:
# save standardScaler
save_model(scaler, filename=SCALER_NAME)
# save nn-model
torch.save(net, MODEL_NAME)

## Implementing the Predict Function
You will have to implement the predict function that we will run in order to check the performance of your model on a secret test set. The better you perfrom, the higher you'll be on the leaderboard! For now, our solution is to use our dummy classifier to make predictions based on the input values.

Note that this predict function should return the value of the decision function of your model for each transaction in `values`. In the probabilistic case, these are the probabilities that the input values are of target class 1 (i.e. fraud). The higher the value of the decision function, the more likely that a transaction is fraudulent.

In [19]:
from sklearn.preprocessing import StandardScaler

# standardscalar for scaling the data (preprocessing)
scaler = load_model(filename=SCALER_NAME)
net = torch.load(MODEL_NAME, map_location='cpu')

def leader_board_predict_fn(values):
    
    decision_function_values = np.zeros(values.shape[0])
    
    # YOUR CODE HERE
    X = scaler.transform(values.values.astype(np.float32))
    
    net.eval()
    
    # Make predictions (class 0 or 1) using the learned parameters
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    X = torch.tensor(X)
    logits = net(X)
    # class 0 if < 0.5, class 1 if >= 0.5 and <= 1
    predictions = torch.round(torch.sigmoid(logits))
    
    decision_function_values = predictions.int().numpy()
    decision_function_values = decision_function_values.ravel()
    
    return decision_function_values

In [20]:
# UNCOMMENT the cell and comment the cell above if you want to test LogisticRegression

# from sklearn.preprocessing import StandardScaler

# # standardscalar for scaling the data (preprocessing)
# scaler = load_model(filename=SCALER_NAME)
# clf = load_model()

# def leader_board_predict_fn(values):
#     # YOUR CODE HERE
#     values = scaler.transform(values)
#     return clf.predict(values)

## Testing the Classifier
To measure the classifier's performance on the test set, we will use the [ROC AUC score](https://scikit-learn.org/stable/modules/model_evaluation.html#roc-metrics). The best possible score is 1.

In [21]:
### LEADER BOARD TEST
from sklearn.metrics import roc_auc_score
score = roc_auc_score(y_test, leader_board_predict_fn(X_test))
print(f"Leaderboard Score: {score}")
### LEADER BOARD TEST

Leaderboard Score: 0.94423409234468
