<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Financial-Transactions" data-toc-modified-id="Financial-Transactions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Financial Transactions</a></span><ul class="toc-item"><li><span><a href="#The-Leaderboard-Predict-function" data-toc-modified-id="The-Leaderboard-Predict-function-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>The Leaderboard Predict function</a></span></li><li><span><a href="#Testing-your-Implementation" data-toc-modified-id="Testing-your-Implementation-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Testing your Implementation</a></span></li></ul></li></ul></div>

# Financial Transactions

The ability to identify fraudulent transactions is of great interest to the payments industry. In this notebook, you will make use of the binary classifier you trained on the transcations dataset to detect fraud.

## The Leaderboard Predict function
Replace the comment and `NotImplementedError` in the `leader_board_predict_fn` with code that loads your model parameters and returns the likelyhood of fraud for each transaction (i.e. row) in the values dataframe. Note that the returned array should contain a single decision function value for each transaction, indicating whether the transaction is fraudulent (i.e. it belongs to target class $1$). The higher the decision function value, the more likely that the transaction is fraud.
You can import the packages you require.

In [1]:
# Added Cell
import pickle

def save_model(model, filename='model.sav'):
    pickle.dump(model, open(filename, 'wb'))
    
def load_model(filename='model.sav'):
    model = pickle.load(open(filename, 'rb'))
    return model

In [2]:
# Define hyperparameters
LEARNING_RATE = 0.0002
MOMENTUM = 0.92
INPUT_SIZE = 30
HIDDEN_SIZE = 11
OUTPUT_SIZE = 1
NUM_EPOCHS = 10
BATCH_SIZE = 128

MODEL_NAME = 'model.pt'
SCALER_NAME = 'scaler.sav'

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.output_size = output_size
        
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Flatten the input x keeping the batch dimension the same
        # Use the relu activation on the output of self.fc1(x)
        # Use the relu activation on the output of self.fc2(x)
        # Use the relu activation on the output of self.fc3(x)
        # Pass x through fc4 but do not apply any activation function
        
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x  # Return x (logits)

In [4]:
class Classifier:
    """
    Basic classifier defining methods train and predict.
    This class does no classifications. Use Sub-classes instead
    """
    def train(self, X, y):
        pass

    def predict(self, X):
        pass

In [5]:
class LogisticRegression(Classifier):

    def __init__(self):
        self.w = None

    def sigmoid(self, z):
        """
        Function that computes the sigmoid of the input values.

        :param z: input values
        :returns: sigmoid values for each input value
        """

        return 1 / (1 + np.exp(-z))

    def train(self, X, y):
        print("Train in _training")

    def predict(self, x):
        """
        Assign input to a class using the logistic regression model.

        :param: w: Weight vector
        :param: x: Test input data
        :returns: Predicted class labels (0 or 1)
        """

        if self.w is None:
            print("Weights not specified. Train the model first")
            return

        f_x = self.sigmoid(x @ self.w)
        predictions = np.round(f_x)

        return predictions

In [6]:
from sklearn.preprocessing import StandardScaler

# standardscalar for scaling the data (preprocessing)
scaler = load_model(filename=SCALER_NAME)
net = torch.load(MODEL_NAME, map_location='cpu')

def leader_board_predict_fn(values):
    
    decision_function_values = np.zeros(values.shape[0])
    
    # YOUR CODE HERE
    X = scaler.transform(values.values.astype(np.float32))
    
    net.eval()
    
    # Make predictions (class 0 or 1) using the learned parameters
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    X = torch.tensor(X)
    logits = net(X)
    # class 0 if < 0.5, class 1 if >= 0.5 and <= 1
    predictions = torch.round(torch.sigmoid(logits))
    
    decision_function_values = predictions.int().numpy()
    decision_function_values = decision_function_values.ravel()
    
    return decision_function_values

In [7]:
# UNCOMMENT the cell and comment the cell above if you want to test LogisticRegression

# from sklearn.preprocessing import StandardScaler

# # standardscalar for scaling the data (preprocessing)
# scaler = load_model(filename=SCALER_NAME)
# clf = load_model()

# def leader_board_predict_fn(values):
#     # YOUR CODE HERE
#     values = scaler.transform(values)
#     return clf.predict(values)

## Testing your Implementation
Your model should return the probability or decision function value that indicates the likelyhood of fraud for each input transaction. To verify that this is the case, we run your model on a subset of the transactions dataset it was trained on. There is a hidden cell that performs the actual test on the unseen test set and computes your score for the leaderboard using the [ROC AUC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html) score.

In [8]:
import pandas as pd
import numpy as np
import os
from sklearn.metrics import roc_auc_score
import pathlib

try:
    path = "/data/mlproject21" if os.path.exists("/data/mlproject21") else "."
    test_data = pd.read_csv(os.path.join(path, "transactions.csv.zip"))
    X_test = test_data.drop(columns = "Class")
    y_test = test_data["Class"]
    decision_function_values = leader_board_predict_fn(X_test)
    assert decision_function_values.shape == (X_test.shape[0],)
    dataset_score = roc_auc_score(y_test, decision_function_values)
    assert dataset_score >= 0.0 and dataset_score <= 1.0
except Exception:
    dataset_score = np.float("nan")
print(f"Train Dataset Score: {dataset_score}")


Train Dataset Score: 0.949058643679086
