# Question 4: Model (Con'd)

Fraud is a problem for any bank. Fraud can take many forms, whether it is someone stealing a single credit card, to large batches of stolen credit card numbers being used on the web, or even a mass compromise of credit card numbers stolen from a merchant via tools like credit card skimming devices.

Each of the transactions in the dataset has a field called isFraud. Please build a predictive model to determine whether a given transaction will be fraudulent or not. Use as much of the data as you like (or all of it).

Provide an estimate of performance using an appropriate sample, and show your work.

Please explain your methodology (modeling algorithm/method used and why, what features/data you found useful, what questions you have, and what you would do next with more time)

# Train models

This report builds models of `decision tree`, `logistic regression`, `random forest`, `svm` and `xgboost` (see `.py` scripts in the folder `models`). 

Since training models takes 1-2 days, I performed this task on my terminal. The trained models were saved in the folder of `results`, named `dec_tree`, `log_reg`, `rf`, `svc`, `xgboost`. 

In [None]:
import pickle
from models import decision_tree, logistic_reg, random_forest, svm, xgboost

def train():

    with open('results/preprocessed_data', 'rb') as file:
        data = pickle.load(file)

    results = {
        'dec_tree': decision_tree.run(data),
        'log_reg': logistic_reg.run(data),
        'rf': random_forest.run(data),
        'svm': svm.run(data),
        'xgboost': xgboost.run_optimal(data)
    }

    with open('results/result_dict', 'wb') as file:
        pickle.dump(results, file, protocol=4)

    print(results)

if __name__ == '__main__':
    train()
