# Paysim

From Kaggle's site **[Synthetic Financial Datasets For Fraud Detection](https://www.kaggle.com/ntnu-testimon/paysim1)**

>PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

>This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.

The following notebook rewrites the original using the ```mlrun``` package. At the end of this demonstration we will have a **deployed fraud detection service**.

## Data Description

**This is a sample of 1 row with headers explanation:**

1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0

**step** - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).

**type** - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

**amount** - amount of the transaction in local currency.

**nameOrig** - customer who started the transaction

**oldbalanceOrg** - initial balance before the transaction

**newbalanceOrig** - new balance after the transaction

**nameDest** - customer who is the recipient of the transaction

**oldbalanceDest** - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

**newbalanceDest** - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

**isFraud** - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

**isFlaggedFraud** - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

In [2]:
from os import path, makedirs
import pandas as pd
import numpy as np
import pyarrow.parquet as pq
import pyarrow as pa

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns

In [3]:
target_path = path.join('/v3io', 'bigdata', 'parquet', 'paysim')

if not path.isdir(target_path):
    makedirs(target_path)

### timing a pandas read, parquet write

In [3]:
%%time
paysim = pd.read_csv('/v3io/bigdata/csv/PS_20174392719_1491204439457_log.csv.zip')

CPU times: user 11.6 s, sys: 1.47 s, total: 13.1 s
Wall time: 13.2 s


In [4]:
paysim.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [7]:
%%time
pq.write_table(
    pa.Table.from_pandas(paysim),
    path.join(target_path, 'paysim'))

CPU times: user 3.64 s, sys: 442 ms, total: 4.08 s
Wall time: 5.66 s


### timing a parquet read

In [4]:
%%time
paysim = pq.read_table(path.join(target_path, 'paysim')).to_pandas()

CPU times: user 5.37 s, sys: 2.52 s, total: 7.88 s
Wall time: 6.35 s


In [5]:
paysim.dtypes

step                int64
type               object
amount            float64
nameOrig           object
oldbalanceOrg     float64
newbalanceOrig    float64
nameDest           object
oldbalanceDest    float64
newbalanceDest    float64
isFraud             int64
isFlaggedFraud      int64
dtype: object

In [6]:
paysim.isnull().values.any()

False

In [7]:
paysim.groupby(['step']).sum()

Unnamed: 0_level_0,amount,oldbalanceOrg,newbalanceOrig,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
step,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,2.854292e+08,2.392695e+09,2.449811e+09,2.019706e+09,2.670425e+09,16,0
2,8.592160e+07,1.207902e+09,1.230152e+09,9.204119e+08,9.555345e+08,8,0
3,4.329388e+07,4.259244e+08,4.431034e+08,5.542691e+08,5.603481e+08,4,0
4,7.291003e+07,6.151320e+08,6.100317e+08,7.106773e+08,7.800897e+08,10,0
5,4.554809e+07,7.416566e+08,7.632521e+08,7.214833e+08,7.208779e+08,6,0
...,...,...,...,...,...,...,...
739,1.658783e+07,1.658783e+07,0.000000e+00,8.510574e+06,1.680449e+07,10,0
740,7.632964e+06,7.632964e+06,0.000000e+00,2.930014e+06,6.746496e+06,6,0
741,8.782899e+07,1.705272e+08,8.837274e+07,4.420524e+06,4.549775e+07,22,1
742,1.432374e+07,1.432374e+07,0.000000e+00,1.875033e+06,9.036900e+06,14,0
