# Credit Card Fraud Detection::

Download dataset from this link:

https://www.kaggle.com/mlg-ulb/creditcardfraud

# Description about dataset::

The datasets contains transactions made by credit cards in September 2013 by european cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. 


### Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

# WORKFLOW :

1.Load Data

2.Check Missing Values ( If Exist ; Fill each record with mean of its feature )

3.Standardized the Input Variables. 

4.Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).

5.Model : input Layer (No. of features ), 3 hidden layers including 10,8,6 unit & Output Layer with activation function relu/tanh (check by experiment).

6.Compilation Step (Note : Its a Binary problem , select loss , metrics according to it)

7.Train the Model with Epochs (100).

8.If the model gets overfit tune your model by changing the units , No. of layers , epochs , add dropout layer or add Regularizer according to the need .

9.Prediction should be > 92%
10.Evaluation Step
11Prediction


# Task::

## Identify fraudulent credit card transactions.

In [6]:
from google.colab import files 

uploaded = files.upload()


Saving creditcard.csv to creditcard (1).csv


In [7]:
import pandas as pd
import numpy as np

In [8]:
dataframe = pd.read_csv('creditcard.csv')
dataframe.shape

(5052, 31)

In [9]:
dataframe.isnull().any()

Time      False
V1        False
V2        False
V3        False
V4        False
V5        False
V6        False
V7        False
V8        False
V9        False
V10       False
V11       False
V12       False
V13       False
V14        True
V15        True
V16        True
V17        True
V18        True
V19        True
V20        True
V21        True
V22        True
V23        True
V24        True
V25        True
V26        True
V27        True
V28        True
Amount     True
Class      True
dtype: bool

In [10]:
dataframe.head().T

Unnamed: 0,0,1,2,3,4
Time,0.0,0.0,1.0,1.0,2.0
V1,-1.359807,1.191857,-1.358354,-0.966272,-1.158233
V2,-0.072781,0.266151,-1.340163,-0.185226,0.877737
V3,2.536347,0.16648,1.773209,1.792993,1.548718
V4,1.378155,0.448154,0.37978,-0.863291,0.403034
V5,-0.338321,0.060018,-0.503198,-0.010309,-0.407193
V6,0.462388,-0.082361,1.800499,1.247203,0.095921
V7,0.239599,-0.078803,0.791461,0.237609,0.592941
V8,0.098698,0.085102,0.247676,0.377436,-0.270533
V9,0.363787,-0.255425,-1.514654,-1.387024,0.817739


In [11]:
dataframe_shuffled = pd.DataFrame(np.random.permutation(dataframe), columns = dataframe.columns)

In [12]:
target_labels = dataframe_shuffled['Class'].copy()

In [13]:
data = dataframe_shuffled.drop('Class', axis = 1).copy()

In [14]:


train_len = len(dataframe)*7//10
test_len = len(dataframe) - train_len

In [15]:
train_data = data[0 : train_len].copy()
test_data = data[train_len : train_len + test_len].copy()

In [16]:
train_labels = target_labels[0 : train_len].copy()
test_labels = target_labels[train_len : train_len + test_len].copy()

In [17]:


mean = np.mean(train_data, axis = 0)
std = np.std(train_data, axis = 0)

train_data -= mean
train_data /= std

test_data -= mean
test_data /= std

In [18]:
train_data = np.array(train_data).astype('float32')
train_labels = np.array(train_labels).astype('float32')

In [19]:
test_data = np.array(test_data).astype('float32')
test_labels = np.array(test_labels).astype('float32')

In [20]:
len_partial_train = len(dataframe)*5//10
len_validation = train_len - len_partial_train


In [21]:
partial_train_data = train_data[0 : len_partial_train].copy()
val_data = train_data[len_partial_train : ].copy()

In [22]:
partial_train_labels = train_labels[0 : len_partial_train].copy()
val_labels = train_labels[len_partial_train : ].copy()

In [23]:
partial_train_data.shape

(2526, 30)

In [24]:
data.shape

(5052, 30)

In [25]:
train_data.shape

(3536, 30)

In [26]:
from keras import models 
from keras import layers
from keras import regularizers

modelCr = models.Sequential()

modelCr.add(layers.Dense(32, activation = 'relu', input_shape = (train_data.shape[-1],)))

modelCr.add(layers.Dense(10, activation = 'relu'))

modelCr.add(layers.Dense(8, activation = 'relu'))

modelCr.add(layers.Dense(6, activation = 'relu'))

modelCr.add(layers.Dense(1, activation = 'sigmoid'))

In [28]:
from keras import metrics
import tensorflow as tf
import keras


modelCr.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [29]:
modelCr.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 32)                992       
_________________________________________________________________
dense_1 (Dense)              (None, 10)                330       
_________________________________________________________________
dense_2 (Dense)              (None, 8)                 88        
_________________________________________________________________
dense_3 (Dense)              (None, 6)                 54        
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 7         
Total params: 1,471
Trainable params: 1,471
Non-trainable params: 0
_________________________________________________________________


In [30]:
modelCr.fit(partial_train_data,
            partial_train_labels,
            epochs = 20,
            batch_size = 512,
            verbose = 1,
            validation_data = (val_data,val_labels))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fbd613ee7d0>

In [31]:

modelCr = models.Sequential()

modelCr.add(layers.Dense(32, activation = 'relu', input_shape = (train_data.shape[-1],)))

modelCr.add(layers.Dense(10, activation = 'relu'))
modelCr.add(layers.Dense(8, activation = 'relu'))
modelCr.add(layers.Dense(6, activation = 'relu'))

modelCr.add(layers.Dense(1, activation = 'sigmoid'))


modelCr.compile(optimizer = 'rmsprop',
                loss = 'binary_crossentropy',
                metrics = ['accuracy'])

modelCr.fit(train_data, train_labels, epochs = 5, batch_size = 512, verbose = 0)

modelCr.evaluate(test_data, test_labels)



[nan, 0.9993403553962708]