# Bank Note Neural Network Project

This project uses a bank note data set to create a model that will distinguish the authenicity of dollar bills. Solving a binary classification problem as we are predicting wether a dollar bill is real or fake. The dataset comes with 4 features: variance,skewness,kurtosis and entropy. These features are calculated by applying mathematical operations over the dollar bill images. The labels are found in the dataframe's class column with 1 representing a fake bill and 0 representing a real bill.

The data set is available on Kaggle at: https://www.kaggle.com/vivekgediya/banknote-authenticationcsv


## Explore the Data

The data will be imported and to view the first few lines below before looking at the description below.

In [16]:
#import the relevant functions
import pandas as pd

#read the data set
data = pd.read_csv('BankNote_Authentication.csv')

#view first few lines of data
data.head()

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [17]:
# Describe the data
print('Dataset stats: \n', data.describe())

Dataset stats: 
           variance     skewness     curtosis      entropy        class
count  1372.000000  1372.000000  1372.000000  1372.000000  1372.000000
mean      0.433735     1.922353     1.397627    -1.191657     0.444606
std       2.842763     5.869047     4.310030     2.101013     0.497103
min      -7.042100   -13.773100    -5.286100    -8.548200     0.000000
25%      -1.773000    -1.708200    -1.574975    -2.413450     0.000000
50%       0.496180     2.319650     0.616630    -0.586650     0.000000
75%       2.821475     6.814625     3.179250     0.394810     1.000000
max       6.824800    12.951600    17.927400     2.449500     1.000000


## Constructing a Neural Network 

One way to perform this binary classification will be construsting a neural network.

This will be created with a single neuron as an output. With an input layer with 4 neurons to represent the 4 features in of variance, skewness, curtosis and entrophy. The model's output represents the probability of a pair of coordinates being in one class or another. With the sigmoid activation function squashing the neuron output of the second to last layer to a floating point number between 0 and 1. The model is then compiled before training using stochastic gradient descent as an optimizer and binary cross-entropy as our loss function. 

In [18]:
# Import the sequential model and dense layer
from keras.models import Sequential
from keras.layers import Dense

# create a sequential model
model = Sequential()

# add the dense layer 
model.add(Dense(1, input_shape=(4,), activation="sigmoid"))

# compile the model
model.compile(loss='binary_crossentropy', optimizer="sgd", metrics=['accuracy'])

# show summary of model
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 1)                 5         
Total params: 5
Trainable params: 5
Non-trainable params: 0
_________________________________________________________________


## Binary Classification Problem

The model is split the data set to create a training set and a testing set with the train_test_split method. 
It will be trained with the .fit() to find real or fake bills. As we are examining the features of the data we do not need the class column and so we will drop it before splitting the data. It is trained for 20 epochs passing our coordinates and labels as parameters.


In [19]:
#import relevant packages 
import numpy as np
from sklearn.model_selection import train_test_split

# drop the class column and assign it to X
X = data.drop('class',axis=1)

# assign class column to y 
y = data['class']

# split data set to a train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# train model for 20 epochs
model.fit(X_train, y_train, epochs = 20)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f90d23c23a0>

The accuracy of the model will be tested with the evaluate() method on the test data where we can see that the model has produced an accuracy of 95% . 

In [20]:
# evaluate accuracy on the test set
accuracy = model.evaluate(X_test, y_test)[1]

# print accuracy
print('Accuracy:', accuracy)

Accuracy: 0.9527272582054138


The results of this project has created a neural network that has over 95% accuracy in predicting if a dollar bill is real or fake.