# Machine Learning

# Exploring Banknote Authentication

## Tensorflow - Classification

## Introduction

The objective is to develop a classification model using the Bank Authentication dataset and Tensorflow

The data set has 5 columns:

- Image.Var: variance of Wavelet Transformed image (continuous)
- Image.Skew: skewness of Wavelet Transformed image (continuous)
- Image.Curt: curtosis of Wavelet Transformed image (continuous)
- Entropy: entropy of image (continuous)
- Class: class (integer)

The Bank Authentication dataset contains two classes

- Authentic banknote
- Inauthentic banknote

##### Source: Pierian Data

## Analysis

Import libraries

In [65]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import tensorflow as tf

Load the Bank Authentication data set

In [66]:
df_bank = pd.read_csv("bank_note_data.csv")

Explore header/first 5 rows

In [44]:
df_bank.head()

Unnamed: 0,Image.Var,Image.Skew,Image.Curt,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


Explore dimensions and missing data

In [45]:
df_bank.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1372 entries, 0 to 1371
Data columns (total 5 columns):
Image.Var     1372 non-null float64
Image.Skew    1372 non-null float64
Image.Curt    1372 non-null float64
Entropy       1372 non-null float64
Class         1372 non-null int64
dtypes: float64(4), int64(1)
memory usage: 53.7 KB


In [46]:
df_bank["Class"].value_counts()

0    762
1    610
Name: Class, dtype: int64

Columns should be formatted in snake format, there are not missing values and only two target values to classify

In [47]:
df_bank.columns

Index(['Image.Var', 'Image.Skew', 'Image.Curt', 'Entropy', 'Class'], dtype='object')

In [48]:
df_bank.columns = df_bank.columns.str.lower()
df_bank.columns = df_bank.columns.str.replace(".","_")

In [49]:
df_bank.columns

Index(['image_var', 'image_skew', 'image_curt', 'entropy', 'class'], dtype='object')

Define training and testing sets

In [50]:
X = df_bank[["image_var", "image_skew", "image_curt", "entropy"]]

In [51]:
y = df_bank["class"]

Data set = Train (70%) & Test (30%)

In [52]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

## Train Model

Define feature columns of type numeric using Tensorflow

In [53]:
feature_columns = []

In [54]:
for col in X.columns:
    feature_columns.append(tf.feature_column.numeric_column(col))

In [55]:
feature_columns

[_NumericColumn(key='image_var', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='image_skew', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='image_curt', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='entropy', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Define an input function

In [56]:
input_fn = tf.estimator.inputs.pandas_input_fn(x=X_train, y=y_train, batch_size=20, num_epochs=100, shuffle=True)

Because there are two target values to classify, n_classes is equal to 2 and the structure of the neural network should be tuned, actually it has three hidden layers, 10 neurons each 

In [57]:
estimator = tf.estimator.DNNClassifier(hidden_units=[10,10,10],n_classes=2,feature_columns=feature_columns)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/z6/vzs6bsnx4hn2qr27y9pzrrmw0000gn/T/tmp4yls9s2y', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0xb313d7160>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [58]:
estimator.train(input_fn=input_fn,steps=30)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/z6/vzs6bsnx4hn2qr27y9pzrrmw0000gn/T/tmp4yls9s2y/model.ckpt.
INFO:tensorflow:loss = 20.002594, step = 1
INFO:tensorflow:Saving checkpoints for 30 into /var/folders/z6/vzs6bsnx4hn2qr27y9pzrrmw0000gn/T/tmp4yls9s2y/model.ckpt.
INFO:tensorflow:Loss for final step: 0.3022642.


<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0xb313d7cc0>

## Test Model

Define an input function

In [59]:
prediction_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

In [60]:
predictions = list(estimator.predict(input_fn=prediction_fn))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/z6/vzs6bsnx4hn2qr27y9pzrrmw0000gn/T/tmp4yls9s2y/model.ckpt-30
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Review the predictions

In [61]:
predictions

[{'logits': array([-10.277134], dtype=float32),
  'logistic': array([3.4409826e-05], dtype=float32),
  'probabilities': array([9.9996555e-01, 3.4409826e-05], dtype=float32),
  'class_ids': array([0]),
  'classes': array([b'0'], dtype=object)},
 {'logits': array([2.3437798], dtype=float32),
  'logistic': array([0.9124386], dtype=float32),
  'probabilities': array([0.08756146, 0.9124386 ], dtype=float32),
  'class_ids': array([1]),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([2.0754838], dtype=float32),
  'logistic': array([0.8884974], dtype=float32),
  'probabilities': array([0.11150261, 0.8884974 ], dtype=float32),
  'class_ids': array([1]),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([-13.092796], dtype=float32),
  'logistic': array([2.0600128e-06], dtype=float32),
  'probabilities': array([9.999980e-01, 2.060013e-06], dtype=float32),
  'class_ids': array([0]),
  'classes': array([b'0'], dtype=object)},
 {'logits': array([-8.068062], dtype=float32),
 

## Results

Confusion matrix and classification report will be used in order to evaluate the results

In [62]:
predictions_single_vals = []

for prediction in predictions:
    predictions_single_vals.append(prediction["class_ids"][0])

In [63]:
print(confusion_matrix(y_test,predictions_single_vals))

[[230   1]
 [  2 179]]


409 correct predictions of a total of 412, 1 and 2 incorrect predictions respectively

In [64]:
print(classification_report(y_test,predictions_single_vals))

              precision    recall  f1-score   support

           0       0.99      1.00      0.99       231
           1       0.99      0.99      0.99       181

   micro avg       0.99      0.99      0.99       412
   macro avg       0.99      0.99      0.99       412
weighted avg       0.99      0.99      0.99       412



As a result a f1 value of 99% which represents a high accuracy of the model