In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import os

Banknote Authentication Dataset
--------------------------------------------

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.


Attribute Information:

1. variance of Wavelet Transformed image (continuous)
2. skewness of Wavelet Transformed image (continuous)
3. curtosis of Wavelet Transformed image (continuous)
4. entropy of image (continuous)
5. class (integer)

This is a copy of the UCI Machine Learning banknote authentication dataset. https://archive.ics.uci.edu/ml/datasets/banknote+authentication

In [2]:
# Read the CSV file into a pandas DataFrame

notes = pd.read_csv('../Resources/data_banknote_authentication.csv', header=None, names=['variance','skewness','curtosis', 'entropy', 'class'])
notes.head()

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [3]:
# Assign the data to X and y

X = notes[["variance", "skewness", "curtosis", "entropy"]]
y = notes["class"]

Split the data into training and testing sets

In [5]:
# Split the data by using train_test_split()

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

Create a logistic regression model

In [4]:
# Create a logistic regression model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()

Fit (train) our model by using the training data

In [6]:
# Fit the model to the data
classifier.fit(X_train, y_train)

LogisticRegression()

Validate the model by using the test data

In [7]:
# Print the accuracy score for the test data
classifier.score(X_test, y_test)

0.9912536443148688

Make predictions

In [9]:
# Make predictions by using the X_test and y_test data
# Print at least 10 predictions vs. their actual labels
predictions = classifier.predict(X_test)
preds_df = pd.DataFrame({"Prediction": predictions, "Actual": y_test})
preds_df.head(15)

Unnamed: 0,Prediction,Actual
1240,1,1
703,0,0
821,1,1
1081,1,1
37,0,0
167,0,0
223,0,0
647,0,0
325,0,0
558,0,0
