## Multi Variate Logistic Regression
- The notebook implements a logistic regression model to classify bank notes as 'authentic' or 'fake'
- We use a data set with the following features;
    - Variance of Wavelet Transformed image (continuous)
    - Skewness of Wavelet Transformed image (continuous)
    - Curtosis of Wavelet Transformed image (continuous)
    - Entropy of image (continuous)
    - Class (integer)
- Total Instance : 1372
- Data Source
    - https://archive.ics.uci.edu/ml/datasets/banknote+authentication

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

In [2]:
path = "02_bankNote.csv"
df = pd.read_csv(path)

In [3]:
df.columns

Index(['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class'], dtype='object')

In [5]:
df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [6]:
X = df.iloc[:, :-1]
Y = df.iloc[:,4]

In [7]:
from sklearn.preprocessing import RobustScaler
scaler =  preprocessing.RobustScaler()
X = scaler.fit_transform(X)

In [8]:
X

array([[ 0.68025618,  0.74464159, -0.72018678,  0.04973186],
       [ 0.88143259,  0.68612813, -0.64684149, -0.31174108],
       [ 0.7334505 , -0.58172613,  0.27503326,  0.24680763],
       ...,
       [-0.92425794, -1.85129344,  3.57083857, -0.78000256],
       [-0.88364394, -1.255728  ,  2.47703253, -0.24771567],
       [-0.66124639, -0.34937829,  0.43489107,  0.63450322]])

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)

In [10]:
print(len(X_train))
print(len(Y_train))
print(len(X_test))
print(len(Y_test))

1029
1029
343
343


In [11]:
model = LogisticRegression()
model.fit(X_train,Y_train)

print(model.intercept_)
print(model.coef_)

[-0.71459806]
[[-6.25222021 -5.67891883 -3.90587697  0.17434859]]


In [12]:
# Accuracy
score = model.score(X_test, Y_test)

In [13]:
score

0.9795918367346939

## Out of Sample Testing

In [14]:
path = "02_bankNoteTest.csv"
df = pd.read_csv(path)

In [15]:
df

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,1.352,5.367,-1.63,-0.44,1
1,3.45,5.34,-3.34,-2.87,0
2,4.456,-3.01,1.254,1.01,1
3,2.54,4.87,-2.345,-2.4,0
4,1.92,-3.57,3.56,-1.32,0
5,2.23,5.63,-1.345,-2.345,0
6,-4.567,2.456,-0.763,-1.14,0
7,-2.434,-1.52,2.356,1.023,1
8,1.023,-1.034,-2.345,0.034,0
9,0.567,1.34,-2.034,-0.786,1


In [16]:
X_OS= df.iloc[:, :-1]
Y_OS= df.iloc[:,4]

In [17]:
X_OS = scaler.fit_transform(X_OS)

In [18]:
score = model.score(X_OS, Y_OS)

In [19]:
score

0.6363636363636364