In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import math
import csv
from scipy.optimize import minimize
from matplotlib import cm

The file credit_data.csv contains the information of people submitting credit card applications. The file credit_label.csv contains the information about wither their applications were approved (1) or denied (-1). 
(It is noted the values in the data set have been changed by the authors to protect the confidentiality of the data.)

We use the Perceptron Learning Algorithm to determine the qualifications that were most heavily considered in the approval/rejection. 

We will start by splitting the data set into training and testing sets. The first 500 data is for training, and the rest is set aside for testing.

In [2]:
# Import data
# https://realpython.com/python-csv/
credit_data = np.loadtxt(open("credit_data.csv", "rb"), delimiter=",", skiprows=1)
credit_label = np.loadtxt(open("credit_label.csv", "rb"), delimiter=",", skiprows=1)

training_data = np.hstack((np.ones((500,1)),credit_data[0:500, :])) # a 500 x 16 matrix
training_label = np.vstack(credit_label[0:500])

test_data = np.hstack((np.ones((153,1)),credit_data[500:653, :])) # a 153 x 16 matrix
test_label = np.vstack(credit_label[500:653])

w_old = 1 x (n+1) initial-guess weight vector

data = k x (n+1) matrix. Column 1 contains 1's, Columns 2 through n+1 contain the data

label = k x 1 column. 1 for Class 1 (approved), -1 for Class 2 (denied)

gamma = learning rate (0<gamma<1)

EPOCH = maximum # of cycles through the data points

In [6]:
# Perceptron Learning Algorithm 
def w_fit3(w_old, data, label, gamma, EPOCH):
    N = np.shape(data)[0]
    M = np.zeros((N,1))
    c = 0 # counts number of times weight changes
    for j in range(EPOCH):
        k=0 # stops us once we reach separation of data
        for i in range(N):
            M[i] = np.sign(w_old @ np.transpose(data[i,:]))
            if M[i] != label[i]: # label is incorrect - adjust towards data[i,:]
                #print('Adjusting to', data[i,1:3])
                w_new = w_old + gamma * (label[i] - M[i]) * data[i,:]
                w_old = w_new
                #print("w_new =", w_old)
                # update counters
                k=0
                c=c+1
            else: # label is correct - no need to adjust w
                k=k+1
                #print('w_new =', w_old)
        if k==N:
            #Separation achieved
            return w_old
        if (k<N and j+1==EPOCH):
            #EPOCH attained
            return w_old
    plt.show()
    
    #returns a 3x1 weight vector 

In [7]:
# Initialize parameters
learning_rate = 0.1
EPOCH = 1000

Note: Altering the hyperparameters $\gamma$ and EPOCH for part (a) and (b) don't seem to make significant differences in the percent error. However, when the random seed isn't set to zero at the beginning of the problem, it effects the Nelder-Mean initial guess and we see a change in the accuracy. Running a few simulations the best $(\%~error)_{testing}$ on the unseen data that I found for (a) was 18.95, and for (b) 13.73. The worst was (a)33.3, (b)29.4 (these were very mean and rejected a lot of people). 

# unchanged data

In [8]:
############## Compute the 'optimal' weight vector ###################

np.random.seed(0)
w0 = [np.random.uniform(-1, 1) for p in range(0, 16)] # starting weight guess from uniform distribution on -1 to 1
#print(w0)

# Use Nelder-Mead to find good starting weight
# https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
def loss(w):
    return (1/500)* np.sum(np.square(np.subtract(w @ training_data.T, training_label)))

nmw = minimize(loss, w0, method='nelder-mead',
               options={'disp': False})
print('nelder-mead weight= ', np.around(nmw.x, decimals = 2))


#Perceptron Learning
#w1 = w_fit3(w0, training_data, training_label, learning_rate,EPOCH) # using naive starting weight
w2 = w_fit3(nmw.x, training_data, training_label, learning_rate,EPOCH) # using Nelder-Mead starting weight
print('perceptron weight= ', np.around(w2, decimals = 2))

nelder-mead weight=  [ 0.16 -0.31 -0.18  0.5  -0.28  0.54  0.08  0.5  -0.03 -0.03  0.13  0.11
  0.16  2.15 -0.   -0.  ]
perceptron weight=  [  -572.64  -2586.11  -1169.49    884.84  -3397.08  -3396.26   -867.32
 -12119.3   12800.61 -12447.83  -3450.87   8655.11  -2122.24   2195.75
  -1324.4   10229.6 ]


It’s interesting to note that Nelder-Mead had the highest weights corresponded to citizenship, ethnicity, bank, and amount of debt. 

Yet after the Perceptron they were employment, prior defaults, ethnicity, and income. 

This could indicate the Perceptron considers these four factors the most when deciding if an applicant is approved or denied.

In [29]:
############## Use the Perceptron's weight to determine error ###################

#how'd it do with training set?
self_error = np.count_nonzero((np.transpose([np.sign(w2 @ np.transpose(training_data))])-training_label))/500
print("% training error = ", self_error)


#how'd it do with testing set?
unseen_error = np.count_nonzero((np.transpose([np.sign(w2 @ np.transpose(test_data))])-test_label))/153
misapproved = sum(1 for x in (np.transpose([np.sign(w2 @ np.transpose(test_data))])-test_label) if x > 0)
misdenied = sum(1 for x in (np.transpose([np.sign(w2 @ np.transpose(test_data))])-test_label) if x < 0)
print("% testing error = ", unseen_error , "\n" ,"misapproved =" ,misapproved ,"\n", "misdenied =" ,misdenied)



% training error =  0.352
% testing error =  0.3202614379084967 
 misapproved = 31 
 misdenied = 18


Testing to see if the calculated weight vector, $w^*$, separates the training data, we will take the hyperplane $(w^*)^T x = 0$ and calculate the amount of points this hyperplane mis-identified. 
$$
(\%~error)_{training} = \frac{\text{# points mis-identified in training set}}{500}\\
= \frac{\text{# nonzero entries of }[(w^* ~training\_data^T) - training\_labels^T]}{500}\\
= 35.2 \%
$$


Applying the same concept to the testing data, we find the percent error is 
$$(\%~error)_{testing} = \frac{\text{# points mis-identified in testing set}}{153}\\
= \frac{\text{# nonzero entries of }[(w^* ~test\_data^T) - test\_labels^T]}{153}\\
\approx 32.0 \%
$$

Surprisingly, the Perceptron will mis-label credit applicants fairly frequently. Counting the number of positive and negative entries of $(w^* ~test\_data^T) - test\_labels^T$, we find there were 31 positives and 18 negatives. This implies the Perceptron gave 31 people the Credit Card when it should've been denied, and rejected 18 people who should've been approved. So we might consider this Perceptron to be overly kind and should reject more people. 

# data normalized by the maximum value of each feature

In [22]:
# normalize data
norm = np.amax(training_data[0:500,1:16],axis=0)

# split data into training & testing sets
norm_training = np.zeros((500,16))
norm_test = np.zeros((153,16))
for j in range(1,16):
    for i in range(500):
        norm_training[i,0] = 1
        norm_training[i,j] = training_data[i,j]/norm[j-1]
    for k in range(153):
        norm_test[k,0] = 1
        norm_test[k,j] = test_data[k,j]/norm[j-1]

# Nelder-Mead
def norm_loss(w):
    return (1/500)* np.sum(np.square(np.subtract(w @ norm_training.T, training_label)))

norm_nmw = minimize(norm_loss, w0, method='nelder-mead',
               options={'disp': False})
print('normed Nelder-Mead = ', np.around(norm_nmw.x,decimals = 2))


#Perceptron Algorithm
#w1n = w_fit3(w0, norm_training, training_label, learning_rate,EPOCH) # using naive starting vector
w2n = w_fit3(norm_nmw.x, norm_training, training_label, learning_rate,EPOCH) # using Nelder-mead starting vector
print('normed Perceptron = ', np.around(w2n, decimals = 2))

normed Nelder-Mead =  [ 0.51 -0.13  0.17 -0.23 -0.52  0.44 -0.25  0.13 -0.32 -0.17 -0.19 -0.17
  0.   -0.04 -0.02 -0.51]
normed Perceptron =  [ 2.710e+00 -4.300e-01 -5.500e-01 -1.070e+00  1.000e-02  9.700e-01
 -2.900e-01  5.500e-01  1.920e+00 -2.370e+00 -1.290e+00 -3.300e-01
 -5.000e-01 -1.100e-01 -1.180e+00  1.366e+01]


The factors most influencing the approval or rejection in the Perceptron are the applicant's income, prior defaults, eployment status, and years of employment. 

In [28]:
############## Use the Perceptron's weight to determine error ###################

# on itself
norm_self_error = np.count_nonzero((np.transpose([np.sign(w2n @ np.transpose(norm_training))])-training_label))/500
print("% training error = ",norm_self_error)

# on unseen data        
norm_unseen_error = np.count_nonzero((np.transpose([np.sign(w2n @ np.transpose(norm_test))])-test_label))/153
misapproved = sum(1 for x in (np.transpose([np.sign(w2n @ np.transpose(norm_test))])-test_label) if x > 0)
misdenied = sum(1 for x in (np.transpose([np.sign(w2n @ np.transpose(norm_test))])-test_label) if x < 0)
print("% testing error = ", norm_unseen_error , "\n" ,"misapproved =" ,misapproved ,"\n", "misdenied =" ,misdenied)

% training error =  0.236
% testing error =  0.1830065359477124 
 misapproved = 4 
 misdenied = 24


Testing how well this Perceptron's hyperplane, $\tilde{w}^T x = 0$, categorizes the data,
$$
(\%~error)_{training} = \frac{\text{# points mis-identified in training set}}{500} = 23.6 \%\\
(\%~error)_{testing} = \frac{\text{# points mis-identified in testing set}}{153} \approx 18.3 \%
$$
Upon further investigation, we find that this normalized Perceptron gave 4 people approval when it shouldn't have, but it denied 24 people who the banker approved. Thus, the normalized Perceptron is in general more accurate, but also more strict than the un-normalized version. 