Neural Networks :

In this problem, we will work with the Poker Hand dataset available on the UCI repository. We will use the entire dataset for the purpose of this assignment. The training set contains 25010 examples whereas the test set contains 1000000 examples each. The dataset consists of 10 categorical attributes. The last entry in each row denotes the class label.

In [326]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

In [327]:
pokerTrainData = pd.read_csv('./data/poker-hand-training-true.data', header=None, delimiter=',')
pokerTrainData

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1,10,1,11,1,13,1,12,1,1,9
1,2,11,2,13,2,10,2,12,2,1,9
2,3,12,3,11,3,13,3,10,3,1,9
3,4,10,4,11,4,1,4,13,4,12,9
4,4,1,4,13,4,12,4,11,4,10,9
...,...,...,...,...,...,...,...,...,...,...,...
25005,3,9,2,6,4,11,4,12,2,4,0
25006,4,1,4,10,3,13,3,4,1,10,1
25007,2,1,2,10,4,4,4,1,4,13,1
25008,2,12,4,3,1,10,1,12,4,9,1


Each record is an example of a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank), for a total of 10 predictive attributes. There is one Class attribute that describes the Poker Hand. The order of cards is important, which is why there are 480 possible Royal Flush hands as compared to 4 (one for each suit)

Attribute Information:
   1) S(i) Suit of card 
   
      #S(i) - Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}


   2) C(i) Rank of card 
   
      #C(i) - Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)

   i goes from 1 to 5

In [328]:
X = pokerTrainData.iloc[:,:-1]
X

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1,10,1,11,1,13,1,12,1,1
1,2,11,2,13,2,10,2,12,2,1
2,3,12,3,11,3,13,3,10,3,1
3,4,10,4,11,4,1,4,13,4,12
4,4,1,4,13,4,12,4,11,4,10
...,...,...,...,...,...,...,...,...,...,...
25005,3,9,2,6,4,11,4,12,2,4
25006,4,1,4,10,3,13,3,4,1,10
25007,2,1,2,10,4,4,4,1,4,13
25008,2,12,4,3,1,10,1,12,4,9


In [329]:
y = pokerTrainData.iloc[:,-1]
y

0        9
1        9
2        9
3        9
4        9
        ..
25005    0
25006    1
25007    1
25008    1
25009    1
Name: 10, Length: 25010, dtype: int64

CLASS Poker Hand
      Ordinal (0-9)

      0: Nothing in hand; not a recognized poker hand 
      1: One pair; one pair of equal ranks within five cards
      2: Two pairs; two pairs of equal ranks within five cards
      3: Three of a kind; three equal ranks within five cards
      4: Straight; five cards, sequentially ranked with no gaps
      5: Flush; five cards with the same suit
      6: Full house; pair + different rank three of a kind
      7: Four of a kind; four equal ranks within five cards
      8: Straight flush; straight + flush
      9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush

Using One Hot Encoding to convert the categorical data feature to binary

In [330]:
enc = OneHotEncoder()

In [331]:
XEnc = pd.DataFrame(enc.fit_transform(X).toarray())
totalSize = XEnc.shape[0]
XEnc

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,75,76,77,78,79,80,81,82,83,84
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25005,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25006,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
25007,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25008,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


In [332]:
yEnc = np.array(y).reshape(totalSize,1)
yEnc = pd.DataFrame(enc.fit_transform(yEnc).toarray())
yEnc

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...
25005,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25006,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25007,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25008,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [333]:
XTrainEnc, XTestEnc, yTrainEnc, yTestEnc = train_test_split(XEnc, yEnc, test_size=0.1, random_state=0)

In [334]:
XTrainEnc

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,75,76,77,78,79,80,81,82,83,84
22256,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
24698,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
24064,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
16774,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17634,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13123,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19648,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9845,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10799,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [335]:
yTrainEnc

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
22256,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24698,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24064,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16774,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17634,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
13123,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19648,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9845,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10799,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Implementing a generic neural network architecture to learn a model for multi-class classification using one-hot encoding as described above. We will implement the backpropagation algorithm (from first principles) to train your network. We should use mini-batch Stochastic Gradient Descent (mini-batch SGD) algorithm to train your network. Use the Mean Squared Error 2(MSE) over each mini-batch as your loss function. Given a total of m examples, and M samples in each batch

Here each y (i) is represented using one-hot encoding as described above. We will use the sigmoid as activation function for the units in output layer as well as in the hidden layer (we will experiment with other activation units in one of the parts below). Our implementation (including back-propagation) is from the first principles and not used any pre-existing library in Python for the same. It is generic enough to create an architecture based on the following input parameters:

    • Number of features/attributes (n)
    • Hidden layer architecture: List of numbers denoting the number of perceptrons in the corresponding hidden layer. Eg. a list [100 50] specifies two hidden layers; first one with 100 units and second one with 50 units.
    • Number of target classes (r)
Assume a fully connected architecture i.e., each unit in a hidden layer is connected to every unit in
the next layer.

In [336]:
m = XTrainEnc.shape[0]
inputSize = XTrainEnc.shape[1]
outputSize = len(np.unique(y))

In [337]:
layers = [inputSize, 200, 100, 50, outputSize]
layers

[85, 200, 100, 50, 10]

In [338]:
def initialize_params(layers):
    len_ = len(layers)
    params = {}
    for i in range(1,len_):
        params['theta'+str(i)] = np.random.rand(layers[i], layers[i-1])*0.01
        params['bias'+str(i)] = np.random.rand(layers[i])
    return params


In [339]:
parameter = initialize_params(layers)
parameter['theta1'].shape, parameter['bias1'].shape, parameter['theta2'].shape, parameter['bias2'].shape

((200, 85), (200,), (100, 200), (100,))

In [340]:
parameter['theta1']

array([[2.64685346e-03, 2.07180455e-03, 5.65412408e-03, ...,
        9.66801394e-03, 8.89732100e-04, 5.50399442e-03],
       [7.83745071e-04, 7.59780011e-03, 7.77806621e-05, ...,
        6.56049529e-03, 1.52150298e-03, 4.93125262e-03],
       [3.44526937e-03, 3.91277073e-03, 2.71010808e-03, ...,
        9.30866318e-03, 5.54893401e-03, 5.44370906e-03],
       ...,
       [8.84389799e-03, 7.20319402e-03, 7.39172162e-03, ...,
        6.35956659e-03, 2.24880445e-03, 5.22025384e-03],
       [7.02233254e-03, 8.00303650e-03, 3.44120737e-03, ...,
        5.49853085e-03, 9.66166975e-03, 2.57598912e-03],
       [6.32337674e-03, 8.24290508e-03, 2.91854845e-03, ...,
        1.36016260e-03, 7.43752614e-03, 6.91218490e-03]])

In [341]:
parameter['theta1'].shape, parameter['theta2'].shape ,XTrainEnc.shape

((200, 85), (100, 200), (22509, 85))

In [342]:
def forwardPropagation(X, params, aFunc):
    layers = len(params)//2
    values = {}
    
    for i in range(1, layers + 1):
        if i == 1:
            temp = params['theta'+str(i)]@X.T
            values['Z' + str(i)] = pd.Series(params['bias'+str(i)]) + temp.T
            values['A' + str(i)] = values['Z' + str(i)].apply(aFunc)

        else:
            temp = params['theta'+str(i)]@values['Z' + str(i-1)].T
            values['Z' + str(i)] = pd.Series(params['bias'+str(i)]) + temp.T
            if i == layers:
                values['A' + str(i)] = values['Z' + str(i)]
            else:
                values['A' + str(i)] = values['Z' + str(i)].apply(aFunc)
    return values

In [343]:
identityFunc = lambda x:x
sigFunc = lambda x : 1/(1+np.exp(-x))
dictOutput = forwardPropagation(XTrainEnc, parameter, sigFunc)

In [344]:
dictOutput['A1'].shape, dictOutput['A2'].shape

((22509, 200), (22509, 100))

In [345]:
yTrainEnc

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
22256,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24698,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24064,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16774,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17634,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
13123,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19648,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9845,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10799,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [346]:
def computeCost(values, y):
    noOflayers = len(values)//2
    y_pred = values['A' + str(noOflayers)]
    len_ = y.shape[0]
    cost = 1/(2*len_)*np.sum(np.sum(np.square(y_pred - y)))
    return cost

In [347]:
computeCost(dictOutput, yTrainEnc)

2.802388340091599

In [348]:
def backwardPropagation(params, values, X, y):
    noOfLayers = len(params)//2
    m = len(y)
    grads = {}
    for i in range(noOfLayers, 0, -1):
        if i == noOfLayers:
            dA = 1/m * (values['A'+str(i)] - y)
            dZ = dA
        else:
            dA = dZ@params['theta'+str(i+1)]
            dZ = np.multiply(dA, np.multiply(dictOutput['A'+str(i)], (1-dictOutput['A'+str(i)])))
        
        if i == 1:
            grads['theta' + str(i)] = 1/m * dZ.T@X
            grads['bias' + str(i)] = np.sum(dZ, axis=0)
        else:
            grads['theta' + str(i)] = 1/m * dZ.T@values['A'+str(i-1)]
            grads['bias' + str(i)] = np.sum(dZ, axis=0)
    return grads

In [349]:
deltas = backwardPropagation(parameter, dictOutput, XTrainEnc, yTrainEnc)

In [350]:
parameter['theta1'].shape, deltas['theta1'].shape

((200, 85), (200, 85))

In [351]:
def update_params(params, grads, lr):
    layers = len(params)//2
    updatedParams = {}
    for i in range(1, layers+1):
        updatedParams['theta'+str(i)] = params['theta'+str(i)] - lr*grads['theta'+str(i)]
        updatedParams['bias'+str(i)] = params['bias'+str(i)] - lr*grads['bias'+str(i)]
    return updatedParams

In [352]:
newParameter = update_params(parameter, deltas, 100000)

In [353]:
dictOutput = forwardPropagation(XTrainEnc, newParameter, sigFunc)
computeCost(dictOutput, yTrainEnc)

  result = getattr(ufunc, method)(*inputs, **kwargs)


81880368.50056581

In [354]:
def model(X, y, layer_sizes, no_iters, lr, func):
    params = initialize_params(layer_sizes)
    for i in range(no_iters):
        vals = forwardPropagation(X, params, func)
        cost = computeCost(vals, y)
        grads = backwardPropagation(params, vals, X, y)
        params = update_params(params, grads, lr)
        print('Cost at iter:' + str(i+1) + ' = ' + str(cost))
    return params


In [355]:
newParam = model(XTrainEnc, yTrainEnc, layers, 500, 0.1, sigFunc)

Cost at iter:1 = 3.2103208368434175
Cost at iter:2 = 2.653375023532553
Cost at iter:3 = 2.202423685052298
Cost at iter:4 = 1.8372946099846694
Cost at iter:5 = 1.5416546371465076
Cost at iter:6 = 1.3022790313460595
Cost at iter:7 = 1.1084599070166103
Cost at iter:8 = 0.9515272369971381
Cost at iter:9 = 0.8244610199526912
Cost at iter:10 = 0.7215772576872488
Cost at iter:11 = 0.6382736953078523
Cost at iter:12 = 0.5708239505446087
Cost at iter:13 = 0.516210823102373
Cost at iter:14 = 0.4719913275433472
Cost at iter:15 = 0.4361874122748362
Cost at iter:16 = 0.4071974762218513
Cost at iter:17 = 0.3837247250980964
Cost at iter:18 = 0.3647191624673621
Cost at iter:19 = 0.3493306207065311
Cost at iter:20 = 0.33687073082455704
Cost at iter:21 = 0.32678212994975525
Cost at iter:22 = 0.3186135290572379
Cost at iter:23 = 0.3119995256519158
Cost at iter:24 = 0.306644258376534
Cost at iter:25 = 0.3023081723733868
Cost at iter:26 = 0.29879730338035254
Cost at iter:27 = 0.2959546012113451
Cost at ite

In [356]:
newParam

{'theta1':            0         1         2         3         4         5         6   \
 0    0.001050  0.008927  0.007953  0.008470  0.007064  0.001176  0.006963   
 1    0.009464  0.008651  0.004396  0.000906  0.003833  0.008752  0.000930   
 2    0.003872  0.003529  0.008086  0.002151  0.008326  0.000730  0.007054   
 3    0.008651  0.009645  0.005459  0.003195  0.000047  0.000386  0.004926   
 4    0.009526  0.004149  0.008008  0.001117  0.001262  0.002135  0.009045   
 ..        ...       ...       ...       ...       ...       ...       ...   
 195  0.006157  0.000925  0.009827  0.000840  0.005926  0.008888  0.008772   
 196  0.000450  0.003293  0.003046  0.005206  0.006891  0.000168  0.000928   
 197  0.004226  0.000620  0.002429  0.008589  0.008386  0.008894  0.002275   
 198  0.007911  0.001391  0.003201  0.007151  0.001349  0.006614  0.001504   
 199  0.007031  0.000851  0.003569  0.002569  0.002631  0.005434  0.006485   
 
            7         8         9   ...        75   

In [357]:
def compute_accuracy(X, y, param, aFunc):
    noOfParam = len(newParam)//2
    aValues = forwardPropagation(X, param, aFunc)
    len_ = aValues['A'+str(noOfParam)].shape[0]
    correct_pred = 0
    for i in range(len_):
        predY = aValues['A'+str(noOfParam)].iloc[i][aValues['A'+str(noOfParam)].iloc[i] == np.max(aValues['A'+str(noOfParam)].iloc[i])].index[0]
        realY = y.iloc[i][y.iloc[i] == 1.0].index[0]
        if predY == realY:
            correct_pred += 1
    accuracy = correct_pred/len_
    return accuracy

In [358]:
compute_accuracy(XTrainEnc, yTrainEnc, newParam, sigFunc)

0.4992669598827136

In [359]:
compute_accuracy(XTestEnc, yTestEnc, newParam, sigFunc)

0.5017992802878849