# MNIST classification

This notebook explores performance in MNIST classification using object-oriented deep neural network I implemented using softmax regression

In [1]:
from neural_network import NeuralNetwork
import numpy as np
import struct

# read in mnist dataset
def read_idx(filename):
    with open(filename, 'rb') as f:
        zero, data_type, dims = struct.unpack('>HBB', f.read(4))
        shape = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
        array = np.fromstring(f.read(), dtype=np.uint8).reshape(shape)
        return array.reshape(array.shape[0],-1).T
        
def convert_to_multiclass(Y):
    '''
    Convert Y from shape(1 x m) to shape (10 x m)
    '''
    Y_list = []
    for each in Y[0,:]:
        this_Y = np.zeros((10,))
        this_Y[each] = 1
        Y_list.append(this_Y)
    
    multiclass = np.array(Y_list).T
    return multiclass


def show_numbers(data, start_idx, end_idx):
    side = np.int8(np.sqrt(data.shape[0]))
    for each in range(start_idx, end_idx):
        pixels = ((data[:,each].reshape(side,side) * 255).astype(np.uint8))
        plt.imshow(pixels, cmap=plt.cm.binary)
        plt.show()
        
def model(train_X, test_X, train_Y, test_Y, layer_dims, init_method = 'standard', learning_rate = 0.05, 
          batch_size = 64, num_epochs = 50, optimizer = 'gd', lambd = 0, keep_prob = 1, 
          beta1 = 0.9, beta2 = 0.999, epsilon = 10**-8, print_int = 1, print_costs = True):
    '''Convenience method used to perform modeling'''
    
    nn = NeuralNetwork(layer_dims,init_method=init_method)
    
    costs = nn.train(train_X, train_Y, learning_rate = learning_rate, batch_size = batch_size, 
                     num_epochs = num_epochs, optimizer = optimizer, lambd = lambd,
                     keep_prob = keep_prob, beta1 = beta1, beta2 = beta2, epsilon = epsilon,
                     print_int = print_int, print_costs = print_costs)
    
    
    train_predict = nn.predict(train_X)
    NeuralNetwork.print_accuracy(train_predict, train_Y, dataset_name="Train set")

    test_predict = nn.predict(test_X)
    NeuralNetwork.print_accuracy(test_predict, test_Y, dataset_name="Test set")
    
    return nn

In [2]:
# load
train_X = read_idx('dataset/train_images_ubyte')
train_Y = read_idx('dataset/train_labels_ubyte')
test_X = read_idx('dataset/test_images_ubyte')
test_Y = read_idx('dataset/test_labels_ubyte')

# normalize
train_X = train_X / np.max(train_X)
test_X = test_X / np.max(test_X)

print("Train shape:\nX: {}\nY: {}".format(train_X.shape,train_Y.shape))

  # Remove the CWD from sys.path while we load stuff.


Train shape:
X: (784, 60000)
Y: (1, 60000)


In [3]:
# convert output data to 2D array (to allow use with softmax)

multiclass_train_Y = convert_to_multiclass(train_Y)
multiclass_test_Y = convert_to_multiclass(test_Y)

print(train_Y[:,:2])
print(multiclass_train_Y[:,:2])

[[5 0]]
[[0. 1.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [1. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


In [4]:
n_x = train_X.shape[0]
n_y = multiclass_train_Y.shape[0]

### Single layer NN (logistic regression)

In [5]:
layer_dims = [n_x,n_y]
nn = model(train_X, test_X, multiclass_train_Y, multiclass_test_Y, layer_dims)

Cost after epoch 0/50: 5.936088549185273
Cost after epoch 1/50: 9.097407274554952
Cost after epoch 2/50: 11.063471526602306
Cost after epoch 3/50: 14.338950909919895
Cost after epoch 4/50: 13.00460203378837
Cost after epoch 5/50: 11.211778738255283
Cost after epoch 6/50: 5.204176630521559
Cost after epoch 7/50: 4.739740649963984
Cost after epoch 8/50: 14.2458075121803
Cost after epoch 9/50: 10.316136264787444
Cost after epoch 10/50: 8.797722249407478
Cost after epoch 11/50: 12.773882787790978
Cost after epoch 12/50: 4.049367174258591
Cost after epoch 13/50: 2.8786409988723802
Cost after epoch 14/50: 10.131550730605678
Cost after epoch 15/50: 1.956743233520785
Cost after epoch 16/50: 12.926262204400317
Cost after epoch 17/50: 6.918485318397735
Cost after epoch 18/50: 9.353000633089614
Cost after epoch 19/50: 2.720676530688244
Cost after epoch 20/50: 8.073585055481978
Cost after epoch 21/50: 9.676306537325704
Cost after epoch 22/50: 7.254928848911033
Cost after epoch 23/50: 8.46622061504

#### Overall, not much overfitting here.  Just need to perform better on training data (high bias).  Therefore need better architecture or to train longer

### 2 layer NN with 100 HU

In [6]:
layer_dims = [n_x,100,n_y]
nn = model(train_X, test_X, multiclass_train_Y, multiclass_test_Y, layer_dims)

Cost after epoch 0/50: 9.613306056424634
Cost after epoch 1/50: 9.122927167419721
Cost after epoch 2/50: 4.202536282923826
Cost after epoch 3/50: 5.623462360216738
Cost after epoch 4/50: 9.467037999129062
Cost after epoch 5/50: 9.599000766280955
Cost after epoch 6/50: 8.834463333758139
Cost after epoch 7/50: 4.91717113681793
Cost after epoch 8/50: 1.5495119669034985
Cost after epoch 9/50: 1.1868075312243227
Cost after epoch 10/50: 2.2973349809640675
Cost after epoch 11/50: 5.91045718531282
Cost after epoch 12/50: 4.512645788659789
Cost after epoch 13/50: 0.48005315508658775
Cost after epoch 14/50: 2.64028532640309
Cost after epoch 15/50: 6.110973503386994
Cost after epoch 16/50: 3.57921696787658
Cost after epoch 17/50: 3.819206034867924
Cost after epoch 18/50: 0.7187427159359527
Cost after epoch 19/50: 1.0820061337734148
Cost after epoch 20/50: 0.7136652579810604
Cost after epoch 21/50: 5.217090761735295
Cost after epoch 22/50: 0.8545827937499285
Cost after epoch 23/50: 2.1070698245105

#### Wow, great performance on train set (bias much improved with only 0.27% error).  Let's try some more architectures

### 2 layer NN with 300 HU

In [7]:
layer_dims = [n_x,300,n_y]
nn = model(train_X, test_X, multiclass_train_Y, multiclass_test_Y, layer_dims)

Cost after epoch 0/50: 5.9681816138079675
Cost after epoch 1/50: 5.167716966832673
Cost after epoch 2/50: 4.910781525010577
Cost after epoch 3/50: 6.7333971309139145
Cost after epoch 4/50: 5.749005001595094
Cost after epoch 5/50: 2.2129575228460556
Cost after epoch 6/50: 10.706185930849177
Cost after epoch 7/50: 8.130948564518759
Cost after epoch 8/50: 4.099771939347459
Cost after epoch 9/50: 7.383392178413406
Cost after epoch 10/50: 3.1406043334121367
Cost after epoch 11/50: 8.737075335510475
Cost after epoch 12/50: 3.075514307247179
Cost after epoch 13/50: 3.0473505046829827
Cost after epoch 14/50: 1.8790954958433765
Cost after epoch 15/50: 5.055506990998065
Cost after epoch 16/50: 1.2284294284579094
Cost after epoch 17/50: 1.5942712468122633
Cost after epoch 18/50: 4.884040039531151
Cost after epoch 19/50: 0.8086234076014627
Cost after epoch 20/50: 1.054636923441886
Cost after epoch 21/50: 0.30792123527533855
Cost after epoch 22/50: 0.6441207033131798
Cost after epoch 23/50: 0.59611

### 2 layer NN with 800 HU

In [8]:
layer_dims = [n_x,800,n_y]
nn = model(train_X, test_X, multiclass_train_Y, multiclass_test_Y, layer_dims)

Cost after epoch 0/50: 8.355840472283738
Cost after epoch 1/50: 15.92123096217971
Cost after epoch 2/50: 3.2095645730185067
Cost after epoch 3/50: 9.606212319937553
Cost after epoch 4/50: 6.152624362678135
Cost after epoch 5/50: 6.646742401001461
Cost after epoch 6/50: 0.9709668343366863
Cost after epoch 7/50: 2.502584622632905
Cost after epoch 8/50: 11.411141494152067
Cost after epoch 9/50: 0.649530298894871
Cost after epoch 10/50: 0.6172386289189543
Cost after epoch 11/50: 3.1409180978432234
Cost after epoch 12/50: 4.197036391659313
Cost after epoch 13/50: 2.210493375539399
Cost after epoch 14/50: 6.746504895788266
Cost after epoch 15/50: 0.7528228353754426
Cost after epoch 16/50: 2.994275552379687
Cost after epoch 17/50: 0.2813141425831936
Cost after epoch 18/50: 2.3465754379492334
Cost after epoch 19/50: 0.33658043677449967
Cost after epoch 20/50: 0.6976421799186874
Cost after epoch 21/50: 0.8575875387408904
Cost after epoch 22/50: 0.9272635980002919
Cost after epoch 23/50: 0.36200

In [9]:
### 3 layer NN with 300 + 100 HU

In [10]:
layer_dims = [n_x,300,100,n_y]
nn = model(train_X, test_X, multiclass_train_Y, multiclass_test_Y, layer_dims)

Cost after epoch 0/50: 17.720276486414523
Cost after epoch 1/50: 14.573733221394559
Cost after epoch 2/50: 5.112767288863813
Cost after epoch 3/50: 1.8586629864132553
Cost after epoch 4/50: 1.9696872883423233
Cost after epoch 5/50: 6.126213833778041
Cost after epoch 6/50: 2.1240206099920096
Cost after epoch 7/50: 3.6507795609068827
Cost after epoch 8/50: 1.0813468429669422
Cost after epoch 9/50: 0.8124234402030417
Cost after epoch 10/50: 0.159510154099897
Cost after epoch 11/50: 0.38548194017194104
Cost after epoch 12/50: 0.42184306773344743
Cost after epoch 13/50: 1.2377491430995766
Cost after epoch 14/50: 0.8190509697253914
Cost after epoch 15/50: 3.9692684583066833
Cost after epoch 16/50: 0.47191151961582706
Cost after epoch 17/50: 0.031214338619493076
Cost after epoch 18/50: 0.5683466958490055
Cost after epoch 19/50: 0.19429074689752446
Cost after epoch 20/50: 0.35153023382126813
Cost after epoch 21/50: 0.5498093350646287
Cost after epoch 22/50: 0.27093805031621543
Cost after epoch

#### As expected increasing complexity of the model has marginal impact on results as training set performance (bias) already high.  Regularization or larger dataset may help decrease variance