# Challenge

Now take your Keras skills and go build another neural network. Pick your data set, but it should be one of abstract types, possibly even nonnumeric, and use Keras to make five implementations of your network. Compare them both in computational complexity as well as in accuracy and given that tradeoff decide which one you like best.

Your dataset should be sufficiently large for a neural network to perform well (samples should really be in the thousands here) and try to pick something that takes advantage of neural networks’ ability to have both feature extraction and supervised capabilities, so don’t pick something with an easy to consume list of features already generated for you (though neural networks can still be useful in those contexts).

Note that if you want to use an unprocessed image dataset, scikit-image is a useful package for converting to importable numerics.

In [11]:
import os
import numpy as np
import skimage
from skimage import io, transform
from tqdm import tqdm
from sklearn.model_selection import train_test_split

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import LSTM, Input
from keras.models import Model
from keras.optimizers import Adam

img_size = 50
train_dir = './data/asl_train/'
test_dir =  './data/asl_test/'

def get_data(folder_path):
    imgs = []
    labels = []
    for folder_name in os.listdir(folder_path):
        if not folder_name.startswith('.'):
            if ord(folder_name[0]) >= 65 and ord(folder_name[0]) <= 90:
                label = ord(folder_name[0]) - 75
            elif folder_name == 'del':
                label = 26
            elif folder_name == 'nothing':
                label = 27
            elif folder_name == 'space':
                label = 28           
            else:
                label = 29
            for file_name in tqdm(os.listdir(folder_path + folder_name)):
                img_file = io.imread(folder_path + folder_name + '/' + file_name)
                if img_file is not None:
                    img_file = transform.resize(img_file, (img_size, img_size))
                    imgs.append(np.asarray(img_file))
                    labels.append(label)
    imgs = np.asarray(imgs)
    labels = np.asarray(labels)
    return imgs, labels

X_train, y_train = get_data(train_dir)
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.2) 

  warn("The default mode, 'constant', will be changed to 'reflect' in "
100%|██████████| 3000/3000 [00:11<00:00, 265.39it/s]
100%|██████████| 3000/3000 [00:11<00:00, 262.49it/s]
100%|██████████| 3000/3000 [00:11<00:00, 261.70it/s]
100%|██████████| 3000/3000 [00:11<00:00, 263.18it/s]
100%|██████████| 3000/3000 [00:11<00:00, 259.46it/s]
100%|██████████| 3000/3000 [00:11<00:00, 258.32it/s]
100%|██████████| 3000/3000 [00:11<00:00, 258.37it/s]
100%|██████████| 3000/3000 [00:11<00:00, 267.60it/s]
100%|██████████| 3000/3000 [00:11<00:00, 254.75it/s]
100%|██████████| 3000/3000 [00:11<00:00, 263.82it/s]
100%|██████████| 3000/3000 [00:11<00:00, 256.85it/s]
100%|██████████| 3000/3000 [00:11<00:00, 255.30it/s]
100%|██████████| 3000/3000 [00:11<00:00, 265.89it/s]
100%|██████████| 3000/3000 [00:10<00:00, 298.26it/s]
100%|██████████| 3000/3000 [00:10<00:00, 285.91it/s]
100%|██████████| 3000/3000 [00:10<00:00, 280.35it/s]
100%|██████████| 3000/3000 [00:11<00:00, 250.86it/s]
100%|██████████| 3000/3000 

In [12]:
X_train.shape

(69600, 50, 50, 3)

In [13]:
# 69600 images with 50*50 pixels (2500 pixels total)
# 17400 images with 50*50 pixels
new_X_train = X_train.reshape(X_train.shape[0], 2500 * 3).astype('float32')
new_X_test = X_test.reshape(X_test.shape[0], 2500 * 3).astype('float32')

# new_X_train /= 255
# new_X_test /= 255

print(new_X_train.shape[0], 'train samples')
print(new_X_test.shape[0], 'test samples')

new_y_train = keras.utils.to_categorical(y_train, 30)
new_y_test = keras.utils.to_categorical(y_test, 30)

69600 train samples
17400 test samples


In [14]:
model = Sequential()

model.add(Dense(100, activation='relu', input_shape=(2500 * 3,)))
model.add(Dropout(0.1))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(30, activation='softmax'))

model.summary()
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 100)               750100    
_________________________________________________________________
dropout_5 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 100)               10100     
_________________________________________________________________
dropout_6 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 30)                3030      
Total params: 763,230
Trainable params: 763,230
Non-trainable params: 0
_________________________________________________________________


In [15]:
history = model.fit(new_X_train, new_y_train, batch_size=180, epochs=10, verbose=1, validation_data=(new_X_test, new_y_test))
score = model.evaluate(new_X_test, new_y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 69600 samples, validate on 17400 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 1.872857763301367
Test accuracy: 0.3508045977011494
