The final layer has 46 units

Intermediate layers of fewer than 46 units might cause an information bottleneck

Explore the consequence of a very small intermediate representation - just 4D

The final layer of our Reuters network has 46 units. We might guess that intermediate layers of fewer than 46 units could cause an information bottleneck. Let's explore the consequence of a very small intermediate representation - just 4D.

In [1]:
from tensorflow.keras.datasets import reuters
(train_data, train_labels,), (test_data, test_labels) = reuters.load_data(num_words=10000)

import numpy as np
def vectorize_sequences(sequences, dimension = 10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

from tensorflow.keras.utils import to_categorical # one hot encoder for lists
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

x_val = x_train[:1000]
partial_x_train = x_train[1000:]

y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

In [2]:
def build(layer_1, layer_2, layer_3):
    from tensorflow.keras import models
    from tensorflow.keras import layers
    model = models.Sequential()
    model.add(layers.Dense(layer_1, activation = 'relu', input_shape = (10000,)))
    model.add(layers.Dense(layer_2, activation = 'relu'))
    model.add(layers.Dense(layer_3, activation = 'softmax'))
    model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])
    return model

A network building function for decluttering notebooks and ease of experimentation. The size of each layer - number of units - is passed as function arguments.

In [3]:
def train(model):
    return model.fit(partial_x_train, 
              partial_y_train,
              epochs = 20,
              batch_size = 512,
              validation_data = (x_val, y_val),
              verbose=0)

A training function. This function could be modified to include training hyperparameters as function parameters. 

In [4]:
model = build(layer_1=64, layer_2=64, layer_3=46)
history = train(model)
print(max(history.history['val_accuracy']))

0.8199999928474426


Our original Reuters network. A maximum validation accuracy of...

In [5]:
model = build(layer_1=64, layer_2=4, layer_3=46)
history = train(model)
print(max(history.history['val_accuracy']))

0.6349999904632568


A bottlenecked network - just 4 units in the middle layer. A maximum validation error of... The fall in validation accuracy is largely due to the crowded intermediate representation