# Part 9: Hither to Train, Thither to Test
OK, now we know a bit about perceptrons. We'll return to that subject again. But now let's do a couple of things with our 48 colors from lesson 7:

* We're going to wiggle some more - perturb the color data - in order to generate even more data.
* But now we're going to randomly split the data into two parts, 80% for training and 20% for testing.

Why split for training and testing?

## Repeating the Same Things Too Much Makes Jack a Dull Network
It's possible to *overtrain* your network, to provide it with so much similar data and so many epoch repetitions that it only learns the data you give it, so that if you give it something new it can't deal with it very well and its guesses come out wrong. A network that can make good predictions is a *generalized* network. 

So if you have a lot of data - enough not to worry about *undertraining* by not providing enough examples - you can keep aside some of it as a test for after all your epochs are done, to see, if you give it data it was not trained on, it can still produce similar loss and provide accuracy.

This testing is called *scoring* the network against test data.

## But why weren't we splitting data before?
In the beginning we had no colors, then 3, then 11, then 24, then 48. Splitting with so little data does not do much good, as you'll keep important information about some colors out of training, and the network won't know what they are since it never saw them as an example. When we started perturbing - wiggling - the original colors to multiply how much data we have, splitting started to become possible.

What we're going to do below is increase the amount of data even more, by adding more wiggle points. And then we're going to keep 20% of the data aside for testing the network to see whether it's becoming too focused - not generalized enough - and can't figure out what to do with the test data.

## Slightly Different Network
There are a bunch of differences in the network code below, based on what we learned in lesson 7:

* We use only 4 perceptrons per color, used to be 8.
* We use a batch size of 8 to avoid waiting too long for training. As we increase the data we can usually increase the batch size without making things much worse.
* We split the data into training and test, then train on the training data.
* Then after training we score the network against the test data which the network hasn't seen yet.

In [1]:
from keras.layers import Activation, Dense, Dropout
from keras.models import Sequential
import keras.optimizers, keras.utils, numpy
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

def train(rgbValues, colorNames, epochs = 3, perceptronsPerColorName = 4, batchSize = 8):
    """
    Trains a neural network to understand how to map color names to RGB triples.
    The provided lists of RGB triples must be floating point triples with each
    value in the range [0.0, 1.0], and the number of color names must be the same length.
    Different names are allowed to map to the same RGB triple.
    Returns a trained model that can be used for recognize().
    """

    # Convert the Python map RGB values into a numpy array needed for training.
    rgbNumpyArray = numpy.array(rgbValues, numpy.float64)
    
    # Convert the color labels into a one-hot feature array.
    # Text labels for each array position are in the classes_ list on the binarizer.
    labelBinarizer = LabelBinarizer()
    oneHotLabels = labelBinarizer.fit_transform(colorNames)
    numColors = len(labelBinarizer.classes_)
    colorLabels = labelBinarizer.classes_
    
    # Partition the data into training and testing splits using 80% of
    # the data for training and the remaining 20% for testing.
    (trainingColors, testColors, trainingOneHotLabels, testOneHotLabels) = train_test_split(
        rgbNumpyArray, oneHotLabels, test_size=0.2)

    # Hyperparameters to define the network shape.
    numFullyConnectedPerceptrons = numColors * perceptronsPerColorName
    
    model = Sequential([
        # Layer 1: Fully connected layer with ReLU activation.
        Dense(numFullyConnectedPerceptrons, activation='relu', kernel_initializer='TruncatedNormal', input_shape=(3,)),

        # Outputs: SoftMax activation to get probabilities by color.
        Dense(numColors, activation='softmax')
    ])

    print(model.summary())

    # Compile for categorization.
    model.compile(
        optimizer = keras.optimizers.SGD(lr = 0.01, momentum = 0.9, decay = 1e-6, nesterov = True),
        loss = 'categorical_crossentropy',
        metrics = [ 'accuracy' ])

    history = model.fit(trainingColors, trainingOneHotLabels, epochs=epochs, batch_size=batchSize)

    print("")
    print("Scoring result against test data after training with training data:")
    score = model.evaluate(testColors, testOneHotLabels, batch_size=batchSize)
    
    print("")
    print("Score: loss=%1.4f, accuracy=%1.4f" % (score[0], score[1]))
    return (model, colorLabels)

Using TensorFlow backend.


Here's our createMoreTrainingData() function, mostly the same but we've doubled the number of perturbValues by adding points in between the previous ones.

In [2]:
def createMoreTrainingData(colorNameToRGBMap):
    # The incoming color map is not typically going to be oversubscribed with e.g.
    # extra 'red' samples pointing to slightly different colors. We generate a
    # training dataset by perturbing each color by a small amount positive and
    # negative. We do this for each color individually, by pairs, and for all three
    # at once, for each positive and negative value, resulting in dataset that is
    # many times as large.
    perturbValues = [ 0.0, 0.005, 0.01, 0.015, 0.02, 0.025, 0.03 ]
    rgbValues = []
    labels = []
    for colorName, rgb in colorNameToRGBMap.items():
        reds = []
        greens = []
        blues = []
        for perturb in perturbValues:
            if rgb[0] + perturb <= 1.0:
                reds.append(rgb[0] + perturb)
            if perturb != 0.0 and rgb[0] - perturb >= 0.0:
                reds.append(rgb[0] - perturb)
            if rgb[1] + perturb <= 1.0:
                greens.append(rgb[1] + perturb)
            if perturb != 0.0 and rgb[1] - perturb >= 0.0:
                greens.append(rgb[1] - perturb)
            if rgb[2] + perturb <= 1.0:
                blues.append(rgb[2] + perturb)
            if perturb != 0.0 and rgb[2] - perturb >= 0.0:
                blues.append(rgb[2] - perturb)
        for red in reds:
            for green in greens:
                for blue in blues:
                    rgbValues.append((red, green, blue))
                    labels.append(colorName)
    return (rgbValues, labels)

And our previous 48 crayon colors, and let's try training:

In [3]:
def rgbToFloat(r, g, b):  # r, g, b in 0-255 range
    return (float(r) / 255.0, float(g) / 255.0, float(b) / 255.0)

# http://www.jennyscrayoncollection.com/2017/10/complete-list-of-current-crayola-crayon.html
colorMap = {
    # 8-crayon box colors
    'red': rgbToFloat(238, 32, 77),
    'yellow': rgbToFloat(252, 232, 131),
    'blue': rgbToFloat(31, 117, 254),
    'brown': rgbToFloat(180, 103, 77),
    'orange': rgbToFloat(255, 117, 56),
    'green': rgbToFloat(28, 172, 20),
    'violet': rgbToFloat(146, 110, 174),
    'black': rgbToFloat(35, 35, 35),

    # Additional for 16-count box
    'red-violet': rgbToFloat(192, 68, 143),
    'red-orange': rgbToFloat(255, 117, 56),
    'yellow-green': rgbToFloat(197, 227, 132),
    'blue-violet': rgbToFloat(115, 102, 189),
    'carnation-pink': rgbToFloat(255, 170, 204),
    'yellow-orange': rgbToFloat(255, 182, 83),
    'blue-green': rgbToFloat(25, 158, 189),
    'white': rgbToFloat(237, 237, 237),

    # Additional for 24-count box
    'violet-red': rgbToFloat(247, 83 ,148),
    'apricot': rgbToFloat(253, 217, 181),
    'cerulean': rgbToFloat(29, 172, 214),
    'indigo': rgbToFloat(93, 118, 203),
    'scarlet': rgbToFloat(242, 40, 71),
    'green-yellow': rgbToFloat(240, 232, 145),
    'bluetiful': rgbToFloat(46, 80, 144),
    'gray': rgbToFloat(149, 145, 140),
    
    # Additional for 32-count box
    'chestnut': rgbToFloat(188, 93, 88),
    'peach': rgbToFloat(255, 207, 171),
    'sky-blue': rgbToFloat(128, 215, 235),
    'cadet-blue': rgbToFloat(176, 183, 198),
    'melon': rgbToFloat(253, 188, 180),
    'tan': rgbToFloat(250, 167, 108),
    'wisteria': rgbToFloat(205, 164, 222),
    'timberwolf': rgbToFloat(219, 215, 210),

    # Additional for 48-count box
    'lavender': rgbToFloat(252, 180, 213),
    'burnt-sienna': rgbToFloat(234, 126, 93),
    'olive-green': rgbToFloat(186, 184, 108),
    'purple-mountains-majesty': rgbToFloat(157, 129, 186),
    'salmon': rgbToFloat(255, 155, 170),
    'macaroni-and-cheese': rgbToFloat(255, 189, 136),
    'granny-smith-apple': rgbToFloat(168, 228, 160),
    'sepia': rgbToFloat(165, 105, 79),
    'mauvelous': rgbToFloat(239, 152, 170),
    'goldenrod': rgbToFloat(255, 217, 117),
    'sea-green': rgbToFloat(159, 226, 191),
    'raw-sienna': rgbToFloat(214, 138, 89),
    'mahogany': rgbToFloat(205, 74, 74),
    'spring-green': rgbToFloat(236, 234, 190),
    'cornflower': rgbToFloat(154, 206, 235),
    'tumbleweed': rgbToFloat(222, 170, 136),
}

(rgbValues, colorNames) = createMoreTrainingData(colorMap)
(colorModel, colorLabels) = train(rgbValues, colorNames)

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 192)               768       
_________________________________________________________________
dense_2 (Dense)              (None, 48)                9264      
Total params: 10,032
Trainable params: 10,032
Non-trainable params: 0
_________________________________________________________________
None
Instructions for updating:
Use tf.cast instead.
Epoch 1/3
Epoch 2/3
Epoch 3/3

Scoring result against test data after training with training data:

Score: loss=0.1417, accuracy=0.9565


Not bad: We quickly got our loss down to 0.17 in only 3 epochs, but the larger batch size kept it from taking a really long time.

But let's examine our new addition, the test data scoring result. From my machine:

 `Score: loss=0.1681, accuracy=0.9464`

Note that we trained with 74,000 data points, but we kept aside an additional 18,000 data points as test data the network was not allowed to train with. And when we ask the network to predict with the test data, the loss we get - 0.168 on my machine - is pretty close to the 0.172 loss I got on training.

This is good news! It means our network is well generalized: not overtrained, not too focused to deal with making predictions on new data.

Try it out to make sure it still seems like a good result:

In [4]:
from ipywidgets import interact
from IPython.core.display import display, HTML

def displayColor(r, g, b):
    rInt = min(255, max(0, int(r * 255.0)))
    gInt = min(255, max(0, int(g * 255.0)))
    bInt = min(255, max(0, int(b * 255.0)))
    hexColor = "#%02X%02X%02X" % (rInt, gInt, bInt)
    display(HTML('<div style="width: 50%; height: 50px; background: ' + hexColor + ';"></div>'))

numPredictionsToShow = 5
@interact(r = (0.0, 1.0, 0.01), g = (0.0, 1.0, 0.01), b = (0.0, 1.0, 0.01))
def getTopPredictionsFromModel(r, g, b):
    testColor = numpy.array([ (r, g, b) ])
    predictions = colorModel.predict(testColor, verbose=0)  # Predictions shape (1, numColors)
    predictions *= 100.0
    predColorTuples = []
    for i in range(0, len(colorLabels)):
        predColorTuples.append((predictions[0][i], colorLabels[i]))
    predAndNames = numpy.array(predColorTuples, dtype=[('pred', float), ('colorName', 'U50')])
    sorted = numpy.sort(predAndNames, order=['pred', 'colorName'])
    sorted = sorted[::-1]  # reverse rows to get highest on top
    for i in range(0, numPredictionsToShow):
        print("%2.1f" % sorted[i][0] + "%", sorted[i][1])
    displayColor(r, g, b)

interactive(children=(FloatSlider(value=0.5, description='r', max=1.0, step=0.01), FloatSlider(value=0.5, desc…

In my opinion the extra perturbation data made quite a bit of difference. It guesses over 70% for gray at (0.5, 0.5, 0.5), better than before.

Here's the hyperparameter slider version so you can try out different epochs, batch sizes, and perceptrons:

In [5]:
@interact(epochs = (1, 10), perceptronsPerColorName = (1, 12), batchSize = (1, 50))
def trainModel(epochs=4, perceptronsPerColorName=3, batchSize=16):
    global colorModel
    global colorLabels
    (colorModel, colorLabels) = train(rgbValues, colorNames, epochs=epochs, perceptronsPerColorName=perceptronsPerColorName, batchSize=batchSize)

interactive(children=(IntSlider(value=4, description='epochs', max=10, min=1), IntSlider(value=3, description=…

In [6]:
interact(getTopPredictionsFromModel, r = (0.0, 1.0, 0.01), g = (0.0, 1.0, 0.01), b = (0.0, 1.0, 0.01))

interactive(children=(FloatSlider(value=0.5, description='r', max=1.0, step=0.01), FloatSlider(value=0.5, desc…

<function __main__.getTopPredictionsFromModel(r, g, b)>

### Coming up...
We'll begin understanding why neurons and perceptrons by themselves are not enough, it's the connections that matter too. And we'll begin to learn how training works to create weights and biases that let ask the network for new predictions.