# Part 7: 48 Crayons
Let's review where we've been so far:

* We got a basic neural network up and running and trained it to predict the name of a few colors.
* We added more colors until we ran into problems with the training taking too many repetitions, *epochs*, and not getting much better after awhile.
* So we *perturbed* our data, wiggling the color values around a little bit and calling the new values the same color as the original. This gave the network lots more examples - which networks typically like a lot! - and helped us converge to a lower loss in only a few epochs, but at the cost of each epoch taking a lot of time.

We've not discussed exactly what the network does inside. That's a lot of detail that we will start to get into soon.

In the meantime, as promised last time, it's time to try doubling our crayon colors from 24 to 48. Let's give it a go and see where things end up.

Hint: There are two new changes in this code from last time. See if you can spot them; we'll use them below.

In [1]:
from keras.layers import Activation, Dense, Dropout
from keras.models import Sequential
import keras.optimizers, keras.utils, numpy
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

def train(rgbValues, colorNames, epochs = 16, perceptronsPerColorName = 8, batchSize = 1):
    """
    Trains a neural network to understand how to map color names to RGB triples.
    The provided lists of RGB triples must be floating point triples with each
    value in the range [0.0, 1.0], and the number of color names must be the same length.
    Different names are allowed to map to the same RGB triple.
    Returns a trained model that can be used for recognize().
    """

    # Convert the Python map RGB values into a numpy array needed for training.
    rgbNumpyArray = numpy.array(rgbValues, numpy.float64)
    
    # Convert the color labels into a one-hot feature array.
    # Text labels for each array position are in the classes_ list on the binarizer.
    labelBinarizer = LabelBinarizer()
    oneHotLabels = labelBinarizer.fit_transform(colorNames)
    numColors = len(labelBinarizer.classes_)
    colorLabels = labelBinarizer.classes_
    
    # Hyperparameters to define the network shape.
    numFullyConnectedPerceptrons = numColors * perceptronsPerColorName
    
    model = Sequential([
        # Layer 1: Fully connected layer with ReLU activation.
        Dense(numFullyConnectedPerceptrons, activation='relu', kernel_initializer='TruncatedNormal', input_shape=(3,)),

        # Outputs: SoftMax activation to get probabilities by color.
        Dense(numColors, activation='softmax')
    ])

    print(model.summary())

    # Compile for categorization.
    model.compile(
        optimizer = keras.optimizers.SGD(lr = 0.01, momentum = 0.9, decay = 1e-6, nesterov = True),
        loss = 'categorical_crossentropy',
        metrics = [ 'accuracy' ])

    history = model.fit(rgbNumpyArray, oneHotLabels, epochs=epochs, batch_size=batchSize)

    return (model, colorLabels)

def createMoreTrainingData(colorNameToRGBMap):
    # The incoming color map is not typically going to be oversubscribed with e.g.
    # extra 'red' samples pointing to slightly different colors. We generate a
    # training dataset by perturbing each color by a small amount positive and
    # negative. We do this for each color individually, by pairs, and for all three
    # at once, for each positive and negative value, resulting in dataset that is
    # many times as large.
    perturbValues = [ 0.0, 0.01, 0.02, 0.03 ] # TODO: Experiment with adding 0.04, 0.05
    rgbValues = []
    labels = []
    for colorName, rgb in colorNameToRGBMap.items():
        reds = []
        greens = []
        blues = []
        for perturb in perturbValues:
            if rgb[0] + perturb <= 1.0:
                reds.append(rgb[0] + perturb)
            if perturb != 0.0 and rgb[0] - perturb >= 0.0:
                reds.append(rgb[0] - perturb)
            if rgb[1] + perturb <= 1.0:
                greens.append(rgb[1] + perturb)
            if perturb != 0.0 and rgb[1] - perturb >= 0.0:
                greens.append(rgb[1] - perturb)
            if rgb[2] + perturb <= 1.0:
                blues.append(rgb[2] + perturb)
            if perturb != 0.0 and rgb[2] - perturb >= 0.0:
                blues.append(rgb[2] - perturb)
        for red in reds:
            for green in greens:
                for blue in blues:
                    rgbValues.append((red, green, blue))
                    labels.append(colorName)
    return (rgbValues, labels)

Using TensorFlow backend.


And then our newly expanded color list, and training a small number of epochs:

In [4]:
def rgbToFloat(r, g, b):  # r, g, b in 0-255 range
    return (float(r) / 255.0, float(g) / 255.0, float(b) / 255.0)

# http://www.jennyscrayoncollection.com/2017/10/complete-list-of-current-crayola-crayon.html
colorMap = {
    # 8-crayon box colors
    'red': rgbToFloat(238, 32, 77),
    'yellow': rgbToFloat(252, 232, 131),
    'blue': rgbToFloat(31, 117, 254),
    'brown': rgbToFloat(180, 103, 77),
    'orange': rgbToFloat(255, 117, 56),
    'green': rgbToFloat(28, 172, 20),
    'violet': rgbToFloat(146, 110, 174),
    'black': rgbToFloat(35, 35, 35),

    # Additional for 16-count box
    'red-violet': rgbToFloat(192, 68, 143),
    'red-orange': rgbToFloat(255, 117, 56),
    'yellow-green': rgbToFloat(197, 227, 132),
    'blue-violet': rgbToFloat(115, 102, 189),
    'carnation-pink': rgbToFloat(255, 170, 204),
    'yellow-orange': rgbToFloat(255, 182, 83),
    'blue-green': rgbToFloat(25, 158, 189),
    'white': rgbToFloat(237, 237, 237),

    # Additional for 24-count box
    'violet-red': rgbToFloat(247, 83 ,148),
    'apricot': rgbToFloat(253, 217, 181),
    'cerulean': rgbToFloat(29, 172, 214),
    'indigo': rgbToFloat(93, 118, 203),
    'scarlet': rgbToFloat(242, 40, 71),
    'green-yellow': rgbToFloat(240, 232, 145),
    'bluetiful': rgbToFloat(46, 80, 144),
    'gray': rgbToFloat(149, 145, 140),
    
    # Additional for 32-count box
    'chestnut': rgbToFloat(188, 93, 88),
    'peach': rgbToFloat(255, 207, 171),
    'sky-blue': rgbToFloat(128, 215, 235),
    'cadet-blue': rgbToFloat(176, 183, 198),
    'melon': rgbToFloat(253, 188, 180),
    'tan': rgbToFloat(250, 167, 108),
    'wisteria': rgbToFloat(205, 164, 222),
    'timberwolf': rgbToFloat(219, 215, 210),

    # Additional for 48-count box
    'lavender': rgbToFloat(252, 180, 213),
    'burnt-sienna': rgbToFloat(234, 126, 93),
    'olive-green': rgbToFloat(186, 184, 108),
    'purple-mountains-majesty': rgbToFloat(157, 129, 186),
    'salmon': rgbToFloat(255, 155, 170),
    'macaroni-and-cheese': rgbToFloat(255, 189, 136),
    'granny-smith-apple': rgbToFloat(168, 228, 160),
    'sepia': rgbToFloat(165, 105, 79),
    'mauvelous': rgbToFloat(239, 152, 170),
    'goldenrod': rgbToFloat(255, 217, 117),
    'sea-green': rgbToFloat(159, 226, 191),
    'raw-sienna': rgbToFloat(214, 138, 89),
    'mahogany': rgbToFloat(205, 74, 74),
    'spring-green': rgbToFloat(236, 234, 190),
    'cornflower': rgbToFloat(154, 206, 235),
    'tumbleweed': rgbToFloat(222, 170, 136),
}

(rgbValues, colorNames) = createMoreTrainingData(colorMap)
(colorModel, colorLabels) = train(rgbValues, colorNames, 5)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 384)               1536      
_________________________________________________________________
dense_6 (Dense)              (None, 48)                18480     
Total params: 20,016
Trainable params: 20,016
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In 3 epochs my machine went from 1.2 down to 0.43, then by 5 epochs it got down to 0.37. Plus note it takes longer since we have more colors, more added perturbed colors, and the network itself needs to be bigger to handle more colors.

Let's see how it performs by testing with the sliders:

In [5]:
from ipywidgets import interact
from IPython.core.display import display, HTML

def displayColor(r, g, b):
    rInt = min(255, max(0, int(r * 255.0)))
    gInt = min(255, max(0, int(g * 255.0)))
    bInt = min(255, max(0, int(b * 255.0)))
    hexColor = "#%02X%02X%02X" % (rInt, gInt, bInt)
    display(HTML('<div style="width: 50%; height: 50px; background: ' + hexColor + ';"></div>'))

numPredictionsToShow = 5
@interact(r = (0.0, 1.0, 0.01), g = (0.0, 1.0, 0.01), b = (0.0, 1.0, 0.01))
def getTopPredictionsFromModel(r, g, b):
    testColor = numpy.array([ (r, g, b) ])
    predictions = colorModel.predict(testColor, verbose=0)  # Predictions shape (1, numColors)
    predictions *= 100.0
    predColorTuples = []
    for i in range(0, len(colorLabels)):
        predColorTuples.append((predictions[0][i], colorLabels[i]))
    predAndNames = numpy.array(predColorTuples, dtype=[('pred', float), ('colorName', 'U50')])
    sorted = numpy.sort(predAndNames, order=['pred', 'colorName'])
    sorted = sorted[::-1]  # reverse rows to get highest on top
    for i in range(0, numPredictionsToShow):
        print("%2.1f" % sorted[i][0] + "%", sorted[i][1])
    displayColor(r, g, b)


interactive(children=(FloatSlider(value=0.5, description='r', max=1.0, step=0.01), FloatSlider(value=0.5, desc…

Rather good results! With so many more colors to choose from, the network is predicting rather well the in-between colors.

And of course, next is the slider that lets you play around with the epochs. But this time there are new sliders you can play with:

* How many perceptrons - the elements that make up our network, which we'll describe more later on - are added for each color name. More perceptrons do not always mean more accuracy. And each perceptron we add increases the amount of math needed to train and use the network.
* The batch size. This is how many color examples are given at a time before the network is told to adjust itself to the new data (which is called *backpropagation*, something we'll be learning more about later). We've been running with a batch size of 1 so far, which is slow but best for accuracy, whereas bigger numbers train faster but tend to be less accurate, meaning you might have to increase the epochs as well. Try running with batch sizes 3 and 5 and see if your results with the predicted colors are very different.

Play with the numbers and see the results to get a feel for things.

In [6]:
@interact(epochs = (1, 10), perceptronsPerColorName = (1, 32), batchSize = (1, 10))
def trainModel(epochs=3, perceptronsPerColorName=8, batchSize=1):
    global colorModel
    global colorLabels
    (colorModel, colorLabels) = train(rgbValues, colorNames, epochs=epochs, perceptronsPerColorName=perceptronsPerColorName, batchSize=batchSize)

interactive(children=(IntSlider(value=3, description='epochs', max=10, min=1), IntSlider(value=8, description=…

In [9]:
interact(getTopPredictionsFromModel, r = (0.0, 1.0, 0.01), g = (0.0, 1.0, 0.01), b = (0.0, 1.0, 0.01))

interactive(children=(FloatSlider(value=0.5, description='r', max=1.0, step=0.01), FloatSlider(value=0.5, desc…

<function __main__.getTopPredictionsFromModel(r, g, b)>

What I found was that 2 and 3 perceptrons per color name were too low, the predictions were pretty bad at 3 epochs. At 4 it gets about as good as when it was at 8, and setting it to 28 did not much better at all.

Similarly, batch size 2 seemed about as good as 1, but trained twice as fast, 3 a little bit less accurate but faster, but by 10 or so it was relatively inaccurate - loss of 0.86 instead of 0.4 - but had a lot faster training.

Numbers like epochs, perceptron counts, and batch sizes are called *hyperparameters* to the neural network. Adjusting them can make the network better or worse at its job, and in fact for a given problem and set of training data there is not always an exact right set of numbers, instead you often have to run many experiments with different hyperparameters to try to find the best results.

And if you have a lot of experiments, a lot of data, and a lot of perceptrons, you get even more math...

### Coming up...
We'll start using the hyperparameters to save a bit of training time at the cost of some accuracy. And we'll learn why perceptrons get *triggered*...