In [2]:
import tensorflow as tf
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# **Neurons for Visions**
## **Designing the Neural Network**
First, we'll look at the design of the neural network in *Figure 2-5*
![Extending our pattern for a more complex example](fig2.5.png)

```python
model = tf.keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

The first, `Flatten` isn't a layer of neurons, but an input layer specification. Our inputs are $28\times 28$ images, but we want them to be treated as a series of numeric values, like a gray boxes at the top of the *Figure 2-5*. `Flatten` takes that "square" value (a 2D array) and turns it into a line (a 1D array).

The next one, `Dense`, is a layer of neurons, and we're specifying that we want 128 of them. This is the middle layer shown in *Figure 2-5*. You'll often hear such layers described as $\textbf{\textit{hidden layers}}$. Layers that are between the inputs and the outputs aren't seen by a caller, so the term "hidden" is used to describe them. We are asking for $128$ neurons to have their internal parameters randomly initialized. More neurons means it will run more slowly, as it has to learn more parameters. More neurons could also lead to a network that is greate at recognizing the training data, but not so good at recognizing data that it hasn't previously seen (this is known as $\textit{overfitting}$). On the other hand, fewer neurons means that the model might not have sufficient parameters to learn.

It takes some experimentation over time to pick the right values. This process is typically called $\textbf{\textit{hyperparameter
tuning}}$. In machine learning, a hyperparameter is a value that is used to control the training, as opposed to the internal values of the neurons that get trained/learned, which are referred to as parameters.

There's also an $\textit{activation function}$ specified in that layer. The activation function is code that will execute on each neuron in the layer. TensorFlow supports a number of them, but a very common one in middle layers is `relu`, which stands for $\textit{rectified linear unit}$. It's a simple function that just returns a value if it's greater than $0$. In this case, we don't want negative values being passed to the next layer to potentially impact the summing function, so instead of writing a lot of `if-then` code, we can simply activate the layer with `relu`.

Finally, there's another `Dense` layer, which is the output layer. This has $10$ neurons, because we have $10$ classes. Each of these neurons will end up with a probability that the input pixels match that class, so our job is to determine which one has the highest values. We could loop through them to pick that value, but the `softmax` activation function does that for us.

Now when we train our neural network, the goal is that we can feed in a $28 \times 28$-pixel array and the neurons in the middle layer will have weights and bias ($m$ and $c$ values) that when combined will match those pixels to one of the $10$ output values.

In [4]:
import tensorflow as tf
data = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = data.load_data()

training_images  = training_images / 255.0
test_images = test_images / 255.0

model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(training_images, training_labels, epochs=5)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 1ms/step - accuracy: 0.7806 - loss: 0.6321
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8606 - loss: 0.3865
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8741 - loss: 0.3462
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8860 - loss: 0.3136
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8899 - loss: 0.2991


<keras.src.callbacks.history.History at 0x203c2a16570>