In [1]:
import sys
!{sys.executable} -m pip install tensorflow



You should consider upgrading via the 'C:\Users\jenni\anaconda3\python.exe -m pip install --upgrade pip' command.





In [2]:
import tensorflow as tf
from tensorflow import keras
tf.__version__



'2.8.0'

In [3]:
keras.__version__

'2.8.0'

In [4]:
"""
70,000 grayscale images of 28 × 28 pixels each, with 10 classes
 the pixel intensities are represented as integers (from 0 to 255) 
 rather than floats (from 0.0 to 255.0)
 
 
"""
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

In [5]:
X_train_full.shape

(60000, 28, 28)

X_train_full.dtype

Note that the dataset is already split into a training set and a test set, but there is no validation set, so we’ll create one now. Additionally, since we are going to train the neural network using Gradient Descent, we must scale the input features. For simplicity, we’ll scale the pixel intensities down to the 0–1 range by dividing them by 255.0 (this also converts them to floats):

In [6]:
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.0

For Fashion MNIST, however, we need the list of class names to know what we are dealing with:

In [7]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

the first image in the training set represents a coat

In [8]:
class_names[y_train[0]]

'Coat'

## Creating the model using the Sequential API

classification MLP with two hidden layers:

In [9]:
"""
The first line creates a Sequential model. 
This is the simplest kind of Keras model for neural networks 
that are just composed of a single stack of layers connected sequentially. 
This is called the Sequential API.
"""
model = keras.models.Sequential()
"""
Next, we build the first layer and add it to the model. 
It is a Flatten layer whose role is to convert each input image 
into a 1D array: if it receives input data X, 
it computes X.reshape(-1, 28*28). 
This layer does not have any parameters; 
it is just there to do some simple preprocessing. 
Since it is the first layer in the model, 
you should specify the input_shape, which doesn’t 
include the batch size, only the shape of the instances. 
Alternatively, you could add a keras.layers.InputLayer as the 
first layer, setting input_shape=[28,28].
"""
model.add(keras.layers.Flatten(input_shape=[28, 28]))
"""
Next we add a Dense hidden layer with 300 neurons. 
It will use the ReLU activation function. 
Each Dense layer manages its own weight matrix, 
containing all the connection weights between the neurons 
and their inputs. It also manages a vector of bias terms 
(one per neuron). When it receives some input data, 
it computes RELU
"""
model.add(keras.layers.Dense(300, activation="relu"))
"""
Then we add a second Dense hidden layer with 100 neurons, also using the ReLU activation function.
"""
model.add(keras.layers.Dense(100, activation="relu"))
"""
Finally, we add a Dense output layer with 10 neurons (one per class)
, using the softmax activation function 
(because the classes are exclusive).
"""
model.add(keras.layers.Dense(10, activation="softmax"))

The model’s summary() method displays all the model’s layers, including each layer’s name

Note that Dense layers often have a lot of parameters

the first hidden layer has 784 × 300 connection weights, plus 300 bias terms, which adds up to 235,500 parameters

This gives the model quite a lot of flexibility to fit the training data, but it also means that the model runs the risk of overfitting, especially when you do not have a lot of training data

In [10]:
 model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235500    
                                                                 
 dense_1 (Dense)             (None, 100)               30100     
                                                                 
 dense_2 (Dense)             (None, 10)                1010      
                                                                 
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


You can easily get a model’s list of layers, to fetch a layer by its index, or you can fetch it by name:

In [11]:
model.layers

[<keras.layers.core.flatten.Flatten at 0x18813cf4f40>,
 <keras.layers.core.dense.Dense at 0x18819974eb0>,
 <keras.layers.core.dense.Dense at 0x18819afb970>,
 <keras.layers.core.dense.Dense at 0x18819b50670>]

In [12]:
hidden1 = model.layers[1]
hidden1.name

'dense'

All the parameters of a layer can be accessed using its get_weights() and set_weights() methods

In [16]:
weights, biases = hidden1.get_weights()
print(weights)
print(weights.shape)

[[-0.02298503 -0.00524017  0.04122557 ...  0.06297977  0.06131759
   0.06239198]
 [-0.03740768  0.06873949  0.06478673 ... -0.05190772 -0.03348781
  -0.05172724]
 [ 0.02867264  0.02156693  0.00892299 ...  0.01734243  0.01280714
  -0.01115044]
 ...
 [-0.02574976  0.02313401 -0.05127238 ... -0.05165261  0.06724736
  -0.04575659]
 [ 0.06140584  0.00193688  0.00528567 ...  0.04093091  0.04218265
  -0.06759141]
 [ 0.02791654 -0.02682829 -0.0583418  ... -0.0654764  -0.00690526
   0.02158599]]
(784, 300)


In [17]:
print(biases)
print(biases.shape)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(300,)


Notice that the Dense layer initialized the connection weights randomly (which is needed to break symmetry

and the biases were initialized to zeros, which is fine. If you ever want to use a different initialization method, you can set kernel_initializer (kernel is another name for the matrix of connection weights) or bias_initializer when creating the layer.

The shape of the weight matrix depends on the number of inputs. 

This is why it is recommended to specify the input_shape when creating the first layer in a Sequential model. 

However, if you do not specify the input shape, it’s OK: Keras will simply wait until it knows the input shape before it actually builds the model. 

This will happen either when you feed it actual data (e.g., during training), or when you call its build() method. 

Until the model is really built, the layers will not have any weights, and you will not be able to do certain things (such as print the model summary or save the model). 

So, if you know the input shape when creating the model, it is best to specify it.

## Compiling the model

After a model is created, you must call its compile() method to 
specify the loss function and the optimizer to use. 
Optionally, you can specify a list of extra metrics to compute 
during training and evaluation


In [18]:
 
"""
First, we use the "sparse_categorical_crossentropy" loss because 
we have sparse labels (i.e., for each instance, there is just a 
target class index, from 0 to 9 in this case), 
and the classes are exclusive. 
If instead we had one target probability per class for 
each instance (such as one-hot vectors, 
e.g. [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.] to represent class 3), 
then we would need to use the "categorical_crossentropy" loss instead. 
If we were doing binary classification or multilabel binary 
classification, then we would use the "sigmoid" (i.e., logistic) activation 
function in the output layer instead of the "softmax" activation 
function, and we would use the "binary_crossentropy" loss.
"""
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])


If you want to convert sparse labels (i.e., class indices) to one-hot vector labels, use the keras.utils.to_categorical() function. To go the other way round, use the np.argmax() function with axis=1.

Regarding the optimizer, "sgd" means that we will train the model using simple Stochastic Gradient Descent. In other words, Keras will perform the backpropagation algorithm described earlier (i.e., reverse-mode autodiff plus Gradient Descent). 

When using the SGD optimizer, it is important to tune the learning rate. So, you will generally want to use optimizer=keras.optimizers.SGD(lr=???) to set the learning rate, rather than optimizer="sgd", which defaults to lr=0.01

In [19]:
history = model.fit(X_train, y_train, epochs=30,
                     validation_data=(X_valid, y_valid))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


We pass it the input features (X_train) and the target classes (y_train), as well as the number of epochs to train (or else it would default to just 1, which would definitely not be enough to converge to a good solution). 

We also pass a validation set (this is optional). 

Keras will measure the loss and the extra metrics on this set at the end of each epoch, which is very useful to see how well the model really performs. 

If the performance on the training set is much better than on the validation set, your model is probably overfitting the training set 

And that’s it! The neural network is trained. 

At each epoch during training, Keras displays the number of instances processed so far (along with a progress bar), 

the mean training time per sample, and the loss and accuracy (or any other extra metrics you asked for) on both the training set and the validation set. 

You can see that the training loss went down, which is a good sign, and the validation accuracy reached 89.26% after 30 epochs. 

That’s not too far from the training accuracy, so there does not seem to be much overfitting going on.

Instead of passing a validation set using the validation_data argument, you could set validation_split to the ratio of the training set that you want Keras to use for validation. For example, validation_split=0.1 tells Keras to use the last 10% of the data (before shuffling) for validation.